CN1902620B - Virtual file system - Google Patents

Virtual file system Download PDF


Publication number
CN1902620B CN 200480039804 CN200480039804A CN1902620B CN 1902620 B CN1902620 B CN 1902620B CN 200480039804 CN200480039804 CN 200480039804 CN 200480039804 A CN200480039804 A CN 200480039804A CN 1902620 B CN1902620 B CN 1902620B
Grant status
Patent type
Prior art keywords
Prior art date
Application number
CN 200480039804
Other languages
Chinese (zh)
Other versions
CN1902620A (en )
Original Assignee
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date



A virtual file system (209) including multiple storage processor nodes (103) including a management node (205), a backbone switch (101), a disk drive array (111), and a virtual file manager (301) executing on the management node. The backbone switch enables communication between the storage processor nodes. The disk drive array is coupled to and distributed across the storage processor nodes and stores multiple titles. Each title is divided into data subchunks (113a) - (113e) which are distributed across the disk drive array in which each subchunk is stored on a disk drive of the disk drive array. The virtual file manager manages storage and access of each subchunk, and manages multiple directory entries including a directory entry for each title. Each directory entry is a list of subchunk location entries in which each subchunk location entry includes a storage processor node identifier, a disk drive identifier, and a logical address for locating and accessing each subchunk of each title.


虚拟文件系统 Virtual File System

技术领域 FIELD

[0001] 本发明涉及交互式宽带服务器系统,特别涉及管理和维护分布于存储装置阵列的数据信息的虚拟文件系统。 [0001] The present invention relates to interactive broadband server system, and particularly to manage and maintain the virtual file system in the distributed data storage device array.

背景技术 Background technique

[0002] 一直以来,希望提供一个存储和传输流式媒体内容的解决方案。 [0002] All along, hoping to provide a storage and transmission of streaming media content solutions. 虽然考虑了不同数据的速度,在以4兆比特每秒(Mbps)每流的速度传输时,可伸缩性的初始目标为100至1,000,000个同时独立同步内容流。 While considering the speed of different data, when the transmission rate for each stream at 4 megabits per second (Mbps), scalability initial target 100 is synchronized to 1,000,000 content streams simultaneously and independently. 全部可用带宽受到最大可用底板交换机的限制,目前最大交换机处于太比特(terabit)每秒的范围,或者大约200,000个同时输出流。 All the available bandwidth limited by the maximum available floor switch, the maximum current in the switch (terabit) range of terabits per second, or about 200,000 while the output stream. 通常,输出流的数量与每流比特率成反比例。 Typically, the number of output streams with each stream bit rate is inversely proportional.

[0003] 内容存储最简单的模式是连接至含有单一网络连接器的单一处理器的单一盘驱动器。 [0003] The simplest mode of the content storage is connected to a single processor, a single network connector comprising a single disk drive. 数据由盘读取,放置于内存,通过网络以分组形式分发给用户。 Data read from the disk, is placed in memory, in the form of packets over a network to users. 传统数据,例如网页等,以异步方式传输。 Traditional data, such as Web pages, etc., transmitted asynchronously. 换句话说,存在任意时延的任意量的数据。 In other words, the presence of any amount of any delay data. 从Web服务器上传输了低容量,低分辨率的视频。 Video transmission low-volume, low-resolution from the Web server. 实时媒体内容,例如视听,需要同步传输,或带有保证传输时间的传输。 Real-time media content, such as viewing, need to synchronize the transmission, or transmission with a guaranteed transmission time. 这样,对带宽的限制在于盘驱动器,盘需要解决臂状移动和旋转延搁。 Thus, bandwidth limitation is that the disk drive, the disk needs to solve the arm moves and rotates postponement. 如果在特定时间从驱动器至处理器,系统只能支持6个连续内容的同时流,那么第7个用户的请求需要等待6个优先用户的其中一个放弃内容流。 If six consecutive supports only content from the drive to the processor at a specific time while the system stream, the seventh user requests needs to wait six priority users abandon wherein a content stream. 该设计的上层是简单的,下层是盘,作为该设计中唯一的机械装置,其只能这么快速地访问和传输数据。 The upper design is simple, the disc is lower, as the only design mechanical devices, which can only be so fast access and transfer of data.

[0004] 通过添加另一个驱动器,或多个驱动器,交错驱动器访问,来进行改进。 [0004], interlace drive access by adding another drive or drives to be improved. 也可以存储复制的内容于每个驱动器,而获得冗余和高性能。 Copy the contents may be stored for each driver, to obtain redundancy and performance. 这是较好的方案,但还是存在许多问题。 This is a better solution, but still there are many problems. 只用那么多的内容可以放置于本地的一个驱动器或多个驱动器中。 Only so much content may be placed on a local drive or multiple drives. 盘驱动器,CPU,和内存每个都是故障单点,其造成的后果将是灾难性的。 Disk drive, CPU, and memory are each single points of failure, its consequences would be catastrophic. 该系统只能承受盘控制器能控制的驱动器数量。 The system can only withstand the number of drives in the disk controller to control. 即使带有很多装置,还存在标题分配的问题。 Even with many devices, there is a problem of distribution header. 在真实世界中,每个人都希望看到最新的电影。 In the real world, everyone wants to see the latest movies. 单凭经验,80%的内容请求只针对20%的标题。 A rule of thumb, 80% of the content requests only for 20% of the title. 所有机器的带宽不能由一个标题占有,因为这将阻碍对只存储在该机器中不很流行标题的访问。 All machines can not occupy the bandwidth of a title, because it would hamper access to the storage not only very popular in the machine title. 因此,“高需求”标题需要加载于大多数或所有的机器中。 Therefore, the "high demand" title to be loaded on most or all of the machines. 简而言之,如果用户希望看一部老电影,那么他将会运气不好-即使该电影已经装入系统中。 In short, if the user wants to watch an old movie, then he will be bad luck - even if the film has been loaded into the system. 在一个大型图书馆中,这一比例比上述例子中80/20的比例更大。 In a large library, this proportion is larger than the above example 80/20.

[0005] 如果该系统基于标准局域网(LAN)处理数据,还存在其他效率低的问题。 [0005] If the system is based on a standard local area network (LAN) data processing, there are other lower efficiency. 现代基于以太网TCP/IP系统是传输保证方面的重大突破,但还是存在分组冲突和局部丢失分组而重新传递引起的时间代价的问题,需要对其管理使之运作。 Modern Ethernet-based TCP / IP transport system is a major breakthrough in terms of the guarantee, but still there is a problem packet collisions and lost packets and re-passed local time costs caused by the need to manage to make their operations. 不能保证及时地获得内容流集。 We can not guarantee timely access to streaming content set. 每个用户占据一个交换端口,每个内容服务器也占据一个交换端口。 Each user occupy a switch port, each content server also occupy a switch port. 因此,交换端口数是服务器数的两倍,限制了整个在线带宽。 Therefore, the number of switch ports is twice the number of servers, limiting the bandwidth of the entire line.


[0006] 本发明旨在解决上述技术问题。 [0006] The present invention is intended to solve the above problems. [0007] 根据本发明的一个方面,提出了一种虚拟文件系统,包含:多个存储处理器节点,其中至少包括一个管理节点,每个所述存储处理器节点包括一端口接口和一盘驱动接口;底板交换机,其包括多个端口,每个所述端口连接至所述多个存储处理器节点的相应端口接口,所述底板交换机使得所述多个存储处理器节点的每个节点之间能够进行通信;盘驱动器阵列,其连接和分布于所述多个存储处理器节点的所述盘驱动接口,所述盘驱动器阵列存储多个标题,每个标题被分成分布于所述盘阵列的多个子块,其中每个子块被存储在所述盘驱动器阵列的一个盘驱动器中;所述至少一个管理节点运行一个虚拟文件管理器,其管理所述多个标题的每个子块的存储和访问,以及维护包括每个标题的目录项的多个目录项,每个所述目录项包括一个子块位置项列表, [0007] In accordance with one aspect of the present invention, it proposes a virtual file system, comprising: a plurality of storage processor nodes, which comprise at least one management node, each node comprising a processor, a memory port interface and a drive Interface; switch plate including a plurality of ports, each port connected to the respective port interface plurality of storage processor nodes, such that the bottom plate switches between said plurality of storage processor nodes each node can communicate; the disk drive array, which is connected to the disc and distributed to the plurality of storage processor nodes driver interface, said disk drive array stores a plurality of titles, each title is distributed into said disk array a plurality of sub-blocks, wherein each sub-block is in a disk drive of said disk drive array storage; the at least one managed node running a virtual file manager, storage and access of each sub-block of the plurality of titles which manages , and maintenance including a plurality of directory entries for each title directory entry, the directory entry for each sub-block comprises a list of items position, 其中每个子块位置项包括一个存储处理器节点标志符、一个盘驱动器标志符、以及用于定位和访问存储于所述盘驱动器阵列的每个标题的每个子块的逻辑地址;以及用户过程,其执行于一个存储处理器节点上,其向所述虚拟文件管理器提交对一个选定标题的标题请求,其从所述虚拟文件管理器接收所述选定标题的相应目录项,针对所述相应目录项中的每个子块位置项提交一个子块读取请求,每个子块读取请求被发送至由所述相应目录项中的相应子块位置项中的存储处理器节点标识符所标识的存储处理器节点处接收子块,以及使用接收的子块重建所述选定标题。 Wherein the position of each sub-block comprises a key storage processor node identifier, a disk drive identifier, and a logical address for locating and accessing each subchunk of each title stored on the disk drive array; and a user process, which is executed on a storage processor node, which submits a request for the title of the selected title to the virtual file manager, the corresponding directory entry received from the virtual file manager of the selected title, for the each sub-block positions corresponding entry in the directory entry read request submitted to a sub-block, each sub-block read request is sent to the respective sub-block identified by the position of the respective entries in the directory entry storage processor node identifier the storage processor node receiving sub-blocks and the reconstruction sub-block using the received title selected.

[0008] 根据本发明的另一方面,提出了一种虚拟文件系统,包括:多个存储处理器节点,其中至少包括一个管理节点,每个所述存储处理器节点包括一端口接口和一盘驱动接口;底板交换机,其包括多个端口,每个所述端口连接至所述多个存储处理器节点的相应端口接口,所述底板交换机使得所述多个存储处理器节点的每个节点之间能够进行通信;盘驱动器阵列,其连接和分布于所述多个存储处理器节点的所述盘驱动接口,所述盘驱动器阵列存储多个标题,每个标题被分成分布于所述盘阵列的多个子块,其中每个子块被存储在所述盘驱动器阵列的一个盘驱动器中;所述至少一个管理节点运行一个虚拟文件管理器,其管理所述多个标题的每个子块的存储和访问,以及维护包括每个标题的目录项的多个目录项,每个所述目录项包括一个子块位置项列表, [0008] According to another aspect of the present invention, there is proposed a virtual file system, comprising: a plurality of storage processor nodes, which comprise at least one management node, each node comprising a processor, a memory port interface and a drive interface; switch backplane, which includes a plurality of ports, each port connected to a respective port interface of said plurality of storage processor nodes, said base plate of said plurality of switches so that each node of the storage processor node inter can communicate; the disk drive array, which is connected and distributed to the plurality of storage processor nodes of said disk drive interface, a disk drive array storage plurality of titles, each title is distributed into said disk array a plurality of sub-blocks, wherein each sub-block is in a disk drive of said disk drive array storage; the at least one managed node running a virtual file manager, that manages the storage of the plurality of titles, and each sub-block access, maintenance and comprising a plurality of directory entries for each title directory entry, the directory entry for each sub-block comprises a list of items position, 其中每个子块位置项包括一个存储处理器节点标识符、一个盘驱动器标识符、以及用于定位和访问存储于所述盘驱动器阵列的每个标题的每个子块的逻辑地址;其中所述虚拟文件管理器管理标题存储,其中每个标题分成多个数据块,每个数据块包含多个子块,所述多个子块包含每个数据块的冗余数据,以及其中所述盘驱动器阵列分成多个冗余阵列组,其中每个冗余阵列组包含分布于多个存储处理器节点的多个盘驱动器,以及其中每个数据块的所述多个子块分布于一个相应冗余阵列组的多个盘驱动器上。 Wherein the position of each sub-block comprises a key storage processor node identifier, a disk drive identifier, and a logical address of each sub-block for each header for locating and accessing stored in the disk drive array; wherein said virtual file Manager header storage, wherein each title divided into a plurality of data blocks, each data block comprising a plurality of sub-blocks, sub-blocks of the plurality of redundant data for each data block, and wherein said disk drive array is divided into a plurality a redundant array groups, wherein each redundant array group comprises a plurality of disk drives distributed in a plurality of storage processor nodes, and wherein the plurality of sub-blocks for each data block located in a respective plurality of redundant array groups a disk drive.


[0009] 根据下面的说明,结合附图,能更好地理解本发明的益处,特性和优点: [0009] The following description, in conjunction with the accompanying drawings, will be better understood benefits, features and advantages of the present invention:

[0010] 附图1是根据本发明一个示范实施例实现的交互式内容引擎(ICE)部分简化方框图; [0010] Figure 1 is an exemplary of the present invention (ICE) partially simplified block diagram of an interactive content engine implemented according to embodiments;

[0011] 附图2是附图1中ICE的部分逻辑方框图,阐述同步数据传输系统; [0011] Figure 2 is a logical block diagram of a part of the ICE in the drawings, set forth the synchronous data transmission system;

[0012] 附图3是附图1中ICE的部分方框图,根据本发明实施例进一步阐述附图2中VFS细节和支持功能;[0013] 附图4显示了表1,阐述附图1中只包含3个盘阵列组的ICE的示范配置; [0012] FIG. 3 is a partial block diagram of an ICE drawings, Figure 2 is further illustrated in detail and VFS support according to embodiments of the present invention; [0013] Figure 4 shows Table 1, set forth in the accompanying drawings a ICE 3 contains an exemplary set of disks array configuration;

[0014] 附图5显示了表2,阐述了4个标题是如何使用表1的配置保存; [0014] Figure 5 shows Table 2 illustrates how four titles are stored using the configuration of Table 1;

[0015] 附图6显示了表3,阐述了描述于表2的4个标题的最初12个定位器的内容;和 [0015] Figure 6 shows the contents of Table 3, the description set forth in Table 2 of the 4 titles first 12 locators; and

[0016] 附图7显示了表4,进一步阐述了子块如何存储于附图1中ICE的不同组, SPN,和盘驱动器的细节。 [0016] Figure 7 shows Table 4 further illustrate how the sub-block is stored in a different set of drawings in detail, the SPN, and the disk drive 1 in the ICE.

具体实施方式 detailed description

[0017] 下面的说明是为了能使本领域普通技术人员可以制造和使用在特定申请上下文和必要条件中提供的本发明。 [0017] The following description is presented to enable those of ordinary skill in the art to make and use the present invention in particular provides the necessary context and application conditions. 但是对于本技术领域熟练的技术人员,最优实施例的不同修改是显然的,在此确定的通用原则可应用于其他实施例。 But for those skilled skilled in the art, various modifications of the preferred embodiment will be apparent to the generic principles defined herein may be applied to other embodiments. 因此,本发明不限于这里显示和描述的特定实施例,而是同与这里揭露的原则和新颖特性一致的最宽范围相符合。 Accordingly, the present invention is not limited to the embodiments shown and described specific embodiments, but is consistent with the widest scope consistent with the principles disclosed herein and novel properties.

[0018] 这里描述的体系结构提供了改变性能的独立组件,避免当购买初始系统时进行的安装。 [0018] The architecture described herein provides a separate component performance changes, avoid an initial purchase of the installation system. 使用商品组件可以确保使用最新的优良技术,避免单一来源,和得到每流的最低成本。 The use of commodity components to ensure good use of the latest technology, to avoid single source, and get the lowest cost per stream. 也能容忍独立组件的故障。 We can tolerate the failure of individual components. 在许多情况下,从用户的角度,在运行中没有显著的变换。 In many cases, from a user perspective, no significant transformation in operation. 但另一方面,存在简短的“自我修复”周期。 On the other hand, there is a brief "self-healing" period. 在多数情况下,若干故障可以容忍。 In most cases, a number of tolerable faults. 多数情况下(如果不是全部),不需要立即注意系统即可恢复,使其适于“熄灯(lightout)” 操作。 In most (if not all), the system does not require immediate attention can be restored, adapted to "lights (LightOut)" operation.

[0019] 内容存储分配和内部宽带由最近最少使用(LRU)算法自动管理,确保RAM缓存和硬盘阵列缓存的内容适应当前的需要,以最有效的方式使用底板交换宽带。 [0019] Content distribution and storage by the internal broadband least recently used (LRU) algorithm for automatic management, and to ensure that the contents of RAM cache caching disk array adapted to current needs, in the most efficient way to use base switched broadband. 系统的带宽很少订购超额(如果有的话),因此不必丢弃或延迟分组的传输。 Order the excess bandwidth of the system rarely (if any), and therefore does not have to be discarded or delayed transmission of a packet. 该结构充分利用每个组件的合成宽带,因此可以获得保证,同时网络是私人的,处于完全的掌控,因此即使是不曾预料到的最高要求的情况下,也没有数据通路超载。 The full structure of each component using a synthetic wide band, can be guaranteed, while the network is private, in full control, even if a so unanticipated most demanding case, the data path is no overload. 可以提供任意比特率的流, 但还是希望典型的流位于1至20Mbps的范围内。 It may provide any bit rate stream, but still want to stream typically in the range of 1 to 20Mbps. 基于有效的带宽,提供异步内容。 Based on the effective bandwidth, provides asynchronous content. 为了应用程序的需要保存带宽。 In order to save the bandwidth needs of the application. 文件可以是任何大小,且带有最小的低存储效率。 Files can be of any size, and with a minimum of storage efficiency is low.

[0020] 附图1是根据本发明一个示范实施例实现的交互式内容引擎(ICE) 100部分简化方框图。 [0020] Figure 1 is an exemplary of the present invention (ICE) 100 part a simplified block diagram of an interactive content engine embodiment implemented embodiment. 为了清楚,并没有显示不能适于完全彻底理解本发明的部分。 For clarity, it is not shown can not be adapted to full and complete understanding of this invention. ICE100包含适当的多个端口(或多端口)吉比特以太网(GbE)交换机101作为底板构造,其拥有连接至许多存储器处理节点(SPN) 103的多个以太网端口。 ICE100 comprising a plurality of ports suitable (or port) Gigabit Ethernet (GbE) switch 101 is configured as a base plate, which has a plurality of Ethernet ports connected to a number of memory processing node (SPN) 103 a. 每个SPN103是被简化的服务器, 包含两个吉比特以太网端口,一个或更多处理器107,存储器109 (例如随机存取存储器(RAM)),和合适数量(例如四个至八个)的盘驱动器111。 Each SPN103 is a simplified server, comprises two Gigabit Ethernet ports, the one or more processors 107, a memory 109 (e.g., random access memory (the RAM)), and a suitable number (e.g. four to eight) a disk drive 111. 每个SPN103上的第一Gb 端口105连接至交换机101的相应端口,实现全双工操作(在每个SPN/端口连接的同时传输和接收),以及用于在ICE100中传输数据。 Gb a first port 105 connected to the corresponding port of the switch 101 on each SPNs 103, full-duplex operation (simultaneous transmission and reception in each SPN / connection port), and for transmitting the data ICE100. 另一个Gb端口(未显示)传输内容输出至下游的用户(未显示)。 Another Gb port (not shown) to transfer content output downstream user (not shown).

[0021] 每个SPN103可以高速访问其本地盘驱动器,和高速访问每组五个SPN中其他四个SPN的盘驱动器。 [0021] Each SPN103 high-speed access to its local disk drives, high-speed access, and other five SPN SPN of four disk drives each. 交换机101是ICE100的底板,而不仅仅是SPN103之间的通信装置。 Switch 101 is a bottom plate ICE100, not just the communication means between the SPN103. 为了阐述的原因,只显示了五个SPN103,但我们知道ICE可典型地包含大量的服务器。 For the reasons set forth above, it shows only five SPN103, but we know that ICE may typically contain a large number of servers. 每个SPN103用于存储,处理和传输内容。 Each SPN103 for storing, processing and transmitting the content. 在显示的配置中,每个SPN103使用现有组件进行配置,已经不是通常意义上的计算机。 In the configuration shown, each of the existing components SPN103 configuration, the computer is not in the usual sense. 虽然可以考虑标准操作系统,但是这样的中断驱动操作系统会造成不必要瓶颈。 Although considered a standard operating system, but such interrupt-driven operating system may result in unnecessary bottlenecks.

7[0022] 每个标题(例如视频,电影或其他媒体内容)不完全存储在任一单一盘驱动器111中。 7 [0022] Each title (e.g. video, movie, or other media content) is not completely stored in either a single disk drive 111. 相反,每个标题数据被分割,存储于多个ICElOO的盘驱动器,以此获得交错访问速度优势。 Instead, each title divided data stored in the plurality of disk drives ICElOO, in order to obtain the advantages of interleaving access speed. 单一标题内容分布在多个SPN103的多个盘驱动器中。 Single title in a plurality of contents distributed plurality of disk drives in the SPN103. 标题内容的短“时间帧”从每个SPN103的每个驱动器以循环方式(roundrobin)聚集。 Short "time frames" title of the content from each driver for each aggregate in a cyclic manner SPN103 (roundrobin). 以这种方式展开物理加载,避免SCSI和IDE的驱动器数量限制,获得失效-安全操作的形式,并且识别和管理大量标题组。 In this way expand the physical loading, and to avoid the SCSI drive IDE quantity obtained fail - safe operation of the form, and identify and manage a large number of title sets.

[0023] 在显示的特定配置中,每个内容标题被分成固定大小(典型地每块2兆字节(MB))的具体数据块。 [0023] In the particular configuration shown, each content title is divided into fixed size (typically each 2 megabytes (MB)) of the specific data block. 每个数据块以循环方式存储于不同SPN103组。 Each data block in a circular manner SPN103 stored in different groups. 每个数据块分成四个子块,创建表示奇偶的第五子块。 Each data block is divided into four sub-blocks, creating a fifth representative of the parity sub-blocks. 每个子块存储于不同SPN103的盘驱动器。 Each sub-block stored in different disk drive SPN103. 在显示和描述的特定配置中,子块大小大约是512千字节(KB) ( “K”是1024),其与每个盘驱动器111的额定数据单元匹配。 In the particular configuration shown and described, the sub-block size is about 512 kilobytes (KB) ( "K" 1024), which rated each data unit is matched with the disk drive 111. 一次SPN103分为五个,每组或SPN集存储一个标题的一个数据块。 A SPN103 is divided into five, each or a block of data stored in a title set SPN. 如所显示的,该五个SPN103被标为1-4和“奇偶”,其共同存储了数据块113,SPNU 2、3、4和奇偶分别存储了子块113a、113b、113c、113d和113e。 As shown, the five SPN103 are labeled 1-4 and "parity" data is stored together which block 113, SPNU 2,3,4 and parity sub-blocks are stored 113a, 113b, 113c, 113d, and 113e . 子块113a_113e显示为以分布方式存储在每个不同SPN(例如SPNl/DRIVE1,SPN2/ DRIVE2, SPN3/DRIVE3等)的不同驱动器中,但也可以存储在任何其他可能的组合(例如SPN1/DRIVE1,SPN2/DRIVE1, SPN3/DRIVE3 等)中。 Subblock 113a_113e displayed in a distributed manner are stored in each of the different the SPN (e.g. SPNl / DRIVE1, SPN2 / DRIVE2, SPN3 / DRIVE3 etc.) different drives, but may be stored in any other possible combination thereof (e.g. SPN1 / DRIVE1, SPN2 / DRIVE1, SPN3 / DRIVE3 etc.). 子块1-4 包含了数据,子块奇偶包含了数据子块的奇偶信息。 1-4 are data sub-blocks, the parity sub-block contains parity information for the data sub-block. 每个SPN组大小,虽然典型的是五个,可以是任意的,可以是任意其他合适的数量,例如两个SPN至十个SPN。 Each SPN group size, though typically five, can be any, may be any other suitable number, for example two to ten SPN SPN. 两个SPN可以使用其存储量的50%用于冗余,十个将使用10%。 SPN can use two 50% of its storage capacity for redundancy, ten to 10%. 五个是存储效率和发生故障可能性的折中数目。 Five is the number of storage efficiency and compromise the possibility of failure.

[0024] 通过这种方式分布内容,至少可以实现两个目标。 [0024] In this way distribution of content, the at least two goals may be achieved. 第一,可以观看单一标题的用户数量不限于由单一SPN组服务的数量,而受所有SPN组一起的带宽的限制。 A first number of users, a single title can be viewed not limited to that of a single service group of SPN, SPN and all groups together by bandwidth limitations. 因此, 只需要每个内容标题的一个副本。 Therefore, the contents of only one copy of each title. 所折衷的是每秒钟能启动的给定标题的新观察者数量的限制,但该限制性远不及浪费的空间和冗余存储管理额外开销。 The compromise is to give the viewer a new limit on the number given titles per second can be started, but it is far less restrictive and wasted space and redundant storage management overhead. 第二目标是ICE100全面可靠性的提升。 The second goal is to enhance the overall reliability of ICE100. 使用奇偶驱动器,单一驱动器故障由其内容实时再生所掩盖,其与独立盘冗余阵列(RAID)相似。 Using the parity drive, a single drive failure by its content in real time reproduction masked, which is similar to the redundant array of independent disks (RAID). SPN103的故障由其包含来自若干RAID组的每一组的一个驱动器的事实所掩盖,其中的每一个连续操作。 SPN103 fault by the fact that each set comprising a drive from a plurality of masked RAID group, each of which is a continuous operation. 连接至一个故障SPN的用户很快由运行于其他SPN的影子(shadow)程序所接管。 Users connected to a fault SPN soon taken over by the run (shadow) program in other SPN shadow. 在盘驱动器或整个SPN出现故障时,操作员被告知维修或更换故障装置。 When the disc drive or the entire SPN fails, the operator is informed repair or replace the malfunctioning device. 当一个丢失子块由用户过程重建时,其被传回应该提供该块的SPN中,在那里其被缓存在RAM中(就好像已经从本地盘读取)。 When a user is lost by the sub-block reconstruction process, which should be transmitted back to the SPN providing the block, where it is cached in RAM (if it had been read from the local disk). 这避免浪费其他用户过程执行同一修复流行标题的时间,因为后来的需求由来自RAM的子块满足,只要子块足够流行而保持缓存。 This avoids wasting other users during the execution of the same popular title of repair time, and later because of the demand by the sub-block from the RAM to meet, as long as the sub-block and is popular enough to keep the cache.

[0025] 运行于每个“用户” SPN103的用户过程(UP)的目标是从其自身盘聚集子块, 加上来自其他用户SPN相应的四个子块,组成视频内容数据块进行传输。 [0025] Each run certain "user" user SPNs 103 Process (UP) is transmitted from its own sub-block disc aggregate, with the corresponding four sub-blocks the SPN from other users, video content consisting of data blocks. 用户SPN与一个或多个管理MGMT SPN相区别,后者以相同的方式配置,但实现不同的功能,这将在下面阐述。 SPN user with one or more management of MGMT SPN distinguished, the latter arranged in the same manner, but perform different functions, which will be described below. 一对冗余MGMT SPN用于提升可靠性和性能。 A pair of redundant MGMT SPN for boosting performance and reliability. 由每个UP实现的汇聚和组成功能为了多个用户在每个用户SPN 103上被实现多次。 UP implemented by each composition and aggregation functions to be implemented multiple times in a plurality of users on each user SPN 103. 因此,在用户SPN103之间, 存在大量的数据传输。 Thus, between the user SPNs 103, there are a lot of data transmission. 带有分组冲突校验和重试的典型以太网协议将被淹没。 With a checksum and packet collisions typical Ethernet protocol retry will be submerged. 典型协议是为了随机传输设计的,其依靠这些事件之间的空闲时间。 A typical protocol for random transmission design, which relies on idle time between these events. 因此没有使用该方法。 Thus the method is not used. 在ICE100中,通过使用全双工的、全交换的结构,和通过仔细管理带宽,可以避免冲突。 In ICE100 by using full-duplex, the whole structure of exchange, and by carefully managing bandwidth to avoid conflict. 多数通信可以同步实现。 Most communications can be synchronized to achieve. 交换机101自身以同步方式管理,将在下面进一步阐述,因此传输是协同的。 Switch 101 itself to manage the synchronization will be further described below, the transmission is synergistic. 由于确定了哪个SPN103何时开始传输,在特定期间端口不会被比其所能控制的更多的数据所压倒。 Since it is determined when to start transmission which SPN103, more data during a particular port can not be controlled ratio which overwhelmed. 事实上,数据首先聚集于用户SPN103的存储器109,然后同步地控制其传输。 In fact, the user data is first gathered in SPN103 memory 109, and controls its transmission synchronization. 作为协调的一部分,在用户SPN103间存在状态信号。 As part of the coordination between the user presence status signal SPN103. 不像实际发往最终用户的内容,该用户SPN装置间的传输信令的数据非常小。 Unlike the actual content sent to the end user, the transmission of signaling data between the user SPN apparatus is very small.

[0026] 如果允许子块随机或异步地传输,每个子块(大约512K字节,此处“K”是1024)的长度可以压倒GbE交换机101中的任何可用缓冲。 [0026] If the sub-block transmission permission randomly or asynchronously, each sub-block (about 512K bytes, where "K" is 1024) the length of the buffer can overwhelm any available GbE switch 101. 传输这么多信息的期间大约是4毫秒(ms),且希望确保多个端口不同时尝试传输至单一端口。 So much information during transmission is about 4 milliseconds (ms), and want to make sure that does not try to transfer multiple ports to a single port simultaneously. 因此,如下面将进一步阐述,以导致同步运行的方式控制交换机101,在全加载的情况下充分利用所有的端口。 Thus, as will be further described below, to cause a synchronous operation control mode switch 101, in case of a full load take advantage of all the ports.

[0027] 管理文件系统(或虚拟文件系统或VFS)的冗余目录过程用于当给定内容标题被用户请求时报告它存储在哪里。 [0027] manages the file system (or virtual file system or VFS) the process for reporting redundancy directory where it is stored in a given content title when the user is requested. 当加载新标题时,它也可以用于分配所需求的存储空间。 When loading a new title, it can also be used for the needs of allocated storage space. 所有分配是完整的数据块,它们中的每一个都由五个子块组成。 All assignments are complete data blocks, each consisting of five sub-block thereof is composed. 每个盘驱动器的空间由逻辑块地址(LBA)在驱动器中进行管理。 Each disk drive space is managed in the drive by a logical block address (LBA). 子块存储在盘驱动器邻近的扇区或LBA 地址中。 Sub-block is stored in the disk drive neighboring sector or LBA address. ICE100中每个盘驱动器的容量由其最大的LBA地址除以每个子块扇区数目来表不。 ICE100 each disk drive capacity of its maximum LBA address divided by the number of sectors for each sub-block table is not.

[0028] 每个标题映射或“目录项”包含一个列表,在表中显示在哪里存储了标题数据块,特别是每个数据块的子块位于何处。 [0028] Each title map or "directory entry" includes a list showing where to store the header data blocks, particularly blocks for each sub-block are located in the table. 在该阐述的实施例中,列表中代表子块的每一项包含确定具体用户SPN103的SPNID,确定被确定用户SPN103的特定盘驱动器111的盘驱动器号(DD#),和压缩为64比特值的子块指针(或者逻辑块地址或LBA)。 In an embodiment of the forth, the list represent sub-blocks each comprising determining a particular user SPN103 of SPNID, determining determines that the user SPN103 particular disk drive the disk drive 111 of the (DD #), and compressed into 64-bit value sub-block pointer (or a logical block address or LBA). 每个目录项包含在额定4Mbps下的大约半小时内容的子块列表。 About half an hour the sub-block list for each content included in the directory entry of the rated 4Mbps. 这等于450数据块,或者2250 子块。 This data block is equal to 450, 2250 or sub-blocks. 每个目录项大约20KB并带有辅助数据。 Each directory entry about 20KB and with auxiliary data. 当运行在SPN上的UP请求目录项时, 为了相应的用户,整个项被发送并本地保存。 When running on a SPN UP request directory entry for the corresponding user, the entire entry is transmitted and stored locally. 即使一个SPN支持1,000个用户,本地列表或目录项只消耗20MB的存储器容量。 Even if a SPN supports 1,000 users, local list or directory entries only consume memory capacity of 20MB.

[0029] ICE100维持所有对于用户可用的标题的数据库。 [0029] ICE100 maintain all databases available to the user for the title. 该列表包含本地光盘库,实时网络程序设计,和已经进行许可和传输设置的远程位置的标题。 The list includes local CD library, real-time network programming, and the title has been licensed and the remote location of the transmission settings. 该数据库包含每个标题的所有元数据,包含管理信息(许可期间,比特率,图形分辨率等。)和用户感兴趣的信息(制片人,导演,演员,制作成员,作者等。)。 The database contains all the metadata for each title, including management information (license period, bit rate, resolution graphics, etc.) and information of interest to the user (producer, director, actor, producer member, author, etc.). 当用户做出选择,查询虚拟文件系统(VFS)209(附图2)的目录,确定该标题是否已经加载在盘阵列中。 When the user makes a selection, the query (VFS) 209 directory (FIG. 2) of the virtual file system determines whether the title is already loaded in the disk array. 如果没有,则为该内容启动加载程序(未显示),如果需要的话UP被告知何时该内容可以观看。 If not, was the content boot loader (not shown), if desired UP was told that when the content can be viewed. 多数情况下,该延迟时间不超过光盘检索自动设备(未显示)的机械延迟时间,或者大约30秒。 In most cases, this delay time does not exceed the automatic CD-ROM retrieval apparatus (not shown) of a mechanical delay time, or about 30 seconds.

[0030] 在光盘上存储的信息(未表示)包括所有元数据(它们当盘第一次装载到库中时读入数据库)、以及代表关于这些数据流可预先搜集的标题和所有信息的压缩数字视频和声频。 [0030] The information storage (not shown) on the optical disc includes all metadata (which is read when the disc is first loaded into the database repository), and representatives of these data streams may be pre-compressed header and gather all information about the digital video and audio. 例如,它包含对于在数据流中的所有相关信息的指针,如时钟值和时间戳。 For example, it contains pointers to all the relevant information in the data stream, such as the clock value and a time stamp. 它已经划分成子块,具有预计算的和在盘上存储的奇偶性子块。 It has been divided into sub-blocks having parity temper precomputed and stored on the disk in blocks. 一般地,任何可预先进行以节省装载和处理开销的内容都包括在光盘上。 In general, any advance in order to save and load the contents of processing overhead are included on the disc.

[0031] 包括在资源管理系统中的是调度程序(未表示),UP咨询该调度程序以接收用于其流的开始时间(通常在请求的数个毫秒内)。 [0031] included in the resource management system is a scheduler (not shown), UP consult the scheduler to receive the flow of a start time (typically within a few milliseconds in the request). 调度程序保证在系统上的负载保持均勻、等待最小、及在ICE 100内需要的带宽在所有时间都不超过可用的带宽。 Scheduler on the system to ensure that the load remains uniform, the minimum waiting, and the bandwidth required within the ICE 100 does not exceed the available bandwidth at all times. 当用户请求停止、暂停、快进、后退或中断其流的流动的其它操作时,其带宽被重新分配,并且 When a user requests to stop, pause, fast forward, rewind, or other interruption of the flow of the operation of the flow, its bandwidth is reallocated, and

9对于请求的任何新服务(例如,快进流)进行新分配。 9 a new allocation for any new service requests (e.g., fast-forward flow).

[0032] 图2是ICE 100的一部分的逻辑方块图,表明根据本发明的实施例实现的同步数据传送系统200。 [0032] FIG 2 is a logical block diagram of a portion of the ICE 100, indicating synchronous data transfer system implemented according to an embodiment of the present invention 200. 开关101表示成联接到几个典型SPN103上,包括第一用户SPN 201、 第二用户SPN 203、及管理(MGMT)SPN 205。 Represented as a switch 101 coupled to several exemplary SPN103, comprising a first user SPN 201, a second user SPN 203, and management (MGMT) SPN 205. 如以前提到的那样,多个SPN 103联接到交换机101上,并且为了解释本发明只有两个用户SPN 201、203被表示,并且如以前描述的那样实际上仅作为任何SPN 103实施。 As previously mentioned, a plurality of SPN 103 is coupled to the switch 101, and in order to explain the present invention, only two users are represented SPN 201,203, and as before only any SPN 103 actually described embodiment. MGMT SPN 205实际上仅像任何其它SPN 103那样实现,但通常完成管理功能而不是特定用户功能。 MGMT SPN 205 actually implemented only as like any other SPN 103, but is usually complete management functions rather than specific user function. SPN 201表明某些功能,并且SPN 203表明每个用户SPN SPN 201 that certain functions, and means that each user SPN 203 SPN

[0033] 103的其它功能。 Other features [0033] 103. 然而,要理解,每个用户SPN 103被设置为完成类似功能, 从而对于SPN 201描述的功能(和过程)也提供在SPN 203,并且反之亦然。 However, it is understood that each user SPN 103 is arranged to perform similar functions to the function (and the process) described for the SPN 201 is also provided in the SPN 203, and vice versa.

[0034] 如以前描述的那样,交换机101以每端口IGbps操作,从而每个子块(约512KB)占用约4ms以从一个SPN传到另一个。 [0034] As previously described, the switch 101 per port IGbps operation, so that each sub-block (about 512KB) occupies approximately 4ms from one SPN to another spread. 每个用户SPN 103执行一个或多个用户过程(UP),每个用户过程用来支持一个下游用户。 Each user SPN 103 to perform one or more user processes (UP), each for supporting a user process downstream users. 当需要标题的新块重新填充用户输出缓冲器(未表示)时,来自列表的下五个子块从存储那些子块的其它用户SPN请求。 When a new title block needs refilling a user output buffer (not shown), the five sub-blocks from the list of other users from the memory SPN requests that sub-block. 由于多个UP可能大体上在同一时间请求多个子块,所以子块传输持续时间另外地会淹没用于单个端口的几乎任何GbE交换机的缓冲能力,勿论用于整个交换机。 Since a plurality of UP request may be substantially at the same time a plurality of sub-blocks, the duration of transmission sub-blocks otherwise overwhelm the buffering capacity of a single port for almost any GbE switch, not on for the entire switch. 这对于所示的交换机101是真实的。 This is shown in the switch 101 is true. 如果不管理子块传输,则导致用于每个UP的所有五个子块可能同时返回,淹没输出端口带宽。 If no sub-block transmission management, resulting in all the five sub-blocks for each of the UP may return while submerged output port bandwidth. 希望收紧ICE 100的SPN的传输的计时,从而最关键数据被首先且完好地传输。 ICE hopes to tighten the timing of transmission of the SPN 100, which is the most critical data first and intact transmission.

[0035] SPN 201显示成执行UP 207以便服务相应的下游用户。 [0035] SPN 201 to perform the display corresponding UP 207 to service downstream users. 用户请求标题(例如, 影片),该请求被转发到UP 207。 A user request header (e.g., video), the request is forwarded to the UP 207. UP 207把标题请求(TR)传送到位于MGMT SPN 205 上的VFS 209 (下面进一步描述)。 UP 207 put title request (TR) to transmit located on the MGMT SPN 205 VFS 209 (described further below). VFS 209把目录条目(DE)返回到UP 207,UP 207 本地存储在211处显示的DE。 VFS 209 put directory entry (DE) to return to the UP 207, DE UP 207 in the local storage 211 is displayed. DE 211包括定位标题的每个子块(SCI、SC2、等等)的列表,每个条目包括识别特定用户SPN 103的SPNID、识别所述被识别的SPN 103的特定盘驱动器111的盘驱动器号码(DD#)、及在所述识别盘驱动器上提供子块的特定位置的地址或LBA。 DE 211 comprises list header positioning each sub-block (SCI, SC2, etc.), each entry identifying a specific disk drive comprising a specific user SPN 103 of SPNID, identifying the identified SPN 103 of the disk drive number 111 ( DD #), and a specific address or LBA location subblock on the disk drive identification. SPN 201对于在DE 211中的每个子块一次一个地启动时间戳读取请求(TSRR)。 SPN 201 a read request (TSRR) a time stamp for each sub-block in DE 211 is activated one at a time. 在ICE 100中,立即和直接进行所述请求。 In the ICE 100, the request is immediately and directly. 换句话说,SPN 201向存储数据的特定用户SPN 103立即和直接地提出对于子块的请求。 In other words, a specific user SPN 103 SPN 201 to the stored data request immediately and directly to the sub-block. 在显示的结构中,即使本地存储,也以同样方式进行请求。 In the configuration shown, even if the local storage, but also request the same manner. 换句话说,即使请求的子块驻留在SPN 201的本地盘驱动器上,它也像远程布置那样经交换机201送出请求。 In other words, even if the sub-block requested reside locally on the SPN 201 of the disk drive, as it is arranged as a remote request sent by the switch 201. 网络是可以构造成识别出请求正在从SPN发送到同一SPN的定位。 Network is configured to identify that the request is being sent from the SPN is positioned to the same SPN. 同样地处置所有情形会更加简单,特别是在其中请求实际上是本地的可能性较小的较大设施。 Likewise, the disposal of all cases will be easier, especially where the request is actually more likely the smaller local facilities.

[0036] 尽管请求被立即和直接地送出,子块每个都以完全管理方式返回。 [0036] While the request is directly sent and, for each sub-block are completely returned to management immediately. 每个TSRR 使用SPNID到特定用户SPN,并且包括用于目标用户SPN的DD#和LBA,以检索和返回数据。 Each TSRR to a specific user SPN SPNID use, and includes a target user and the SPN DD # LBA, to retrieve and return data. TSRR还可以包括任何其它识别信息,该信息足以保证请求的子块适当地返回到适当的请求者并且使请求者能够识别子块(例如,区分在目的地SPN上执行的多重UP的UP标识符、区分用于每个数据块的多个子块的子块标识符、等等)。 TSRR may also include any other identifying information, sufficient to ensure that the information requested sub-block is suitably returned to the appropriate requestor and the requestor can identify sub-blocks (e.g., multiple identifier distinguishing UP UP executing on the destination SPN , sub-block identifier for distinguishing a plurality of sub-blocks for each data block, etc.). 每个TSRR也包括识别何时进行原始请求的具体时间的时间戳(TS)。 Each TSRR also include identifying the specific time when the original request for a time stamp (TS). TS识别用于同步传输目的的请求的优先级,其中优先级基于时间,从而较早请求呈现较高优先级。 TS identify a priority request synchronous transmission purposes, wherein the priority is based on time, so that a higher priority earlier requested for presentation. 当被接收到时,被请求标题的返回子块存储在本地标题存储器213中,以便进一步处理和传送到请求该标题的用户。 When received, the request is returned sub-block stored in the local header in the header memory 213 for further processing and delivery to the user requesting the title.

[0037] 用户SPN 203表明在每个用户SPN(例如,201、203)上执行的传送过程(TP) 215的操作和支持功能,用来接收TSRR和用来返回请求子块。 [0037] The user SPN 203 indicates the SPN for each user (e.g., 201, 203) transfer process (TP) and the operations performed on the support 215 for receiving a request for returning TSRR and sub-blocks. TP 215包括存储过程(未表示)或者否则与存储过程相接口,该存储过程与在SPN 203上的本地盘驱动器111相接口,其用来请求和访问存储子块。 TP 215 includes a storage procedure (not shown) or otherwise interfacing with the stored procedure, the stored procedure with a local disk drive on the SPN 203 to interface 111, which is used to access requests and memory sub-block. 存储过程可以以任何希望方式实现,如状态机等,并且可以是在TP 215与本地盘驱动器111之间接口的分离过程,如对于本领域的技术人员已知的那样。 Stored procedure may be implemented in any desired manner, such as a state machine, and may be a separation process between the TP 215 and the local disk drive interface 111, as to the skilled artisan. 如表示的那样,TP 215从在其它用户SPN 103上执行的一个或多个UP接收一个或多个TSRR,并且把每个请求存储在其本地存储器109中的读取请求队列(RRQ) 217中。 As indicated above, TP 215 receives from one or more of the one or more TSRR UP executing on other user SPN 103, and each of the read request is stored in its local memory 109 request queue (RRQ) 217 ​​in . RRQ 217存储对子块SCA、SCB等等的请求列表。 List request RRQ 217 stores the subblock SCA, SCB, and the like. 存储被请求子块的盘驱动器从RRQ 217除去对应请求,把它们以实际顺序分类,并且然后以分类顺序执行每个读取。 Storing the requested sub-blocks corresponding to the disk drive is removed from the RRQ 217 requests, sorts them in the actual sequence, and then executes each read in sorted order. 对于在每个盘上的子块的访问按组管理。 On each access to the disc by the sub-block group management. 每一组根据“电梯寻找(elevator seek)“操作(一次扫描从低到高,下次扫描从高到低,等等,从而跨过盘表面的盘头来回扫描,暂停以读取下个顺序子块)按实际顺序分类。 According to each group "Looking elevator (elevator seek)" operation (one scan from low to high, from high to low for the next scan, and the like, so that the disk surface across the pan head scans back and forth, to read the next sequential pauses subblock) classified according to the actual order. 成功读取的请求存储在按TS顺序分类的成功读取队列(SRQ) 218中。 Read successfully read request stored in the queue of the classified TS order success (SRQ) 218 ​​in. 对于失败读取的请求(如果有的话)存储在失败读取队列(FRQ) 220中,并且失败信息转发到网络管理系统(未表示),该网络管理系统确定错误和适当的校正动作。 For read request failed (if any) stored in read queue (FRQ of) the failure 220, and forwards the failure information to the network management system (not shown), the network management system determines the error and the appropriate corrective action. 注意,在显示的构造中,队列217、218及220存储请求信息而不是实际子块。 Note that, in the configuration shown, the request queue 217, 218 and 220 stores information rather than the actual sub-block.

[0038] 成功读取的每个子块放置在为最近请求子块的LRU高速缓冲器而保留的存储器中。 [0038] each sub-block of a successful read is placed in a LRU cache memory sub-blocks and the most recently requested reserved. 对于每个被检索子块,TP 215创建对应消息(MSG),该消息包括用于子块的TS、 子块的源(SRC)(例如正在从其传输子块的SPNID和其实际存储器位置以及任何其它识别信息)、及子块待传输到的目的地(DST) SPN (例如,SPN 201) 0如显示的那样,SRQ 218包括分别用于子块SCA、SCB等等的消息MSGA、MSGB等等。 For each sub-block is retrieved, TP 215 creates a corresponding message (MSG), the message comprising a source TS, the sub-blocks a sub-block (the SRC) (e.g. SPNID is from its physical memory location and transmission sub-block, and any other destination identification information), and the sub-blocks to be transmitted (DST) SPN (e.g., SPN 201) 0 as shown, SRQ 218 respectively include a sub-block SCA, SCB, etc. message MSGA, MSGB etc. Wait. 在读取和高速缓冲被请求的子块之后,TP 215把对应MSG发送到在MGMT SPN 205上执行的同步交换机管理器(SSM) 219。 After reading the cache and the requested subblocks, TP 215 corresponding to MSG is to put the synchronization switch manager (SSM) 219 executing on the MGMT SPN 205.

[0039] SSM 219从TP接收来自用户SPN的多重MSG并把其按优先级排序,并且最终把传送请求(TXR)发送到识别在其SRQ 218中的MSG之一的TP 215,如使用消息标识符(MSGID)等。 [0039] SSM 219 receives from a user SPN multiple MSG from its TP and to sort by priority, and eventually transfer request (the TXR) is sent to the TP 215 identifying one of MSG in the SRQ 218 which, using a message such as identification character (MSGID) and the like. 当SSM 219把带有识别在SRQ 218中的子块的MSGID的TXR发送到TP 215时,请求列表从SRQ218移动到网络传送过程(NTP)221(其中“移动”指示从SRQ 218除去请求),该过程221建立用来把子块传送到目的地用户SPN的分组。 When put with SSM 219 identified in the SRQ 218 TXR MSGID subblock sent to the TP 215, the request list from the network to the mobile SRQ218 transfer process (NTP) 221 (where "move" request indication is removed from the SRQ 218), the establishment process 221 for the sub-block to the packet's destination user SPN. 其中从SRQ 218除去子块请求列表的顺序不必是顺序的,尽管列表按时间戳顺序,因为只有SSM 219确定适当的排序。 Wherein the order of the sub block is removed from the list request SRQ 218 need not be sequential, although the list of timestamp order, since only the SSM 219 determines an appropriate ordering. SSM 219把一个TXR发送到具有至少一个子块待发送的每个其他SPN 103,除非所述子块要发送到已经计划接收相等或较高优先级子块的SPN 103上的UP,如下面进一步描述的那样。 SSM 219 put a TXR transmitted to each other SPN 103, unless the sub-blocks to be transmitted to the program already has at least one sub-block to be transmitted or received is equal to the higher priority sub-block of the UP 103 SPN, as described further as described. SSM 219然后向所有用户SPN 103广播单个传送命令(TX CMD)。 SSM 219 then broadcast to all the user SPN 103 single transmission command (TX CMD). TP 215响应由SSM 219广播的TX CMD命令而指示NTP 221把所述子块传送到用户SPN 103的请求UP。 TP 215 in response to the broadcast of the SSM 219 TX CMD command indicating said NTP 221 sub-block to the requesting user SPN 103 is UP. 以这种方式,已经从SSM 219接收到TXR的每个SPN 103同时传送到另一个请求用户SPN 103。 In this manner, it has been received from SSM 219 to each SPN 103 TXR is simultaneously transferred to another requesting user SPN 103.

[0040] 在MGMT SPN 205上的VFS 209管理标题列表和它们的在ICE [0040] VFS 209 on the title list management MGMT SPN 205 thereof and the ICE

[0041] 100中的位置。 In [0041] 100 positions. 在典型计算机系统中,目录(数据信息)通常驻留在数据所驻留 In a typical computer system, the directory (data) typically resides in the data resides

11的同一盘上。 11 on the same disc. 然而,在ICE 100中,VFS 209集中布置以管理分布数据,因为用于每个标题的数据跨在盘阵列中的多个盘上分布,该多个盘又跨在多个用户SPN 103上分布。 However, in the ICE 100, VFS 209 is arranged to manage the centralized data distribution, since the distribution of data across multiple disks for each title in the disc across the array, and across the plurality of disks distributed over a plurality of user SPN 103 . 如以前描述的那样,在用户SPN 103上的盘驱动器111主要存储标题的子块。 As previously described, the block 111 primarily stores sub-title on a disc drive of the user SPN 103. VFS 209如以上描述的那样,借助PSNID、DD#、及LBA,包括用于每个子块的位置的标识符。 VFS 209 as described above, by PSNID, DD #, and the LBA, including an identifier for each sub-block location. VFS 209也包括外部的ICE 100的其它部分(如光学存储)的标识符。 VFS 209 also includes an identifier of the other outer portion (optical storage) of the ICE 100. 当用户请求标题时,使目录信息的完全集合(ID/地址)对于接收了用户的请求的用户SPN 103上执行的UP可用。 When a user requests a title, that the complete set of directory information (ID / address) for receiving UP executing on the user requesting user SPN 103 available. 从那里,任务是把子块传送离开盘驱动器到存储器(缓冲器),经交换机101把它们移动到请求用户SPN 103,该请求用户SPN 103在缓冲器中组装完整的块,把它输送到用户,及重复直到完成。 From there, the task is to handle the transfer block away from the disk drive to the memory (buffer), the switch 101 by moving them to the requesting user SPN 103, the requesting user SPN 103 complete assembly block in the buffer, sends it to the user , and repeat until done.

[0042] SSM 219在准备消息(RDY MSG)列表223中按时间戳顺序创建“准备”消息列表。 [0042] SSM 219 223 list created "ready" message list timestamp order in preparation message (RDY MSG). 其中在用户SPN 103上从TP接收消息的顺序不必按时间戳顺序,而是按在RDY MSG列表223中的TS顺序。 Wherein the order message received from the user SPN 103 TP necessarily in timestamp order, but in TS order in the RDY MSG list 223. 刚好在下个传送集合之前,SSM 219从最早时间戳开始扫描RDY MSG列表223。 Just before the next transfer collection, SSM 219 from the earliest time stamp start scanning RDY MSG list 223. SSM 219首先识别在RDY MSG列表223中的最早TS,并且产生和发送对应TXR消息到存储对应子块的用户SPN 103的TP 215,以启动该子块的当前传送。 SSM 219 first identifies the first TS RDY MSG list 223, and generates and sends a corresponding message to the user SPN TXR memory subblock corresponding to TP 215 103 to initiate transmission of the current sub-block. SSM 219对于每个随后子块按TS顺序继续扫描列表223,产生用于每个子块的TXR 消息,该子块的源和目的地还没有包括在当前子块传送中。 SSM 219 for each subsequent sub-block TS order to continue the scan list 223, TXR message is generated for each sub-block, the source and destination sub-block does not include the sub-block in the current transmission. 对于到所有用户SPN 103的每个TXCMD广播,每个用户SPN 103只一次传送一个子块,并且只一次接收一个子块, 尽管它可同时地进行两者。 For all the broadcast to each user SPN 103 TXCMD each user SPN 103 transmit only one sub-block time, and receives only one sub-block time, although it can be carried out both simultaneously. 例如,如果TXR消息发送到SPN#10的TP以计划到SPN#2 的当前子块传送,那么SPN#10不能同时发送另一个子块。 For example, if a TXR message is sent to the SPN # TP 10 are planning to block transmission of the current sub SPN # 2, then the SPN # 10 can not simultaneously send another sub-block. 然而,SPN#10可同时从另一个SPN接收子块。 However, SPN # 10 can receive sub-block from another SPN. 此外,SPN#2在从SPN#10接收子块的同时不能同时接收另一个子块,尽管SPN#2可同时传送到另一个SPN,这是因为交换机101的端口的每一个的全双工属性。 Further, SPN # 2 can not receive another sub-block 10 while receiving sub-block from the SPN #, although SPN # 2 can be transmitted simultaneously to another SPN, because the properties of each of the full-duplex port 101 of the switch .

[0043] SSM 219继续扫描RDY MSG列表223,直到已经考虑所有用户SPN 103,或者当到达RDY MSG列表223的末端。 [0043] SSM 219 continues to scan the RDY MSG list 223 until all have been considered user SPN 103, or when the tip reaches the RDY MSG list 223. 与TXR消息相对应的RDY MSG列表223中的每个条目最终从RDY MSG列表223除去(或者当发送TXR消息时、或者在完成传送之后)。 TXR message corresponding to the RDY MSG list 223 for each entry in eventually removed from the RDY MSG list 223 (TXR message is sent or when, or after the transfer is completed). 当前一段时间的最后传送已经结束时,SSM 219广播TX CMD分组,该TX CMD分组向所有用户SPN 103发信号以开始下一轮传输。 The last current transmission period has ended, SSM 219 TX CMD broadcast packets, the TX CMD packet transmitted to the next round to all user SPN 103 signal. 对于所描述的特定配置,每次传送在近似4 至5秒的时段内同时发生。 For the particular configuration described, while each transmission occurs within approximately a period of 4-5 seconds. 在每个传送轮次期间,额外的MSG发送到SSM 219,并且新TXR消息送出到用户SPN 103以计划下一轮传输,及重复该过程。 In each round during transport, additional MSG sent to SSM 219, and the new TXR message sent to the user SPN 103 to schedule the next transmission, and the process is repeated. 在连续TX CMD之间的时段近似等于传送子块所有字节所必需的时段,包括分组总开销和分组间延迟,加上清除在子块的传输期间在交换机中可能已经发生的所有高速缓冲的时段,典型地为60微秒(ys),加上考虑到由单独SPN识别TX CMD时的延迟引起的任何抖动的时段,典型地小于100 μ S。 TX CMD period between successive sub-blocks is approximately equal to the transfer time period necessary for all bytes, including inter-packet overhead and packet delay, plus clear all the cache during the transmission sub-blocks may have occurred in the switch period, typically 60 microseconds (YS), plus the time taking into account any jitter caused by the delay in identifying the individual SPN TX CMD, typically less than 100 μ S.

[0044] 在一个实施例中,复制或镜像MGMT SPN(未表示)是主MGMTSPN 205的镜像,从而SSM 219、VFS 209、及调度程序每个都复制在一对冗余专用MGMT SPN 上。 [0044] In one embodiment, duplicated, or mirrored MGMT SPN (not shown) is a front MGMTSPN mirror 205, whereby SSM 219, VFS 209, and a scheduling program are duplicated on each pair of redundant dedicated MGMT SPN. 在一个实施例中,同步TX CMD广播作为指示MGMT SPN 205的健康的脉动(heartbeat)。 In one embodiment, the synchronization TX CMD broadcast as an indication of the health of the pulsation MGMT SPN 205 (heartbeat). 脉动发送到辅助MGMT SPN,指示一切都好。 Pulsation transmitted to the secondary MGMT SPN, indicate all is well. 在没有脉动的情况下,辅助MGMTSPN在预定时间段内,像例如在5ms内,接管所有管理功能。 In the absence of pulsations, the auxiliary MGMTSPN predetermined period of time, like for example in the 5ms, to take over all administrative functions.

[0045] 附图3是ICE的部分方框图,根据本发明的一个实施例进一步阐述VFS209的 [0045] Figure 3 is a partial block diagram of the ICE VFS209 further illustrated in accordance with one embodiment of the present invention.

12细节和支持功能。 12 details and support functions. 如图所示,VFS209包含虚拟文件管理器(VFM) 301和VFS接口管理器(VFSIM) 302。 As shown, VFS209 comprising a virtual file manager (VFM) 301 and VFS interface manager (VFSIM) 302. VFSIM302是VFS301和ICE100其余部分之间的通信端口,所述其余部分包含系统监视器(SM) 303、库加载器(LL) 305和用户主监视器(UMM) 307。 VFSIM302 communication port between VFS301 ICE100 and the rest, the rest of the system comprising a monitor (SM) 303, library loader (LL) 305, and user home monitor (UMM) 307. VFSIM302从SM303接收请求和指示,以及向LL305和UMM307提供服务。 And receiving a request from the indication VFSIM302 SM303, and to provide services and LL305 UMM307. 提供给VFM301的请求和指示被排队并保存,直到被取得。 Available to VFM301 requests and instructions to be queued and stored until it is achieved. VFM301的响应被缓存,并返回至请求器。 VFM301 response is cached, and returned to the requestor. VFSIM302管理由其自己和VFM301启动的后台任务。 VFSIM302 by its own management and VFM301 start of a background task. 这些任务包含自动内容再分割(re-striping)、存储装置验证/修复、以及容量增大和减少。 These tasks contain content automatically re-segmentation (re-striping), storage means verify / repair, as well as capacity increases and decreases. VFSIM302监控硬件添加/去除的通知;记录装置序列号,以便需要时自动启动验证/修复。 VFSIM302 monitoring hardware add / remove notification; recording device serial number, to initiate verify / repair automatically when needed. 这里的讨论涉及VFS209,其分别或都涉及VSM301和VFSIM302,除非另有说明。 This discussion involves VFS209, respectively, or involve VSM301 and VFSIM302, unless otherwise noted.

[0046] VFS209以最大化整体系统性能和便于从硬件故障恢复的方式,管理标题内容存储(分布于存储装置或盘驱动器)。 [0046] VFS209 to maximize overall system performance and ease of hardware failure recovery mode, managing title content storage (storage device or distributed in a disk drive). VFS209尽可能地设计成灵活的支持大范围地硬件配置,使ICE100的每个站点部署都能够精细调整硬件支出,满足特定使用特性。 VFS209 designed to be flexible as possible to support a wide range of hardware configurations, so that each site deployment ICE100 are able to fine-tune hardware spending to meet the specific usage characteristics. 当整个系统保持可运行状态时,通过添加新的SPN103,站点可以提升自己的容量。 When the entire system can be kept running, by adding new SPN103, sites can enhance their capacity. 同样地, 保持运行的同时,VFS209也能够使SPN以及各个存储装置(例如串行ATA(SATA)驱动器)交替地工作和不工作。 Likewise, while maintaining operation, it is possible to make the VFS 209 SPN and the respective storage devices (such as serial ATA (SATA) drives) work alternately and not working. ICE100中SPN103的数量只受实施交换机101的当前最大的底板交换机带宽的限制(比如目前大约是500SPN)。 The maximum base current of the switch bandwidth limitations in the number ICE100 SPN103 embodiment only by the switch 101 (such as the current is about 500SPN). 每个SPN103可以有任意数量的存储装置(通常对于给定站点,每个SPN存储装置数量是不变的),每个存储装置含有不同的存储容量(大约或等于该站点指定的最小值)。 Each SPN103 can have any number of storage devices (generally for a given site, the number SPN for each storage device is constant), each storage device containing a different storage capacities (or approximately equal to the specified minimum site). 目前,对于一个站点,典型的是每个SPN103含有1-8硬盘驱动器,虽然该设计能足够灵活地容纳新装置类型(当新装置类型可用时)。 Currently, for a site, typically containing 1-8 SPN103 each hard disk drive (when a new device type available) Although this design flexible enough to accommodate the new device type. 进一步,如果单个物理SPN103的容量是站点最小容量的两倍或三倍,可向VFS209添加两个或三个逻辑SPN(这适用于所述最小容量任意偶数倍的情况)。 Further, if the capacity of a single physical SPN103 site twice or three times the minimum size, may be added to two or three logical SPN the VFS 209 (this applies to the case where any even multiple of the minimum capacity). 设计所述VFS209使得允许每个站点能随着时间逐步更新其硬件,如所需要的,在每次添加时使用最好的可用硬件。 VFS209 designed so as to allow said each site can update its hardware gradually over time, as needed, using the best available hardware at each addition.

[0047] VFS209智能地管理内容。 [0047] VFS209 intelligently manage content. 它具有一些规定以平稳地处理最大负荷的问题,推迟并非要紧的任务,自动再分配内容(再分割过程)以充分利用提升的站点容量,它优先地进行故障恢复以在需要之前预期指令和重建内容,同时它具有较强的从先前使用的存储装置中恢复内容的能力。 It has a number of provisions in order to smoothly handle peak load problems, delay is not critical tasks, automatic redistribution of content (subdivision process) to take full advantage of the site to enhance the capacity, it priority in order to carry out recovery and reconstruction is expected before the required instruction content, and it has strong ability to recover the content from the storage device previously used. 在显示的一个实施例中,VFM301排他地与VFSIM302通信, VFSIM302由SM303管理、并向LL305和UMM307提供服务。 In one embodiment shown in the embodiment, the VFM 301 communicate exclusively with VFSIM302, VFSIM302 managed by the SM303, and provides services to LL305 and UMM307. 在加电时,VFS209不了解系统硬件配置。 At power up, VFS209 do not understand the system hardware configuration. 当每个用户SPN103自我导入和声明时,SM303组合该SPN的相关细节(它的组联系,盘数量,每个盘的存储容量等),其在VFSIM302注册,VFSIM302通知VFM301。 When the self introduction and each user SPN103 statements, the SM 303 details the SPN composition (its contact group, the number of discs, each disc storage capacity, etc.), which is registered in VFSIM302, VFSIM302 notification VFM301. 虽然每个SPN都可以存储内容,但并非都需要这么做。 Although each SPN can store content, but not all need to do so. VFS209允许任意数量的“热空闲(hot spare)”被保存以储备为空盘,准备在故障修复、定期维修、或其他目的时起作用。 VFS209 allows any number of "hot spare (hot spare)" is saved to disk reserve is empty, ready to fault repair, regular maintenance work or other purposes.

[0048] 在站点初始,做出关于RAID组中SPN数量的决定。 [0048] In the initial site, make a decision on the number SPN of RAID group. 内容均勻地分布在每个SPN组,因此必须添加SPN至RAID组增量中的一个站点。 SUMMARY SPN uniformly distributed in each group, it is necessary to add a site to SPN of the RAID group increments. 唯一的例外是被指定为空闲的SPN(其以任意数量独立添加)和冗余管理的SPN。 The only exception is the SPN is designated as free (which is added separately in any number) and redundancy management SPN. 在系统初始化时,多数SPN103被添加,但是新SPN组可以在系统的整个过程中的任意时间点被添加。 At system initialization, most SPN103 is added, but the new group can SPN any point in the process system is added. 当一个站点通过添加新SPN组提升其容量时,现存的内容在后台自动地被再分割(下面将更详细地解释再分割的过程),来充分利用新添加的硬件。 When a station to enhance its capacity by adding new SPN group, the existing contents are automatically re-divided (the process will be explained in more detail subdivided) in the background, to take advantage of newly added hardware. 通过第一次再分割(后台的分割过程)实现缩小ICE100的大小,然后删除已经被解除分配的装置。 By a first subdivision (splitting background processes), reduction of the size of ICE100, and then remove the device has been deallocated.

[0049] 在VFS209中,每个SPN103完全任意地分配一个逻辑ID,但是为了方便,其通 [0049] In VFS209 each SPN103 completely arbitrarily assigned a logical ID, for convenience, which pass

常与SPN的物理地址相应。 Often SPN corresponding to the physical address. 一旦添加,给定的SPN作为逻辑实体存在于VFS209中,直到被删除。 Once added, given as a logical entity is present in the SPN VFS209 until it is deleted. 任何无空闲的SPN可以被另一个SPN代替,当该情况发生时,分配同样的逻辑地址。 Without any idle SPN SPN may be replaced by another, when this happens, assign the same logical address. 因此,可以随意交换物理SPN(下面将更详细地解释),能够不用中断服务即可提供执行定期维护。 Therefore, it is free to exchange physical SPN (explained in more detail below), it can be provided without interrupting service to perform regular maintenance. 一旦全部SPN组在VFS209注册,可以开始在该组中存储内容。 Once all SPN group registered in VFS209, you can start storing content in the group. 但是,为了允许在整个系统上进行统一的内容分配,在加载第一标题之前,所有存储内容的SPN组应当进行注册。 However, to allow for uniform distribution of content across the entire system, before loading the first title, SPN group all stored content should be registered.

[0050] 如前所述,标题内容的每个数据块存储于不同的组,内容以循环模式分布于所有的组。 [0050] As described above, each block of data stored in the title of the content of the different groups, the content of recycle mode distributed to all groups. 更为特别地,每个数据块分成子块(子块的数量等于该站点的组的大小,从数据子块中衍生出奇偶子块作为数组子块之一),每个子块存储于特定组的不同SPN上。 More particularly, each data block is divided into sub-blocks (the number of sub-blocks is equal to the group size of the site from the data sub-blocks derived from one of the parity sub-blocks as an array of sub-blocks), each sub-block stored in the particular group the differences SPN. 例如,假设五个盘驱动器的RAID大小,那么SPN组的大小是五(每个内容数据块有五个子块)。 For example, assuming that the size of five RAID disk drives, the size of the group of five SPN (each content block has five sub-blocks). 如果每个SPN包含四个驱动器,则共有四个RAID组。 If each SPN contains four drives, a total of four RAID groups. 第一组包含每个SPN 的驱动器1;第二组包含每个SPN的驱动器2,等等。 The first set of drivers each comprising an SPN 1; SPN for each second group comprises a drive 2, and the like.

[0051] 考虑ICE100的一个示例性的配置,如附图4中表1所述的第一个标题,“标题1”,其只包含三组GP1-GP3,其中每个组被指定为GP,每个数据块被指定为C,每个数据块的子块被指定SC。 [0051] Consider an exemplary configuration ICE100, as a first reference of the title table 14, "Title 1", which contains only three groups GP1-GP3, wherein each group is designated as the GP, each data block is designated by C, the sub-block for each data block is designated SC. 附图4的表1显示了标号为GP1-GP3的3个组,标号为C1-C12 的12个数据块,和数据块的标号为SCI、SC2、SC3、SC4、和SCP的5个子块,其中最后“P”子块表示奇偶子块。 BRIEF Table 14 shows the numeral three groups GP1-GP3, the reference numeral 12 is a data block of C1-C12, and label data block is SCI, SC2, SC3, SC4, and the SCP 5 sub-blocks, Finally, where "P" represents a parity sub-block sub-blocks. 标题1的第一数据块Cl记录为5个子块SC1-4、SCP(第五个子块为奇偶子块),分别位于第一组GPl的SPN1-5的驱动器1中。 Recording a first data block header 1 is Cl 5 sub-blocks SC1-4, SCP (the fifth sub-blocks of the parity sub-blocks), which are located a first set of drive of SPN1-5 of GPl 1. 标题1的下一个数据块C2记录为5个子块(再次为子块SC1-4、SCP),分别位于第二组GP2的SPN1-5 的驱动器1中。 1 under the heading of a data block is recorded C2 5 sub-blocks (sub-blocks again SC1-4, SCP), respectively, in the drive of the second group GP2 of SPN1-5 1. 同样,第三数据块C3记录在第三组GP3的SPN1-5的驱动器1中。 Also, the third block C3 is recorded in the third group GP3 SPN1-5 of drive 1. 第四数据块C4记录在第一组GPl的SPN1-5的驱动器2中。 Fourth blocks SPN1-5 C4 recorded in the first group of driver 2 GPl. 表1显示了第一标题,“标题1”,是如何存储的。 Table 1 shows the first heading, "Heading 1", is how the store. 丢失整个SPN(表1的一行)导致损失四个RAID组的每个组中的一个驱动器。 The SPN entire loss (Table 1, row) for each group results in the loss of four RAID groups in a drive. 所有RAID组继续生产内容,并且通过奇偶再建而不丢失内容。 All RAID groups continue to produce content, and to build without losing content through parity. 在紧跟先前标题开始的组和驱动器中开始其他的标题。 Other titles in the group to start and drive following the previously started heading in. 因此,第二个标题,标题2(未显示),在GP2的驱动器2中开始(第二个数据块位于GP3的驱动器2,第三个数据块位于组1的驱动器3,等等)。 Thus, the second title, header 2 (not shown), starts in the GP2 drive 2 (the second block of data in the drive GP3 2, the third block of data in the drive group 1, 3, etc.). 标题以这种分配方式,以最小化开始时间延迟。 The title to this assignment, in order to minimize the delay start time. 每个标题以螺旋方式围绕ICE100,从组3的每个SPN的驱动器4,循环至组1的每个SPN的驱动器1。 Each title in a spiral manner around the ICE 100, each SPN from the group of the driver 4. 3, drive cycle for each SPN of group 1 to 1.

[0052] 附图5的表2显示了如何使用表1中的配置存储四个标题。 [0052] Table 2 of Figure 5 shows how the configuration is stored in Table 1 four titles. 为了阐述的目的, 第一标题Tl包含24个数据块Tl Cl-Tl C24,第二标题T2包含10个数据块T2 C1-T2 C10,第三标题T3包含9个数据块T3C1-T3C9,第四标题T4包含12个数据块T4 C1-T4 C12。 For purposes of illustration, the first header block 24 comprises a Tl Tl Cl-Tl C24, the second header comprises 10 data blocks T2 T2 C1-T2 C10, T3 third header contains nine data blocks T3C1-T3C9, fourth The title includes 12 data blocks T4 T4 C1-T4 C12. 为了简化,3个SPN组中的每一个(SPN组1,SPN组2,SPN组3)折叠至一行, 每个标题的第一数据块加下划线,设置为粗体。 To simplify, three groups each of SPN (SPN group. 1, SPN Group 2, Group 3 SPN) to folding line, a first data block of each title underlined, bold set. 4Mbps的典型标题包含1350数据块,位于三个VFS目录项中,每个包含450个数据块,其代表了一个半小时的内容。 4Mbps typical header comprises a data block 1350, located in three VFS directory entry, each containing 450 data blocks, representing the contents of a half hour. 使用100 吉字节(GB)盘驱动器,每个RAID组含有超过200,000个数据块(意味着组中的每个驱动器含有超过200,000个子块)。 Use 100 gigabyte (GB) disk drives, each RAID group of data blocks containing more than 200,000 (each drive means a group containing more than 200,000 subblocks). RAID组的每个驱动器的子块分配典型地位于每个驱动器同样的点(逻辑块地址)。 Assigning each sub-block drive RAID group typically located the same point (logical block address) of each drive.

[0053] 在阐述的配置中,VFS209的每个目录项(DE)包含关于标题的不同的元数据, [0053] In the illustrated configuration, the VFS 209 of each directory entry (DE) contain different metadata about the title,

14以及数据块定位器阵列。 Positioning block 14 and a data array. 数据块定位器数据结构包含8个字节:两个字节确定所述组, 两个字节确定所述盘,四个字节确定所述盘的分配块(block),其中每个块含有一个子块。 Locator block data structure includes 8 bytes: two bytes of said set in two bytes of the disk is determined, four bytes of the disk to determine the allocation block (Block), wherein each block comprises a sub-block. 附图6显示了表3,其阐述了附图2中4个标题Τ1_Τ4(显示为标题1-标题4)的最先12个定位器的内容。 Figure 6 shows Table 3 which are set forth in the figures 24 titles Τ1_Τ4 (displayed as the title of the title 1- 4) the content of the first 12 locators. 标题1的未显示的上方12个定位器用完每个盘的块2。 An upper header (not shown) of retainer 12 runs each disk block 2. 在VFSIM302和每个SPN103中复制查找表,该查找表映射每个盘的逻辑地址至其所连接的SPN的MAC(介质访问控制)ID。 VFSIM302 SPN103 in each copy lookup table, the lookup table mapping the logical address of each disk SPN to which they are attached a MAC (Media Access Control) ID. 通过简单用所述块数乘以每个子块的扇区数,可以获得对应于子块的LBA。 Multiplied by the number of sectors for each sub-block by simply using the number of blocks can be obtained corresponding to the sub-block LBA. 附图7显示了表4,其进一步阐述了子块如何被存储于ICElOO的不同RAID组,SPN(标号为1-5),和盘驱动器(标号为1-4)的细节。 Figure 7 shows Table 4 which further illustrates how the different sub-blocks of the RAID group is stored in ICElOO, details the SPN (numbered 1-5), and the disk drives (labeled 1-4). 例如,标题Tl 的数据块COl的子块Sa存储于RAID组1的SPNl的盘1的块0,标题Tl的数据块COl 的下一个子块Sb存储于RAID组1的SPN2的盘1的块0,等等。 For example, the data block COl title Tl subblock Sa is stored in the RAID group SPNl disc 1 a block 01, a block in the RAID group 1 SPN2 data blocks COl title Tl the next subblock Sb storage disk 1 0, and so on.

[0054] 内容长度的变化导致存储于每个SPN103的内容数量的微小且不可预测的变化。 Change [0054] causes a change in length of the content stored in the content SPN103 number per minute and unpredictable. 对于这些示例性的标题,该变化被夸大了,但是对于成百个各自包含一千或更多数据块的标题来说,期望SPN间的差异小于1%。 For these examples the title, the variations are exaggerated, but for hundreds of titles each containing one thousand or more data blocks, the expected difference between SPN less than 1%. 虽然单个存储装置含有大于站点最小值的任意容量,但超过站点最小值的容量可以不用于存储等时(isochranous)内容。 While a single memory device containing any of these sites is greater than the minimum capacity, but more than the minimum capacity of the site may not be used for storage, etc. (isochranous) content. 因此,站点最小值尽可能地保持足够大,典型地它应当设置为等于站点中最小容量存储装置的容量。 Thus, the minimum value hold sites sufficiently large as possible, typically equal to the capacity that it should be set to the site of the smallest capacity storage device. 站点最小值可在任何时间升高或降低,例如较大装置代替最小容量装置的任何时候,它应当升高至一个较高的值。 Site at any time may be increased or decreased minimum value at any time, for example, instead of the minimum capacity means a large device, it should be raised to a higher value.

[0055] 依据ICE100的给定配置安装在哪里以及其是如何使用的,VFS209可以很少地接收新标题的存储分配的请求,或者它可以在每半小时的起始接收上百个几乎同时发生的请求。 [0055] According to a given request ICE100 installation and configuration, which is how to use where, the VFS 209 may receive rarely allocate new title is stored, or it can receive hundreds of nearly simultaneous initiation of each half-hour requests. 为了快速高效满足存储期望的需求,VFS209维持了一个预先分配的目录项的池。 In order to quickly and efficiently meet the storage needs of expectations, VFS209 maintain the pool directory entry of a pre-allocated. 池的大小基于站点的使用概况而预先设置,可以在任何时间改变池的大小,以进行调节或回应站点特性的变化。 Pool size based on the usage profile of the site set in advance, the pool size can be changed at any time, to adjust or respond to changes in the characteristics of the site. 当VFS209接收了存储分配请求,其首先试图从预先分配的目录项的池来满足所述请求。 When receiving a memory allocation request VFS209, it first attempts to pool of pre-allocated directory entry to satisfy the request. 如果其可用,一个预先分配的目录项立即被返回给该请求者。 If it is available, a pre-allocated directory entry is immediately returned to the requester. 如果池被用尽,按需地产生新的目录项,如下面将描述的。 If the pool is exhausted, a new demand generated directory entry, as will be described below. 如果一个分配请求需要同一标题的多个目录项,只有第一个项被立即返回。 If a directory entry allocation request more need the same title, only the first item is returned immediately. 该标题剩下项的分配将在随后发生,因此加载该任务至由VFS209维护的后台程序列表。 The title will allocate the remaining items in the ensuing, so the task is to load the program from the list of background VFS209 maintenance. 补充预先分配的目录项池也是一个后台任务。 Supplementary pre-allocated directory entry pool is a background task.

[0056] 为了产生目录项(预先分配的或按需产生的),VFS209需要首先确定被需要的容量是否可用(例如目前不被使用)。 [0056] In order to generate a directory entry (pre-allocated or generated on demand), VFS209 need to first determine whether a needed capacity is available (e.g., currently not used). 如果可用,可以容易地完成请求。 If available, the request can be easily accomplished. 如果不可用, VFS209解除分配一个或更多最近最少使用(LRU)标题,以便完成请求。 If not, the VFS 209 de-allocation of one or more least recently used (LRU) title, in order to complete the request. 当一个标题以这种方式解除分配,VFS209把该事件告知SM303和的SPN103。 When a title deallocate In this way, VFS209 event to inform the SM303 and the SPN103. 当VFS209返回第一目录项至请求者(或呼叫者),则初步的完成分配请求。 When VFS209 returns to the requester a first directory entry (or caller), the initial assignment request is completed. 在一个标题含有多个项时,当呼叫者能够详细说明需要哪个项时,根据需要来提供随后项。 When a title contains a plurality of items, when a caller to the detailed description which item when required, to provide the necessary subsequent entries. 每个项包含一个能够存储30分钟内容的子块定位器的表。 Each entry contains a content capable of storing 30 minutes locator sub-block table. 因此,一个95分钟的电影需要4个项,大多数情况第4个项是不充分使用的。 Therefore, a 95-minute movie requires four items, in most cases the first four items are not fully used. 更确切地说,通常第4个项表是基本不使用的,但在盘驱动器上不存在被浪费的空间,因为该盘空间消耗的实际盘驱动器仅是所述5分钟的内容所需要的。 More specifically, typically the fourth entry in the table is substantially not used, but there is wasted space on the disk drive, because the disk space consumed by the actual contents of the disk drive only the required 5 minutes. 在内部,VFS209使用高效存储数据结构来跟踪每个存储装置上的可用子块位置。 Internally, VFS209 using efficient data structures to keep track of available positions on each sub-block storage device.

[0057] 通过合并每个项中最后有效数据块(LVC)指针,收回未使用的存储空间成为可能。 [0057] By combining the last valid entry in each data block (the LVC) pointer to recover unused storage space is made possible. 在上述例子中,当提供给请求者时,第4项最初含有保留的30分钟存储。 In the above example, when provided to the requester, the first item 4 containing 30 minutes of reserved memory. 当实际存储内容的组件结束其任务时,其更新LVC指针并告知VFS209。 When a component is actually stored contents of the end of its mandate, which is updated LVC pointer and informs VFS209. 然后VFS209释放任何未使用的区域,可以在别处使用。 Then VFS209 release any unused area can be used elsewhere. 由于在长度上是可变的,每个标题可以随处结束, 不需要因为任何原因而浪费盘空间,例如对齐存储至任意边界。 Since the variable in length, each title can end anywhere, for any reason need not wasted disc space, for example, to an arbitrary memory boundary alignment. 因此,VFS209尽可能压缩全部盘,利用装置的任何下一个空闲块。 Thus, the VFS 209 to the next free block compress any full disk, using the apparatus as possible. 最初,为了简化,小文件(例如完全能装在单个块内的系统文件)和其他内容以同样的方式管理。 Initially, for simplicity, small files (e.g., fully contained within a single block of system files), and other content managed in the same way. 最后,添加微FVS性能,其处理数据块使得它仿佛其是一个盘驱动器,以为了存储许多小文件。 Finally, add the micro FVS properties, which process the data block such that it is as if it were a disk drive, a memory that many small files.

[0058] SM303也可以在任意时间指示VFS209解除分配标题,例如当标题许可期限已至IJ,或者其他原因。 [0058] SM303 may also indicate VFS209 deallocation title at any time, such as when the license period has title to the IJ, or for other reasons. 解除分配命令可以由于当前正在使用标题这一事实而复杂,当这种情况发生时,在一个实施例中,解除分配一直不结束直到访问该标题的每个用户发出结束该标题的全部使用的信号为止。 Deallocate command due to the fact that the title is currently being used and complex, when this happens, in one embodiment, the deallocation has been no signal until the end of each user accessing the title of issuing end all use of the title until. VFS209跟踪当前由每个UMM307使用的所有项,也跟踪由后台程序使用的项。 VFS209 tracked by each item is currently used by all UMM307, also tracks used by the daemon. 在延迟期间,不允许新的用户访问被标记为解除分配的标题。 During the delay, do not allow users to access the new title is marked as deallocated.

[0059] 在添加或删除新SPN组之后,重新分配现存的内容,或“再分割”,以在再分割过程期间使得资源利用尽可能统一。 [0059] After adding or deleting new SPN group, reallocate existing content, or "subdivision" in order during the subdivision process makes use of resources as uniform as possible. VFS209在任何需要的时候自动完成再分割。 VFS209 automatically re-division at any time of need. 为了使事情简单化,新的和旧的项没有任何重叠;对于新的旧的不存在共用的存储区域(见下)。 To make things simple, old and new entries are not overlapped; old for the new shared memory area is not present (see below). 一旦新的再分割复制结束(由于进展速度受可用带宽的限制,结束时间是不可预期的),新用户可以访问它,并且使用标准程序简单地解除分配旧的复本。 Once the new subdivision copy end (due to the rate of progress by the available bandwidth limitations, the end time is unpredictable), a new user can access it, and use standard procedures simply lift the copy of the old distribution. 在再分割过程中,大多数子块从原先SPN复制到不同的SPN,但小部分复制至同一SPN的不同方位。 In the re-division process, the majority of the sub-blocks are copied from the original to a different SPN SPN, but a small portion is copied to the different orientations of the same SPN. 保留在同一SPN的子块比率是m/(m*n),其中“m”是SPN先前数量,“η”是SPN 的新数量。 SPN remain in the same sub-block is the ratio of m / (m * n), where "m" is the number of previous SPN, "η" is the new number of SPN. 对于从100个更新至110个SPN的站点,每11,000子块中的100个复制于同一SPN。 For updates from SPN 100 to 110 sites, each sub-block 11,000 100 reproduced in the same SPN.

[0060] 实时操作包含内容完全是瞬时的实例,还包含正保存其内容的实例。 [0060] SUMMARY entirely real-time operation comprising transient instances, further comprising the instance is to save its contents. 如果任何时候需要瞬时实时缓冲器,在一个实施例中,ICE100使用单个30分钟目录项作为循环缓冲器,以及当不再需要时,使用标准程序解除分配该项以用于任何其他标题。 If at any time a buffer requiring instantaneous real-time, in one embodiment, the ICE 100 uses a single directory entry 30 minutes as a circular buffer, and when no longer needed, using standard procedures for the de-allocated in any other title. 当实时内容正被保存时,需要请求额外的30分钟项,必要的话VFS209解除分配LRU标题。 When real-time content is being saved, it is necessary to request additional items for 30 minutes, if necessary VFS209 LRU deallocation title. 与其他任何标题一样,原始内容可立即用于回放至由LVC指针表示的点,当存储继续时LVC 指针定期被更新。 Like any other title, original content available for playback immediately to the point indicated by the LVC pointer, when the pointer is stored LVC continue to be updated regularly. 在一些场合,“原始内容”在被用户可用之前,被分成具体标题,其中所述用户在原始内容初始开始时间之后希望请求它。 In some case, "original content" prior to being available to the user, is divided into specific title, wherein the user wishes to request that the original contents after the initial start time. 准备好后,被编辑内容被添加至VFS209,如同其他任何标题,并且原始内容能被删除。 When ready, the edited content is added to VFS209, like any other title, and original content can be deleted.

[0061] 偶尔希望使得操作的SPN103或盘驱动器离线,以满足其他什么目的。 [0061] SPN103 occasionally desirable that the disk drive or off-line operation, in order to meet whatever other purpose. 为了完成此操作而不产生负面影响,设置ICE100,使用热空闲之一作为内容容器来复制,更准确地是“克隆”所述装置。 To accomplish this without a negative impact, the ICE 100 is provided, one thermal free as to copy the contents of the container, more precisely, "clone" of the device. 当复制过程结束(再次,因为其受到可用带宽的限制,时间不可预期),所述克隆随后呈现前一个装置的标识,操作平稳地与接收通知的VFSIM302和SPN103 一起继续。 When the copy process ends (again, because it is limited by the available bandwidth, time can not be expected), and then the clonal identity before presenting a device, and to continue with VFSIM302 SPN103 operation smoothly and notifications. 除非该装置物理地断开并且再连至ICE100 (也就是除非它不被拔去和去除),VFM301不需要任何参与,因为所述克隆过程和身份交换对于VFM301是不可见的(SPN对于VFM301是逻辑实体,不是物理实体,因为使用互联网协议(IP)地址而非MAC ID)。 Unless the device is physically disconnected and re-connected to the ICE 100 (that is, unless it is not pulled off and removed), without any participation of the VFM 301, since the identity of the exchange to the cloning process and is invisible to the VFM 301 (SPN for the VFM 301 are logical entity, not a physical entity, because the use of Internet protocol (IP) address rather than a MAC ID). 当盘或SPN连接至ICE100时,其自动完成验证/维修过程(下面描述)以保证数据完成。 When a disk or SPN is connected to ICE100, which automatically verify / repair process (described below) to ensure that data is completed.

[0062] 从任何给定内容流的角度,一个存储装置的损失或整个SPN的损失看起来都是一样的。 [0062] From the perspective of any given content stream, or a storage device of the loss of an entire SPN it looks the same. 特别地,每第η个数据块遗漏一个子块(η由系统中SPN103的数量确定)。 In particular, each of the missing data blocks [eta] a subblock ([eta] is determined by the number of SPN103 system). 通过奇偶再建,指示ICElOO弥补这类损失,允许硬件替换的足够时间。 By parity to build indicating ICElOO make up for such losses, allowing sufficient time hardware replacement. 维修、验证、和克隆是特定于盘的过程。 Maintenance, verification, processes and clones specific to the disc. 为了维修、验证、克隆SPN,只需简单启动SPN中的每个盘的程序。 In order to repair, verify, clone SPN, SPN simply start the program each disk. 当一个UP发送数据块的子块请求,以及任何一个子块没有在预定时间内返回时, UP使用被取回的子块来重建丢失的子块。 When requesting a subblock UP transmission data block, and any one sub-block is not returned within a predetermined time, the sub-block is retrieved UP used to reconstruct the missing sub-blocks. 在一个实施例中,重建的子块被发送至用户SPN,该子块应当从该用户SPN获得而无论故障的原因是什么(也就是由于SPN或SPN 的驱动器的故障或者只是网络的延迟)。 In one embodiment, the sub-block reconstructed is transmitted to the user SPN, the sub-block should be obtained from the user SPN regardless of the cause of the failure, what is (i.e. due to the failed drive SPN or SPN or just a network delay). 如果作为应当获得该丢失子块的用户SPN不能接收该重建子块(例如SPN暂时不能上线或故障被限制于SPN的盘驱动器),则它仅是在传输过程中丢失。 As should be obtained if the user SPN missing sub-block can not receive the reconstructed sub-block (e.g., on-line or SPN is temporarily not to be limited to the SPN fault disk drive), it can only be lost during transmission. 如果该SPN能够接收该重建子块(例如,所述SPN恢复在线、或故障被限制在该SPN的一个盘驱动器),则它在存储器中缓冲该子块,就好像其从本地盘驱动器中读取。 If the SPN is able to receive the reconstructed sub-block (e.g., the SPN back online, or failure is limited to a disk drive that the SPN), it buffers the subblocks in a memory, if it were read from the local disk drive take.

[0063] 热交换和奇偶重建需要每个SPN103意识到每个装置的每个块是否有效。 [0063] heat exchange, and a parity reconstruction requires each SPN103 realize each block of each device is valid. 最初, 当SPN上线时,其没有有效块。 Initially, when the SPN on the line, it is not valid blocks. 当SPN接收和存储子块时(或者验证已经存在的),其把该块标记为有效。 When the sub-block received and stored SPN (or verify the present), to which the block is marked as valid. 当一个SPN接收一个对存储于标为无效的块中的请求时,该SPN回复以一个请求接收该子块。 When receiving a request for a memory SPN marked as invalid in the block, the SPN to a reply to the request receiving sub-block. 如果所述丢失子块在ICE100的其他地方通过奇偶重建而产生,其被发回SPN(使用可用带宽)以被存储,并且所述块被标记为有效。 If the missing sub-block is generated by a parity reconstruction ICE100 elsewhere, which is sent back to the SPN (using available bandwidth) to be stored, and the block is marked as valid. 缺乏对该子块的请求表示该SPN仍然是无用的,且没有重建子块需要被发送。 It indicates the lack of sub-block requesting the SPN is still useless, and there is no need to reconstruct the sub-block to be transmitted. 使用这个协议,使用最小额外花销来重新放入替代装置。 Using this protocol, using minimal additional expense to replace back into the device. 同时,为了捕获那些由于它们高要求而未受照顾的数据块,简单的后台验证/维修过程完成始-末重建,跳过已经标为有效的区域。 Meanwhile, in order to capture those data blocks were not due to their high care requirements, simple background validation / repair process is completed beginning - the end of the reconstruction, skip area has been marked as valid.

[0064] 在特定环境下,如当VFSIM302识别出先前已知含有有效内容的盘的再连接时,指导SPN193不考虑其不能发送标为无效的子块的禁止。 [0064] Under certain circumstances, such as when the disc reconnection VFSIM302 identified previously known comprising an effective content when, irrespective of their guide SPN193 not prohibit invalid subblock transmission standard. 如果试用的子块通过检验和测试,该子块可以使用(并且源SPN标记其有效),因此可以避免奇偶重建的不必要开销。 If the trial sub-block by inspection and testing, the sub-block may be used (mark it valid and the source SPN), thus avoiding unnecessary overhead parity reconstruction. SPN未能提供被请求子块以及未能请求该子块表示SPN故障。 SPN unable to provide the requested sub-block and sub-block failed request indicates SPN failure. 通过监控这些故障, ICE100自动通知系统操作员,并且在熄灯操作期间启动恢复程序。 By monitoring these failures, ICE100 automatic notification system operator, and initiate recovery procedures during lights-out operation.

[0065] 当不同的物理盘代替一个包含内容的现有盘时,VFSIM302自动启动和管理盘验证/维修。 [0065] When different physical disk instead of a conventional disk containing content, VFSIM302 disk management automatically start and verify / repair. 对于盘验证/维修,VFM301准备与已使用的目录项相似的盘维修项(DRE), 但是存在一些小差异。 For disk verify / repair, VFM301 preparation and directory entries have been used similar disk maintenance items (DRE), but there are some small differences. 450子块都来自坏的驱动器,包含来自超过一个标题的数据块。 450 sub-blocks from the bad drive, comprising a block of data from more than one title. 每个子块(包括丢失的)的检验和也被包含。 Each sub-block (including loss), and also comprising a test. DRE以最近使用的标题开始,由下一个最近使用的标题紧跟等等的方式提供。 DRE recently used beginning with the title, the next title and so keeping recently used way to provide. 如果标题不完全合适,这并没有什么关系,因为在最后一个停止的地方获得下一个DRE。 If the title is not entirely appropriate, it does not matter, because in the last place to get a stop at a DRE. 由于事先不知道DRE的全部数量,DRE简单地含有一个标记告知是否其是最后一个。 Because I do not know in advance the total number of DRE, DRE simply contain a marker to inform whether it is the last one. 这个过程允许所述维修以有序、优先的方式进行, 保持最大的数据完整性。 This process allows the maintenance of an ordered, prioritized manner, maintaining maximum data integrity.

[0066] 任何时候出现数据丢失的时候都希望能维修,例如当一个新的盘代替故障盘时。 When [0066] Any time data loss are hoping to repair, such as when a new disk when the disk instead of a failure. 当故障盘在ICE100的某个地方无效时,在整个SPN103中进行恢复接收新盘。 When the failed drive somewhere ICE100 invalid, and receive a new disc recovery throughout SPN103 in. 一次使用一个DRE,主机SPN请求丢失子块的组员,并使用它们用于奇偶重建。 DRE uses a time, the host SPN requests for lost members sub-blocks, and uses them for parity reconstruction. 保存该重建子块,并且标记该块为有效。 Save the reconstruction sub-block, and the block is marked as valid. 另一方面,如果故障盘连接至空闲SPN,VFSIM302 识别它,并试图恢复任何可用子块,以努力减少所需要的奇偶重建的量。 On the other hand, if the fault is connected to a spare disk SPN, VFSIM302 recognize it and attempts to recover any of the available sub-block, in order to reduce the amount of effort required for parity reconstruction. VFSIM302首先发送DRE至空闲SPN,在那里它使用检验和和定位器检测候选子块的有效性。 VFSIM302 DRE first transmitted to the idle SPN, where it is used and the positioning and testing the effectiveness of a candidate detection sub-block. 当一个通过后,空闲SPN标记该子块为有效,并发送至需要它的SPN,在那里它被存储为有效。 When a pass, the spare SPN sub-block mark is valid, and transmits it to the SPN needs, where it is stored as valid. 当空闲SPN已经恢复并且发送所有能够发送的子块,它通知VFSIM302其已经完成该DRE。 When the spare SPN has recovered and transmits all sub-blocks can be transmitted, it notifies VFSIM302 it has completed the DRE. 如果这时并非所有子块被恢复,VFSIM302发送该DRE至接收新盘的SPN,并 At this time, if not all sub-blocks are restored, VFSIM302 transmitted to the receiver DRE a new disc SPN, and

且在需要进行奇偶重建。 And in need parity reconstruction.

[0067] 当一个重建盘从一个SPN移至另一个时,盘或SPN连至系统的任何时候都希望进行内容确认。 [0067] When another rebuild a disk from one SPN to move at any time to the system disk or SPN is connected desired content are confirmed. 确认过程本质上与维修过程一样,只是更快。 On the confirmation process and the repair process is essentially the same, only faster. 同样的DRE对每个候选子块使用,其一次检测一个。 DRE using the same sub-block for each candidate, which detects a time. 为存在于盘的子块计算校验和。 Checksum is calculated for the sub-blocks are present in the disc. 如果被计算的校验和与DRE中校验和相匹配,该子块被认为有效。 If the calculated checksum matches the checksum DRE with the sub-blocks are considered valid. 如果校验和不匹配,从RAID组的其他SPN 请求对应于该子块的另外四个子块,重建并存储该丢失的子块。 If the checksums do not match, a request from another SPN RAID group corresponding to the four sub-blocks further sub-blocks, sub-blocks and stores the reconstructed lost. 验证过程比重建过程更快,仅仅是因为多数子块(如果不是全部)将通过最初的校验和测试。 Reconstruction process faster than the verification process, simply because most of the sub-block (if not all) through the first checksum test. 由于验证程序同重建程序相同,使得操作员可以方便地移动一个驱动器至其正确的槽,即使重建程序只完成了一部分。 Since the re-establishment procedure with the same verification process, so that the operator can easily move a drive to its proper slot, even partially completed reconstruction procedure. 当操作员拔出部分重建的盘,则重建程序被放弃,当该盘放入新的插槽中时,启动新的验证/重建程序。 When the operator tray removed partially reconstructed, reconstruction of the program is abandoned, and when the new disk into the slot, begin a new authentication / reconstruction procedures.

[0068] 克隆比重建/验证程序简单,因为只需从主机装置复制数据。 [0068] The cloning is simpler than reconstruction / verification process, because only copy data from the host device. 克隆主机将存储内容发送至接受者,此外克隆主机根据发生的变化进行发送。 Cloning host transmits the stored content to the recipient, in addition cloning host in accordance with the transmission changes. 这意味着在内容整体传输至接受者之后,允许克隆程序无限地空闲,保持该两个装置一直同步。 This means that after the entire contents transferred to a recipient, allow the cloning procedures idle indefinitely, the holding means has two synchronized. 当克隆结束后, 克隆装置呈现出主机装置的逻辑身份标识,不再需要进一步的确认(除非该装置被移动)。 When After cloning, the device presents the logical identity of the host device, no further acknowledgment (unless the device is moved). 除了验证中的潜在作用,VFS209不涉及克隆。 In addition to verifying the potential role, VFS209 not involve cloning. 由于主机负责发送和同步,不需要在VFS209中为接受者创建(然后损坏)复制数据结构。 Because the host is responsible for sending and synchronization, the need to create (and damage) in VFS209 copy data structure for the recipient.

[0069] 关于来自SM303的请求,VFS209能报告有利于管理ICE100的信息,包含最常使用(MRU)标题列表(未显示)和包含统计的装置使用报告(未显示)。 [0069] on a request from the SM303, VFS209 able to report in favor of management information ICE100, including the most commonly used (MRU) list of titles (not shown) and use the report contains statistics of the device (not shown). MRU列表包含每个当前存储的每个标题的一个记录,同该标题具体信息一起,例如其被最后请求的日期,其被请求总的次数,其全部大小,以及其是否可以被删除。 MRU list contains one record for each currently stored for each title, together with specific information of the title, such as the date it was last requested, the total number of which is requested, all of size, and whether it can be deleted. 装置使用报告包含每个SPN的一个记录,提供它的IP地址,它的组联系,以及含有每个存储装置信息的阵列, 例如装置ID,块的总数和目前分配块的总数。 Apparatus comprising a recording usage reports for each SPN, providing its IP address, its contact group, and an array of each storage device containing information such as Device ID, the total number of the current block and the total number of allocation blocks. VFS209也参与系统日志,添加每个重要事件的项。 VFS209 also involved in the system log, add entries for each important event.

[0070] 现在认识到,根据本发明的虚拟文件系统提供了一个标题数据有组织的分布, 其最大化标题的访问速度和存储效率。 [0070] It is now recognized to provide an organized distribution of a title data in accordance with the present invention, the virtual file system, which maximizes the speed of accessing the title and storage efficiency. 每个标题分为多个子块,分配于连接至多个存储处理器节点的盘驱动器阵列的盘驱动器,所述多个存储处理器节点包含一个管理节点。 Each title is divided into a plurality of sub-blocks allocated in the disc drive is connected to a plurality of storage processor nodes disk drive array, said plurality of storage processor nodes comprises a management node. 执行于管理节点的虚拟文件管理器管理存储并访问存储于阵列每个标题的每个子块。 Management node to perform virtual file storage and access manager manages each sub-block stored in the array of each title. 虚拟文件管理器维持每个标题的目录项,每个目录项是该标题的子块位置项的列表。 Virtual file manager maintains a directory entry for each title, each directory entry is a list of sub-block position of the item of the title. 每个子块位置项包含一个存储处理器节点标识符、盘驱动器标识符、以及定位和访问存储在盘驱动器阵列的每个标题的每个子块的逻辑地址。 Each entry contains a sub-block position storage processor node identifier, a disk drive identifier, and a logical address to locate and access stored in each sub-block for each title of the disk drive array.

[0071] 文件管理的集中化提供了优于盘和存储系统的现有技术许多优点。 [0071] Providing a centralized file management than the prior art disk storage system, and a number of advantages. 文件或“标题”可以是最大为所有结合的驱动器存储容量的任何大小,不受单一驱动器或冗余存储组的限制。 File or "heading" of any size can be the maximum storage capacity for all drives in combination, is not limited by a single drive or redundant storage group. 由于目录信息是集中存储的,每个驱动器的全部容量可以存储内容。 Since the directory information is stored centrally, the entire capacity of each drive may be stored content. 标题的每个请求不受一个盘驱动器或若干盘驱动器的限制,而是所述负载分布于最多为阵列中的所有盘驱动器。 Each request header is not a disk drive or disk stopper drive of several, but most of the load distribution in the array of all the disk drives. 同步交换管理器通过在顺序传输期间确保每个节点一次收到一个数据子块而使效率最大化。 By synchronous switching manager during the sequential transmission time ensure that each node receives a data sub-block to maximize the efficiency. 集中化的文件管理器允许实现最大程度输出每个盘驱动器的带宽,而不要求任何盘驱动器的任何本地目录类别。 Centralized file manager allows for maximum output bandwidth of each disk drive, any disk drive without requiring any local directory categories. 在一个实施例中,在每个盘驱动器上使用工厂设置的逻辑至物理再映射,允许信息从每个驱动器通过单一寻找操作被恢复。 In one embodiment, the logical to physical plant provided on each disk drive remapping single seek operation permission information from each drive is restored. 如本领域熟练技术人员所理解的,标准目录寻找惩罚是严重的,减少驱动器带宽,远不及其规定的一半。 As those skilled in the art will appreciate, the standard directory to find a serious penalty to reduce the bandwidth of the drive, and a predetermined half far. 相反,每个子块位置项足以定位并访问标题相应的子块,从而最小化每个存储处理器节点检索和转发数据子块的花销。 Instead, the position of each sub-block is sufficient to locate and access the respective items of title subblocks, thereby minimizing expenses for each storage processor node retrieval and forwarding of data sub-block. 不需要与复杂的操作系统接口、或执行中间的目录查找或者等等。 And so on do not need to find or complex interfaces to perform directory or in the middle of the operating system. 所识别的处理器节点的传输过程通过向识别的盘驱动器提供逻辑地址(即,逻辑块地址)来访问子块,该识别的盘驱动器立即返回存储在所述逻辑地址的子块。 Transmission of the identified processor node provides a logical address (i.e., logical block address) to access to the identified sub-block through the disk drive, the disk drive returns the identified sub-block stored in the logical address immediately. [0072] 虚拟文件系统进一步使用数据和/或过程冗余保护以防止数据丢失,在重建期间不中断服务,冗余存储组跨越各个存储处理器节点,允许任何驱动器、每个冗余盘组(例如RAID阵列)的任意驱动器的故障,或者去除所有其驱动器的任何单个节点。 [0072] The virtual file system further uses the data and / or process redundancy protection to prevent data loss, without service interruption during reconstruction, the redundancy memory group across all storage processor node, allowing any drive, each redundant disk group ( such as RAID arrays) any drive failure, or remove any single node all drives. 每个驱动器唯一被确定,允许在启动时系统自动配置,更快地从盘的局部故障或预期故障恢复。 Each driver is determined uniquely, allow the system to automatically configured at startup time, faster recovery from an expected failure or partial failure of the disc. 当产生驱动器误差时,执行奇偶重建,重建数据被发送至节点,该数据应当来源于此节点,因此可以在此缓存。 When a drive error, performs a parity reconstruction, reconstruction data is transmitted to the node, the data should be derived from this node, it is possible in this cache. 这样的结构和过程避免流行标题的冗余重建,直到驱动器和/或节点被替换,其为分布于节点的用户过程提供了主时间节省。 Such structures and processes to avoid redundancy header reconstruction popular until the drive and / or the node is replaced, which provides a time savings for the user a main processes in the distributed nodes. 进一步,执行冗余虚拟文件管理器的冗余管理节点,可以在整个系统的任意单点故障事件时,不中断操作。 Further, the implementation of redundant management node redundancy virtual file manager, you can at any single point of failure in the event of the entire system, without interrupting operations.

[0073] 还获得了许多其他优点。 [0073] also received a number of other advantages. 交互式内容引擎100不会因为成百上千的存储分配的同时请求而超载。 Interactive content engine 100 will not be hundreds or thousands of simultaneous requests for storage allocation and overload. 它使用目录处理(小于100,000流的带宽的)而允许大量视频流被同时记录和回放,且没有超载系统。 It uses the directory processing (100,000 smaller than the bandwidth of the stream) and allows a large number of simultaneous video streams are recorded and played back without overloading the system. 它允许管理功能(例如预先分配存储、再分割内容、删除标题、以及克隆驱动器和SPN)发生于后台,而不干扰同步内容回放和注入。 It allows management (e.g. storage allocated in advance, and then divided content, delete the title, and a driver and cloning SPN) occurs in the background, without interfering with synchronized content playback and injection.

[0074] 虽然本发明参考特定优选版本进行了相当详细地阐述,但是也可能出现和考虑到其他版本和变化。 [0074] Although the present invention with reference to certain preferred versions thereof are set forth in considerable detail, and it is also possible to consider other variations and versions. 本领域熟练的技术人员应当能领会,他们可以轻易地使用揭露的观念和具体的实施例,作为设计或修改其他提供本发明同样目的结构的依据,而不离开本发明的原则和范围,如下面权利要求所定义的。 Those skilled in the art should be able to appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures providing the same purposes of the present invention, without departing from the spirit and scope of the present invention, as described below as defined in the claims.

Claims (25)

  1. 1. 一种虚拟文件系统,包含:多个存储处理器节点,其中至少包括一个管理节点,每个所述存储处理器节点包括一端口接口和一盘驱动接口;底板交换机,其包括多个端口,每个所述端口连接至所述多个存储处理器节点的相应端口接口,所述底板交换机使得所述多个存储处理器节点的每个节点之间能够进行通盘驱动器阵列,其连接和分布于所述多个存储处理器节点的所述盘驱动接口,所述盘驱动器阵列存储多个标题,每个标题被分成分布于所述盘驱动器阵列的多个子块,其中每个子块被存储在所述盘驱动器阵列的一个盘驱动器中;所述至少一个管理节点运行一个虚拟文件管理器,其管理所述多个标题的每个子块的存储和访问,以及维护包括每个标题的目录项的多个目录项,每个所述目录项包括一个子块位置项列表,其中每个子块位置项包括一个存 A virtual file system, comprising: a plurality of storage processor nodes, which comprise at least one management node, each node comprising a processor, a memory port interface and a drive interface; switch plate, comprising a plurality of ports each said port interface connected to a corresponding port of the plurality of storage processor nodes, enables the overall floor switch driver array between each node of said plurality of storage processor nodes, their connections and distribution said plurality of disk storage drive interface processor node, said disk drive array storage plurality of titles, each title is divided into a plurality of subblocks located in the disk drive array, wherein each sub-block is stored in the a disk drive of said disk drive array; at least one management node runs a virtual file manager which manages storage and access of the plurality of titles each sub-block, and comprises maintaining a directory entry for each title a plurality of directory entries, each directory entry comprises a location sub-block list of items, wherein each item includes a sub-block stored position 处理器节点标志符、一个盘驱动器标志符、以及用于定位和访问存储于所述盘驱动器阵列的每个标题的每个子块的逻辑地址;以及用户过程,其执行于一个存储处理器节点上,其向所述虚拟文件管理器提交对一个选定标题的标题请求,其从所述虚拟文件管理器接收所述选定标题的相应目录项,针对所述相应目录项中的每个子块位置项提交一个子块读取请求,每个子块读取请求被发送至由所述相应目录项中的相应子块位置项中的存储处理器节点标识符所标识的存储处理器节点,并从由所述相应目录项中的相应子块位置项中的存储处理器节点标识符所标识的存储处理器节点处接收子块,以及使用接收的子块重建所述选定标题。 Processor node identifier, a disk drive identifier, and a logical address of each sub-block for each header for locating and accessing stored in the disk drive array; and a user process, which is executed on a storage processor node , submitted to the virtual file manager requests the title of a selected title, the corresponding directory entry is received from the virtual file manager of the selected title, each sub-block for the respective position in the directory entry submitted under a sub-block read request, each sub-block read request is sent to a storage processor node from the respective sub-block positions of the respective entry in the directory entry storage processor node identified by the identifier, and from the storage processor node at said respective storage processor node corresponding sub-blocks in the position of the item in the directory entry identified by the identifier receiving sub-blocks and the reconstruction sub-block using the received title selected.
  2. 2.权利要求1中的所述虚拟文件系统,其中通过对一个被标识的存储处理器节点的一个被标识的盘驱动器提供所述逻辑地址,以单个寻找操作取得所述多个子块中的每一个子块。 The virtual file system of claim 1, wherein the logical addresses provided by a disk drive is a storage processor node identified identified, made in a single seek operation of the plurality of sub-blocks each of a sub-block.
  3. 3.权利要求1中的所述虚拟文件系统,其中所述盘驱动器阵列的每个盘驱动器的全部容量可用于存储所述多个标题的所述多个子块。 The virtual file system of claim 1, wherein each of the full capacity of the disk drive of said disk drive array can be used for storing the plurality of subblocks of the plurality of titles.
  4. 4.权利要求1中的所述虚拟文件系统,其中:其中每个子块读取请求包括一个目的节点标识符、所述盘驱动器标识符和所述逻辑地址;和其中所述虚拟文件管理器取得所述选定标题的所述相应目录项,并将所述相应目录项转发给所述用户过程以回复所述标题请求。 The virtual file system of claim 1, wherein: each sub-block read request includes a destination node identifier, the disk drive identifier and the logical address; and wherein said virtual file manager to obtain the directory entry corresponding to the selected title, and the directory entry corresponding to a forwarding procedure to respond to the user request for the title.
  5. 5.权利要求4中的所述虚拟文件系统,进一步包含一个传输过程,其执行于一个存储处理器节点,其接收子块读取请求,其通过使用所述逻辑地址以从由所述盘驱动器标识符所标识的本地盘驱动器中定位一个被请求的子块,来请求所述子块,以及将所取得的子块转发到由所述目的节点标识符所标识的存储处理器节点。 The virtual file system 4 by using the logical address of the disk drive from claim 1, further comprising a transmission process, which is executed on a storage processor node, which receives a read request sub-block, local disk drive identified by the identifier to locate a requested sub-block, the sub-block request, and forwards the acquired sub-block to store the destination processor node by the node identified by the identifier.
  6. 6.权利要求1中的所述虚拟文件系统,其中每个标题分成多个数据块,每个所述数据块包括多个子块,所述多个子块联合地包含每个数据块的冗余数据,以及其中所述用户过程可操作以从包含任一数据块的所有子块中少一个子块的多个子块中重建所述数据块。 The virtual file system of claim 1, wherein each title is divided into a plurality of data blocks, each data block includes a plurality of sub-blocks, the plurality of subblocks contain redundant data combined for each data block and wherein said user process is operable from a plurality of sub-blocks comprises all sub-blocks of data in any one of at least one sub-block of the reconstructed blocks in the data block.
  7. 7.权利要求6中的所述虚拟文件系统,其中所述盘驱动器阵列分成多个冗余阵列组,其中每个冗余阵列组包括分布于多个存储处理器节点的多个盘驱动器,以及其中每个数据块的所述多个子块分布在一个相应冗余阵列组的多个盘驱动器上。 The virtual file system of claim 6, wherein said disk drive array into a plurality of redundant array groups, wherein each redundant array group comprises a plurality of disk drives distributed to a plurality of storage processor nodes, and wherein the plurality of sub-blocks for each block distributed on a corresponding plurality of disk drives in the redundant array group.
  8. 8.权利要求7中的所述虚拟文件系统,其中所述用户过程用于在以下任一情况下重建任何被存储的标题:任一盘驱动器故障;所述多个冗余阵列组的每一个的任一盘驱动器故障;和所述多个存储处理器节点的任意一个故障。 The virtual file system of claim 7, wherein the user process is used in any of the following reconstruction of any titles are stored: the failure of any one disk drive; each of said plurality of redundant array groups the failure of any one disk drive; and any of said plurality of storage processor nodes of a failure.
  9. 9.权利要求8中的所述虚拟文件系统,其中所述用户过程用于从所述数据块剩下的子块中重建该数据块丢失的一个子块,以及用于返回所述重建的丢失子块到一个应该获得所述丢失子块的存储处理器节点。 9. The loss of the virtual file system as claimed in claim 8, wherein said user process a sub-block for reconstructing the missing data block from the data block in the remaining sub-blocks, and means for returning the rebuilt sub-block to a storage processor node should obtain the missing sub-block.
  10. 10.权利要求9中的所述虚拟文件系统,其中在所述应该获得所述丢失子块的存储处理器节点故障时被一个替代存储器处理节点替代,所述替代存储处理器节点通过存储接收的子块而再存储丢失和新的标题数据,其中所述接收的子块包括返回子块和重建子块。 10. The virtual file system of claim 9, wherein in the alternative should be to obtain a processing node instead of the memory is lost when the storage processor node failed sub-blocks, the replacement storage processor node stores the received subblocks loss and re-storing the new header data, wherein said receiving sub-block comprises a return and reconstruction sub-block sub-blocks.
  11. 11.权利要求9中的所述虚拟文件系统,进一步包含缓冲存储器,其连接至所述应该获得所述丢失子块的所述存储处理器节点,并且暂时存储包括被返回子块和重建子块的被接收的子块,以用于传输至故障盘驱动器的替代盘驱动器。 11. The virtual file system of claim 9, further comprising a buffer memory, connected to the said storage processor node should obtain the missing sub-blocks, and temporarily storing a sub-block is returned and reconstructed subblock sub-block is received for transmission to the replacement disk drive failure of the disk drive.
  12. 12.权利要求1中的所述虚拟文件系统,其中每个子块存储于所述逻辑地址所标识的盘驱动器的一个块上,其中所述逻辑地址包含逻辑块地址。 12. The virtual file system of claim 1, wherein each sub-block of a block stored on the disk drive identified by logical address, wherein the logical address comprises a logical block address.
  13. 13.权利要求1中的所述虚拟文件系统,其中所述虚拟文件管理器管理标题存储,其中每个标题分成多个数据块,每个数据块包含多个子块,所述多个子块包含每个数据块的冗余数据。 The virtual file system of 1 wherein said virtual file manager 13 stores the title management claim, wherein each title is divided into a plurality of data blocks, each data block comprising a plurality of sub-blocks, each comprising a plurality of sub-blocks a redundant data block.
  14. 14.权利要求1中的所述虚拟文件系统,其中所述至少一个管理节点包含一个镜像管理节点,其运行镜像所述虚拟文件管理器操作的一个镜像虚拟文件管理器。 14. The virtual file system of claim 1, wherein said at least one management node comprises a mirrored management node which runs a virtual mirror image of the virtual file manager file manager operations.
  15. 15.权利要求1中的所述虚拟文件系统,其中所述虚拟文件管理器维持一个预先分配的目录项的池,每个都包括可用子块位置项的一个列表。 15. The virtual file system of claim 1, wherein said virtual file manager maintains a pool of pre-allocated directory entries, each comprising a list of all available sub-block position of the item.
  16. 16.权利要求15中的所述虚拟文件系统,其中所述预先分配的目录项的池的数目是基于性能和站点使用概况。 16. The virtual file system of claim 15, wherein the number of pre-allocated directory entry pool is based on performance and usage profile site.
  17. 17.—种虚拟文件系统,包括:多个存储处理器节点,其中至少包括一个管理节点,每个所述存储处理器节点包括一端口接口和一盘驱动接口;底板交换机,其包括多个端口,每个所述端口连接至所述多个存储处理器节点的相应端口接口,所述底板交换机使得所述多个存储处理器节点的每个节点之间能够进行通盘驱动器阵列,其连接和分布于所述多个存储处理器节点的所述盘驱动接口,所述盘驱动器阵列存储多个标题,每个标题被分成分布于所述盘驱动器阵列的多个子块,其中每个子块被存储在所述盘驱动器阵列的一个盘驱动器中;所述至少一个管理节点运行一个虚拟文件管理器,其管理所述多个标题的每个子块的存储和访问,以及维护包括每个标题的目录项的多个目录项,每个所述目录项包括一个子块位置项列表,其中每个子块位置项包括一个存 17.- species virtual file system, comprising: a plurality of storage processor nodes, which comprise at least one management node, each node comprising a processor, a memory port interface and a drive interface; switch plate, comprising a plurality of ports each said port interface connected to a corresponding port of the plurality of storage processor nodes, enables the overall floor switch driver array between each node of said plurality of storage processor nodes, their connections and distribution said plurality of disk storage drive interface processor node, said disk drive array storage plurality of titles, each title is divided into a plurality of subblocks located in the disk drive array, wherein each sub-block is stored in the a disk drive of said disk drive array; at least one management node runs a virtual file manager which manages storage and access of the plurality of titles each sub-block, and comprises maintaining a directory entry for each title a plurality of directory entries, each directory entry comprises a location sub-block list of items, wherein each item includes a sub-block stored position 处理器节点标识符、一个盘驱动器标识符、以及用于定位和访问存储于所述盘驱动器阵列的每个标题的每个子块的逻辑地址;其中所述虚拟文件管理器管理标题存储,其中每个标题分成多个数据块,每个数据块包含多个子块,所述多个子块包含每个数据块的冗余数据,以及其中所述盘驱动器阵列分成多个冗余阵列组,其中每个冗余阵列组包含分布于多个存储处理器节点的多个盘驱动器,以及其中每个数据块的所述多个子块分布于一个相应冗余阵列组的多个盘驱动器上。 Processor node identifier, a disk drive identifier, and a logical address of each title for locating and accessing stored in the disk drive array for each sub-block; wherein said virtual file manager header storage management, where each one title into a plurality of data blocks, each data block comprising a plurality of sub-blocks, sub-blocks of the plurality of redundant data for each data block, and wherein said disk drive array into a plurality of redundant array groups, wherein each redundant array group comprises a plurality of disk drives distributed in a plurality of storage processor nodes, and wherein each of said plurality of data blocks distributed over sub-blocks a respective plurality of disk drives in the redundant array group.
  18. 18.权利要求17中的所述虚拟文件系统,进一步包含:一个具有多个丢失子块的替代盘驱动器,其连接至第一存储处理器节点;所述虚拟文件管理器准备一个盘修复目录项,其中列出每个丢失子块及其构成一数据块的相应奇偶子块,并且转发所述盘修复目录项至所述第一存储处理器节点;以及修复过程,执行于所述第一存储处理器节点,该修复过程针对所述盘修复目录项中列出的每个奇偶子块提交一个子块读取请求,其对应于每个丢失子块,该修复过程使用所接收的相应奇偶子块来重建每个丢失子块,以及存储重建的子块至所述替代盘驱动ο 18. The virtual file system of claim 17, further comprising: a replacement disk drive having a plurality of sub-blocks is lost, which is connected to the first storage processor node; said virtual file manager prepare a disk repair directory entry listing each of the missing block and its corresponding parity sub-blocks a sub-block of data, and forwarding said disk repair directory entry of the first storage processor to the node; and a repair process is performed in said first memory processor nodes, the repair process for the sub-blocks each parity disk repair directory entry listed in a sub-block read request submitted that corresponding to each sub-block is lost, a respective subset of the parity-repair procedure using the received block sub-blocks to reconstruct each sub-block is missing, and storing the reconstructed to the replacement disk drive ο
  19. 19.权利要求18中的所述虚拟文件系统,进一步包含:空闲存储处理器节点;局部故障盘驱动器,由所述替代盘驱动器替代,连接至所述空闲存储处理器节点;所述虚拟文件管理器在发送所述盘修复目录项至所述第一存储处理器节点之前首先发送到所述空闲存储处理器节点;和文件挽救过程,执行于所述空闲存储处理器节点,其使用检验和与定位器检测存储在所述局部故障盘驱动器的所述丢失子块的有效性,以及把从所述局部故障盘驱动器读取的有效子块转发至所述第一存储处理器节点,以存储在所述替代盘驱动器上。 Partially failed disk drive, is replaced by the replacement disk drive, connected to said spare storage processor node;; said virtual file manager spare storage processor node: 19. The virtual file system of claim 18, further comprising first, before sending the transmission in the first storage disk repair directory entry to the node processor to the spare storage processor node; and file saving process is executed to said spare storage processor node, using a checksum and locator detects a fault stored in the partial loss of the effectiveness of the disk drive sub-blocks, sub-blocks and the valid forwarding read from the partially failed disk drive to the first storage processor node to be stored in on the replacement disk drive.
  20. 20.权利要求19中的所述虚拟文件系统,其中所述修复过程在相应丢失子块已经被重建且存储在所述替代盘驱动器时,丢弃从局部故障盘驱动器读取的被接收的有效子块。 20. The virtual file system of claim 19, wherein the repair process in a corresponding loss of active sub-sub-blocks have been reconstructed and stored in said replacement disk drive, read from the discard partially failed disk drive is received Piece.
  21. 21.权利要求17中的所述虚拟文件系统,其中所述盘驱动器阵列包含预块在所述多个冗余阵列组上。 The virtual file system of claim 21. 17, wherein said disk drive array comprises a pre-block in the plurality of redundant array groups.
  22. 22.权利要求21中的所述虚拟文件系统,响应所述盘驱动器的预定数量的变化,其中所述虚拟文件管理器执行再分割过程,以再分配所述多个数据块以维持数据的平均分配。 22. The virtual file system of claim 21, in response to a predetermined change in the number of disk drives, wherein said virtual file manager performs a re-segmentation process, a plurality of data blocks of said reallocated to maintain an average data distribution.
  23. 23.权利要求22中的所述虚拟文件系统,其中所述再分割过程作为后台任务运行。 23. The virtual file system of claim 22, wherein the re-segmentation process runs as a background task.
  24. 24.权利要求22中的所述虚拟文件系统,其中检测到所述盘驱动器阵列的所述盘驱动器预定数量增加时,所述虚拟文件管理器执行所述再分割过程,来在所述盘驱动器阵列的新的盘驱动器中再分配所述多个数据块,以维持数据的平均分配。 The virtual file system of claim 24. 22, wherein the predetermined number is detected to increase the disk drive of said disk drive array, the virtual file manager performs the re-segmentation process to said disk drive the new disk drive array redistribution of the plurality of data blocks, to maintain the average allocation data.
  25. 25.权利要求22中的所述虚拟文件系统,其中所述虚拟文件管理器检测到去除所述盘驱动器阵列的指定盘驱动器的请求,执行所述再分割过程以再分配所述多个数据块,以在剩余的盘驱动器中来维持数据平均分配,并且解除分配所述指定的盘驱动器。 The virtual file system 22 performs the process of re-divided data blocks to a plurality of said redistribution as claimed in claim 25, wherein said virtual file manager detects a request to remove the specified disk drive of said disk drive array, to the rest of the disk drive to maintain an average distribution data, and deallocation of the specified disk drive.
CN 200480039804 2001-11-28 2004-12-02 Virtual file system CN1902620B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US52639003 true 2003-12-02 2003-12-02
US60/526,390 2003-12-02
US10999286 US7644136B2 (en) 2001-11-28 2004-11-30 Virtual file system
US10/999,286 2004-11-30
PCT/US2004/040367 WO2005057343A3 (en) 2003-12-02 2004-12-02 Virtual file system

Publications (2)

Publication Number Publication Date
CN1902620A true CN1902620A (en) 2007-01-24
CN1902620B true CN1902620B (en) 2011-04-13



Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200480039804 CN1902620B (en) 2001-11-28 2004-12-02 Virtual file system

Country Status (1)

Country Link
CN (1) CN1902620B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100571159C (en) 2007-09-21 2009-12-16 华为技术有限公司 Method and apparatus for managing multi-work space contents
CN101399840B (en) 2007-09-26 2013-10-23 新奥特硅谷视频技术有限责任公司 Method and system for implementing image storage by virtual file system technique

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134596A (en) 1997-09-18 2000-10-17 Microsoft Corporation Continuous media file server system and method for scheduling network resources to play multiple files having different data transmission rates
US6374336B1 (en) 1997-12-24 2002-04-16 Avid Technology, Inc. Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134596A (en) 1997-09-18 2000-10-17 Microsoft Corporation Continuous media file server system and method for scheduling network resources to play multiple files having different data transmission rates
US6374336B1 (en) 1997-12-24 2002-04-16 Avid Technology, Inc. Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner

Also Published As

Publication number Publication date Type
CN1902620A (en) 2007-01-24 application

Similar Documents

Publication Publication Date Title
Bolosky et al. The tiger video fileserver
US7013379B1 (en) I/O primitives
US5991804A (en) Continuous media file server for cold restriping following capacity change by repositioning data blocks in the multiple data servers
US5815146A (en) Video on demand system with multiple data sources configured to provide VCR-like services
Bommaiah et al. Design and implementation of a caching system for streaming media over the Internet
Philip et al. Design and analysis of a grouped sweeping scheme for multimedia storage management
US7280536B2 (en) Fast path for performing data operations
US7043610B2 (en) System and method for maintaining cache coherency without external controller intervention
US6389432B1 (en) Intelligent virtual volume access
US6959373B2 (en) Dynamic and variable length extents
US6101547A (en) Inexpensive, scalable and open-architecture media server
US6212657B1 (en) System and process for delivering digital data on demand
US20030140209A1 (en) Fast path caching
US20070079088A1 (en) Information processing system, control method for information processing system, and storage system
US20050044114A1 (en) System and method for dynamically performing storage operations in a computer network
US20050172092A1 (en) Method and system for storing data
US5987501A (en) Multimedia system having server for retrieving media data as indicated in the list provided by a client computer
US20050198062A1 (en) Method and apparatus for accelerating data access operations in a database system
US20060212668A1 (en) Remote copy method and storage system
US6535967B1 (en) Method and apparatus for transferring data between a primary storage system and a secondary storage system using a bridge volume
US5892915A (en) System having client sending edit commands to server during transmission of continuous media from one clip in play list for editing the play list
US20060167838A1 (en) File-based hybrid file storage scheme supporting multiple file switches
US7246369B1 (en) Broadband video distribution system using segments
US7567991B2 (en) Replication of snapshot using a file system copy differential
US5974503A (en) Storage and access of continuous media files indexed as lists of raid stripe sets associated with file names

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1100191

Country of ref document: HK

C14 Granted
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1100191

Country of ref document: HK