WO2018209694A1 - 一种分布式计算系统及其数据处理方法 - Google Patents

一种分布式计算系统及其数据处理方法 Download PDF

Info

Publication number
WO2018209694A1
WO2018209694A1 PCT/CN2017/085109 CN2017085109W WO2018209694A1 WO 2018209694 A1 WO2018209694 A1 WO 2018209694A1 CN 2017085109 W CN2017085109 W CN 2017085109W WO 2018209694 A1 WO2018209694 A1 WO 2018209694A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
data
ssd
platform module
computing system
Prior art date
Application number
PCT/CN2017/085109
Other languages
English (en)
French (fr)
Inventor
陆克中
毛一帆
毛睿
廖好
朱金彬
隋秀峰
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2017/085109 priority Critical patent/WO2018209694A1/zh
Publication of WO2018209694A1 publication Critical patent/WO2018209694A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to a distributed computing system and a data processing method thereof.
  • Spark is a big data computing framework that is currently efficient and widely used in the industry. It is a general-purpose, fast and large-scale data processing engine.
  • Spark provides a unified solution for complex tasks such as interactive queries, real-time stream processing, machine learning, and more.
  • Spark uses elastic distributed data sets (Resilient).
  • Distributed Dataset (RDD) divides phases and tasks through efficient directed acyclic graphs (Directed Acyclic Graphs).
  • Acronym DAG performs engine optimization subtask execution order and greatly improves data processing efficiency through memory-based computing.
  • Spark data management relies on multiple data sources such as HDFS and Hive, and Spark in cluster mode implements horizontal expansion. , support the processing of large-scale data.
  • RDD is the most important concept that Spark distinguishes from other big data computing frameworks. It is a read-only distributed data set with a highly fault-tolerant mechanism. In the Spark application, each RDD is divided into multiple partitions, and Spark performs various operations on the RDD in units of partitions. Persist RSD partition data to memory or hard disk to achieve the cache of intermediate results of the calculation task, for subsequent iterative tasks to directly read the intermediate results, avoiding double calculations, greatly improving data processing efficiency. In addition, persisting data to the hard disk breaks the limitation of the size of the data set due to insufficient memory capacity, making Spark handle big data with ease.
  • the persistence semantics provided by Spark currently have poor flexibility, and the processed data cannot be identifiably stored in different storage units according to the characteristics of the Spark application data.
  • the present invention aims to solve the technical problem that the processed data cannot be stored in different storage units identifiably according to the characteristics of the Spark application data in the prior art, and provides a distributed computing system and a data processing method thereof.
  • An embodiment of the present invention provides a distributed computing system, including a Spark platform module and a hybrid storage module, where the hybrid storage module includes an SSD unit and an HDD unit, and the Spark platform module is respectively connected to the SSD unit and the HDD unit;
  • the Spark platform module uses the big data processing framework Spark as a calculation engine, and sends the processed data to the SSD unit or the HDD unit for storage.
  • the Spark platform module is further configured to receive a query instruction, and The SSD unit or the HDD unit acquires data corresponding to the query instruction and outputs the data.
  • the present invention also provides a data processing method of a distributed computing system according to an embodiment, comprising the following steps:
  • the Spark platform module uses the big data processing framework Spark as a calculation engine, and sends the processed data to the SSD unit or the HDD unit for storage;
  • the Spark platform module receives the query instruction, and obtains data corresponding to the query instruction from the SSD unit or the HDD unit, and outputs the data.
  • the technical solution of the present invention has the beneficial effects that: the Spark platform module is respectively connected to the SSD unit and the HDD unit, so that the processed data is sent to the SSD unit or the HDD unit. For storage, accurate mapping and saving of data can be achieved.
  • FIG. 1 is a block diagram showing an embodiment of a distributed computing system of the present invention.
  • FIG. 2 is a flow chart of an embodiment of a data processing method of a distributed computing system of the present invention.
  • SSD Solid state drive
  • HDD Hard Disk Drive
  • heterogeneous data centers based on SSD and HDD hybrid storage have been widely studied and applied.
  • the distributed computing system of the embodiment of the present invention includes a Spark platform module 1 and a hybrid storage module 2, and the hybrid storage module 2 includes an SSD unit 21 and an HDD unit 22, and the Spark platform module 1 Connected to the SSD unit 21 and the HDD unit 22, respectively;
  • the Spark platform module 1 uses the big data processing framework Spark as a calculation engine, and sends the processed data to the SSD unit 21 or the HDD unit 22 for storage.
  • the Spark platform module 1 is further configured to receive a query instruction. And the data corresponding to the query command is taken from the SSD unit 21 or the HDD unit 22 and output.
  • the Spark platform module is respectively connected to the SSD unit and the HDD unit, so that the processed data is sent to the SSD unit or the HDD unit for storage, so that accurate mapping and storage of data can be realized.
  • the Spark platform module 1 includes a first API (Application Programming Interface) corresponding to the SSD unit 21 and a second API corresponding to the HDD unit, and the Spark platform module 1 passes The first API is connected to the SSD unit 21, and the Spark platform module 1 is connected to the HDD unit 22 through a second API for data transmission.
  • the Spark platform module 1 can display the structural features of the hybrid storage system to the user through the first API and the second API.
  • the selection of the storage medium is implemented by calling the first API or the second API interface, that is, selecting to perform storage in the SSD unit 21 or the HDD unit 22 by calling the first API or the second API interface.
  • the SSD unit 21 and the HDD unit 22 are in the same layer persistent storage unit.
  • the data obtained by the processing specifically includes RDD partition data.
  • the Spark platform module is further configured to persist RDD partition data to the SSD unit or the HDD unit according to a preset partition ratio value.
  • the Spark platform module 1 is further configured to persist RDD partition data into the SSD unit or the HDD unit according to the heat of the RDD partition data.
  • the I/O bandwidth of the SSD and the reduced access latency can be effectively improved.
  • HDDs still provide a lot of storage efficiency for data that requires less storage performance.
  • a large amount of data is collected and captured by the data center, which is not often accessed, called cold data, accounting for about 90% of global data.
  • the remaining 10% of the data is collected and captured, and is frequently accessed, called hot data.
  • the distributed computing system further includes a capacity monitoring module that is connected to the hybrid storage module, where the capacity monitoring module is configured to monitor a remaining capacity of the hybrid storage module, and the remaining capacity is less than a preset.
  • the alarm signal is output at the threshold.
  • the distributed computing system may further include a capacity monitoring module connected to the hybrid storage module 2, the capacity monitoring module is configured to monitor the remaining capacity of the hybrid storage module 2, and output alarm information when the remaining capacity is less than a preset threshold.
  • the specific value of the preset threshold may be determined according to the capacity of the hybrid storage module 2, and the output alarm information may be controlling the sound of the speaker or controlling the flashing of the alarm light.
  • the present invention also provides a data processing method of a distributed computing system according to an embodiment. As shown in FIG. 2, the data processing method includes the following steps:
  • Step S21 the Spark platform module uses the big data processing framework Spark as a calculation engine, and sends the processed data to the SSD unit or the HDD unit for storage;
  • Step S22 The Spark platform module receives the query instruction, and obtains data corresponding to the query instruction from the SSD unit or the HDD unit, and outputs the data.
  • the Spark platform module is respectively connected to the SSD unit and the HDD unit, so that the processed data is sent to the SSD unit or the HDD unit for storage, so that accurate mapping and storage of data can be realized.
  • the data processing method further includes the following steps: monitoring, by the capacity monitoring module, the remaining capacity of the hybrid storage module, and outputting the alarm information when the remaining capacity is less than a preset threshold.
  • the specific value of the preset threshold may be determined according to the capacity of the hybrid storage module 2, and the output alarm information may be controlling the sound of the speaker or controlling the flashing of the alarm light.
  • an alarm is issued to remind the staff to transfer the storage data or replace the storage hard disk in time to improve the reliability of data storage.
  • the Spark platform module 1 includes a first API (Application Programming Interface) corresponding to the SSD unit 21 and a second API corresponding to the HDD unit, and the Spark platform module 1 passes The first API is connected to the SSD unit 21, and the Spark platform module 1 is connected to the HDD unit 22 through a second API for data transmission.
  • the Spark platform module 1 can display the structural features of the hybrid storage system to the user through the first API and the second API.
  • the selection of the storage medium is implemented by calling the first API or the second API interface, that is, selecting to perform storage in the SSD unit 21 or the HDD unit 22 by calling the first API or the second API interface.
  • the SSD unit 21 and the HDD unit 22 are in the same layer persistent storage unit.
  • the data obtained by the processing specifically includes RDD partition data.
  • the Spark platform module is further configured to persist RDD partition data to the SSD unit or the HDD unit according to a preset partition ratio value.
  • the Spark platform module 1 is further configured to persist RDD partition data into the SSD unit or the HDD unit according to the heat of the RDD partition data.
  • the I/O bandwidth of the SSD and the reduced access latency can be effectively improved.
  • HDDs still provide a lot of storage efficiency for data that requires less storage performance.
  • a large amount of data is collected and captured by the data center, which is not often accessed, called cold data, accounting for about 90% of global data.
  • the remaining 10% of the data is collected and captured, and is frequently accessed, called hot data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种分布式计算系统,包括Spark平台模块和混合存储模块,所述混合存储模块包括SSD单元和HDD单元,所述Spark平台模块分别与所述SSD单元和HDD单元连接;所述Spark平台模块利用大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,所述Spark平台模块还用于接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。

Description

一种分布式计算系统及其数据处理方法 技术领域
本发明涉及数据处理技术领域,尤其涉及一种分布式计算系统及其数据处理方法。
背景技术
在现有的大数据时代,面对海量数据,如何在有效的时间内管理、分析并提取有价值的信息,成为人们亟需解决的问题。然而,无论是规模、种类还是结构,大数据对人们驾驭数据的能力提出了巨大挑战。
Spark是目前高效且在产业界被广泛使用的大数据计算框架,是通用、快速的大规模数据处理引擎。首先,Spark提供了统一的解决方案,可以用于交互式查询、实时流处理、机器学习等复杂任务;其次,Spark通过弹性分布式数据集(Resilient Distributed Dataset,简称RDD)划分阶段和任务,通过高效的有向无环图(Directed Acyclic Graph, 简称DAG)执行引擎优化子任务执行顺序,并通过基于内存的计算大幅提升数据处理效率;第三,Spark数据管理依赖于HDFS、Hive等多种数据源,并且集群模式下的Spark实现了横向扩展,支持大规模数据的处理。RDD是Spark区别于其他大数据计算框架最重要的概念,它是一种具有高度容错机制的、只读的分布式数据集。Spark应用程序中,每一个RDD会被分成多个分区,且Spark以分区为单位对RDD进行各种操作。持久化(Persist)RDD分区数据到内存或硬盘实现了对计算任务中间结果的缓存,以供后续迭代任务直接读取中间结果,避免了重复计算,大幅提升了数据处理效率。另外,持久化数据到硬盘,打破了内存容量不足对数据集规模的限制,使得Spark处理大数据游刃有余。
但是目前的Spark所提供的持久化语义灵活性较差,无法根据Spark应用数据的特征将处理得到的数据可识别地在不同的存储单元中进行存储。
技术问题
本发明旨在解决现有技术中无法根据Spark应用数据的特征将处理得到的数据可识别地在不同的存储单元中进行存储的技术问题,提供一种分布式计算系统及其数据处理方法。
技术解决方案
本发明的实施例提供一种分布式计算系统,包括Spark平台模块和混合存储模块,所述混合存储模块包括SSD单元和HDD单元,所述Spark平台模块分别与所述SSD单元和HDD单元连接;
所述Spark平台模块利用大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,所述Spark平台模块还用于接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。
本发明还提供一种实施例的分布式计算系统的数据处理方法,包括以下步骤:
所述Spark平台模块通过大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储;
所述Spark平台模块接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。
有益效果
本发明的技术方案与现有技术相比,有益效果在于:通过所述Spark平台模块分别与所述SSD单元和HDD单元连接,以使处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,可以实现数据的精确映射和保存。
附图说明
图1是本发明分布式计算系统一种实施例的结构示意图。
图2是本发明分布式计算系统的数据处理方法一种实施例的流程图。
本发明的实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
具体的,固态硬盘(Solid-State Drive,简称SSD)的出现为提升存储系统性能带来了新的机遇,SSD具有低功耗、低延迟、体积小等优点。与传统企业级硬盘(Hard Disk Drive,简称HDD)通过移动机械臂来寻址方式不同,SSD完全构建于半导体芯片上,因此具有随机访问性能。然而,由于SSD容量成本过高、寿命有限等不足,完全使用SSD替换HDD会大幅提升产业成本。为了合理利用SSD的高性能和HDD的低廉价格等优势,基于SSD和HDD混合存储的异构数据中心得到人们普遍研究和应用。
本发明一个实施例的分布式计算系统,如图1所示,包括Spark平台模块1和混合存储模块2,所述混合存储模块2包括SSD单元21和与HDD单元22,所述Spark平台模块1分别与所述SSD单元21和HDD单元22连接;
所述Spark平台模块1利用大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元21或者所述HDD单元22进行存储,所述Spark平台模块1还用于接收查询指令,并从所述SSD单元21或者所述HDD单元22取与查询指令对应的数据后输出。
通过所述Spark平台模块分别与所述SSD单元和HDD单元连接,以使处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,可以实现数据的精确映射和保存。
在具体实施中,所述Spark平台模块1包括与所述SSD单元21对应的第一API(ApplicationProgrammingInterface,应用程序编程接口)和与所述HDD单元对应的第二API,所述Spark平台模块1通过第一API与所述SSD单元21连接,所述Spark平台模块1通过第二API与所述HDD单元22连接,以进行数据传输。所述Spark平台模块1通过第一API和第二API,可以将混合存储系统的结构特征展示给用户。而存储介质的选择是通过调用第一API或第二API接口来实现,即选择在所述SSD单元21或是所述HDD单元22中进行存储通过调用第一API或第二API接口来实现。
在具体实施中,所述SSD单元21作和所述HDD单元22为同层持久化存储单元。所述处理得到的数据具体包括RDD分区数据。所述Spark平台模块还用于根据预设的分区比例值将RDD分区数据持久化到所述SSD单元或所述HDD单元中。
在具体实施中,所述Spark平台模块1还用于根据RDD分区数据的热度将RDD分区数据持久化到所述SSD单元或所述HDD单元中。由于SSD的I/O带宽和降低访问延迟可以被有效地提升。而HDD仍然能为那些对存储性能要求较低的数据提供大量的存储效率。另外大量的数据被数据中心收集并捕获后,并不经常被访问,称之为冷数据,约占全球数据的90%。而剩余的10%的数据被收集并捕获后,会经常性的被访问,称之为热数据。显然,将全部的数据都存储在高性能、低延迟的存储设备是不合理的,成本是极为昂贵的。因此,根据RDD分区数据的热度,实现对SSD单元21和HDD单元22以合理的方式进行组合,通过构建混合存储系统可以带来性能的大幅提升,同时保障成本可控。
在具体实施中,所述分布式计算系统还包括连接所述混合存储模块的容量监控模块,所述容量监控模块用于对所述混合存储模块的剩余容量进行监控,并在剩余容量小于预设阈值时输出报警信号。也就是说,分布式计算系统还可包括连接混合存储模块2的容量监控模块,容量监控模块用于对混合存储模块2的剩余容量进行监控,并在剩余容量小于预设阈值时输出报警信息。预设阈值的具体取值可根据混合存储模块2的容量大小决定,输出报警信息可以是控制扬声器发声或控制报警灯闪烁等。在混合存储模块2的剩余容量过低时进行报警,提醒工作人员及时对存储数据进行转移或更换存储硬盘等,以提高数据存储可靠性。
本发明还提供一种实施例的分布式计算系统的数据处理方法,如图2所示,所述数据处理方法包括以下步骤:
步骤S21,所述Spark平台模块通过大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储;
步骤S22,所述Spark平台模块接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。
通过所述Spark平台模块分别与所述SSD单元和HDD单元连接,以使处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,可以实现数据的精确映射和保存。
在具体实施中,所述数据处理方法还包括以下步骤通过容量监控模块对所述混合存储模块的剩余容量进行监控,并在剩余容量小于预设阈值时输出报警信息。预设阈值的具体取值可根据混合存储模块2的容量大小决定,输出报警信息可以是控制扬声器发声或控制报警灯闪烁等。在混合存储模块2的剩余容量过低时进行报警,提醒工作人员及时对存储数据进行转移或更换存储硬盘等,以提高数据存储可靠性。
在具体实施中,所述Spark平台模块1包括与所述SSD单元21对应的第一API(ApplicationProgrammingInterface,应用程序编程接口)和与所述HDD单元对应的第二API,所述Spark平台模块1通过第一API与所述SSD单元21连接,所述Spark平台模块1通过第二API与所述HDD单元22连接,以进行数据传输。所述Spark平台模块1通过第一API和第二API,可以将混合存储系统的结构特征展示给用户。而存储介质的选择是通过调用第一API或第二API接口来实现,即选择在所述SSD单元21或是所述HDD单元22中进行存储通过调用第一API或第二API接口来实现。
在具体实施中,所述SSD单元21作和所述HDD单元22为同层持久化存储单元。所述处理得到的数据具体包括RDD分区数据。所述Spark平台模块还用于根据预设的分区比例值将RDD分区数据持久化到所述SSD单元或所述HDD单元中。
在具体实施中,所述Spark平台模块1还用于根据RDD分区数据的热度将RDD分区数据持久化到所述SSD单元或所述HDD单元中。由于SSD的I/O带宽和降低访问延迟可以被有效地提升。而HDD仍然能为那些对存储性能要求较低的数据提供大量的存储效率。另外大量的数据被数据中心收集并捕获后,并不经常被访问,称之为冷数据,约占全球数据的90%。而剩余的10%的数据被收集并捕获后,会经常性的被访问,称之为热数据。显然,将全部的数据都存储在高性能、低延迟的存储设备是不合理的,成本是极为昂贵的。因此,根据RDD分区数据的热度,实现对SSD单元21和HDD单元22以合理的方式进行组合,通过构建混合存储系统可以带来性能的大幅提升,同时保障成本可控。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、 “示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (10)

  1. 一种分布式计算系统,其特征在于:包括Spark平台模块和混合存储模块,所述混合存储模块包括SSD单元和HDD单元,所述Spark平台模块分别与所述SSD单元和HDD单元连接;
    所述Spark平台模块利用大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储,所述Spark平台模块还用于接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。
  2. 如权利要求1所述的分布式计算系统,其特征在于:所述Spark平台模块包括与所述SSD单元对应的第一API和与所述HDD单元对应的第二API,所述Spark平台模块通过第一API与所述SSD单元连接,所述Spark平台模块通过第二API与所述HDD单元连接。
  3. 如权利要求1所述的分布式计算系统,其特征在于:所述SSD单元作和所述HDD单元为同层持久化存储单元。
  4. 如权利要求1所述的分布式计算系统,其特征在于:所述处理得到的数据具体包括RDD分区数据。
  5. 如权利要求1所述的分布式计算系统,其特征在于:所述Spark平台模块还用于根据RDD分区数据的热度将RDD分区数据持久化到所述SSD单元或所述HDD单元中。
  6. 如权利要求1所述的分布式计算系统,其特征在于:所述Spark平台模块还用于根据预设的分区比例值将RDD分区数据持久化到所述SSD单元或所述HDD单元中。
  7. 如权利要求1所述的分布式计算系统,其特征在于:所述分布式计算系统还包括连接所述混合存储模块的容量监控模块,所述容量监控模块用于对所述混合存储模块的剩余容量进行监控,并在剩余容量小于预设阈值时输出报警信号。
  8. 一种分布式计算系统的数据处理方法,其特征在于:包括以下步骤:
    所述Spark平台模块通过大数据处理框架Spark作为计算引擎,将处理得到的数据送至所述SSD单元或者所述HDD单元进行存储;
    所述Spark平台模块接收查询指令,并从所述SSD单元或者所述HDD单元获取与查询指令对应的数据后输出。
  9. 如权利要求8所述的数据处理方法,其特征在于:所述处理得到的数据具体包括RDD分区数据。
  10. 如权利要求8或9所述的数据处理方法,其特征在于:还包括以下步骤:
    通过容量监控模块对所述混合存储模块的剩余容量进行监控,并在剩余容量小于预设阈值时输出报警信息。
PCT/CN2017/085109 2017-05-19 2017-05-19 一种分布式计算系统及其数据处理方法 WO2018209694A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085109 WO2018209694A1 (zh) 2017-05-19 2017-05-19 一种分布式计算系统及其数据处理方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085109 WO2018209694A1 (zh) 2017-05-19 2017-05-19 一种分布式计算系统及其数据处理方法

Publications (1)

Publication Number Publication Date
WO2018209694A1 true WO2018209694A1 (zh) 2018-11-22

Family

ID=64273154

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085109 WO2018209694A1 (zh) 2017-05-19 2017-05-19 一种分布式计算系统及其数据处理方法

Country Status (1)

Country Link
WO (1) WO2018209694A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991958A (zh) * 2015-07-21 2015-10-21 山东鲁能软件技术有限公司 一种电力设备监控数据的分析系统及其方法
CN105426472A (zh) * 2015-11-16 2016-03-23 广州供电局有限公司 分布式计算系统及其数据处理方法
CN106682116A (zh) * 2016-12-08 2017-05-17 重庆邮电大学 基于Spark内存计算大数据平台的OPTICS点排序聚类方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991958A (zh) * 2015-07-21 2015-10-21 山东鲁能软件技术有限公司 一种电力设备监控数据的分析系统及其方法
CN105426472A (zh) * 2015-11-16 2016-03-23 广州供电局有限公司 分布式计算系统及其数据处理方法
CN106682116A (zh) * 2016-12-08 2017-05-17 重庆邮电大学 基于Spark内存计算大数据平台的OPTICS点排序聚类方法

Similar Documents

Publication Publication Date Title
US20210133157A1 (en) Data management system, method, terminal and medium based on hybrid storage
EP1839165B1 (en) Methods and apparatus for hybrid dma queue and dma table
EP1854016B1 (en) Methods and apparatus for synchronizing data access to a local memory in a multi-processor system
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
Aingaran et al. M7: Oracle's next-generation sparc processor
US9128849B2 (en) Coherent memory scheme for heterogeneous processors
US8370533B2 (en) Executing flash storage access requests
US8037251B2 (en) Memory compression implementation using non-volatile memory in a multi-node server system with directly attached processor memory
US20150067243A1 (en) System and method for executing map-reduce tasks in a storage device
US10169087B2 (en) Technique for preserving memory affinity in a non-uniform memory access data processing system
CN103092316B (zh) 一种基于数据挖掘的服务器功耗管理系统
US20090228668A1 (en) Memory Compression Implementation in a Multi-Node Server System With Directly Attached Processor Memory
KR20160143619A (ko) 클러스터 레벨에서의 데이터 일관성 모델 및 프로토콜
WO2013155751A1 (zh) 面向并发olap的数据库查询处理方法
TW201738731A (zh) 多處理器系統及快取共用方法
CN110262754B (zh) 一种面向NVMe和RDMA的分布式存储系统及轻量级同步通信方法
TW201145172A (en) Sharing resources between a CPU and GPU
WO2020087927A1 (zh) 一种内存数据迁移的方法及装置
JP2015524595A (ja) インテリジェントファーメモリ帯域幅スケーリング
CN103595780A (zh) 基于消重的云计算资源调度方法
CN107480202B (zh) 一种用于多并行处理框架的数据处理方法及装置
US10049045B2 (en) Management of chip multiprocessor cooperative caching based on eviction rate
CN107179883B (zh) 一种基于SSD和HDD的混合存储系统的Spark架构优化方法
CN104461941B (zh) 一种内存系统架构及管理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17910199

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.03.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17910199

Country of ref document: EP

Kind code of ref document: A1