WO2014180395A1 - 海量数据融合存储方法及系统 - Google Patents
海量数据融合存储方法及系统 Download PDFInfo
- Publication number
- WO2014180395A1 WO2014180395A1 PCT/CN2014/078558 CN2014078558W WO2014180395A1 WO 2014180395 A1 WO2014180395 A1 WO 2014180395A1 CN 2014078558 W CN2014078558 W CN 2014078558W WO 2014180395 A1 WO2014180395 A1 WO 2014180395A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- storage engine
- nosql
- subsystem
- data
- write
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/256—Integrating or interfacing systems involving database management systems in federated or virtual databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Definitions
- the present invention relates to the field of communications, and in particular to a massive data fusion storage method and system.
- massive data fusion storage method and system BACKGROUND OF THE INVENTION
- SQL Structured Query Language
- NoSQL unstructured query language that handles non-relational data
- the structured storage engine mainly completes the functions of structured data definition, loading, storage, query and analysis of massive data, and provides operations similar to traditional relational databases for data query, statistics, grouping and sorting through SQL language; NoSQL storage
- the engine provides a highly reliable, high performance, column-oriented, scalable, distributed key-value (KEY-VALUE) database system, including creation management of tables, column families, indexes, creation, storage, update, deletion of data, Query and intelligent scanning for load balancing in large-scale clusters.
- the structured storage engine and NoSQL storage engine are independent of each other in data physical storage, and have different application scenarios.
- the current massive data processing system mainly has the following problems: (1) The structured storage engine and the NoSQL storage engine are independent of each other in the physical storage of data, and the same data is difficult to support both the SQL and the key value, which cannot satisfy the flexibility. Diverse business needs; (2) If the structured storage engine stores a piece of data to support SQL for business needs, the NoSQL storage engine also stores a copy of the same data support key, which is a large number for massive data systems.
- Embodiments of the present invention provide a mass data fusion storage method and system to solve at least the above problems.
- a massive data fusion storage method including: a data fusion subsystem receiving a write operation request sent by a structured storage engine or a NoSQL storage engine, where the structured storage engine supports the structure of the SQL Data read and write operations, NoSQL storage engine supports key-value NoSQL data read and write operations; data fusion subsystem writes to the distributed file subsystem based on write operations, where the distributed file subsystem is also used to receive structured A read operation request sent by the storage engine or the NoSQL storage engine, and a read operation is requested according to the read operation.
- the method includes: the data fusion subsystem establishes a mapping relationship between the structured storage engine and the NoSQL storage engine on the metadata definition, and the mapping relationship Used to maintain data consistency between the structured storage engine and the NoSQL storage engine.
- the data fusion subsystem performs a write operation on the distributed file subsystem according to the write operation request, including: the data fusion subsystem determines the structured storage engine Whether there is already a mapping relationship with the NoSQL storage engine.
- the write operation request is sent to the distributed file subsystem to perform the write data operation, and the judgment result is negative.
- establish a mapping relationship on the NoSQL storage engine Preferably, when the data fusion subsystem receives the write operation request sent by the NoSQL storage engine, the data fusion subsystem performs a write operation on the distributed file subsystem according to the write operation request, including: the data fusion subsystem determines the structured storage engine and Whether there is already a mapping relationship between the NoSQL storage engines. If the judgment result is yes and there is no simultaneous write, the write operation request is sent to the distributed file subsystem to perform the write data operation, and if the judgment result is no.
- the method further includes: the data fusion subsystem receiving the write operation result fed back by the distributed file subsystem.
- a mass data fusion storage system including: a data fusion subsystem, a structured storage engine, a NoSQL storage engine, and a distributed file subsystem, where the data fusion subsystem includes: The first receiving module is configured to receive a write operation request sent by the structured storage engine or the NoSQL storage engine, wherein the structured storage engine supports structured data read and write operations of the SQL, and the NoSQL storage engine supports NoSQL data read and write operations of key values.
- the data fusion subsystem further includes: an establishing module configured to establish a mapping relationship between the structured storage engine and the NoSQL storage engine on the metadata definition, wherein the mapping relationship is used to maintain data between the structured storage engine and the NoSQL storage engine. consistency.
- the processing module includes: a first processing unit configured to: when the data fusion subsystem receives the write operation request sent by the structured storage engine, determine whether a mapping relationship exists between the structured storage engine and the NoSQL storage engine, If the result of the determination is yes, and there is no simultaneous write, the write operation request is sent to the distributed file subsystem to perform the write data operation, and if the judgment result is no, the mapping relationship is established on the NoSQL storage engine.
- a first processing unit configured to: when the data fusion subsystem receives the write operation request sent by the structured storage engine, determine whether a mapping relationship exists between the structured storage engine and the NoSQL storage engine, If the result of the determination is yes, and there is no simultaneous write, the write operation request is sent to the distributed file subsystem to perform the write data operation, and if the judgment result is no, the mapping relationship is established on the NoSQL storage engine.
- the processing module includes: a second processing unit configured to determine, when the data fusion subsystem receives the write operation request sent by the NoSQL storage engine, whether there is already a mapping relationship between the structured storage engine and the NoSQL storage engine, If the result is yes, and there is no simultaneous write, the write operation request is sent to the distributed file subsystem to perform the write data operation, and if the judgment result is no, the mapping relationship is established on the structured storage engine.
- the data fusion subsystem further comprises: a second receiving module, configured to receive a write operation result fed back by the distributed file subsystem.
- the physical storage method of data by integrating the structured storage engine and the NoSQL storage engine is adopted, so that the same data can simultaneously support the use of SQL and key values, thereby providing a method for supporting the fusion of SQL and NoSQL massive data.
- the storage service method solves the problem that the structured storage engine and the NoSQL storage engine cannot meet the flexible and diverse business requirements and increase the operation and maintenance cost in the related technologies, thereby achieving the reduction of data redundancy and system operation and maintenance. The effect of cost.
- FIG. 1 is a flow chart of a mass data fusion storage method according to an embodiment of the present invention
- FIG. 2 is a structural block diagram of a mass data fusion storage system according to an embodiment of the present invention
- 3 is a structural block diagram of a preferred mass data fusion storage system according to an embodiment of the present invention
- FIG. 4 is a schematic structural diagram of a simultaneous support SQL and NoSQL massive data fusion storage system according to a preferred embodiment of the present invention
- FIG. FIG. 6 is a schematic diagram of a NoSQL storage engine write operation process according to a preferred embodiment of the present invention
- FIG. 7 is a structured storage engine, NoSQL storage engine according to a preferred embodiment of the present invention
- Step S102 the data fusion subsystem receives a structured storage engine or The write operation request sent by the NoSQL storage engine, wherein the structured storage engine supports the structured data read and write operations of the SQL, and the NoSQL storage engine supports the NoSQL data read and write operations of the key values;
- Step S104 the data fusion subsystem requests the pair according to the write operation
- the distributed file subsystem performs a write operation, wherein the distributed file subsystem is further configured to receive a read operation request sent by the structured storage engine or the NoSQL storage engine, and perform a read operation according to the read operation.
- the data fusion subsystem may establish a mapping relationship between the structured storage engine and the NoSQL storage engine on the metadata definition, and the mapping relationship is used between the structured storage engine and the NoSQL storage engine. Maintain data consistency.
- step S104 when the data fusion subsystem receives the write operation request sent by the structured storage engine, step S104 can be implemented in the following manner: the data fusion subsystem judgment structure Whether there is already a mapping relationship between the storage engine and the NoSQL storage engine.
- step S104 can be implemented in the following manner: the data fusion subsystem determines the structured storage engine and NoSQL. Whether the mapping relationship exists between the storage engines. If the result of the determination is yes and there is no simultaneous write, the write operation request is sent to the distributed file subsystem to perform the write data operation. If the determination result is no, Establish mappings on structured storage engines.
- the data fusion subsystem may further receive a write operation result fed back by the distributed file subsystem.
- the embodiment of the invention provides a massive data fusion storage system for implementing the above massive data fusion storage method.
- 2 is a structural block diagram of a mass data fusion storage system according to an embodiment of the present invention.
- the system includes: a data fusion subsystem, a structured storage engine, a NoSQL storage engine, and a distributed file subsystem, where
- the data fusion subsystem includes: a first receiving module 10, configured to receive a write operation request sent by a structured storage engine or a NoSQL storage engine, wherein the structured storage engine supports SQL structured data read and write operations, and the NoSQL storage engine supports keys.
- processing module 20 configured to write to the distributed file subsystem according to the write operation request; the distributed file subsystem, also for receiving read operations sent by the structured storage engine or the NoSQL storage engine Request, and perform a read operation based on the read operation.
- 3 is a structural block diagram of a preferred mass data fusion storage system according to an embodiment of the present invention. As shown in FIG. 3, in the preferred mass data fusion storage system, the data fusion subsystem may further include: an establishing module 30, configured to be established. The mapping relationship between the structured storage engine and the NoSQL storage engine on metadata definitions, which are used to maintain data consistency between the structured storage engine and the NoSQL storage engine.
- the processing module 20 may include: a first processing unit 22 configured to determine the structured storage engine and the NoSQL storage when the data fusion subsystem receives the write operation request sent by the structured storage engine Whether there is already a mapping relationship between the engines. In the case that the judgment result is yes and there is no simultaneous writing, the write operation request is sent to the distributed file subsystem to perform the write data operation, and in the case that the judgment result is no, A mapping relationship is established on the NoSQL storage engine.
- the processing module 20 may include: a second processing unit 24 configured to determine the structured storage engine and the NoSQL storage engine when the data fusion subsystem receives the write operation request sent by the NoSQL storage engine Whether there is already a mapping relationship between the two, and the result is YES and does not exist.
- the write operation request is sent to the distributed file subsystem to perform a write data operation, and in the case of a negative judgment result, a mapping relationship is established on the structured storage engine.
- the data fusion subsystem may further include: a second receiving module 40 configured to receive a write operation result fed back by the distributed file subsystem.
- the massive data fusion storage method and system provided by the foregoing embodiments can integrate the structured storage engine and the NoSQL storage engine data physical storage manner, so that the same data can support both SQL and key value usage modes, thereby providing support.
- SQL and NoSQL massive data fusion storage services in this way, can meet the needs of flexible and diverse services.
- the massive data fusion storage method and system provided by the foregoing embodiments are described and illustrated in more detail below with reference to FIG. 4 to FIG. 7 and a preferred embodiment.
- the preferred embodiment mainly provides a method for simultaneously supporting SQL and NoSQL massive data fusion storage, and processing of massive data including a structured storage engine, a NoSQL storage engine, a data fusion subsystem, and a distributed file subsystem. system.
- the method for simultaneously supporting the SQL and NoSQL massive data fusion storage mainly includes: Step 1:
- the structured storage engine is responsible for the structured data definition and loading of the data. , storage, query and analysis functions, provide data query, statistics, grouping and sorting operations similar to traditional relational databases through SQL;
- Step 2 NoSQL storage engine is responsible for table, column family, index creation management, data Create, store, update, delete, query, and intelligent scan;
- Step 3 the data fusion subsystem is responsible for the write operation between the structured storage engine and the NoSQL storage engine;
- Step 4 the distributed file subsystem is responsible for the distribution of massive data storage.
- step 1 may further comprise: a structured data write operation; a structured data read operation supporting SQL.
- step 2 may further include: a NoSQL data write operation; a NoSQL data read operation supporting the key value.
- step 3 may further include: establishing a mapping relationship between the structured storage engine and the NoSQL storage engine data definition; establishing a synchronization lock mechanism, when the structured storage engine and the NoSQL storage engine simultaneously write the same data, ensuring Data consistency.
- step 4 may further comprise: a write operation from the data fusion subsystem, and a read operation of the structured storage engine, the NoSQL storage engine.
- the step 1 and the step 3 may further include: when the structured storage engine performs the write operation, sending the write request to the data fusion subsystem, where the data fusion subsystem establishes the data definition with the NoSQL storage engine through the mapping module. consistency.
- the step 1 and the step 4 may further include: when the structured storage engine performs a read operation, sending the write request to the distributed file subsystem, and accessing the distributed file subsystem according to the consistent metadata definition established above. The same data is stored, and the distributed file subsystem provides the structured storage engine and the NoSQL storage engine to consolidate the stored data information.
- the step 2 and the step 3 may further include: when the NoSQL storage engine performs a write operation, sending the write request to the data fusion subsystem, where the data fusion subsystem establishes and defines the data of the structured storage engine through the mapping module. consistency.
- the step 2 and the step 4 may further include: when the NoSQL storage engine performs the read operation, sending the write request to the distributed file subsystem, and accessing the distributed file subsystem according to the consistent metadata definition established above. The same data, provided by the distributed file subsystem, the structured storage engine, the NoSQL storage engine converged stored data information.
- the step 3 and the step 4 may further include: the data fusion subsystem receives the structured storage engine,
- the NoSQL storage engine writes the request and writes the corresponding data to the distributed file subsystem, storing a common piece of data in the distributed file subsystem.
- the massive data fusion storage system includes: (1) a structured storage engine, which is responsible for the structured data definition, loading, storage, query and analysis functions of the data, and provides data similar to the traditional relational database through SQL. Query, statistics, grouping, and sorting operations. (2) NoSQL storage engine, responsible for the creation management of tables, column families, indexes, creation, storage, update, deletion, query and intelligent scanning of data. (3) The data fusion subsystem is responsible for write operations between the structured storage engine and the NoSQL storage engine.
- FIG. 4 is a schematic structural diagram of a synchronous storage system supporting SQL and NoSQL massive data according to a preferred embodiment of the present invention.
- the massive data fusion storage system includes: a structured storage engine 101, a NoSQL storage engine 102, and data.
- the data fusion subsystem 103 includes a mapping module, which is responsible for maintaining consistency in the definition of the structured storage engine and the NoSQL storage engine metadata.
- the structured storage engine needs to specify the metadata definition of the data in the NoSQL storage engine when defining the metadata. Accordingly, the NoSQL storage engine needs to specify the metadata of the data in the structured storage engine when defining the metadata.
- the data fusion subsystem 103 also includes a synchronization lock module, which is responsible for ensuring transaction consistency when the structured storage engine and the NoSQL storage engine simultaneously write to the same data.
- the data fusion subsystem 103 also includes a data write module that receives the write operation request of the structured storage engine, the NoSQL storage engine, and writes the corresponding data to the distributed file subsystem, and stores one in the distributed file subsystem. Shared data.
- the name or the number of each module that is supported by the SQL and NoSQL massive data fusion storage system provided by the preferred embodiment is the same as the name of each module in the massive data fusion storage system provided by the foregoing embodiment.
- the functions of the various modules are not completely consistent, and even have a functional relationship including or overlapping, but this is precisely because the preferred embodiment is only caused as a preferred embodiment of the above embodiment, The solutions provided are fully achievable and produce the same results.
- 5 is a schematic diagram of a process of a structured storage engine write operation process according to a preferred embodiment of the present invention. As shown in FIG. 5, the structured storage engine write operation process includes: Step S501: The structured storage engine 101 performs a write operation, Operation request sent to the data fusion subsystem
- step S502 the data fusion subsystem 103 receives the write request of the structured storage engine 101, and determines whether there is a mapping relationship between the structured storage engine 101 and the NoSQL storage engine 102. If there is a mapping relationship, the structured storage engine 101 and NoSQL need to be further determined. Whether the storage engine 102 has a simultaneous write, otherwise, a mapping relationship needs to be established on the NoSQL storage engine 102. In step S503a, the data fusion subsystem 103 determines that the mapping relationship does not exist, and establishes a mapping relationship on the NoSQL storage engine 102 to ensure consistency with the configuration storage engine 101 in the metadata definition.
- step S503b the data fusion subsystem 103 determines that there is a mapping relationship, and further determines whether the structured storage engine 101 or the NoSQL storage engine 102 has a simultaneous write.
- step S504 the data fusion subsystem 103 sends a write request to the distributed file subsystem 104.
- the distributed file subsystem 104 receives the request and starts writing data, and finally feeds back the write operation result to the data fusion subsystem 103.
- 6 is a schematic diagram of a NoSQL storage engine write operation process according to a preferred embodiment of the present invention. As shown in FIG. 6, the NoSQL storage engine write operation process includes: Step S601, the NoSQL storage engine 102 performs a write operation, and sends an operation request.
- the data fusion subsystem 103 receives the write request of the NoSQL storage engine 102, and determines whether there is a mapping relationship between the structured storage engine 101 and the NoSQL storage engine 102. If there is a mapping relationship, the structured storage engine 101 and NoSQL storage need to be further determined. Whether the engine 102 has a simultaneous write, otherwise a mapping relationship needs to be established on the structured storage engine 101. In step S603a, the data fusion subsystem 103 determines that the mapping relationship does not exist, establishes a mapping relationship on the structured storage engine 101, and ensures consistency with the NoSQL storage engine 102 in metadata definition.
- step S603b the data fusion subsystem 103 determines that there is a mapping relationship, and further determines whether the structured storage engine 101 or the NoSQL storage engine 102 has a simultaneous write.
- step S604 the data fusion subsystem 103 sends a write request to the distributed file subsystem 104.
- the distributed file subsystem 104 receives the request and starts writing data, and finally feeds back the write operation result to the data fusion subsystem 103.
- 7 is a schematic diagram of a structured storage engine and a NoSQL storage engine read operation flow according to a preferred embodiment of the present invention. As shown in FIG. 7, the structured storage engine and NoSQL storage engine read operation flow includes: Step S701, a structured storage engine /NoSQL storage engine read operation.
- Step S702 reading data from the distributed file subsystem. Since the write operations in FIG. 5 and FIG. 6 ensure consistency in data definition of the structured storage engine 101 and the NoSQL storage engine 102, and the distributed file subsystem 104 shares the same data, the structured storage engine 101, The NoSQL storage engine 102 can read directly to the distributed file subsystem 104 when reading operations.
- the data of the independent structured storage engine and the NoSQL storage engine can be integrated into one copy, thereby solving the problem that the same data is difficult to support both SQL and key values.
- the operation and maintenance cost is relatively high, thereby realizing a processing system that supports the massive storage of shared storage of both SQL and key value.
- each of the above modules can be implemented by hardware.
- a processor including the above modules, or each of the above modules is located in one processor.
- software is also provided for performing the technical solutions described in the above embodiments and preferred embodiments.
- a storage medium is provided, the software being stored, including but not limited to: an optical disk, a floppy disk, a hard disk, a rewritable memory, and the like.
- the embodiment of the present invention achieves the following technical effects: the existing system only supports the SQL mode, or only supports the key value mode, and cannot meet the requirements of service flexibility, and the embodiment of the present invention can
- the data of the independent structured storage engine and NoSQL storage engine are integrated into one, which can support both SQL and key value.
- the business can select the appropriate usage according to the needs, and the structured storage engine and NoSQL storage engine. Sharing data in a distributed file subsystem reduces data redundancy and reduces system operation and maintenance costs.
- the above modules or steps of the embodiments of the present invention can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in multiple computing devices.
- the computing device may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from The steps shown or described are performed sequentially, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated into a single integrated circuit module.
- the invention is not limited to any specific combination of hardware and software.
- the above is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
- the technical solution provided by the embodiments of the present invention can be applied to the field of communications, and solves the problem that the structured storage engine and the NoSQL storage engine cannot meet the flexible and diverse business requirements and increase the operation and maintenance cost in the related art, thereby achieving the problem. It not only reduces the redundancy of data, but also reduces the cost of system operation and maintenance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
公开了一种海量数据融合存储方法及系统。其中,该方法包括:数据融合子系统接收结构化存储引擎或NoSQL存储引擎发送的写操作请求,其中,结构化存储引擎支持SQL的结构化数据读写操作,NoSQL存储引擎支持键值的NoSQL数据读写操作;数据融合子系统根据写操作请求对分布式文件子系统进行写操作,其中,分布式文件子系统还用于接收结构化存储引擎或NoSQL存储引擎发送的读操作请求,并根据读操起请求执行读操作。达到了既减少数据的冗余,又降低系统运维成本的效果。
Description
海量数据融合存储方法及系统 技术领域 本发明涉及通信领域, 具体而言, 涉及一种海量数据融合存储方法及系统。 背景技术 随着互联网的快速发展, 每天会产生海量的数据。 美国互联网数据中心指出, 互 联网上的数据每年将增长 50%, 每两年便翻一番, 而目前世界上 90%以上的数据是最 近几年才产生的。 如何高效处理这些海量的数据, 是近几年的一个热点。 在现有的海量数据处理系统中, 有处理关系型数据支持类结构化查询语言 ( Structured Query Language, 简称为 SQL) 的结构化存储引擎, 有处理非关系型数据 的非结构化查询语言 (Not Only SQL, 简称为 NoSQL)存储引擎。 其中, 结构化存储 引擎主要完成海量数据的结构化数据定义、 加载、 存储、 查询及分析等功能, 提供类 似于传统关系型数据库通过 SQL语言进行数据查询、统计、分组及排序等操作; NoSQL 存储引擎提供高可靠性、 高性能、 面向列、 可伸缩的分布式键值(KEY-VALUE)数据 库系统, 包括对表、 列族、 索引的创建管理, 对数据的创建、 存储、 更新、 删除、 查 询和智能扫描, 可在大规模集群中实现负载均衡。 结构化存储引擎、 NoSQL存储引擎在数据物理存储上相互独立, 并且有各自不同 的应用场景。 对于同样的数据, 根据业务的需要, 可能既需要通过 SQL进行海量数据 查询、 统计、 分组及排序等操作, 又需要通过键值方式进行高效地查询和智能扫描, 这在现有的海量数据处理系统中是难以实现的。 可见, 现在的海量数据处理系统主要 存在以下问题: (1 ) 结构化存储引擎、 NoSQL存储引擎在数据物理存储上相互独立, 同一份数据难以同时支持 SQL和键值两种使用方式, 不能满足灵活多样的业务需要; (2) 如果为了业务需要, 结构化存储引擎存储一份数据支持 SQL, NoSQL存储引擎 也存储一份同样的数据支持键值,这对于海量数据系统来说,存在着大量的数据冗余, 同时维护两份数据也会增加运营维护上的成本。 针对相关技术中结构化存储引擎和 NoSQL存储引擎不能满足灵活多样的业务需 要以及增加运营维护成本较高的问题, 目前尚未提出有效的解决方案。
发明内容 本发明实施例提供了一种海量数据融合存储方法及系统, 以至少解决上述问题。 根据本发明实施例的一个方面, 提供了一种海量数据融合存储方法, 包括: 数据 融合子系统接收结构化存储引擎或 NoSQL存储引擎发送的写操作请求, 其中, 结构 化存储引擎支持 SQL的结构化数据读写操作, NoSQL存储引擎支持键值的 NoSQL数 据读写操作; 数据融合子系统根据写操作请求对分布式文件子系统进行写操作,其中, 分布式文件子系统还用于接收结构化存储引擎或 NoSQL存储引擎发送的读操作请求, 并根据读操起请求执行读操作。 优选地, 在数据融合子系统接收结构化存储引擎或 NoSQL存储引擎发送的写操 作请求之前, 包括: 数据融合子系统建立结构化存储引擎和 NoSQL存储引擎在元数 据定义上的映射关系, 映射关系用于使结构化存储引擎和 NoSQL存储引擎之间保持 数据一致性。 优选地, 当数据融合子系统接收到结构化存储引擎发送的写操作请求时, 数据融 合子系统根据写操作请求对分布式文件子系统进行写操作, 包括: 数据融合子系统判 断结构化存储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为是、 且不存在同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操作, 在判断结果为否的情况下, 在 NoSQL存储引擎上建立映射关系。 优选地, 当数据融合子系统接收到 NoSQL存储引擎发送的写操作请求时, 数据 融合子系统根据写操作请求对分布式文件子系统进行写操作, 包括: 数据融合子系统 判断结构化存储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为 是、 且不存在同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操 作, 在判断结果为否的情况下, 在结构化存储引擎上建立映射关系。 优选地,在数据融合子系统根据写操作请求对分布式文件子系统进行写操作之后, 还包括: 数据融合子系统接收分布式文件子系统反馈的写操作结果。 根据本发明实施例的另一方面, 提供了一种海量数据融合存储系统, 包括: 数据 融合子系统、 结构化存储引擎、 NoSQL存储引擎以及分布式文件子系统, 其中, 数据 融合子系统包括: 第一接收模块, 设置为接收结构化存储引擎或 NoSQL存储引擎发 送的写操作请求, 其中, 结构化存储引擎支持 SQL 的结构化数据读写操作, NoSQL 存储引擎支持键值的 NoSQL数据读写操作; 处理模块, 设置为根据写操作请求对分
布式文件子系统进行写操作; 分布式文件子系统, 还用于接收结构化存储引擎或 NoSQL存储引擎发送的读操作请求, 并根据读操起请求执行读操作。 优选地,数据融合子系统还包括:建立模块,设置为建立结构化存储引擎和 NoSQL 存储引擎在元数据定义上的映射关系, 映射关系用于使结构化存储引擎和 NoSQL存 储引擎之间保持数据一致性。 优选地, 处理模块包括: 第一处理单元, 设置为当数据融合子系统接收到结构化 存储引擎发送的写操作请求时, 判断结构化存储引擎与 NoSQL存储引擎之间是否已 经存在映射关系, 在判断结果为是、 且不存在同时写的情况下, 将写操作请求发送给 分布式文件子系统执行写数据操作, 在判断结果为否的情况下, 在 NoSQL存储引擎 上建立映射关系。 优选地, 处理模块包括: 第二处理单元, 设置为当数据融合子系统接收到 NoSQL 存储引擎发送的写操作请求时, 判断结构化存储引擎与 NoSQL存储引擎之间是否已 经存在映射关系, 在判断结果为是、 且不存在同时写的情况下, 将写操作请求发送给 分布式文件子系统执行写数据操作, 在判断结果为否的情况下, 在结构化存储引擎上 建立映射关系。 优选地, 数据融合子系统还包括: 第二接收模块, 设置为接收分布式文件子系统 反馈的写操作结果。 通过本发明实施例, 采用通过整合结构化存储引擎和 NoSQL存储引擎的数据物 理存储方式, 使得同一份数据可以同时支持 SQL和键值两种使用方式, 从而提供一种 支持 SQL与 NoSQL海量数据融合存储服务的方式, 解决了相关技术中结构化存储引 擎和 NoSQL存储引擎不能满足灵活多样的业务需要以及增加运营维护成本较高的问 题, 进而达到了既减少数据的冗余, 又降低系统运维成本的效果。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中: 图 1是根据本发明实施例的海量数据融合存储方法流程图; 图 2是根据本发明实施例的海量数据融合存储系统的结构框图;
图 3是根据本发明实施例的优选海量数据融合存储系统的结构框图; 图 4是根据本发明优选实施例的同时支持 SQL与 NoSQL海量数据融合存储系统 的结构示意图; 图 5是根据本发明优选实施例的结构化存储引擎写操作处理流程示意图; 图 6是根据本发明优选实施例的 NoSQL存储引擎写操作处理流程示意图; 图 7是根据本发明优选实施例的结构化存储引擎、 NoSQL存储引擎读操作流程示 意图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在不冲突的 情况下, 本申请中的实施例及实施例中的特征可以相互组合。 本发明实施例提供了一种海量数据融合存储方法。 图 1是根据本发明实施例的海 量数据融合存储方法流程图, 如图 1所示, 该方法主要包括以下步骤 (步骤 S102-步 骤 S104): 步骤 S102, 数据融合子系统接收结构化存储引擎或 NoSQL存储引擎发送的写操 作请求, 其中, 结构化存储引擎支持 SQL的结构化数据读写操作, NoSQL存储引擎 支持键值的 NoSQL数据读写操作; 步骤 S104, 数据融合子系统根据写操作请求对分布式文件子系统进行写操作, 其 中, 分布式文件子系统还用于接收结构化存储引擎或 NoSQL存储引擎发送的读操作 请求, 并根据读操起请求执行读操作。 通过上述各个步骤, 通过整合结构化存储引擎和 NoSQL存储引擎的数据物理存 储方式,使得同一份数据可以同时支持 SQL和键值两种使用方式,从而提供支持 SQL 与 NoSQL海量数据融合存储服务。 在本实施例中,在执行步骤 S102之前,数据融合子系统可以建立结构化存储引擎 和 NoSQL存储引擎在元数据定义上的映射关系, 映射关系用于使结构化存储引擎和 NoSQL存储引擎之间保持数据一致性。 在本实施例的一个优选实施例方式中, 当数据融合子系统接收到结构化存储引擎 发送的写操作请求时,步骤 S104可以通过以下方式来实现:数据融合子系统判断结构
化存储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为是、 且不 存在同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操作, 在判 断结果为否的情况下, 在 NoSQL存储引擎上建立映射关系。 在本实施例的另一个优选实施例方式中, 当数据融合子系统接收到 NoSQL存储 引擎发送的写操作请求时,步骤 S104可以通过以下方式来实现:数据融合子系统判断 结构化存储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为是、 且不存在同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操作, 在判断结果为否的情况下, 在结构化存储引擎上建立映射关系。 在本实施例中,在执行步骤 S104之后,数据融合子系统还可以接收分布式文件子 系统反馈的写操作结果。 本发明实施例提供了一种海量数据融合存储系统, 用以实现上述海量数据融合存 储方法。 图 2是根据本发明实施例的海量数据融合存储系统的结构框图,如图 2所示, 该系统包括: 数据融合子系统、 结构化存储引擎、 NoSQL存储引擎以及分布式文件子 系统, 其中, 数据融合子系统包括: 第一接收模块 10, 设置为接收结构化存储引擎或 NoSQL存储引擎发送的写操作请求, 其中, 结构化存储引擎支持 SQL的结构化数据 读写操作, NoSQL存储引擎支持键值的 NoSQL数据读写操作; 处理模块 20, 设置为 根据写操作请求对分布式文件子系统进行写操作; 分布式文件子系统, 还用于接收结 构化存储引擎或 NoSQL存储引擎发送的读操作请求, 并根据读操起请求执行读操作。 图 3是根据本发明实施例的优选海量数据融合存储系统的结构框图,如图 3所示, 在该优选海量数据融合存储系统中, 数据融合子系统还可以包括: 建立模块 30, 设置 为建立结构化存储引擎和 NoSQL存储引擎在元数据定义上的映射关系, 映射关系用 于使结构化存储引擎和 NoSQL存储引擎之间保持数据一致性。 在该优选海量数据融合存储系统中, 处理模块 20可以包括: 第一处理单元 22, 设置为当数据融合子系统接收到结构化存储引擎发送的写操作请求时, 判断结构化存 储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为是、 且不存在 同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操作, 在判断结 果为否的情况下, 在 NoSQL存储引擎上建立映射关系。 在该优选海量数据融合存储系统中, 处理模块 20可以包括: 第二处理单元 24, 设置为当数据融合子系统接收到 NoSQL存储引擎发送的写操作请求时, 判断结构化 存储引擎与 NoSQL存储引擎之间是否已经存在映射关系, 在判断结果为是、 且不存
在同时写的情况下, 将写操作请求发送给分布式文件子系统执行写数据操作, 在判断 结果为否的情况下, 在结构化存储引擎上建立映射关系。 在该优选海量数据融合存储系统中, 数据融合子系统还可以包括: 第二接收模块 40, 设置为接收分布式文件子系统反馈的写操作结果。 采用上述实施例提供的海量数据融合存储方法及系统, 可以通过整合结构化存储 引擎和 NoSQL存储引擎的数据物理存储方式, 使得同一份数据可以同时支持 SQL和 键值两种使用方式, 从而提供支持 SQL与 NoSQL海量数据融合存储服务, 通过这种 方式, 能够满足灵活多样业务的需要。 下面结合图 4至图 7以及优选实施例对上述实施例提供的海量数据融合存储方法 及系统进行更加详细的描述和说明。 本优选实施例主要提供了一种可以同时支持 SQL与 NoSQL海量数据融合存储的 方法, 以及一种包括结构化存储引擎、 NoSQL存储引擎、 数据融合子系统和分布式文 件子系统的海量数据的处理系统。 首先, 对本优选实施例的实施过程进行一个总体描述: 本优选实施例提供的同时支持 SQL与 NoSQL海量数据融合存储的方法主要包括: 步骤 1, 结构化存储引擎负责数据的结构化数据定义、 加载、 存储、 查询及分析 等功能, 提供类似于传统关系型数据库通过 SQL进行数据查询、 统计、 分组及排序等 操作; 步骤 2, NoSQL存储引擎负责表、 列族、 索引的创建管理, 对数据的创建、 存储、 更新、 删除、 查询和智能扫描等; 步骤 3, 数据融合子系统负责结构化存储引擎与 NoSQL存储引擎之间的写操作; 步骤 4, 分布式文件子系统负责海量数据的分布式存储。 优选地, 步骤 1可以进一步包括: 结构化数据写操作; 支持 SQL的结构化数据读 操作。 优选地, 步骤 2可以进一步包括: NoSQL数据写操作; 支持键值的 NoSQL数据 读操作。
优选地, 步骤 3可以进一步包括: 建立结构化存储引擎、 NoSQL存储引擎数据定 义上的映射关系; 建立同步锁机制, 当结构化存储引擎、 NoSQL存储引擎对相同的数 据同时进行写操作时, 保证数据的一致性。 优选地, 步骤 4可以进一步包括: 来自于数据融合子系统的写操作, 以及结构化 存储引擎、 NoSQL存储引擎的读操作。 优选地, 步骤 1和步骤 3之间还可以包括: 结构化存储引擎进行写操作时, 将写 请求发送到数据融合子系统, 由数据融合子系统通过映射模块建立与 NoSQL存储引 擎数据定义上的一致性。 优选地, 步骤 1和步骤 4之间还可以包括: 结构化存储引擎进行读操作时, 将写 请求发送到分布式文件子系统, 根据上述建立的一致的元数据定义访问分布式文件子 系统中存储的同一份数据, 由分布式文件子系统提供结构化存储引擎、 NoSQL存储引 擎融合存储的数据信息。 优选地, 步骤 2和步骤 3之间还可以包括: NoSQL存储引擎进行写操作时, 将写 请求发送到数据融合子系统, 由数据融合子系统通过映射模块建立与结构化存储引擎 数据定义上的一致性。 优选地, 步骤 2和步骤 4之间还可以包括: NoSQL存储引擎进行读操作时, 将写 请求发送到分布式文件子系统, 根据上述建立的一致的元数据定义访问分布式文件子 系统中存储的同一份数据, 由分布式文件子系统提供结构化存储引擎、 NoSQL存储引 擎融合存储的数据信息。 优选地, 步骤 3和步骤 4之间还可以包括:数据融合子系统接收结构化存储引擎、
NoSQL存储引擎的写操作请求, 并写入相应数据到分布式文件子系统中, 在分布式文 件子系统中存储一份共用的数据。 为了便于理解, 这里也请参考图 4。 本优选实施例提供的海量数据融合存储系统 包括: ( 1 )结构化存储引擎, 负责数据的结构化数据定义、 加载、 存储、 查询及分析等 功能, 提供类似于传统关系型数据库通过 SQL进行数据查询、 统计、 分组及排序等操 作。 (2) NoSQL存储引擎, 负责表、 列族、 索引的创建管理, 对数据的创建、 存储、 更新、删除、查询和智能扫描等。(3 )数据融合子系统,负责结构化存储引擎与 NoSQL 存储引擎之间的写操作。 (4) 分布式文件子系统, 负责海量数据的分布式存储。
以下将对本优选实施例提供的可以同时支持 SQL与 NoSQL海量数据融合存储方 法及系统进行进一步说明。 图 4是根据本发明优选实施例的同时支持 SQL与 NoSQL海量数据融合存储系统 的结构示意图, 如图 4所示, 该海量数据融合存储系统包括: 结构化存储引擎 101、 NoSQL存储引擎 102、 数据融合子系统 103和分布式文件子系统 104。 其中: 数据融合子系统 103包含映射模块, 该模块负责维护结构化存储引擎、 NoSQL存 储引擎元数据定义上的一致性。 结构化存储引擎在进行元数据定义时需要同时指定该 数据在 NoSQL存储引擎中的元数据定义, 相应地, NoSQL存储引擎在进行元数据定 义时需要同时指定该数据在结构化存储引擎中的元数据定义。 数据融合子系统 103还包含同步锁模块, 该模块负责在结构化存储引擎、 NoSQL 存储引擎对相同的数据同时进行写操作时, 保证事务的一致性。 数据融合子系统 103还包含数据写模块,该模块负责接收结构化存储引擎、 NoSQL 存储引擎的写操作请求, 并写入相应数据到分布式文件子系统中, 在分布式文件子系 统中存储一份共用的数据。 这里需要说明的是, 该优选实施例提供的同时支持 SQL与 NoSQL海量数据融合 存储系统所保包含的各个模块的名称或数目与上述实施例提供的海量数据融合存储系 统中的各个模块的名称或数目存在不一致的情况, 各个模块的功能并不完全一致, 甚 至具有包含或重叠的功能关系, 但这恰恰是该优选实施例仅仅作为上述实施例的一个 较佳的实施方式的原因导致的,二者提供的方案均完全可以实现且能产生同样的效果。 图 5是根据本发明优选实施例的结构化存储引擎写操作处理流程示意图, 如图 5 所示, 该结构化存储引擎写操作处理流程包括: 步骤 S501 , 结构化存储引擎 101进行写操作, 将操作请求发送到数据融合子系统
103。 步骤 S502, 数据融合子系统 103接收到结构化存储引擎 101的写请求, 判断结构 化存储引擎 101、 NoSQL存储引擎 102中是否存在映射关系, 如果存在映射关系需进 一步判断结构化存储引擎 101、 NoSQL存储引擎 102是否存在同时写的情况, 否则, 需要在 NoSQL存储引擎 102上建立映射关系。 步骤 S503a, 数据融合子系统 103判断映射关系不存在, 在 NoSQL存储引擎 102 上建立映射关系,保证与构化存储引擎 101在元数据定义上的一致性。
步骤 S503b, 数据融合子系统 103判断存在映射关系, 此时进一步判断结构化存 储引擎 101、 NoSQL存储引擎 102是否存在同时写的情况。 步骤 S504, 数据融合子系统 103发送写请求到分布式文件子系统 104, 分布式文 件子系统 104接收请求并开始写数据, 最后将写操作结果反馈给数据融合子系统 103。 图 6是根据本发明优选实施例的 NoSQL存储引擎写操作处理流程示意图,如图 6 所示, 该 NoSQL存储引擎写操作处理流程包括: 步骤 S601, NoSQL存储引擎 102进行写操作, 将操作请求发送到数据融合子系 统 103。 步骤 S602, 数据融合子系统 103接收到 NoSQL存储引擎 102的写请求, 判断结 构化存储引擎 101、 NoSQL存储引擎 102中是否存在映射关系, 如果存在映射关系需 进一步判断结构化存储引擎 101、 NoSQL存储引擎 102是否存在同时写的情况, 否则 需要在结构化存储引擎 101上建立映射关系。 步骤 S603a, 数据融合子系统 103判断映射关系不存在, 在结构化存储引擎 101 上建立映射关系, 保证与 NoSQL存储引擎 102在元数据定义上的一致性。 步骤 S603b, 数据融合子系统 103判断存在映射关系, 此时进一步判断结构化存 储引擎 101、 NoSQL存储引擎 102是否存在同时写的情况。 步骤 S604, 数据融合子系统 103发送写请求到分布式文件子系统 104, 分布式文 件子系统 104接收请求并开始写数据, 最后将写操作结果反馈给数据融合子系统 103。 图 7是根据本发明优选实施例的结构化存储引擎、 NoSQL存储引擎读操作流程示 意图, 如图 7所示, 该结构化存储引擎、 NoSQL存储引擎读操作流程包括: 步骤 S701, 结构化存储引擎 /NoSQL存储引擎读操作。 步骤 S702, 从分布式文件子系统上读数据。 由于在图 5、 图 6中的写操作保证了结构化存储引擎 101、 NoSQL存储引擎 102 数据定义上的一致性, 且在分布式文件子系统 104共用同一份数据, 所以结构化存储 引擎 101、 NoSQL存储引擎 102读操作时均可以直接向分布式文件子系统 104读取。 通过上述优选实施例的实现, 可以通过将相互独立的结构化存储引擎、 NoSQL存 储引擎的数据整合成一份, 解决同一份数据难以同时支持 SQL和键值两种使用方式、
数据相互独立时运维成本较高的问题,从而实现了一个同时支持 SQL和键值两种使用 方式的共享存储的海量数据的处理系统。 需要说明的是, 上述各个模块是可以通过硬件来实现的。 例如: 一种处理器, 包 括上述各个模块, 或者, 上述各个模块分别位于一个处理器中。 在另外一个实施例中, 还提供了一种软件, 该软件用于执行上述实施例及优选实 施方式中描述的技术方案。 在另外一个实施例中, 还提供了一种存储介质, 该存储介质中存储有上述软件, 该存储介质包括但不限于: 光盘、 软盘、 硬盘、 可擦写存储器等。 从以上的描述中, 可以看出, 本发明实施例实现了如下技术效果: 现有系统要么 只支持 SQL方式、 要么只支持键值方式, 无法满足业务灵活性的需要, 而本发明实施 例可以将相互独立的结构化存储引擎、 NoSQL存储引擎的数据整合成一份, 可以同时 支持 SQL和键值两种使用方式, 业务可以根据需要选择合适的使用方式, 而且, 结构 化存储引擎、 NoSQL存储引擎共用一份分布式文件子系统中的数据, 这样既减少了数 据的冗余, 又降低了系统运维的成本。 显然, 本领域的技术人员应该明白, 上述的本发明实施例的各模块或各步骤可以 用通用的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算 装置所组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以将它们存储在存储装置中由计算装置来执行, 并且在某些情况下, 可以以不同于 此处的顺序执行所示出或描述的步骤, 或者将它们分别制作成各个集成电路模块, 或 者将它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明不限制 于任何特定的硬件和软件结合。 以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本领域的技 术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的 任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。 工业实用性 本发明实施例提供的技术方案可以应用于通信领域, 解决了相关技术中结构化存 储引擎和 NoSQL存储引擎不能满足灵活多样的业务需要以及增加运营维护成本较高 的问题, 进而达到了既减少数据的冗余, 又降低系统运维成本的效果。
Claims
1. 一种海量数据融合存储方法, 包括:
数据融合子系统接收结构化存储引擎或 NoSQL存储引擎发送的写操作请 求,其中,所述结构化存储引擎支持 SQL的结构化数据读写操作,所述 NoSQL 存储引擎支持键值的 NoSQL数据读写操作;
所述数据融合子系统根据所述写操作请求对分布式文件子系统进行写操 作, 其中, 所述分布式文件子系统还用于接收所述结构化存储引擎或所述 NoSQL存储引擎发送的读操作请求, 并根据所述读操起请求执行读操作。
2. 根据权利要求 1所述的方法, 其中, 在数据融合子系统接收结构化存储引擎或 NoSQL存储引擎发送的写操作请求之前, 包括:
所述数据融合子系统建立所述结构化存储引擎和所述 NoSQL存储引擎在 元数据定义上的映射关系, 所述映射关系用于使所述结构化存储引擎和所述 NoSQL存储引擎之间保持数据一致性。
3. 根据权利要求 1或 2所述的方法, 其中, 当所述数据融合子系统接收到所述结 构化存储引擎发送的写操作请求时, 所述数据融合子系统根据所述写操作请求 对分布式文件子系统进行写操作, 包括:
所述数据融合子系统判断所述结构化存储引擎与所述 NoSQL存储引擎之 间是否已经存在所述映射关系, 在判断结果为是、 且不存在同时写的情况下, 将所述写操作请求发送给所述分布式文件子系统执行写数据操作, 在判断结果 为否的情况下, 在所述 NoSQL存储引擎上建立所述映射关系。
4. 根据权利要求 1 或 2 所述的方法, 其中, 当所述数据融合子系统接收到所述 NoSQL存储引擎发送的写操作请求时,所述数据融合子系统根据所述写操作请 求对分布式文件子系统进行写操作, 包括:
所述数据融合子系统判断所述结构化存储引擎与所述 NoSQL存储引擎之 间是否已经存在所述映射关系, 在判断结果为是、 且不存在同时写的情况下, 将所述写操作请求发送给所述分布式文件子系统执行写数据操作, 在判断结果 为否的情况下, 在所述结构化存储引擎上建立所述映射关系。
根据权利要求 1或 2所述的方法, 其中, 在所述数据融合子系统根据所述写操 作请求对分布式文件子系统进行写操作之后, 还包括:
所述数据融合子系统接收所述分布式文件子系统反馈的写操作结果。 一种海量数据融合存储系统,包括:数据融合子系统、结构化存储引擎、 NoSQL 存储引擎以及分布式文件子系统, 其中, 所述数据融合子系统包括:
第一接收模块, 设置为接收所述结构化存储引擎或所述 NoSQL存储引擎 发送的写操作请求, 其中, 所述结构化存储引擎支持 SQL的结构化数据读写操 作, 所述 NoSQL存储引擎支持键值的 NoSQL数据读写操作;
处理模块, 设置为根据所述写操作请求对所述分布式文件子系统进行写操 作;
所述分布式文件子系统, 还用于接收所述结构化存储引擎或所述 NoSQL 存储引擎发送的读操作请求, 并根据所述读操起请求执行读操作。 根据权利要求 6所述的系统, 其中, 所述数据融合子系统还包括:
建立模块, 设置为建立所述结构化存储引擎和所述 NoSQL存储引擎在元 数据定义上的映射关系, 所述映射关系用于使所述结构化存储引擎和所述 NoSQL存储引擎之间保持数据一致性。 根据权利要求 6或 7所述的系统, 其中, 所述处理模块包括:
第一处理单元, 设置为当所述数据融合子系统接收到所述结构化存储引擎 发送的写操作请求时, 判断所述结构化存储引擎与所述 NoSQL存储引擎之间 是否已经存在所述映射关系, 在判断结果为是、 且不存在同时写的情况下, 将 所述写操作请求发送给所述分布式文件子系统执行写数据操作, 在判断结果为 否的情况下, 在所述 NoSQL存储引擎上建立所述映射关系。 根据权利要求 6或 7所述的系统, 其中, 所述处理模块包括:
第二处理单元, 设置为当所述数据融合子系统接收到所述 NoSQL存储引 擎发送的写操作请求时, 判断所述结构化存储引擎与所述 NoSQL存储引擎之 间是否已经存在所述映射关系, 在判断结果为是、 且不存在同时写的情况下, 将所述写操作请求发送给所述分布式文件子系统执行写数据操作, 在判断结果 为否的情况下, 在所述结构化存储引擎上建立所述映射关系。 根据权利要求 6或 7所述的系统, 其中, 所述数据融合子系统还包括:
第二接收模块, 设置为接收所述分布式文件子系统反馈的写操作结果。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14795196.6A EP3082050A4 (en) | 2013-12-10 | 2014-05-27 | MASS DATA FUSION STORAGE PROCESS AND SYSTEM |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310669985.2A CN104699720A (zh) | 2013-12-10 | 2013-12-10 | 海量数据融合存储方法及系统 |
CN201310669985.2 | 2013-12-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014180395A1 true WO2014180395A1 (zh) | 2014-11-13 |
Family
ID=51866775
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/078558 WO2014180395A1 (zh) | 2013-12-10 | 2014-05-27 | 海量数据融合存储方法及系统 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3082050A4 (zh) |
CN (1) | CN104699720A (zh) |
WO (1) | WO2014180395A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10216823B2 (en) | 2017-05-31 | 2019-02-26 | HarperDB, Inc. | Systems, methods, and apparatus for hierarchical database |
CN111832034A (zh) * | 2019-04-23 | 2020-10-27 | 创新先进技术有限公司 | 多方数据融合方法及装置 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111949650A (zh) * | 2019-05-15 | 2020-11-17 | 华为技术有限公司 | 一种多语言融合查询方法及多模数据库系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101158958A (zh) * | 2007-10-23 | 2008-04-09 | 浙江大学 | 基于MySQL存储引擎的融合查询方法 |
CN101477568A (zh) * | 2009-02-12 | 2009-07-08 | 清华大学 | 一种结构化数据和非结构化数据综合检索的方法 |
US20130238544A1 (en) * | 2012-03-06 | 2013-09-12 | Samsung Electronics Co., Ltd. | Near real-time analysis of dynamic social and sensor data to interpret user situation |
CN103366311A (zh) * | 2013-07-11 | 2013-10-23 | 昆明能讯科技有限责任公司 | 一种基于变电站多系统的数据融合处理方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9460189B2 (en) * | 2010-09-23 | 2016-10-04 | Microsoft Technology Licensing, Llc | Data model dualization |
US8954478B2 (en) * | 2010-09-28 | 2015-02-10 | Yiftach Shoolman | Systems, methods, and media for managing RAM resources for in-memory NoSQL databases |
US8447721B2 (en) * | 2011-07-07 | 2013-05-21 | Platfora, Inc. | Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines |
CN102446226B (zh) * | 2012-01-16 | 2015-09-16 | 北大方正集团有限公司 | 一种实现NoSQL的键值存储引擎的方法 |
CN103049482B (zh) * | 2012-11-30 | 2015-12-09 | 国家电网公司 | 一种分布式异构系统中数据融合存储的实现方法 |
CN103198150B (zh) * | 2013-04-24 | 2016-04-20 | 清华大学 | 一种大数据索引方法及系统 |
CN103425785A (zh) * | 2013-08-22 | 2013-12-04 | 新浪网技术(中国)有限公司 | 数据存储系统及其用户数据存储、读取方法 |
-
2013
- 2013-12-10 CN CN201310669985.2A patent/CN104699720A/zh active Pending
-
2014
- 2014-05-27 EP EP14795196.6A patent/EP3082050A4/en not_active Ceased
- 2014-05-27 WO PCT/CN2014/078558 patent/WO2014180395A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101158958A (zh) * | 2007-10-23 | 2008-04-09 | 浙江大学 | 基于MySQL存储引擎的融合查询方法 |
CN101477568A (zh) * | 2009-02-12 | 2009-07-08 | 清华大学 | 一种结构化数据和非结构化数据综合检索的方法 |
US20130238544A1 (en) * | 2012-03-06 | 2013-09-12 | Samsung Electronics Co., Ltd. | Near real-time analysis of dynamic social and sensor data to interpret user situation |
CN103366311A (zh) * | 2013-07-11 | 2013-10-23 | 昆明能讯科技有限责任公司 | 一种基于变电站多系统的数据融合处理方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10216823B2 (en) | 2017-05-31 | 2019-02-26 | HarperDB, Inc. | Systems, methods, and apparatus for hierarchical database |
US10956448B2 (en) | 2017-05-31 | 2021-03-23 | HarperDB, Inc. | Systems, methods, and apparatus for hierarchical database |
CN111832034A (zh) * | 2019-04-23 | 2020-10-27 | 创新先进技术有限公司 | 多方数据融合方法及装置 |
CN111832034B (zh) * | 2019-04-23 | 2024-04-30 | 创新先进技术有限公司 | 多方数据融合方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
EP3082050A1 (en) | 2016-10-19 |
EP3082050A4 (en) | 2016-12-07 |
CN104699720A (zh) | 2015-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11372888B2 (en) | Adaptive distribution for hash operations | |
US10509785B2 (en) | Policy-driven data manipulation in time-series database systems | |
US10642840B1 (en) | Filtered hash table generation for performing hash joins | |
US10216770B1 (en) | Scaling stateful clusters while maintaining access | |
CN103312791B (zh) | 物联网异构数据存储方法及系统 | |
US20200319810A1 (en) | Deduplication of encrypted data within a remote data store | |
US10169438B2 (en) | Determining common table definitions in distributed databases | |
WO2012083679A1 (zh) | 一种数据迁移方法、数据迁移装置及数据迁移系统 | |
WO2015039569A1 (zh) | 副本存储装置及副本存储方法 | |
US11841845B2 (en) | Data consistency mechanism for hybrid data processing | |
US10387384B1 (en) | Method and system for semantic metadata compression in a two-tier storage system using copy-on-write | |
WO2014180395A1 (zh) | 海量数据融合存储方法及系统 | |
WO2016206100A1 (zh) | 一种数据表的分区管理方法及装置 | |
US11269930B1 (en) | Tracking granularity levels for accessing a spatial index | |
WO2023066222A1 (zh) | 数据处理方法、装置、电子设备、存储介质及程序产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14795196 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2014795196 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014795196 Country of ref document: EP |