CN101950297A - Method and device for storing and inquiring mass semantic data - Google Patents

Method and device for storing and inquiring mass semantic data Download PDF

Info

Publication number
CN101950297A
CN101950297A CN 201010279073 CN201010279073A CN101950297A CN 101950297 A CN101950297 A CN 101950297A CN 201010279073 CN201010279073 CN 201010279073 CN 201010279073 A CN201010279073 A CN 201010279073A CN 101950297 A CN101950297 A CN 101950297A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
data
cluster
mass
method
node
Prior art date
Application number
CN 201010279073
Other languages
Chinese (zh)
Inventor
赵东岩
邹磊
陈岩光
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention provides a method and a device for storing and inquiring mass semantic data. The method comprises the following steps of: constructing a cluster with a plurality of computer nodes based on cloud computing, wherein the cluster comprises a master control node and a plurality of slave nodes; establishing a distributed database for each node in the cluster; and importing mass semantic data into the distributed database of each node by a cloud computing programming model. Therefore, distributed storage is realized; and the problems of storage bottleneck and management difficulties of a stand alone system when mass data is processed are solved. Simultaneously if data scale continues to expand, good expandability is achieved by only adding the slave nodes of the cluster. In addition, when the problem of inquiring the mass data is solved, the method of the invention segments an inquiry by utilizing RMI technology, so that each node of the cluster can process the obtained segmented inquiry at the same time; and therefore, inquiry velocity is higher and efficiency is higher.

Description

一种海量语义数据的存储和查询方法及装置 One kind of mass data storage and query semantic method and apparatus

技术领域 FIELD

[0001] 本发明涉及数据库技术领域、云计算领域、分布式计算领域、语义网领域,特别涉及一种利用云计算技术对海量语义数据进行存储和查询的方法及装置。 [0001] The present invention relates to the field of database technology, cloud computing, distributed computing, the Semantic Web, and in particular relates to a method and an apparatus for cloud computing technique massive data storage and semantic query.

背景技术 Background technique

[0002] 语义数据是一种表示实体的属性信息以及实体之间语义关系的数据,一般可以利用三元组的集合形式来表示,三元组的格式为< 主体,谓词,客体>。 [0002] Semantic data is a semantic data representing a relationship between an entity and attribute information of an entity, typically using a set of triples may be in the format of a triple <subject, predicate, object>. 例如:< 张三,出生地, 北京>,< 张三,指导老师,李四>,……,< 张三,毕业于,北京大学> 这些三元组就可以把有关张三的一系列属性信息以及和张三有关的实体信息都表示出来。 For example: <Joe Smith, place of birth, Beijing>, <Joe Smith, instructor, John Doe> ...... <Joe Smith, a graduate of Peking University> These triples can put a series of attributes about John's Joe Smith and information as well as information about the entity are shown.

[0003] 这些语义数据的传统的存储和查询方法是将其存储在单机数据库中,利用数据库的表格和相关索引技术来进行查询。 [0003] Traditional methods such storage and query semantic data is stored in a single database, use the index tables and related technologies to perform database queries. 但是这种方法的前提是数据量不大,即单机数据库可以承受的数据规模。 But the premise of this approach is the amount of data that can withstand a single database the data size. 然而,随着互联网的不断发展,信息的规模呈现爆炸式增长,与此同时, 语义数据的规模也在不断扩大,很多语义数据集达到了数亿到数十亿个三元组的规模。 However, with the continuous development of the Internet, the size of the explosive growth of information, at the same time, the scale of semantic data is also expanding, many semantic data sets to reach the scale of hundreds of millions to billions of triples. 在这种情况下,使用传统的方法就无法很好的解决存储和查询的问题了,因为无论是内存还是硬盘,都无法支持海量的数据管理。 In this case, the use of traditional methods can not be a good solution to the problem of storing and querying, because either memory or hard drive, they can not support the massive data management. 同时,如果数据量继续增加,单机数据库也无法支持系统的扩展性。 Meanwhile, if the amount of data continues to increase, stand-alone database can not support the expansion of the system. 可见,传统的单机数据库已经无法满足日益增长的海量语义数据的存储和查询的需求。 Visible, traditional stand-alone database has been unable to meet the growing demand for mass storage and query semantic data.

发明内容 SUMMARY

[0004] 本发明克服现有技术中的缺点,提供一种基于云计算平台的海量语义数据的存储和查询方法及装置,用以支持海量语义数据的管理,并很好的支持了扩展性。 [0004] The present invention overcomes the disadvantages of the prior art, there is provided a storage and query-based method and apparatus the cloud mass semantic data platform to support the massive semantic data management and good support scalability.

[0005] 为了实现本发明存储方法的目的,采用的技术方案如下: [0005] For the purpose of the storage method of the invention, the technical solutions adopted are as follows:

[0006] 一种海量语义数据存储的方法,其步骤包括: [0006] A semantic mass data storage method comprising the steps of:

[0007] 1)基于云计算搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点; [0007] 1) a cloud-based structures having a plurality of computer nodes of the cluster, the cluster comprising a master node and a plurality of slave nodes;

[0008] 2)集群中的每个节点都建立分布式数据库,将语义数据以三元组< 主体,谓词,客体〉的形式表示,映射到分布式数据库;利用云计算编程模型将语义数据导入分布式数据库; Each node [0008] 2) in the cluster to establish a distributed database, semantic data representation, mapped to a distributed database as triple <subject, predicate, object> form; cloud computing programming model semantic data import Distributed database;

[0009] 3)对语义数据建立索引,利用云计算编程模型存储在分布式数据库。 [0009] 3) indexed semantic data, using the distributed database programming model cloud storage.

[0010] 所述步骤2)映射的过程为:三元组的谓词都对应到分布式数据库属性列中,三元组的主体对应一个行键(Rowkey),三元组的客体对应一行中某一属性列的值。 [0010] step 2) the mapping process is: the predicate of the triple corresponds to the distributed database attribute column, the body of a corresponding one triple row of keys (RowKey), and object triples corresponding to a row the value of a property of the column.

[0011] 所述步骤3)编程模型为MapReduce编程模型,分为Map函数和Reduce函数,在数据导入中,Map函数和Reduce函数实现如下所示: [0011] step 3) programming model MapReduce programming model, and into the Map function Reduce function, data import, the Map function and Reduce function implementation is as follows:

[0012] Map :<文件行号,三元组>_><主体,谓词+客体> [0012] Map: <line numbers triple> _> <subject, predicate + object>

[0013] Reduce :<主体,List (谓词+客体)> 导入数据库。 [0013] Reduce: <body, List (predicate + object)> into the database.

[0014] 所述步骤3)建立索引时,每一张索引表都对应了唯一一个的谓词,索引表的行键是一个客体,该行对应的每一列都是一个主体。 [0014] step 3) is indexed, each table corresponds to a unique index of a verb, the row index table is a key object, the row corresponding to each column is one body.

[0015] 所述步骤3)编程模型为MapReduce编程模型,分为Map函数和Reduce函数,建立索引时,Map函数和Reduce函数实现如下所示: [0015] step 3) programming model MapReduce programming model, and into the Map function Reduce function, indexing, and Map function Reduce function implementation is as follows:

[0016] Map :<文件行号,三元组>_><客体+谓词,主体> [0016] Map: <line numbers triple> _> <+ verb object, body>

[0017] Reduce :〈客体+谓词,List (主体)> 导入数据库。 [0017] Reduce: <predicate + object, List (body)> into the database.

[0018] 为了实现本发明查询方法的目的,采用的技术方案如下: [0018] For the purpose of the query process of the present invention, the technical solutions adopted are as follows:

[0019] 一种海量语义数据查询的方法,其步骤包括: [0019] A method for mass query semantic data, comprising the steps of:

[0020] 6-1)基于云计算搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点; [0020] 6-1) to build cluster having a plurality of computer nodes cloud-based, cluster comprising a master node and a plurality of slave nodes;

[0021] 6-2)集群中的每个节点都建立分布式数据库,存储有语义数据,语义数据以三元组< 主体,谓词,客体> 的形式表示; Each node [0021] 6-2) in the cluster to establish a distributed database, semantic data is stored, the semantic data are presented as triple <subject, predicate, object> form;

[0022] 6-3)集群的各个节点都开设查询服务,将某一查询条件分成若干份,分别发送给不同的节点去查询,最后各个节点将查询结果汇总到主控节点。 [0022] 6-3) nodes of a cluster are set up queries, a query condition will be divided into several parts, each sent to different nodes to query the last query results are summarized respective nodes to the master node.

[0023] 所述步骤6-2)对语义数据进行比特序列化,数据表的每一行设置一个比特序列, 语义数据中所出现的不同谓词的个数对于该行,如果某一属性列有值,则该比特位设置为1,否则,设置为0。 [0023] The step 6-2) to the bit sequence of semantic data, each row of the data table to a sequence of bits, the number of different predicates appearing in the semantic data for the line, if there is an attribute value of the column , the bit is set to 1, otherwise it is set to 0.

[0024] 所述步骤6-3)通过预先统计语义数据每个谓词所关联的三元组个数决定查询条件的优先级。 [0024] The step 6-3) determines the priority by previously query count the number of triplets associated with each verb semantic data.

[0025] 所述步骤6-3)利用RMI方法送至集群中的各个节点进行分布式查询。 [0025] The step 6-3) using the RMI to the respective nodes in the cluster is distributed query.

[0026] 为了实现本发明存储装置的目的,采用的技术方案如下: [0026] For the purpose of the storage device of the present invention, a technical solution as follows:

[0027] —种海量语义数据存储的装置,其特征在于,基于云计算平台上搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;所述节点上包括: [0027] - semantic data types mass storage device, characterized in that the building having a plurality of computer nodes of the cluster based on the cloud computing platform, the cluster comprising a master node and a plurality of slave nodes; the node comprising:

[0028] 数据导入单元:利用云计算编程模型将海量语义数据导入各节点的分布式数据库; [0028] The data import unit: cloud computing programming model semantic data import massive distributed database of each node;

[0029] 建立索引单元:利用云计算编程模型对语义数据建立索引。 [0029] The indexing unit: cloud computing programming semantic data model index.

[0030] 为了实现本发明查询装置的目的,采用的技术方案如下: [0030] To achieve the object of the present invention, the query means, technical solution adopted is as follows:

[0031] 一种海量语义数据存储的装置,其特征在于,基于云计算平台上搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;所述节点上包括: [0031] A semantic data mass storage device, characterized in that the building having a plurality of computer nodes of the cluster based on the cloud computing platform, the cluster comprising a master node and a plurality of slave nodes; the node comprising:

[0032] 查询分割单元:将查询分割成若干部分,送至集群中的各个节点进行分布式查询; [0032] dividing unit queries: query is divided into several parts, each node in the cluster to the distributed query;

[0033] 查询合并单元:将各个节点的查询结果进行合并; [0033] The merging unit queries: query result of each node are combined;

[0034] 查询结果显示单元:用于显示用户输入的查询图和满足查询条件的查询结果。 [0034] The query result display unit: FIG query entered by the user and for displaying results of the query satisfy the query criteria.

[0035] 还包括: [0035] further comprises:

[0036] 数据统计单元,用于记录语义数据的统计信息; [0036] The data statistics unit for recording statistics for semantic data;

[0037] 比特序列化单元:对语义数据进行语义序列化; [0037] The bit sequence unit: semantic sequence of semantic data;

[0038] 查询优化单元:利用统计信息和比特序列进行查询优化。 [0038] Query optimization unit: query optimization and statistical information bit sequence.

[0039] 与现有技术相比,本发明的方法实现了存储的分布式,解决了当处理海量数据时单机系统遇到的存储瓶颈和管理困难的问题,同时,如果数据规模继续扩大,只需要增加集群的从属节点,具有良好的可扩展性。 [0039] Compared with the prior art, the method of the invention implements a distributed storage, storage bottlenecks and to solve the difficult management issues when dealing with massive data stand-alone system encounters the same time, if the data continued to expand, only slave nodes in the cluster needs to be increased, with good scalability. 另外,在解决海量数据查询的问题时,本发明实例利用了RMI技术,将查询进行了分割,使得集群的各个节点可以同时处理所得到的分割查询,使得查询速度更快,效率更高。 Further, when the mass data query problem, the present invention utilizes the RMI technology example, the query was divided, such that each node of the cluster dividing process can be obtained while the query, the query so that faster and more efficient.

附图说明 BRIEF DESCRIPTION

[0040] 图1为本发明实施例中海量语义数据存储的方法的图示; [0040] FIG 1 illustrates a mass semantic data stored embodiment of the method of the present embodiment of the invention;

[0041] 图2为本发明实施例中海量语义数据查询的方法的图示; [0041] FIG 2 illustrates an embodiment of the method of mass query semantic data embodiment of the invention;

[0042] 图3为本发明实施例中海量语义数据存储的装置结构图; [0042] FIG. 3 shows the structure of the mass semantic data stored in the embodiment of the present invention;

[0043] 图4为本发明实施例中海量语义数据查询的装置结构图 [0043] FIG. 4 shows the structure of the embodiment massive semantic data query embodiment of the present invention

[0044] 图5a为语义数据导入过程中Map函数处理过程; [0044] Figure 5a is introduced into the semantic data Map function during the process;

[0045] 图5b为语义数据导入过程中Reduce函数处理过程; [0045] FIG. 5b Reduce function during the import process for the semantic data;

[0046] 图6a为建立索引过程中Map函数处理过程; [0046] Figure 6a is a process during indexing Map function;

[0047] 图6b为建立索引过程中Reduce函数处理过程; [0047] Figure 6b is a Reduce function during the indexing process;

[0048] 图7为所有主体、客体都是变量的查询结果图。 [0048] FIG. 7 is a diagram of all the results of the query subject, object is a variable.

具体实施方式 detailed description

[0049] 本发明实例是基于Hadoop云计算平台的,在这个平台上搭建了一个具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点。 [0049] Examples of the present invention is based on Hadoop cloud computing platform, the platform built in a cluster having a plurality of computer nodes, the cluster comprising a master node and a plurality of slave nodes. 主控节点的作用是控制从属节点的存储细节信息,调度存储单元,分配计算任务,监测节点状态,平衡集群负载。 Role of the master node is a slave node of the control store detailed information, scheduling storage unit, assign computing tasks, node status monitoring, cluster load balancing. 从属节点的作用是存储具体数据,完成主控节点所分配的计算任务,向主控节点报告存储信息、计算信息以及自己的当前状态。 Role of the slave node is to store specific data, complete computing tasks assigned master node, the master node reports to store information, and calculates their current status information. 在这个集群上,建立了分布式数据库Hbase,这样集群的每个节点都可以提供存储功能,当存储海量语义数据时,不用再担心容量的问题。 In this cluster, the establishment of a distributed database Hbase, each node in the cluster can provide such a memory function, when the store massive amounts of data semantics, do not have to worry about capacity. 而且当数据继续增加时,只需增加集群中的从属节点个数即可,具有非常好的可扩展性。 And when the data continues to increase, only to increase the number of slave nodes in the cluster have very good extensibility. 在数据导入和建立索引时利用了MapReduce编程框架,使得集群中每个节点同时完成计算任务,海量数据的导入时间耗费更短。 When using index data and the introduction MapReduce programming framework, such that each node in the cluster to complete computing tasks simultaneously, mass data lead-in time-consuming shorter.

[0050] 本发明实施例提供一种海量语义数据存储的方法,包括: [0050] The method of the present invention provides a semantic data storage mass, comprising:

[0051] 利用Hadoop云计算平台的Hbase存储海量语义数据; [0051] using Hadoop cloud mass Hbase semantic data storage platform;

[0052] 利用Hadoop云计算平台的Hbase存储海量语义数据的索引; [0052] index calculation Hbase semantic data store massive platform using Hadoop cloud;

[0053] 利用MapReduce云计算编程模型将海量语义数据导入Hbase ; [0053] Cloud computing MapReduce programming model using the semantic data import Hbase mass;

[0054] 利用MapReduce云计算编程模型对海量语义数据建立索引。 [0054] With the MapReduce programming model cloud mass index of semantic data.

[0055] 本发明实施例提供一种海量语义数据查询的方法,包括: [0055] The embodiments of the present invention provides a method for mass query semantic data, comprising:

[0056] 记录语义数据的统计信息,利用语义数据的统计信息进行查询优化; [0056] Semantic data records information, statistical information semantic query optimization data;

[0057] 对语义数据进行比特序列化,利用比特序列的方法进行查询优化; [0057] The semantics of the data bit sequence, the query is optimized by the method of the bit sequence;

[0058] 利用RMI技术实现集群中多个节点同时进行查询。 [0058] RMI technology using a plurality of query nodes in the cluster simultaneously.

[0059] 本发明实施例提供一种海量语义数据存储的装置,包括: [0059] The apparatus of embodiments of the present invention provides a semantic data storage mass, comprising:

[0060] 数据导入单元:利用MapReduce云计算编程模型将海量语义数据导入Hbase ; [0060] The data import unit: cloud MapReduce programming model using the semantic data import Hbase mass;

[0061] 建立索引单元:利用MapReduce云计算编程模型对海量语义数据建立索引。 [0061] The indexing unit: cloud using the MapReduce programming model mass index of semantic data.

[0062] 本发明实施例提供一种海量语义数据查询的装置,包括: [0062] The apparatus of embodiments of the present invention provides a mass query semantic data, comprising:

[0063] 数据统计单元:记录语义数据的统计信息; [0063] Statistics Unit: semantic information data records;

[0064] 比特序列化单元:对语义数据进行语义序列化;[0065] 查询优化单元:利用统计信息和比特序列进行查询优化; [0064] The bit sequence unit: semantic semantic data serialization; [0065] Query optimization unit: using statistics and query optimization bit sequence;

[0066] 查询分割单元:将查询分割成若干部分,利用RMI技术送至集群中的各个节点进行分布式查询; [0066] dividing unit queries: query is divided into several parts, using RMI technology to the respective nodes in the cluster is distributed query;

[0067] 查询合并单元:将各个节点的查询结果进行合并。 [0067] Query combining unit: each node of the query results are merged.

[0068] 查询结果显示单元:用于显示用户输入的查询图和满足查询条件的查询结果。 [0068] The query result display unit: FIG query entered by the user and for displaying results of the query satisfy the query criteria.

[0069] 参见图1,实施例中,海量语义数据存储的方法包括: [0069] Referring to Figure 1, in the embodiment, the semantic mass data storage method comprising:

[0070] 步骤101 :利用Hadoop云计算平台的Hbase存储海量语义数据。 [0070] Step 101: the use of semantic data store massive Hbase Hadoop cloud computing platform.

[0071] Hbase的表格类似于属性表,每一列都可以表示一个属性,但相对于属性表,Hbase 的表格支持稀疏数据和多值数据的存储。 [0071] Hbase table similar to the attribute table, each column may represent a property, but the property with respect to the table, the table supports Hbase storing sparse data and multi-value data. 表1显示了一个Hbase表的例子: Table 1 shows an example of a table Hbase:

[0072] [0072]

Figure CN101950297AD00071

[0073] 表1 [0073] TABLE 1

[0074] 表1显示了“张三”和“北京大学”这两个实体的信息,张三有“出生于”和“毕业于” 两个属性,而在“成立于”和“校长”这两个属性列上没有值,在实际存储时,这两列对于“张三”这个实体不消耗任何存储空间,这就解决了数据稀疏时带来的存储空间浪费问题。 [0074] Table 1 shows the message "Joe Smith" and "Peking University" These two entities, Joe Smith has a "born" and "graduated" two properties, while "established in" and "Principal" This no value on the two properties listed in the actual storage, two columns for the "San" this entity does not consume any storage space, which would address the problems associated with waste storage space when data is sparse. “北京大学”这个实体在“校长”这个属性列有三个值,分别为“周其凤”、“许智宏”、“陈佳洱”, 这三个值分别对应一个时间戳,也就是说,这个表是三维的,这就解决了数据的多值问题。 "Peking" entity in the "principal" attribute lists the three values, namely "ZHOU Qi", "Xu Zhihong", "NATIONAL NATURAL", these three values ​​correspond to a time stamp, it means that this table is a three-dimensional , which solves the problem of multi-value data.

[0075] 从表1我们可以看出,只要把语义数据的谓词都对应到表中的属性列中,对于三元组< 主体,谓词,客体>,三元组的主体对应一个Rowkey,三元组的客体对应一行中某一属性列的值,然后这张大表就可以把所有的三元组信息全部表示出来。 [0075] We can see from Table 1, as long as the verb semantic data corresponds to the attribute list, for the triple <subject, predicate, object>, corresponding to a body RowKey triple, three yuan the object set corresponding values ​​of a property of a column in a row, then this will be a large table all the information of all the triples represented.

[0076] 步骤102 :利用Hadoop云计算平台的Hbase存储海量语义数据的索引。 [0076] Step 102: using the index Hadoop cloud mass Hbase semantic data storage platform.

[0077] 为了提高查询速度,对数据建立索引是必不可少的。 [0077] In order to speed up the search, indexing of data is essential. 考虑到索引表的规模也是很大的,所以索引表也存储在Hbase中。 Considering the size of the index table is great, so the index table is also stored in the Hbase. 对于语义数据中每一个不同的谓词,都建立一张索引表,也就是说,每一张索引表都对应了唯一一个的谓词。 For each semantic data in a different predicate, we have established an index table, that is to say, each one corresponding to the index table are the only one predicate. 所以表的Rowkey (即行键)是一个客体,这一行对应的每一列都是一个主体,这个主体和该表所对应的谓词以及行键客体组成的三元组是原始语义数据所包含的三元组。 Therefore RowKey table (i.e., row key) is an object, each column corresponding to this line is a body, the body and the triple table corresponding to the row of keys predicate and object component contained in the original semantic data ternary group. 表2是一个索引表的例子: Table 2 is an example of an index table:

[0078] [0078]

Figure CN101950297AD00072

[0079]表 2 [0080] 表2是关于谓词“出生于”的索引表的一个片段,可以看出,第一行表示了所有出生于北京的人,第一行可以还原成4个三元组:< 张三,出生于,北京X李四,出生于,北京X王五,出生于,北京X李六,出生于,北京〉。 [0079] Table 2 [0080] Table 2 is a table regarding segment index predicate "born" can be seen, a first row shows that all people born in Beijing, the first four rows may be reduced to three yuan group: <Joe Smith, born in, Beijing X John Doe, born in, Beijing X Wang Wu, born in, Beijing Li X VI, was born in, Beijing>.

[0081] 步骤103 :利用MapReduce云计算编程模型将海量语义数据导入Hbase。 [0081] Step 103: using the MapReduce programming model cloud mass semantic data import Hbase.

[0082] 由于海量语义数据的规模非常大,如果采用传统的单机导入方式会消耗大量时间,同时为了充分利用集群中各个节点的计算资源,我们采用了MapReduce的编程模型来分布式的进行语义数据的导入。 [0082] Due to the massive scale of semantic data is very large, if the traditional stand-alone import way consumes a lot of time, and in order to take full advantage of the computing resources of each node in the cluster, we use the MapReduce programming model for distributed semantic data import.

[0083] MapReduce编程模型主要分为Map函数和Reduce函数。 [0083] MapReduce programming model is divided into functions and Map Reduce function. Map函数将输入的键值对处理并输出新的键值对,即<kl,vl>-Xk2, v2>。 Map function key input processing and output a new key pair, i.e. <kl, vl> -Xk2, v2>.

[0084] 然后合并函数combine会将Map的输出的具有相同key的键值对合并在一起,组成<k2,List<v2»发送给Reduce函数。 [0084] Map will combine and merge function keys have the same key to output combined to form <k2, List <v2 »to a Reduce function. Reduce函数对输入的<k2,List<v2»进行处理。 Reduce function of input <k2, List <v2 »processing. MapReduce的工作过程是这样的:将原始数据切分成若干份,然后将每一份发送给集群中的一个节点,每个节点根据预先定义好的Map函数对收到的切分数据进行处理,combine函数将所有的节点的Map结果进行合并,并发送给相应的Reduce函数完成相应工作。 MapReduce working process is such that: the original data is divided into several parts by cutting, and then to send a copy of each node in the cluster, each node according to the good function Map data segmentation processes the received pre-defined, Combine Map function will result in all the nodes are combined and sent to the appropriate Reduce function finished their job.

[0085] 在数据导入中,Map函数和Reduce函数如下所示: [0085] In the data lead-in, Map and Reduce function functions as follows:

[0086] Map :<文件行号,三元组>_><主体,谓词+客体> [0086] Map: <line numbers triple> _> <subject, predicate + object>

[0087] Reduce :<主体,List (谓词+客体)> 导入数据库 [0087] Reduce: <body, List (predicate + object)> into the database

[0088] Map函数处理过程如图5a所示;Reduce函数处理过程如图5b所示 [0088] Map function processing procedure shown in Figure 5a; 5b the Reduce function processing procedure shown in FIG.

[0089] 由上图可知,由于多个节点同时在运行Map函数,这样就可以并行的处理多个三元组,所以用这种方法导入数据库节约了大量的时间。 [0089] apparent from the figure, since a plurality of nodes are running simultaneously Map function, so that parallel processing of a plurality of triples, so that this method of introducing the database saves a lot of time.

[0090] 步骤104 :利用MapReduce云计算编程模型对海量语义数据建立索引。 [0090] Step 104: indexing the semantic data using massive computing cloud MapReduce programming model.

[0091] 在建立索引时,Map函数和Reduce函数如下所示: [0091] In indexing, Map and Reduce function functions as follows:

[0092] Map :<文件行号,三元组>_><客体+谓词,主体> [0092] Map: <line numbers triple> _> <+ verb object, body>

[0093] Reduce :〈客体+谓词,List (主体)> 导入数据库 [0093] Reduce: <predicate + object, List (body)> into the database

[0094] Map函数处理过程如图6a所示;Reduce函数处理过程如图6b所示 [0094] Map function processing procedure shown in Figure 6a; 6b the Reduce function processing procedure shown in FIG.

[0095] 参见图2,实施例中,海量语义数据查询的方法包括: [0095] Referring to Figure 2, in this embodiment, the method comprises mass query semantic data Embodiment:

[0096] 步骤201 :记录语义数据的统计信息,利用语义数据的统计信息进行查询优化。 [0096] Step 201: the recording of semantic data statistics, statistical information semantic query optimization data.

[0097] 语义数据的查询一般形式为如下: Query general form [0097] The semantic data are as follows:

[0098] Queryl : ? [0098] Queryl:? ρ1<hasAcademicAdvi sor> ? ρ1 <hasAcademicAdvi sor>? p2 p2

[0099] ? [0099]? pl<bornIn> ? pl <bornIn>? cl cl

[0100] ? [0100]? cKlocatedln) "Switzerland,, cKlocatedln) "Switzerland ,,

[0101] ? [0101]? p2<bornIn> ? p2 <bornIn>? c2 c2

[0102] ? [0102]? c2<locatedIn> "Germany" c2 <locatedIn> "Germany"

[0103] 其中,带“?”前缀的是变量,可见一个查询包含多个条件,那么这些条件的查询顺序就会影响到查询的速度。 [0103] in which, with a "?" Prefix is ​​variable, showing a query contains multiple conditions, then these conditions the query sequence will affect the speed of queries. 所以可以通过预先统计每个谓词所关联的三元组个数来决定查询条件的优先级。 Therefore, priority may be determined in advance by the query count the number of triplets associated with each predicate. 例如,对于如下所示的这个文件片段: For example, as shown below for the file fragment:

[0104] 张三出生于北京 [0104] Zhang was born in Beijing

[0105] 张三导师李四[0106] 张三毕业于北京大学 [0105] Zhang tutor Doe [0106] Zhang from Peking University

[0107] 李四出生于北京 [0107] John Doe was born in Beijing

[0108] 李四毕业于北京大学 [0108] John Doe graduated from Peking University

[0109] 统计结果为出生于:2,导师:1,毕业于:2,那么在一个查询中,包含“出生于”和“毕业于”的条件就要优先于“导师”。 [0109] was born in statistics: 2, tutor: 1, graduated: 2, then a query like "born" and "graduated" conditions necessary to override a "mentor."

[0110] 步骤202 :对语义数据进行比特序列化,利用比特序列的方法进行查询优化。 [0110] Step 202: The semantics of the data bit sequence, the query is optimized by the method of the bit sequence.

[0111] 下面这个查询是一个所有主体、客体都是变量的查询: [0111] The following example is a query for all subjects, objects are variables:

[0112] Query2 : ? [0112] Query2:? pl<isMarriedTo> ? pl <isMarriedTo>? p2 p2

[0113] ? [0113]? pl<bornIn> ? pl <bornIn>? cl cl

[0114] ? [0114]? p2<diedln> ? p2 <diedln>? cl cl

[0115] 如果直接对这个query查索引,那么每个条件都要得到很多的候选集,得到每个候选集都需要大量的IO操作,而且要对得到的候选集做交集,这样会极大的影响查询速度。 [0115] If the query directly to the search index, each condition must get a lot of candidate sets to give each candidate set requires a lot of IO operation, but also to make the intersection of candidate sets to get, this will greatly affect query speed. 所以采用了另外一种方法,即利用将语义数据进行比特序列化。 So using another method, i.e., by using the bit sequence of semantic data.

[0116] 所谓比特序列化,就是对Hbase语义数据表的每一行设置一个比特序列,这个比特序列的长度为所有属性列的个数,亦即语义数据中所出现的不同谓词的个数。 [0116] called the sequence of bits, each row is Hbase semantic data table to a bit sequence, the bit sequence length is the number of all the properties of the column, the number of different predicates in the semantic data that is occurring. 对于该行, 如果某一属性列有值,则该比特位设置为“1”,否则,设置为“0”。 For this line, if there is a column property value, the bit is set to "1", otherwise, set to "0." 由于每一行都代表一个实体,所以我们就可以得到所有实体的比特序列,从而知道每个实体具体都有哪些属性。 Since each row represents an entity, so we can get a bit sequence of all entities, so they know what are the specific attributes of each entity. 这些比特序列所占的空间要比数据表和索引表小很多,所以可以直接放在内存中。 Space than the data and index tables to these small bit sequences share a lot, so can be placed directly in memory. 另外,这些比特序列按照树的结构存储,就可以对数的时间开销来查找满足条件的实体。 Further, according to the bit strings stored in the tree structure, the number of overhead time would be to find the entity to meet the conditions. 对于Query2,就可以直接查找比特序列中<isMarriedTo>和<bornIn>这两位同时为“ 1”的实体,这种查询速度要快很多。 For Query2, you can directly find the bit sequence <isMarriedTo> and <bornIn> while the two entities "1", this query speed much faster.

[0117] 步骤203 :利用RMI技术实现集群中多个节点同时进行查询。 [0117] Step 203: RMI technology using a plurality of nodes in the cluster simultaneously query.

[0118] 由于MapReduce编程框架解决批量处理任务具有很大优势,但并不适合于实时任务,所以为了实现分布式的查询操作,我们采用了RMI技术。 [01] Since the MapReduce programming framework to solve the batch processing tasks has a great advantage, but not suitable for real-time tasks, so in order to achieve a distributed query operation, we use the RMI technology.

[0119] 对于Queryl来说,查询的第一步可能会找到所有位于“Germany”的城市,然后再遍历每一个城市来判断剩下的查询条件。 [0119] For Queryl, the first step may be to find all the query is located in "Germany" in the city, and then through each city to determine the rest of the query. 如果得到的城市非常多,例如有几万个甚至几十万个,那么遍历的速度就会很慢。 If you get a lot of cities, for example, there are tens of thousands or even hundreds of thousands, then traverse speed will be very slow. 这样我们利用RMI技术,在集群的各个节点都开设查询服务,然后将得到的这些城市分成若干份,分别发送给不同的节点服务程序去查询,最后各个节点将查询结果汇总到主控节点,这样就可以充分利用集群的计算资源,实现实时的分布式计算,大大提高了查询速度。 We use RMI technology, each node in the cluster have set up tracking, then these will be divided into several parts of the city were sent to different nodes Service program to query, each node will last query results are summarized to the master node, so you can take advantage of cluster computing resources, enabling real-time distributed computing, greatly improve query speed.

[0120] 根据上述海量语义数据存储的方法,可以构造一种海量语义数据存储的装置,参见图3,包括:数据导入单元310,以及建立索引单元320。 [0120] The above-described method of semantic data mass storage, an apparatus may be constructed semantic mass data storage, see FIG. 3, comprising: a data import unit 310, and the indexing unit 320.

[0121] 数据导入单元310 :利用MapReduce云计算编程模型将海量语义数据导入Hbase ; [0121] data import unit 310: Cloud computing MapReduce programming model using the semantic data import Hbase mass;

[0122] 建立索引单元320 :利用MapReduce云计算编程模型对海量语义数据建立索引。 [0122] index unit 320: the cloud using the MapReduce programming model mass index of semantic data.

[0123] 在本发明实例中,只需要将原始的语义数据三元组文件输入到数据导入单元310 和建立索引单元320,就可以直接将语义数据导入Hbase数据库中,并建立相应的索引表, 具体实现原理已经在方法里说明,不再赘述。 [0123] In the example of the present invention, simply enter the original semantic data files to triple data import unit 310 and the indexing unit 320, the semantic data can be directly introduced into Hbase database and establish the corresponding index table, in the specific implementation method in principle has been explained, not repeat them.

[0124] 根据上述海量数据查询的方法,可以构造一种海量数据查询的装置,参见图4,包括:数据统计单元410,比特序列化单元420,查询优化单元430,查询分割单元440,查询合并单元450,查询结果显示单元460。 [0124] The above-described method for massive data query, query data can construct a massive apparatus, see FIG. 4, comprising: a statistics unit 410, a bit sequence unit 420, query optimization unit 430, dividing unit 440 queries, merging the query unit 450, a display unit 460 query results.

[0125] 数据统计单元410 :记录语义数据的统计信息; [0125] Statistics unit 410: data records semantic information;

[0126] 比特序列化单元420 :对语义数据进行语义序列化; [0126] unit 420 bit sequence: The sequence of semantic semantic data;

[0127] 查询优化单元430 :利用统计信息和比特序列进行查询优化; [0127] Query optimization unit 430: using statistics and query optimization bit sequence;

[0128] 查询分割单元440 :将查询分割成若干部分,利用RMI技术送至集群中的各个节点进行分布式查询; [0128] dividing unit 440 queries: query is divided into several parts, using RMI technology to the respective nodes in the cluster is distributed query;

[0129] 查询合并单元450 :将各个节点的查询结果进行合并。 [0129] The merging unit 450 queries: query result of each node are combined.

[0130] 查询结果显示单元460:用于显示用户输入的查询图和满足查询条件的查询结果。 [0130] Search results display unit 460: input a query graph user and for displaying results of the query satisfy the query criteria.

[0131] 其中410-450这五个单元的实现原理已经在相关方法中说明,不再赘述。 [0131] The principle 410-450 wherein the five cells has been described in the relevant process, will not be repeated. 对于查询结果显示单元460,根据用户输入的查询条件,可以将查询以图的形式表现出来,从而更直观的让用户了解自己输入的查询结构。 For the query result display unit 460, based on user input query, the query can be manifested in the form of graphs, allowing users to more intuitively understand the structure of the query you entered.

[0132] 例如,对于Query2,显示的查询图如图7所示,查询图可以反馈给用户更准确的查询意图。 [0132] For example, for Query2 query shown in FIG. 7, the query graph can be a more accurate feedback to the user query intentions.

[0133] 综上所述,本发明实例中,以Hadoop为依托,搭建了一个云计算的平台,用来解决海量语义数据的存储和查询问题。 [0133] As described above, examples of the present invention, the Hadoop based, build a cloud computing platform, to solve the problem of mass storage and query semantic data. 首先将语义中的谓词映射到Hbase表的属性列中,然后对语义数据建立了索引和比特序列,实现了存储的分布式,解决了当处理海量数据时单机系统遇到的存储瓶颈和管理困难的问题,同时,如果数据规模继续扩大,只需要增加集群的从属节点,具有良好的可扩展性。 First, the semantic attributes are mapped to columns predicate Hbase table, and semantic indexing and data bit sequence is established to achieve a distributed storage, to solve difficulties when processing mass data storage management and stand-alone system bottlenecks encountered the problem, at the same time, if the data continued to expand, just add a slave node cluster, with good scalability. 另外,在解决海量数据查询的问题时,本发明实例利用了RMI技术,将查询进行了分割,使得集群的各个节点可以同时处理所得到的分割查询,使得查询速度更快,效率更高。 Further, when the mass data query problem, the present invention utilizes the RMI technology example, the query was divided, such that each node of the cluster dividing process can be obtained while the query, the query so that faster and more efficient.

[0134] 显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。 [0134] Obviously, those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. 这样,倘若对本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 Thus, if part of the claimed invention for such modifications and variations within the scope of the present invention and equivalents thereof, the present invention intends to include these modifications and variations.

Claims (12)

  1. 一种海量语义数据存储的方法,其步骤包括:1)基于云计算搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;2)集群中的每个节点都建立分布式数据库,将语义数据以三元组<主体,谓词,客体>的形式表示,映射到分布式数据库;利用云计算编程模型将语义数据导入分布式数据库;3)对语义数据建立索引,利用云计算编程模型存储在分布式数据库。 One kind of semantic mass data storage method, comprising the steps of: 1) a cloud-based structures having a plurality of computer nodes in a cluster, the cluster comprising a master node and a plurality of slave nodes; 2) for each node in the cluster distribution established database, semantic data is represented by a triple <subject, predicate, object> form, mapped to a distributed database; cloud computing programming model semantic import data distributed database; 3) indexed semantic data, using the cloud computing programming models are stored in a distributed database.
  2. 2.如权利要求1所述的方法,其特征在于,所述步骤2)映射的过程为:三元组的谓词都对应到分布式数据库属性列中,三元组的主体对应一个行键,三元组的客体对应一行中某一属性列的值。 2. The method according to claim 1, wherein said step 2) the mapping process as follows: predicate triads corresponds to a distributed database attribute column, the body of a triple line of a corresponding key, triple object attribute value in a column corresponding to one row.
  3. 3.如权利要求1所述的方法,其特征在于,所述步骤3)编程模型为MapReduce编程模型,分为Map函数和Reduce函数,在数据导入中,Map函数和Reduce函数实现如下所示:Map :<文件行号,三元组>_><主体,谓词+客体>Reduce :<主体,List (谓词+客体)> 导入数据库。 3. The method according to claim 1, wherein said step 3) programming model MapReduce programming model, and into the Map function Reduce function, data import, the Map function and Reduce function implementation is as follows: map: <line numbers triple> _> <subject, predicate + object> Reduce: <body List (predicate + object)> into the database.
  4. 4.如权利要求1所述的方法,其特征在于,所述步骤3)建立索引时,每一张索引表都对应了唯一一个的谓词,索引表的行键是一个客体,该行对应的每一列都是一个主体。 4. The method according to claim 1, wherein said step 3) is indexed, each table corresponds to a unique index of a verb, the row index table is a key object, corresponding to the line each column is a body.
  5. 5.如权利要求1所述的方法,其特征在于,所述步骤3)编程模型为MapReduce编程模型,分为Map函数和Reduce函数,建立索引时,Map函数和Reduce函数实现如下所示:Map :<文件行号,三元组>_><客体+谓词,主体>Reduce :〈客体+谓词,List (主体)> 导入数据库。 5. The method according to claim 1, wherein said step 3) programming model MapReduce programming model, and into the Map function Reduce function, indexing, and Map function Reduce function implementation is as follows: Map : <line numbers triple> _> <+ verb object, body> the Reduce: <predicate + object, List (body)> into the database.
  6. 6. 一种海量语义数据查询的方法,其步骤包括:6-1)基于云计算搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;6-2)集群中的每个节点都建立分布式数据库,存储有语义数据,语义数据以三元组< 主体,谓词,客体〉的形式表示;6-3)集群的各个节点都开设查询服务,将某一查询条件分成若干份,分别发送给不同的节点去查询,最后各个节点将查询结果汇总到主控节点。 6. A method of mass query semantic data, comprising the steps of: 6-1) of cloud-based structures having a plurality of computer nodes in a cluster, the cluster comprising a master node and a plurality of slave nodes; 6-2) in the cluster each node can build a distributed database, semantic data is stored, the semantic data are presented as triple <subject, predicate, object> form; 6-3) nodes of a cluster are set up queries, the query into a Some parts were sent to different nodes to query the last query results are summarized respective nodes to the master node.
  7. 7.如权利要求6所述的方法,其特征在于,所述步骤6-2)对语义数据进行比特序列化, 数据库的每一行设置一个比特序列,如果某一属性列有值,则该比特位设置为1,否则,设置为O。 7. The method according to claim 6, wherein said step 6-2) the semantic data bit sequence of each row of the database is provided a sequence of bits, if there is an attribute value of the column, the bits bit is set to 1, otherwise it is set to O.
  8. 8.如权利要求6所述的方法,其特征在于,所述步骤6-3)通过预先统计语义数据每个谓词所关联的三元组个数决定查询条件的优先级。 8. The method according to claim 6, wherein said step 6-3) the number of triples associated statistical semantic data in advance determines the priority of each query predicate condition.
  9. 9.如权利要求6所述的方法,其特征在于,所述步骤6-3)利用RMI方法送至集群中的各个节点进行分布式查询。 9. The method according to claim 6, wherein said step 6-3) using the RMI to the respective nodes in the cluster is distributed query.
  10. 10. 一种海量语义数据存储的装置,其特征在于,基于云计算平台上搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;所述节点上包括:数据导入单元:利用云计算编程模型将海量语义数据导入各节点的分布式数据库;建立索引单元:利用云计算编程模型对语义数据建立索引。 A semantic data mass storage device, characterized in that the building having a plurality of computer nodes of the cluster based on the cloud computing platform, the cluster comprising a master node and a plurality of slave nodes; the node comprising: data introducing means : cloud computing programming model semantic data import massive distributed database of each node; indexing unit: cloud computing programming semantic data model index.
  11. 11. 一种海量语义数据查询的装置,其特征在于,基于云计算平台上搭建具有多个计算机节点的集群,集群包括一个主控节点和若干个从属节点;所述节点上包括:查询分割单元:将查询分割成若干部分,送至集群中的各个节点进行分布式查询; 查询合并单元:将各个节点的查询结果进行合并;查询结果显示单元:用于显示用户输入的查询图和满足查询条件的查询结果。 11. An apparatus query semantic data mass, wherein a plurality of structures having a cluster of computer nodes based on a cloud computing platform, the cluster comprising a master node and a plurality of slave nodes; the node comprising: dividing means Query : the query is divided into several parts, each node in the cluster to the distributed; queries combining unit: each node of the query results are combined; search-result display unit: FIG queries and for displaying a user input that satisfy the query query results.
  12. 12.如权利要求11所述的装置,其特征在于,还包括:数据统计单元,用于记录语义数据的统计信息,语义数据每个谓词所关联的三元组个数; 比特序列化单元:对语义数据进行语义序列化; 查询优化单元:利用统计信息和比特序列进行查询优化。 12. The apparatus according to claim 11, characterized in that, further comprising: a statistics unit, statistical information for the semantic data recording, the number of triplets associated with each verb semantic data; bit sequence unit: semantic semantic data serialization; query optimization unit: using statistics and query optimization bit sequence.
CN 201010279073 2010-09-10 2010-09-10 Method and device for storing and inquiring mass semantic data CN101950297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010279073 CN101950297A (en) 2010-09-10 2010-09-10 Method and device for storing and inquiring mass semantic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010279073 CN101950297A (en) 2010-09-10 2010-09-10 Method and device for storing and inquiring mass semantic data

Publications (1)

Publication Number Publication Date
CN101950297A true true CN101950297A (en) 2011-01-19

Family

ID=43453799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010279073 CN101950297A (en) 2010-09-10 2010-09-10 Method and device for storing and inquiring mass semantic data

Country Status (1)

Country Link
CN (1) CN101950297A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270232A (en) * 2011-07-21 2011-12-07 华中科技大学 Storing semantic data query system optimization
CN102402606A (en) * 2011-11-28 2012-04-04 中国科学院计算机网络信息中心 High-efficiency text data mining method
CN102769642A (en) * 2011-06-10 2012-11-07 上海子鼠云计算技术有限公司 Mobile cloud memory system and implementation method of mobile cloud memory
CN102841944A (en) * 2012-08-27 2012-12-26 南京云创存储科技有限公司 Method achieving real-time processing of big data
WO2013026287A1 (en) * 2011-08-25 2013-02-28 中兴通讯股份有限公司 Data control method and system based on distributed database system
CN103023992A (en) * 2012-11-28 2013-04-03 江苏乐买到网络科技有限公司 Mass data distributed storage method
CN103108000A (en) * 2011-11-09 2013-05-15 中国移动通信集团公司 Task synchronization method and system and host node and work nodes in system
CN103136363A (en) * 2013-03-14 2013-06-05 曙光信息产业(北京)有限公司 Inquiry processing method and cluster data base system
CN103327128A (en) * 2013-07-23 2013-09-25 百度在线网络技术(北京)有限公司 Intermediate data transmission method and system for MapReduce
CN103338261A (en) * 2013-07-04 2013-10-02 北京泰乐德信息技术有限公司 Storage and processing method and system of rail transit monitoring data
CN103455374A (en) * 2012-06-05 2013-12-18 阿里巴巴集团控股有限公司 Method and device for distributed computation on basis of MapReduce
CN103491158A (en) * 2013-09-18 2014-01-01 万达信息股份有限公司 Nearby-computing cloud computing framework
CN103488704A (en) * 2013-09-06 2014-01-01 乐视致新电子科技(天津)有限公司 Method and device for storing data
CN103500173A (en) * 2013-09-03 2014-01-08 北京泰乐德信息技术有限公司 Method for inquiring rail transit monitoring data
CN104111936A (en) * 2013-04-18 2014-10-22 阿里巴巴集团控股有限公司 Method and system for querying data
CN104317896A (en) * 2014-10-24 2015-01-28 浪潮软件股份有限公司 Distributed comparison collision method on basis of mass data
CN104376053A (en) * 2014-11-04 2015-02-25 南京信息工程大学 Storage and retrieval method based on massive meteorological data
CN105022833A (en) * 2015-08-10 2015-11-04 浪潮(北京)电子信息产业有限公司 Data processing method, nodes and monitoring system
CN103810224B (en) * 2012-11-15 2017-04-12 阿里巴巴集团控股有限公司 Information persistence and query methods and apparatus
CN103617232B (en) * 2013-11-26 2018-03-30 北京京东尚科信息技术有限公司 One for paging query HBase table method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826005A (en) * 2009-06-09 2010-09-08 张艳红 Multi-dimensional image and video information mining and three-dimensional visual search engine software
CN101826092A (en) * 2009-08-24 2010-09-08 张艳红 Image search engine based on sequencing simulation technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826005A (en) * 2009-06-09 2010-09-08 张艳红 Multi-dimensional image and video information mining and three-dimensional visual search engine software
CN101826092A (en) * 2009-08-24 2010-09-08 张艳红 Image search engine based on sequencing simulation technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《现代图书情报技术》 20070825 吴宝贵等 基于Map/Reduce的分布式搜索引擎研究 , 第08期 2 *
《电信科学》 20100531 吴吉义等 云数据管理研究综述 , 第05期 2 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102769642A (en) * 2011-06-10 2012-11-07 上海子鼠云计算技术有限公司 Mobile cloud memory system and implementation method of mobile cloud memory
CN102270232A (en) * 2011-07-21 2011-12-07 华中科技大学 Storing semantic data query system optimization
CN102270232B (en) 2011-07-21 2012-09-26 华中科技大学 Semantic data query system with optimized storage
CN102955801A (en) * 2011-08-25 2013-03-06 中兴通讯股份有限公司 Data control method and data control system based on distributed database system
CN102955801B (en) * 2011-08-25 2017-06-16 中兴通讯股份有限公司 Data control method and system based on distributed database system
WO2013026287A1 (en) * 2011-08-25 2013-02-28 中兴通讯股份有限公司 Data control method and system based on distributed database system
CN103108000A (en) * 2011-11-09 2013-05-15 中国移动通信集团公司 Task synchronization method and system and host node and work nodes in system
CN102402606A (en) * 2011-11-28 2012-04-04 中国科学院计算机网络信息中心 High-efficiency text data mining method
CN103455374B (en) * 2012-06-05 2016-10-19 阿里巴巴集团控股有限公司 MapReduce one kind of distributed computing based on the methods and apparatus
CN103455374A (en) * 2012-06-05 2013-12-18 阿里巴巴集团控股有限公司 Method and device for distributed computation on basis of MapReduce
CN102841944A (en) * 2012-08-27 2012-12-26 南京云创存储科技有限公司 Method achieving real-time processing of big data
CN103810224B (en) * 2012-11-15 2017-04-12 阿里巴巴集团控股有限公司 Information persistence and query methods and apparatus
CN103023992A (en) * 2012-11-28 2013-04-03 江苏乐买到网络科技有限公司 Mass data distributed storage method
CN103136363A (en) * 2013-03-14 2013-06-05 曙光信息产业(北京)有限公司 Inquiry processing method and cluster data base system
CN104111936B (en) * 2013-04-18 2017-12-05 阿里巴巴集团控股有限公司 Method and system for data query
CN104111936A (en) * 2013-04-18 2014-10-22 阿里巴巴集团控股有限公司 Method and system for querying data
CN103338261A (en) * 2013-07-04 2013-10-02 北京泰乐德信息技术有限公司 Storage and processing method and system of rail transit monitoring data
CN103338261B (en) * 2013-07-04 2016-06-29 北京泰乐德信息技术有限公司 Storage and processing method and system for traffic monitoring data track
CN103327128A (en) * 2013-07-23 2013-09-25 百度在线网络技术(北京)有限公司 Intermediate data transmission method and system for MapReduce
CN103500173B (en) * 2013-09-03 2017-07-28 北京泰乐德信息技术有限公司 Query method for rail traffic monitoring data
CN103500173A (en) * 2013-09-03 2014-01-08 北京泰乐德信息技术有限公司 Method for inquiring rail transit monitoring data
CN103488704B (en) * 2013-09-06 2016-10-05 乐视致新电子科技(天津)有限公司 A data storage method and apparatus
CN103488704A (en) * 2013-09-06 2014-01-01 乐视致新电子科技(天津)有限公司 Method and device for storing data
CN103491158A (en) * 2013-09-18 2014-01-01 万达信息股份有限公司 Nearby-computing cloud computing framework
CN103617232B (en) * 2013-11-26 2018-03-30 北京京东尚科信息技术有限公司 One for paging query HBase table method
CN104317896A (en) * 2014-10-24 2015-01-28 浪潮软件股份有限公司 Distributed comparison collision method on basis of mass data
CN104376053A (en) * 2014-11-04 2015-02-25 南京信息工程大学 Storage and retrieval method based on massive meteorological data
CN104376053B (en) * 2014-11-04 2017-12-22 南京信息工程大学 A storage and retrieval method based on meteorological data mass
CN105022833A (en) * 2015-08-10 2015-11-04 浪潮(北京)电子信息产业有限公司 Data processing method, nodes and monitoring system

Similar Documents

Publication Publication Date Title
US20130238667A1 (en) Database, apparatus, and method for storing encoded triples
Sun et al. Scalable rdf store based on hbase and mapreduce
Moniruzzaman et al. Nosql database: New era of databases for big data analytics-classification, characteristics and comparison
US20080086464A1 (en) Efficient method of location-based content management and delivery
US20140012884A1 (en) Optimizing sparse schema-less data in data stores
US20110302194A1 (en) Scalable rendering of large spatial databases
Emani et al. Understandable big data: a survey
Lee et al. Scaling queries over big RDF graphs with semantic hash partitioning
Kang et al. Hadi: Fast diameter estimation and mining in massive graphs with hadoop
US20130282765A1 (en) Optimizing sparse schema-less data in relational stores
Nandi et al. Distributed cube materialization on holistic measures
CN102467570A (en) Connection query system and method for distributed data warehouse
CN102521246A (en) Cloud data warehouse system
CN101916261A (en) Data partitioning method for distributed parallel database system
US20100131564A1 (en) Index data structure for a peer-to-peer network
US20140351239A1 (en) Hardware acceleration for query operators
CN103678665A (en) Heterogeneous large data integration method and system based on data warehouses
Choi et al. SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data
CN101436192A (en) Method and apparatus for optimizing inquiry aiming at vertical storage type database
CN102867059A (en) Method and system for processing data in treelike structures
US20140324876A1 (en) Management of a database system
CN102662952A (en) Chinese text parallel data mining method based on hierarchy
US20120054173A1 (en) Transforming relational queries into stream processing
CN103390038A (en) HBase-based incremental index creation and retrieval method
CN102915373A (en) Data storage method and device

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C12 Rejection of an application for a patent