CN105608228A - High-efficiency distributed RDF data storage method - Google Patents

High-efficiency distributed RDF data storage method Download PDF

Info

Publication number
CN105608228A
CN105608228A CN201610064516.1A CN201610064516A CN105608228A CN 105608228 A CN105608228 A CN 105608228A CN 201610064516 A CN201610064516 A CN 201610064516A CN 105608228 A CN105608228 A CN 105608228A
Authority
CN
China
Prior art keywords
data
predicate
triple
name
back end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610064516.1A
Other languages
Chinese (zh)
Other versions
CN105608228B (en
Inventor
吴志坚
黎建辉
周园春
侯艳飞
韩岳岐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201610064516.1A priority Critical patent/CN105608228B/en
Publication of CN105608228A publication Critical patent/CN105608228A/en
Application granted granted Critical
Publication of CN105608228B publication Critical patent/CN105608228B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a high-efficiency distributed RDF data storage method. The method comprises the following steps that: 1) a user selects a named graph or sets a new named graph for each triple to be uploaded, and sets an effective predicate and a triple thereof for the triple according to service requirements; 2) a data control system analyzes each triple in the RDF uploaded by the user, extracts a predicate of the triple and the effective predicate of the named graph of the triple, and then splits the triple into two triples, namely the triple of a complete predicate of the same subject and the triple of the effective predicate of the same subject, with the same unique identifier according to the effective predicate, wherein the effective predicate is a part of predicate of the complete predicate; 3) the data control system respectively stores the triple data of the complete predicate of the same subject and the triple data of the effective predicate of the same subject to different database clusters. According to the high-efficiency distributed RDF data storage method, the high availability of data is improved.

Description

A kind of efficient distributed RDF date storage method
Technical field
The present invention relates to RDF technical field of data storage, particularly efficient distributed RDF date storage method, belongs toComputer software fields.
Background technology
Along with the high speed development of Internet technology, make the range of application of internet more and more extensive, and form one hugeKnowledge network storehouse, but simultaneously also bring a lot of challenges, for multi-form knowledge network storehouse is coupled together, allows calculating functionEnough understand contacting between data and data, proposed the concept of semantic net. The target of semantic net is to allow information resources on networkCan be by machine perception, thus realize the automation processing of network information resource, to adapt to the rapid growth of network information resource.
Semantic net defines a kind of resource description framework RDF and describes the information resources on network. RDF is an Internet resources objectThe data model of relation therebetween, provides a general data model to support the description to Internet resources, and RDF uses ternaryGroup (subject, predicate and object) is described various resources on network and the relation between them. From the angle analysis of figure, this mouldType is to be made up of the limit between node and node, and node represents subject and object, and limit represents predicate, so can represent with nodeResource, limit represents the attribute of resource.
To RDF data, storage generally adopts unit RDF data base management system at present, such as: GraphDB, stardog andAllegrograph etc. This RDF storage mode can be managed a large amount of triple data, but along with internet information resourceRapid growth, the storage capacity of unit is limited, can not meet the demand of current magnanimity triple data storage. For magnanimityThe scholar that stores of triple data has proposed kinds of schemes, but is all in conceptual phase. Such as using Hadoop or HbaseDistributed type assemblies storage triple data, due to Hadoop or all natural storage administration abilities with mass data of Hbase,And adopt mapreduce simulation to realize data query; But because making the triple data of the same subject, deposits this storage modeStorage has dispersiveness, and the triple of the same subject may be stored in many machines; Add the complexity of RDF data correlation relation,Between each triple, likely have incidence relation, mapreduce simulation realizes data query scheme while carrying out data query,Need to carry out a large amount of data correlation screenings, current storage scheme can not realize the inquiry to data high-speed, query performance comparisonLow, in the situation that particularly data volume is very large, a simple inquiry may just need to be carried out tens seconds, can not meet actualService inquiry demand.
Summary of the invention
For the problem running in RDF data above-mentioned storage, the present invention proposes a kind of efficient distributed RDFDate storage method, solves the problem that in existing RDF data storage method, memory data output is limited, triple data are disperseed.
For addressing the above problem, the present invention proposes a kind of efficient distributed RDF date storage method, the method is mainly wrappedDraw together following performing step:
1) the RDF data that data parser is uploaded user are resolved, and every triple Data Analysis is become to three of consolidation formUnit's group objects; Data after resolving are processed, are resolved and extract the predicate in triple, extract effective predicate of name figure,Effectively predicate defines by user's business demand, and user determines the predicate triple of using at present according to concrete business demand,Form the triple of effective predicate. According to effective predicate of this name figure, the triple Data Division of the same subject is become to twoPoint, i.e. the triple data of the triple data of the complete predicate of the same subject and effective predicate of the same subject; The same subjectThe triple data of complete predicate are the complete triple data of the same subject, the triple number of effective predicate of the same subjectAccording to the triple data of partial predicate that are the same subject, therefore the triple data of effective predicate of the same subject are same mastersThe triple data subset of the complete predicate of language. According to effective predicate of this name figure, the triple Data Division of the same subjectBecome two parts, i.e. the triple data of the triple data of the complete predicate of the same subject and effective predicate of the same subject; And rawBecome unique ID, the triple of unique this subject of sign, the triple of each subject all can generate this unique ID, for unique signThe triple of this subject, the triple data of the triple data of the complete predicate of the same subject and effective predicate of the same subject altogetherWith this unique ID.
2) data are divided into two parts and carry out storage administration, i.e. the triple data of the complete predicate of the same subject and having of the same subjectThe separately storage of the triple data of effect predicate. The increase income complete predicate of distributed NoSQL data-base cluster storage the same subject of useTriple data, in order to ensure the integrality of data, when predicate changes in demand, effective predicate triple data are carried out in the futureExpansion or reduction. Use the triple data of effective predicate of RDF data-base cluster storage the same subject, the having of the same subjectThe triple data of effect predicate are the triple data subsets of the complete predicate of the same subject, in the situation that storage capacity is constant,Memory space and the managerial ability of raising system triple data, reduced the data volume of triple and then improved efficiency data query;RDF data-base cluster is made up of back end, routing node and configuration node.
3) the effective predicate triple data dynamic extending in RDF data-base cluster. RDF data-base cluster is only stored same masterThe triple data of effective predicate of language, effectively predicate capable of dynamic changes, and when effectively predicate changes, first user submits meaning toWord is new task more, and the predicate of system upgrades the more new task of predicate that Mission Monitor module monitors user submits to, when user submits predicate toMore, after new task, this monitoring module starts more new task of predicate on backstage, detect which predicate and change, RDF data base setThe triple of group's storage also needs to change accordingly, when data management module is responsible for effective predicate variation, according to distributedThe triple data of predicate of changing in the triple data of the complete predicate of NoSQL data-base cluster storage the same subject are ledEnter in RDF data-base cluster, ensure the integrality of storage triple data.
Further, described triple and name figure (graphname), in RDF data, basic structure is the collection of multiple tripleClose, each triple is made up of a main body, a predicate and an object, and predicate represents the association pass between subject and objectSystem, each triple is also appreciated that as being made up of a subject, a predicate and an object. A series of such tripleBe called as a RDF figure, the title of definition RDF figure names figure (graphname), and name figure is exactly the space that data are preserved,Being equal to the concept of database in relevant database, is to define according to business demand in the time of user's uploading data, can selectSome name figure, also can add new name figure.
Further, described complete predicate and effectively predicate, the present invention is divided into two parts the predicate of the triple of the same subject,Be complete predicate and effective predicate; Complete predicate: all predicates that a certain name figure comprises, effectively predicate: user is according to businessDemand is self-defined, the predicate that in a certain name figure, user's current needs meeting uses; According to predicate information by three of the same subjectTuple is divided into two parts, i.e. the triple of complete predicate and the effectively triple of predicate.
Further, the triple data of the complete predicate of described the same subject and the triple data of the effective predicate of the same subject are separatelyStorage administration. Because the triple predicate of the same subject generally has multiplely, and most of predicate is redundant digit in actual demandAccording to, can in existing business demand, not use, but in the time that following demand changes, may use this part numberAccording to, in order to ensure the integrality of data, thus this partial data can not be lost, so adopt this pattern to draw dataDivide management, the triple data of the triple data of complete predicate and effective predicate are separately stored, and use unique ID to closeJoin this two parts data, the triple data that use the distributed NoSQL data-base cluster of increasing income to store complete predicate, are used RDFData-base cluster is stored the triple of effective predicate.
Further, described RDF data-base cluster is made up of back end, routing node and configuration node; Back end is mainCarry out data storage, formed by multiple standalone version RDF databases of increasing income; Route (routor) node is controlled back endSystem, comprises that Data Update, back end selection, data fragmentation and data are synchronous etc.; Configuration node (config) is to back endConfiguration information manages, and comprises the IP of each back end and port, title, name figure, predicate information, storage triple numberAccording to information such as amount, maximum load factor and principal and subordinate storehouse signs.
Further, described data fragmentation and back end are selected, when storage triple data, in order to solve data dispersivenessProblem, stores the triple data of the same subject into same back end, and the data of same name figure are deposited in back end maximumIn reserves, store same back end into, make data distributed query reduce the amount of calculation and different internodal number of data queryReportedly defeated, promote inquiry velocity. In the time carrying out data fragmentation, the triple data of the same subject are as an atomic data, rootDistribution situation according to each back end current data memory space, storage capacity, maximum load factor, figure is selected corresponding dataNode, stores this triple data.
Compared with prior art, good effect of the present invention is:
The present invention is directed to the storage of extensive RDF data, propose a kind of new distributed RDF data storage scheme, dataBe divided into two parts and carry out storage administration, the triple data of complete predicate and effectively the triple data of predicate are separated memory module.The storage capacity that improves RDF data, makes it can manage the RDF data of magnanimity; Promote data high availability, RDF dataStorehouse cluster has data fragmentation and Backup Data, and in the time that certain back end lost efficacy, the system that can ensure normally runs without interruption;Data fragmentation strategy be triple data using the same subject as an atomic data, carry out data according to name figure and subject and divideSheet and back end are selected, and reduce the dispersiveness of triple data at each back end, the complexity while reducing data query and notWith volume of transmitted data between node, improve the search efficiency of data simultaneously.
Brief description of the drawings
Accompanying drawing is the system architecture diagram of a kind of efficient distributed RDF date storage method of the present invention.
Detailed description of the invention
For more clear and express intuitively method of the present invention, below in conjunction with accompanying drawing, the present invention is explained in further detail.The efficient distributed RDF date storage method of the present invention comprises the following steps:
1) data access, is responsible for externally providing unified data access interface, and the access of data is undertaken by the interface providing. MainIn comprising, have the interfaces such as data upload, Data Update, data query, predicate expansion and predicate information inquiry.
2) Data Control, provide the control processing capacity of data is mainly comprised in have the management of data management, predicate and dataStorage administration.
Data management provides the management function to RDF data, comprises RDF data to upload, upgrade and inquire about control; RDFUploading data control, comprises RDF data parser, RDF data segmentation module and generates unique ID. When data upload, headFirst, RDF data parser carries out RDF Data Analysis, supports the parsing to multiple format RDF data, comprises xml, jsonRDF data with forms such as nt, according to user's uploading data form, become Data Analysis the RDF data object of consolidation form;Then, RDF data segmentation module is cut apart the RDF data object of resolving the consolidation form generating, and user defines RDFThe name figure title of data, for determining which name figure uploading data is saved in, and obtains it according to the name figure of these RDF dataEffectively predicate list, becomes two parts according to effective predicate list Data Segmentation, i.e. the triple pair of the complete predicate of the same subjectResemble, the triple object of effective predicate of the same subject; Finally, generate unique ID, for the triple of unique this subject of sign,And these two parts data of the triple of the triple of the complete predicate of an associated subject and effective predicate of the same subject, use ID certainlyIncrease strategy and generate, obtain the ID of increasing certainly of this name figure by self-defining unique ID maker, and generate one and comprise this ID'sTriple is encapsulated into respectively the triple object of complete predicate of the same subject and the triple object of effective predicate of the same subjectIn.
Predicate management provides the management function of the predicate to RDF data, comprises the expansion of predicate, reduction and the predicate information of predicateThe function such as inquiry. The expansion of predicate, refers to effective predicate to expand, due to a RDF data-base cluster storage area predicateTriple, in the time that predicate that user need to use certain name figure is not in effective predicate, need to expand effective predicate,The triple of these predicates in extending database. Predicate spread step: user submits the predicate of the name figure that will expand to, meaningWord administration module obtain user submit to name figure and expansion predicate, contrast the effective predicate in this name figure, examine draw wantExpansion predicate, in order to ensure that existing effective predicate does not comprise the expansion predicate that user submits to, plays user input data verificationEffect; Expand scheduling by predicate and submit predicate expansion task to, backstage asynchronous execution predicate expansion task, carries out data importing,Read corresponding triple data from NoSQL database, extract the triple of expansion predicate, import to RDF data-base clusterIn.
Data storage management provides data management module and the operation of predicate administration module to database, all operations to databaseAll undertaken by this module, unified data access interface is provided, storage separates with data to realize data processing, comprises dataData query such as is carried out, upgrades and uploads at the function in storehouse, and data importing, the predicate information of predicate expansion are inquired about, upgradedAnd upload function.
3) data persistence, the physical store of responsible data, is saved in disk data, and data are divided into two parts and carry out persistence,Use NoSQL data-base cluster and RDF data-base cluster to carry out data storage. NoSQL data-base cluster uses the distribution of increasing incomeFormula NoSQL data-base cluster, utilizes the data managing capacity feature of its magnanimity, the triple data of storing complete predicate, forEnsure the integrality of data, in the time that effective predicate changes, read its corresponding predicate triple data importing to RDF dataIn the cluster of storehouse. RDF data-base cluster is made up of multiple back end, routing node and configuration node; Back end mainly carries outThe storage of triple data, is made up of many standalone versions RDF database of increasing income; Routing node is controlled back end, comprisesData Update, back end selection, data fragmentation and data are synchronous etc. Routing node management RDF data-base cluster is clusterCentroid, control each RDF database data node. Configuration node manages data node configuration information, and bag is eachThe IP of back end and port, title, name figure, predicate information, storage triple data volume, maximum load factor and principal and subordinateThe information such as storehouse sign. Load factor refers to memory data output and data maximum capacity ratio, and maximum load factor refers to permissionMaximum load factor value, current load factor refers to current according to memory space and data maximum capacity ratio. Carry out triple numberWhen uploading, routing node, according to the name figure of this triple and the configuration information of configuration node, draws this name diagram data placeBack end, if this name diagram data is not stored in any back end, represents that this name figure is new figure, from allIn back end, choose the back end of a current load factor minimum, the triple data that storage is uploaded; If there is this nameFigure is stored in some back end, from these back end, chooses the back end of certain current load factor minimum, ifMinimum current load factor value in back end is more than or equal to maximum load factor value, needs this name diagram data to carry outA back end of loading factor minimum is chosen in burst storage from other back end, the triple data that storage is uploaded,Otherwise directly choose the back end of current filling factor minimum, the triple data that storage is uploaded. Data store back end intoAfterwards, upgrade corresponding configuration information, comprise the configuration letter such as storage triple data volume of more rebaptism figure information and back endBreath.
The implementation case study of data upload:
1. prepare triple data, and define the name figure (graphname) of these triple data, which specified data will upload toIndividual name figure, uploads interface by calling data, uploads triple data and its name and schemes data management module.
2. data management module calling data resolver, resolves this triple data, data encapsulation is become to the triple of consolidation formData object.
3. data management module calling data is cut apart module, and inquires about effective predicate list of its name figure by predicate administration module,According to effective predicate list, be divided into two parts uploading triple data object, i.e. complete predicate triple data object and havingEffect predicate triple data object.
4. data management module uses unique ID to grow up to be a useful person from hyperplasia, generates the unique ID that uploads triple data, and ID value pointBe not encapsulated in complete predicate triple data object and effective predicate triple data object.
5. calling data storage control module, stores complete predicate triple data and effective predicate triple data into respectivelyNoSQL data-base cluster and RDF data-base cluster. Complete predicate triple data directly store NoSQL data-base cluster intoIn. The storage of the effective predicate triple of the routing node control data of RDF data-base cluster.
The routing node of 6.RDF data-base cluster, obtains this name figure place back end by calling configuration node, if shouldName diagram data is not stored in any back end, represents that this name figure is new figure, chooses one from all back endThe back end of individual current load factor minimum, the triple data that storage is uploaded, proceed data storage by step 10.
If 7. there is this name figure to be stored in some back end, from these back end, choose certain current load factorLittle back end.
8., if the minimum current load factor value in selected data node is more than or equal to maximum load factor value, need thisName diagram data carries out burst storage, chooses the back end of a current filling factor minimum from other back end, storageThe triple data of uploading, proceed data storage by step 10.
9., if the minimum current load factor value in selected data node is less than maximum load factor value, directly choose currentLoad the back end of factor minimum, the triple data that storage is uploaded, proceed data storage by step 10.
10. after data store back end into, upgrade corresponding configuration information: the storage ternary of name figure information, back endGroup data volume and the current filling factor.

Claims (9)

1. an efficient distributed RDF date storage method, the steps include:
1) user is that each triple to be uploaded is chosen a name figure or set a new name figure; And be should according to business demandTriple is set effective predicate and a triple thereof;
2) every triple in the RDF data that data control system is uploaded user is resolved, and extracts the meaning of this tripleEffective predicate of the name figure of word and this triple; Then according to this effective predicate, this triple is split into and has phaseWith two triple of uniquely identified: three of the triple of the complete predicate of the same subject and effective predicate of the same subjectTuple; Wherein, all predicates that the name figure that complete predicate is triple comprises, effectively predicate is in complete predicate onePartial predicate;
3) data control system is by effective predicate of the triple data of the complete predicate of the same subject obtaining and the same subjectTriple data store respectively different data-base clusters into.
2. the method for claim 1, is characterized in that, uses the distributed NoSQL data-base cluster of increasing income to store same masterThe triple data of the complete predicate of language, the triple number of effective predicate of use RDF data-base cluster storage the same subjectAccording to.
3. method as claimed in claim 2, is characterized in that, when data control system receives that predicate is more when new task, according to this morePredicate lastest imformation in new task, detects the predicate changing, and then upgrades the correspondence of RDF data-base cluster storagePredicate in triple.
4. method as claimed in claim 2 or claim 3, is characterized in that, described RDF data-base cluster comprises back end, routeNode and configuration node; Wherein, back end is for data storage; Routing node, for back end is controlled, wrapsDraw together Data Update, back end selection, data fragmentation and data synchronous; Configuration node is for entering data node configuration informationLine pipe reason, comprise the IP of each back end and port, title, name figure, predicate information, storage triple data volume,Maximum load factor and principal and subordinate storehouse beacon information.
5. method as claimed in claim 4, is characterized in that, data control system stores the triple data of the same subject into sameOne back end.
6. method as claimed in claim 5, is characterized in that, data control system by the data of same name figure at back endIn large buffer memory, store same back end into.
7. method as claimed in claim 4, is characterized in that, routing node is according to the name figure of triple and the configuration of configuration nodeInformation, draws the back end at the data place of this name figure; Wherein, if the data of this name figure are not stored in anyBack end is chosen the back end of a current load factor minimum from all back end, the ternary that storage is uploadedGroup data; If find some back end of the data of this name of storage figure, choose current dress from these back endFill out the back end of factor minimum, if the minimum current load factor value in this back end is more than or equal to maximum fillingFactor values, carries out burst storage to the data of this name figure, chooses the joint of a filling factor minimum from other back endPoint, the triple data that storage is uploaded; Otherwise the triple number that the back end storage of choosing current filling factor minimum is uploadedAccording to.
8. method as claimed in claim 7, is characterized in that, back end is stored after a triple, upgrades corresponding configuration information,Comprise triple data volume and the current filling factor of name figure information, storage.
9. the method for claim 1, is characterized in that, data control system is expanded the effective predicate extracting: forThe predicate of the name figure that will expand that user submits to, data control system obtains name figure and the expansion meaning thereof that user submits toWord, contrasts the effective predicate in this name figure, examines to draw and will expand predicate.
CN201610064516.1A 2016-01-29 2016-01-29 A kind of efficient distributed RDF data storage method Active CN105608228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610064516.1A CN105608228B (en) 2016-01-29 2016-01-29 A kind of efficient distributed RDF data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610064516.1A CN105608228B (en) 2016-01-29 2016-01-29 A kind of efficient distributed RDF data storage method

Publications (2)

Publication Number Publication Date
CN105608228A true CN105608228A (en) 2016-05-25
CN105608228B CN105608228B (en) 2019-05-17

Family

ID=55988167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610064516.1A Active CN105608228B (en) 2016-01-29 2016-01-29 A kind of efficient distributed RDF data storage method

Country Status (1)

Country Link
CN (1) CN105608228B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762915A (en) * 2018-04-19 2018-11-06 上海交通大学 A method of caching RDF data in GPU memories
CN108776697A (en) * 2018-06-06 2018-11-09 南京大学 A kind of multi-source data collection cleaning method based on predicate
CN109726254A (en) * 2018-12-24 2019-05-07 科大讯飞股份有限公司 A kind of construction method and device of triple knowledge base
CN110096515A (en) * 2019-05-10 2019-08-06 天津大学深圳研究院 A kind of RDF data management method, device and storage medium based on triple
CN111026747A (en) * 2019-10-25 2020-04-17 广东数果科技有限公司 Distributed graph data management system, method and storage medium
CN111090782A (en) * 2019-12-17 2020-05-01 北京锐安科技有限公司 Graph data storage method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030191727A1 (en) * 2002-04-04 2003-10-09 Ibm Corporation Managing multiple data mining scoring results
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN104346340A (en) * 2013-07-24 2015-02-11 日电(中国)有限公司 Resource description framework data storage method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030191727A1 (en) * 2002-04-04 2003-10-09 Ibm Corporation Managing multiple data mining scoring results
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN104346340A (en) * 2013-07-24 2015-02-11 日电(中国)有限公司 Resource description framework data storage method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王林彬 等: "基于NoSQL的RDF数据存储与查询技术综述", 《计算机应用研究》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762915A (en) * 2018-04-19 2018-11-06 上海交通大学 A method of caching RDF data in GPU memories
CN108776697A (en) * 2018-06-06 2018-11-09 南京大学 A kind of multi-source data collection cleaning method based on predicate
CN108776697B (en) * 2018-06-06 2020-06-09 南京大学 Multi-source data set cleaning method based on predicates
CN109726254A (en) * 2018-12-24 2019-05-07 科大讯飞股份有限公司 A kind of construction method and device of triple knowledge base
CN109726254B (en) * 2018-12-24 2020-12-18 科大讯飞股份有限公司 Method and device for constructing triple knowledge base
CN110096515A (en) * 2019-05-10 2019-08-06 天津大学深圳研究院 A kind of RDF data management method, device and storage medium based on triple
CN111026747A (en) * 2019-10-25 2020-04-17 广东数果科技有限公司 Distributed graph data management system, method and storage medium
CN111090782A (en) * 2019-12-17 2020-05-01 北京锐安科技有限公司 Graph data storage method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN105608228B (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN105608228A (en) High-efficiency distributed RDF data storage method
CN107679192B (en) Multi-cluster cooperative data processing method, system, storage medium and equipment
CN104885054B (en) The system and method that affairs are performed in MPP database
US20140122510A1 (en) Distributed database managing method and composition node thereof supporting dynamic sharding based on the metadata and data transaction quantity
US9753960B1 (en) System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
US20130031229A1 (en) Traffic reduction method for distributed key-value store
CN105468702A (en) Large-scale RDF data association path discovery method
CN105117171A (en) Energy SCADA massive data distributed processing system and method thereof
CN102930062A (en) Rapid horizontal extending method for databases
CN102033938A (en) Secondary mapping-based cluster dynamic expansion method
CN104408174A (en) Database routing device and method
CN106960020B (en) A kind of method and apparatus creating concordance list
CN105808746A (en) Relational big data seamless access method and system based on Hadoop system
CN103678550A (en) Mass data real-time query method based on dynamic index structure
CN102779160B (en) Mass data information index system and index structuring method
CN102609446A (en) Distributed Bloom filter system and application method thereof
CN103390018A (en) Web service data modeling and searching method based on SDD (service data description)
CN109002334A (en) A kind of operation platform and its data processing method
CN102571752A (en) Service-associative-index-map-based quality of service (QoS) perception Top-k service combination system
CN105022791A (en) Novel KV distributed data storage method
CN102158533A (en) Distributed web service selection method based on QoS (Quality of Service)
CN110007905A (en) A kind of generation method and system of the software development scheme based on big data
CN106776810B (en) Big data processing system and method
CN102955808A (en) Data acquisition method and distributed file system
CN101937455A (en) Method for establishing multi-dimensional classification cluster based on infinite hierarchy and heredity information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant