CN109582831B - Graph database management system supporting unstructured data storage and query - Google Patents

Graph database management system supporting unstructured data storage and query Download PDF

Info

Publication number
CN109582831B
CN109582831B CN201811202708.XA CN201811202708A CN109582831B CN 109582831 B CN109582831 B CN 109582831B CN 201811202708 A CN201811202708 A CN 201811202708A CN 109582831 B CN109582831 B CN 109582831B
Authority
CN
China
Prior art keywords
blob
neo4j
data
query
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811202708.XA
Other languages
Chinese (zh)
Other versions
CN109582831A (en
Inventor
沈志宏
周园春
赵子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201811202708.XA priority Critical patent/CN109582831B/en
Publication of CN109582831A publication Critical patent/CN109582831A/en
Application granted granted Critical
Publication of CN109582831B publication Critical patent/CN109582831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a graph database management system supporting unstructured data storage and query, which comprises: a labeled Neo4j attribute map model based on a Neo4j map database for storing structured data and attribute information of non-content of BLOBs, and also storing unique IDs to which BLOBs are assigned; a BLOB data storage model supporting BLOB type, comprising an abstract storage model and a concrete storage model, used for storing the content of the BLOB and maintaining the mapping relation between the ID of the BLOB and the attribute information of the BLOB; and the query module is used for querying the stored structured data and the BLOB type data. The system can store structured data and unstructured data as BLOB type, and realize the query of the two data.

Description

Graph database management system supporting unstructured data storage and query
Technical Field
The invention relates to the technical field of big data, databases and distributed systems, and provides a database management system supporting unstructured data storage and query.
Background
Typically, a conventional relational database is used to store structured data that can be represented in two dimensions, and techniques related to storage and querying of structured data are well established. But as the data age has developed, the form of data has become more complex. In practical application, a lot of semi-structured data with self-description structures and unstructured data without fixed structures appear, and the data has a very good expansibility and can freely express a lot of useful information. However, because of the freedom in format, how to store and manage such data becomes a problem which is difficult to solve, the traditional relational Database system mainly faces to the application field of object processing and data analysis, and cannot well realize storage and management of massive semi-structured and structured data, and the proposed NoSQL, especially Graph Database (Graph Database) technology such as Neo4j, provides a new idea for efficiently solving the management and processing problems of unstructured data.
The graph database is originated from Euler and graph theory, and can also be called as a graph-oriented/based database, the data model of the graph database is embodied by nodes and relations, and information can also be stored as the attributes of the nodes to support the quick query of the relations between entities.
As data sources increase, the variety of data becomes more abundant, and more than 85% of newly generated data is unstructured, but current big data applications are less capable of unstructured processing (http:// www.cio.com.cn/eyan/2295. html). Unstructured data is typically stored as Binary Large Objects (BLOBs) when they are stored. Nonetheless, storing BLOB objects in a conventional relational database still has many inconveniences such as inefficiency, inconvenience in retrieval, and the like.
In addition, in many application scenarios, the requirement for complex relationship query is large, and the online system is sensitive to time. The great advantage of graph databases is to solve complex relational problems quickly. However, graph databases, such as Neo4j, which is very popular at present, use an attribute graph model, which has a drawback that BLOB object storage is not natively supported, and thus it is particularly important to combine graph databases and BLOB storage to achieve uniform management and query of BLOB data and other types of data.
Disclosure of Invention
The invention aims to provide a graph database management system supporting unstructured data storage and query, which can store structured data and unstructured data as BLOB types and realize query of the two data.
In order to achieve the purpose, the invention adopts the following technical scheme:
a labeled Neo4j attribute graph model based on a Neo4j graph database for storing structured data and non-content attribute information of a BLOB, the structured data including a text type, a boolean type, a numerical type, a temporal type, the non-content attribute information of the BLOB including a length of the BLOB type data, a Mime type, and a 128-bit digest, and also storing a unique ID to which the BLOB is assigned;
a BLOB data storage model supporting BLOB types, which comprises an abstract storage model and a concrete storage model; the abstract storage model is realized by a local file system or a ceph distributed file system, is used for storing the content of the BLOB, and realizes mapping through the ID of the BLOB and the attribute information of the BLOB; the specific storage model is realized by a local file system, stores the content of the BLOB in a file storage mode, and realizes mapping by the ID of the BLOB and the attribute information of the BLOB;
and the query module is used for querying the stored structured data and the BLOB type data.
Further, the system also comprises a data type identification module for judging the type of the received data, and if the data is structured data, the data is stored in the Neo4j attribute graph model; if the data is BLOB type data, storing the ID corresponding to the BLOB and the attribute information of the non-content of the BLOB in a Neo4j attribute graph model, and storing the content of the BLOB in a BLOB data storage model.
Further, the device also comprises a BLOB attribute information extraction module for extracting the attribute information of the BLOB from the data text.
Further, the query module runs the Cypher language.
A method for constructing a graph database management system supporting unstructured data storage and query comprises the following steps:
taking the labeled attribute map model based on the Neo4j map database as a Neo4j attribute map model;
modifying the original code of Neo4j based on the labeled attribute graph model of the Neo4j graph database, and acquiring values and set values by a getRecord () method and a setRecord () method;
adding BLOB type support in the PropertType of the original code of Neo4j, and realizing operations such as reading values and creating a BLOB data storage model;
a query module is created that supports queries using Cypher.
Further, the step of adding support of BLOB type in propertype of the original code of Neo4j includes:
adding support for BLOB types in the getpropertytypeOrNull () method so that BLOB types can be returned when the method is called;
adding registration of BLOB type in a register ScalarsAndCollection () method, and injecting Java class into Neo4j.
A storage method of a graph database management system supporting unstructured data storage and query comprises the following steps:
determining whether the received data is structured data or unstructured data;
if the data is structured data, the data is stored in a Neo4j attribute map model;
if the data is unstructured data, extracting attribute information of BLOBs, and allocating a unique ID to each BLOB;
storing the ID corresponding to the BLOB and the content of the BLOB in a BLOB data storage model, and storing the attribute information of the non-content of the BLOB in a Neo4j attribute graph model;
the content of the BLOB is read by the ID.
Further, the step of storing the content of the BLOB comprises:
creating a new file under the appointed directory, writing the content of the BLOB into the new file by using a mode of outputting a file stream, and storing the content in a bid format;
another new file is created under the specified directory, and the md5 value of the BLOB digest is written to the other new file and saved in the md5 format.
Further, the step of reading the content of the BLOB comprises: and taking the bid of the BLOB as a parameter, searching a corresponding file in a specified directory, reading the content of the file by a fromFile () method and returning.
A method for creating attribute information for a BLOB of a graph database management system supporting unstructured data storage and querying, the steps comprising:
reading byte array content of the BLOB from the file as the content of the BLOB;
reading the length of the byte array of the BLOB from the file as the length of the BLOB;
reading the content digest of the BLOB from the file as the digest of the BLOB by using digestutils.getmd5digest;
reading the code of the first8bytes of the BLOB content from the file as the 32-bit flag value of the BLOB;
the unique ID of the BLOB is generated by the IdGenerator method.
The system adds related functions of BLOB storage to an open-source Neo4j graph database on the basis of an attribute graph model, realizes the combination of the graph database and a binary large object, supports the storage of large data of BLOB types, and can fully play the advantages of the graph database in the aspect of processing relationship problems, thereby supporting various types of data, and supporting the BLOB types in addition to text types, Boolean types, numerical types and time types supported by the attribute graph model of Neo4j. The present system stores attribute information of BLOBs other than content in Neo4j, and stores content information of a large volume in a plug-in back-end storage system.
The invention defines and realizes the read-write operation of BLOB, namely how to create the BLOB attribute value from the file, how to read and establish the BLOB object from the given file; the method enriches the self-owned attributes and related operations of the BLOB objects, realizes a method for acquiring attribute values including a digest (digest), a length (length), an 8byte mark and the like according to the content of the BLOB, and also realizes a method for allocating a unique ID to each BLOB object; the method provides support for the search of the related content of the BLOB, supports the matching of the attribute values of the BLOB by providing the operation function of the attribute values of the BLOB based on Cypher search language, and can screen the result by taking other attribute values and the association relationship as the limiting conditions.
The invention has the beneficial effects that: the graph database technology and BLOB storage are organically integrated together, the method can be used for mixed storage and query of structured data and unstructured data, compared with the traditional big data fusion management tool, the method has the advantages that the capability of processing the relation problem is enhanced, the relation retrieval performance is improved, and the blank of the big data fusion management tool on the same block is made up to a certain extent.
Drawings
FIG. 1 is a diagram of a graph database management system according to an embodiment.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
The present embodiment provides a graph database management system supporting unstructured data storage and query, as shown in fig. 1, specifically as follows:
1) the system creates a BLOB data storage model supporting BLOB based on a labeled attribute graph model in Neo4j, wherein the related content of the BLOB object is added; the Neo4j is used as a front-end storage system to store the structured data and part of attribute information of the BLOB. The attribute value of the BLOB type comprises a data length, a digest, a Mime type and a unique ID assigned to the BLOB by a system; a back-end storage system is also created to facilitate the storage of BLOB type data.
2) The system judges the received data, stores the general structured data into a native map storage of Neo4j, stores attribute values except BLOB contents into a front-end storage system of Neo4j if the data is of the BLOB type, and then stores the BLOB contents into a back-end storage system according to the ID; the back-end storage system maintains a mapping between the ID and the data content of the BLOB type attribute values.
The construction process of the system is as follows:
the system is mainly expanded based on an open source database Neo4j (https:// Neo4j. com /), so that all characteristics of labeled attribute maps in a data model Neo4j used by the system include nodes and edges in a data set, entities can add attribute values, and the attribute values can be of a text type, a boolean type, a numerical type and a time type, and in the system, the entities can also be of a BLOB type. The system is built by modifying part of the native code of Neo4j, adding support for BLOBs, with the following modifications:
(1) on the basis of the original function of Neo4j, a trail named WithRecord is designed, wherein the trail comprises a getRecord () method and a setRecord () method, and the getRecord () method and the setRecord () method are respectively used for acquiring a value and a set value.
(2) The support of BLOB type is added in the PropertyType, and the method for reading the value and returning the locked number is realized. Support for BLOB types is added to the getpropertyteornull () method so that a BLOB type can be returned when this method is called. The registration of BLOB is added in the registerScalarsAndCollection () method, i.e. class in Java is injected into Neo4j.
The storage and reading method of the present system is illustrated as follows:
the content of a BLOB object tends to occupy a large storage space, which would make the graph database cumbersome if the entire BLOB object were directly stored in the graph database, causing serious memory, storage space consumption and performance problems. In a specific use scenario, most of the time, the query is performed according to some attribute information of BLOB, so the invention proposes a storage scheme which is fused with and relatively independent from the storage of Neo4 j:
(1) in the present system, some relevant attributes are extracted for the BLOB object, including: the unique ID assigned by the system for each BLOB object, the length of the BLOB object, the 128-bit digest of the BLOB object, the Mime type. In particular, these attribute values are stored in the attribute map model of Neo4j.
(2) The specific content of the BLOB is stored in a storage system other than Neo4j, in the invention, an abstract storage model named as a class of BLOB storage is used, and the BLOB storage can be realized by a local file system or a ceph distributed file system, and is used for storing the content of the BLOB and maintaining the mapping relation between the ID of the BLOB and the attribute information of the BLOB.
The BlobStorage system provides the following interface method:
save: saving the specified BLOB content;
configure: create or modify a path of the BLOB store;
iii, load: acquiring corresponding BLOB data according to the ID;
(3) the invention provides a back-end storage system named FileBlobStorage as a specific storage model based on a local file system, wherein the FileBlobStorage inherits from the BlobStorage, stores the content of BLOB in a file storage mode, and realizes mapping through ID and BLOB. Three methods of save, load and configure are realized.
When the save method is executed, a new file is created under the appointed directory, the content of the BLOB is written into the new file by using a method of outputting a file stream, and the new file name is BLOB. And then calling a write method, creating a new file under the appointed directory, and writing the md5 value of the BLOB abstract into the new file with the name of BLOB.
When the load method is executed, the object bid of the BLOB is taken as a parameter to be transmitted, a corresponding file is searched in an appointed directory, and the data content is read and returned through a fromFile () method.
The attribute value creation and reading method of the BLOB of the present system is as follows:
the invention provides a method for creating BLOB attribute values by taking a file as a data source, and after the method is executed, each attribute value of the BLOB content is obtained:
blob fromfile (file): the BLOB attribute values are generated from the content of the file.
When a user calls the method, a background reads contents from a file designated by the user and creates a Blob object; calling a calculateDiget () method to obtain the digest value of the Blob object and calling a calculateLength () method to obtain the length of the Blob object; then, calling a calcutfirst 8Bytes () method to calculate the first8Bytes contents of the Blob object; and finally returning the attribute information together.
Regarding reading the value of the BLOB, the present invention adds a method of reading the BLOB type to the readValue (). The method for reading the Blob object value is readBlobValue (), and two parameters, namely values and conf, need to be transmitted by using the method, wherein the values store the attribute information of the Blob object, and the conf is system configuration information and comprises the storage position of the Blob in the file system. The system automatically fetches the Blob object's attribute information from the values and reads the Blob object's content values from the file system.
The attributes and operation of the BLOB of the present invention are illustrated below:
the present invention enriches the attributes of a BLOB, which are inherent, including content (content), length (length), digest (digest), and a 32-bit flag (first8Bytes), and provides methods of operation with respect to these attributes. See table below for details.
TABLE 1 Attribute Table for BLOB Attribute values
Figure BDA0001830395760000061
TABLE 2 operation Table for BLOB attribute values
Operation of Means of
getBlobLength Obtaining the length of a BLOB object
calculateDigest Obtaining a digest of a BLOB content
calculateLenngth Obtaining the length of a BLOB
calculateFirst8Bytes Obtaining the content of the first8bytes of a BLOB
computeHash Computing hash values for BLOB digests
equals Judging between two BLOB objects, etc
The invention uses an IdGenerator method to generate the ID of the BLOB, the ID is used as the unique identification of the BLOB object in the system, and the corresponding BLOB content can be found with the help of the Blobstorage.
The BLOB Cypher query method of the system is as follows:
in order to bring the great advantages of the convenient and efficient Cypher language in Neo4j into play, the system designs that a user program can call the related operation function of BLOB in a Cypher query statement, thereby achieving the effect of related query.
In the function of creating a node, a user creates a node according to a conventional method in Neo4j, stores a Blob object as an attribute under the node, and specifies a source file of the Blob object using the Blob.
When a user queries related contents by using Cypher language, the Blob objects under the nodes are treated with other types of common attributes equivalently, and related fields are directly specified for query.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (2)

1. A method for constructing a graph database management system supporting unstructured data storage and query comprises the following steps:
taking the labeled attribute map model based on the Neo4j map database as a Neo4j attribute map model;
modifying the original code of Neo4j based on the labeled attribute graph model of the Neo4j graph database, and acquiring values and set values by a getRecord () method and a setRecord () method;
adding BLOB type support in the PropertType of the original code of Neo4j, realizing reading and returning the number of locks to be added, and creating a BLOB data storage model;
a query module is created that supports queries using Cypher.
2. The method of claim 1, wherein the step of adding BLOB type support to propertype of original code of Neo4j comprises:
adding support for BLOB types in the getpropertytypeOrNull () method so that BLOB types can be returned when the method is called;
adding registration of BLOB type in a register ScalarsAndCollection () method, and injecting Java class into Neo4j.
CN201811202708.XA 2018-10-16 2018-10-16 Graph database management system supporting unstructured data storage and query Active CN109582831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811202708.XA CN109582831B (en) 2018-10-16 2018-10-16 Graph database management system supporting unstructured data storage and query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811202708.XA CN109582831B (en) 2018-10-16 2018-10-16 Graph database management system supporting unstructured data storage and query

Publications (2)

Publication Number Publication Date
CN109582831A CN109582831A (en) 2019-04-05
CN109582831B true CN109582831B (en) 2022-02-01

Family

ID=65920177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811202708.XA Active CN109582831B (en) 2018-10-16 2018-10-16 Graph database management system supporting unstructured data storage and query

Country Status (1)

Country Link
CN (1) CN109582831B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159112B (en) * 2019-12-20 2022-03-25 新华三大数据技术有限公司 Data processing method and system
CN111309750A (en) * 2020-03-31 2020-06-19 中国邮政储蓄银行股份有限公司 Data updating method and device for graph database
CN111611011B (en) * 2020-04-13 2023-01-13 中国科学院计算机网络信息中心 JSON syntax extension method and analysis method and device supporting Blob data types
CN111831787B (en) * 2020-06-08 2021-09-28 中国科学院计算机网络信息中心 Unstructured data information query method and system based on secondary attributes
CN111897911B (en) * 2020-06-11 2021-08-31 中国科学院计算机网络信息中心 Unstructured data query method and system based on secondary attribute graph
CN112836063B (en) * 2021-01-27 2023-06-06 四川新网银行股份有限公司 Method for realizing feature tracing
CN112883249B (en) * 2021-03-26 2022-10-14 瀚高基础软件股份有限公司 Layout document processing method and device and application method of device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122486A (en) * 2017-05-09 2017-09-01 中国科学院计算机网络信息中心 A kind of polynary big data fusion method and system for supporting BLOB
CN108170847A (en) * 2018-01-18 2018-06-15 国网福建省电力有限公司 A kind of big data storage method based on Neo4j chart databases
CN108376174A (en) * 2018-02-27 2018-08-07 河北中科开元数据科技有限公司 The method and apparatus for supporting structuring to be merged with unstructured big data
CN108491511A (en) * 2018-03-23 2018-09-04 腾讯科技(深圳)有限公司 Data digging method and device, model training method based on diagram data and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216902B2 (en) * 2014-08-31 2019-02-26 General Electric Company Methods and systems for improving connections within a healthcare ecosystem

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122486A (en) * 2017-05-09 2017-09-01 中国科学院计算机网络信息中心 A kind of polynary big data fusion method and system for supporting BLOB
CN108170847A (en) * 2018-01-18 2018-06-15 国网福建省电力有限公司 A kind of big data storage method based on Neo4j chart databases
CN108376174A (en) * 2018-02-27 2018-08-07 河北中科开元数据科技有限公司 The method and apparatus for supporting structuring to be merged with unstructured big data
CN108491511A (en) * 2018-03-23 2018-09-04 腾讯科技(深圳)有限公司 Data digging method and device, model training method based on diagram data and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
非结构化数据统一存储平台的设计与实现;何颖鹏;《中国优秀硕士论文电子期刊信息科技辑》;20140228;全文 *

Also Published As

Publication number Publication date
CN109582831A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN109582831B (en) Graph database management system supporting unstructured data storage and query
CN107480198B (en) Distributed NewSQL database system and full-text retrieval method
CN106970936B (en) Data processing method and device and data query method and device
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN103123650B (en) A kind of XML data storehouse full-text index method mapped based on integer
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
CN105373541A (en) Processing method and system for data operation request of database
CN108170752B (en) Template-based metadata management method and system
CN100397397C (en) XML data storage and access method based on relational database
CN111125229A (en) Data blood margin generation method and device and electronic equipment
WO2023024247A1 (en) Range query method, apparatus and device for tag data, and storage medium
CN111506621A (en) Data statistical method and device
Mpinda et al. Evaluation of graph databases performance through indexing techniques
CN115905630A (en) Graph database query method, device, equipment and storage medium
US20220035820A1 (en) Storage structure of data object, method and system for storing and dynamically managing data object on computer, and storage medium and electronic device
CN113704248B (en) Block chain query optimization method based on external index
CN111859863A (en) Document structure conversion method and device, storage medium and electronic equipment
CN116049193A (en) Data storage method and device
Ives et al. Querying provenance for ranking and recommending
KR101263945B1 (en) Method for storing xml data using rdbms
CN113821514B (en) Data splitting method, device, electronic equipment and readable storage medium
CN109753533A (en) A kind of multi-source relevant database client development approach and device
CN114048219A (en) Graph database updating method and device
CN112416966A (en) Ad hoc query method, apparatus, computer device and storage medium
CN113779068A (en) Data query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant