CN109582831B - Graph database management system supporting unstructured data storage and query - Google Patents
Graph database management system supporting unstructured data storage and query Download PDFInfo
- Publication number
- CN109582831B CN109582831B CN201811202708.XA CN201811202708A CN109582831B CN 109582831 B CN109582831 B CN 109582831B CN 201811202708 A CN201811202708 A CN 201811202708A CN 109582831 B CN109582831 B CN 109582831B
- Authority
- CN
- China
- Prior art keywords
- blob
- neo4j
- data
- query
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a graph database management system supporting unstructured data storage and query, which comprises: a labeled Neo4j attribute map model based on a Neo4j map database for storing structured data and attribute information of non-content of BLOBs, and also storing unique IDs to which BLOBs are assigned; a BLOB data storage model supporting BLOB type, comprising an abstract storage model and a concrete storage model, used for storing the content of the BLOB and maintaining the mapping relation between the ID of the BLOB and the attribute information of the BLOB; and the query module is used for querying the stored structured data and the BLOB type data. The system can store structured data and unstructured data as BLOB type, and realize the query of the two data.
Description
Technical Field
The invention relates to the technical field of big data, databases and distributed systems, and provides a database management system supporting unstructured data storage and query.
Background
Typically, a conventional relational database is used to store structured data that can be represented in two dimensions, and techniques related to storage and querying of structured data are well established. But as the data age has developed, the form of data has become more complex. In practical application, a lot of semi-structured data with self-description structures and unstructured data without fixed structures appear, and the data has a very good expansibility and can freely express a lot of useful information. However, because of the freedom in format, how to store and manage such data becomes a problem which is difficult to solve, the traditional relational Database system mainly faces to the application field of object processing and data analysis, and cannot well realize storage and management of massive semi-structured and structured data, and the proposed NoSQL, especially Graph Database (Graph Database) technology such as Neo4j, provides a new idea for efficiently solving the management and processing problems of unstructured data.
The graph database is originated from Euler and graph theory, and can also be called as a graph-oriented/based database, the data model of the graph database is embodied by nodes and relations, and information can also be stored as the attributes of the nodes to support the quick query of the relations between entities.
As data sources increase, the variety of data becomes more abundant, and more than 85% of newly generated data is unstructured, but current big data applications are less capable of unstructured processing (http:// www.cio.com.cn/eyan/2295. html). Unstructured data is typically stored as Binary Large Objects (BLOBs) when they are stored. Nonetheless, storing BLOB objects in a conventional relational database still has many inconveniences such as inefficiency, inconvenience in retrieval, and the like.
In addition, in many application scenarios, the requirement for complex relationship query is large, and the online system is sensitive to time. The great advantage of graph databases is to solve complex relational problems quickly. However, graph databases, such as Neo4j, which is very popular at present, use an attribute graph model, which has a drawback that BLOB object storage is not natively supported, and thus it is particularly important to combine graph databases and BLOB storage to achieve uniform management and query of BLOB data and other types of data.
Disclosure of Invention
The invention aims to provide a graph database management system supporting unstructured data storage and query, which can store structured data and unstructured data as BLOB types and realize query of the two data.
In order to achieve the purpose, the invention adopts the following technical scheme:
a labeled Neo4j attribute graph model based on a Neo4j graph database for storing structured data and non-content attribute information of a BLOB, the structured data including a text type, a boolean type, a numerical type, a temporal type, the non-content attribute information of the BLOB including a length of the BLOB type data, a Mime type, and a 128-bit digest, and also storing a unique ID to which the BLOB is assigned;
a BLOB data storage model supporting BLOB types, which comprises an abstract storage model and a concrete storage model; the abstract storage model is realized by a local file system or a ceph distributed file system, is used for storing the content of the BLOB, and realizes mapping through the ID of the BLOB and the attribute information of the BLOB; the specific storage model is realized by a local file system, stores the content of the BLOB in a file storage mode, and realizes mapping by the ID of the BLOB and the attribute information of the BLOB;
and the query module is used for querying the stored structured data and the BLOB type data.
Further, the system also comprises a data type identification module for judging the type of the received data, and if the data is structured data, the data is stored in the Neo4j attribute graph model; if the data is BLOB type data, storing the ID corresponding to the BLOB and the attribute information of the non-content of the BLOB in a Neo4j attribute graph model, and storing the content of the BLOB in a BLOB data storage model.
Further, the device also comprises a BLOB attribute information extraction module for extracting the attribute information of the BLOB from the data text.
Further, the query module runs the Cypher language.
A method for constructing a graph database management system supporting unstructured data storage and query comprises the following steps:
taking the labeled attribute map model based on the Neo4j map database as a Neo4j attribute map model;
modifying the original code of Neo4j based on the labeled attribute graph model of the Neo4j graph database, and acquiring values and set values by a getRecord () method and a setRecord () method;
adding BLOB type support in the PropertType of the original code of Neo4j, and realizing operations such as reading values and creating a BLOB data storage model;
a query module is created that supports queries using Cypher.
Further, the step of adding support of BLOB type in propertype of the original code of Neo4j includes:
adding support for BLOB types in the getpropertytypeOrNull () method so that BLOB types can be returned when the method is called;
adding registration of BLOB type in a register ScalarsAndCollection () method, and injecting Java class into Neo4j.
A storage method of a graph database management system supporting unstructured data storage and query comprises the following steps:
determining whether the received data is structured data or unstructured data;
if the data is structured data, the data is stored in a Neo4j attribute map model;
if the data is unstructured data, extracting attribute information of BLOBs, and allocating a unique ID to each BLOB;
storing the ID corresponding to the BLOB and the content of the BLOB in a BLOB data storage model, and storing the attribute information of the non-content of the BLOB in a Neo4j attribute graph model;
the content of the BLOB is read by the ID.
Further, the step of storing the content of the BLOB comprises:
creating a new file under the appointed directory, writing the content of the BLOB into the new file by using a mode of outputting a file stream, and storing the content in a bid format;
another new file is created under the specified directory, and the md5 value of the BLOB digest is written to the other new file and saved in the md5 format.
Further, the step of reading the content of the BLOB comprises: and taking the bid of the BLOB as a parameter, searching a corresponding file in a specified directory, reading the content of the file by a fromFile () method and returning.
A method for creating attribute information for a BLOB of a graph database management system supporting unstructured data storage and querying, the steps comprising:
reading byte array content of the BLOB from the file as the content of the BLOB;
reading the length of the byte array of the BLOB from the file as the length of the BLOB;
reading the content digest of the BLOB from the file as the digest of the BLOB by using digestutils.getmd5digest;
reading the code of the first8bytes of the BLOB content from the file as the 32-bit flag value of the BLOB;
the unique ID of the BLOB is generated by the IdGenerator method.
The system adds related functions of BLOB storage to an open-source Neo4j graph database on the basis of an attribute graph model, realizes the combination of the graph database and a binary large object, supports the storage of large data of BLOB types, and can fully play the advantages of the graph database in the aspect of processing relationship problems, thereby supporting various types of data, and supporting the BLOB types in addition to text types, Boolean types, numerical types and time types supported by the attribute graph model of Neo4j. The present system stores attribute information of BLOBs other than content in Neo4j, and stores content information of a large volume in a plug-in back-end storage system.
The invention defines and realizes the read-write operation of BLOB, namely how to create the BLOB attribute value from the file, how to read and establish the BLOB object from the given file; the method enriches the self-owned attributes and related operations of the BLOB objects, realizes a method for acquiring attribute values including a digest (digest), a length (length), an 8byte mark and the like according to the content of the BLOB, and also realizes a method for allocating a unique ID to each BLOB object; the method provides support for the search of the related content of the BLOB, supports the matching of the attribute values of the BLOB by providing the operation function of the attribute values of the BLOB based on Cypher search language, and can screen the result by taking other attribute values and the association relationship as the limiting conditions.
The invention has the beneficial effects that: the graph database technology and BLOB storage are organically integrated together, the method can be used for mixed storage and query of structured data and unstructured data, compared with the traditional big data fusion management tool, the method has the advantages that the capability of processing the relation problem is enhanced, the relation retrieval performance is improved, and the blank of the big data fusion management tool on the same block is made up to a certain extent.
Drawings
FIG. 1 is a diagram of a graph database management system according to an embodiment.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
The present embodiment provides a graph database management system supporting unstructured data storage and query, as shown in fig. 1, specifically as follows:
1) the system creates a BLOB data storage model supporting BLOB based on a labeled attribute graph model in Neo4j, wherein the related content of the BLOB object is added; the Neo4j is used as a front-end storage system to store the structured data and part of attribute information of the BLOB. The attribute value of the BLOB type comprises a data length, a digest, a Mime type and a unique ID assigned to the BLOB by a system; a back-end storage system is also created to facilitate the storage of BLOB type data.
2) The system judges the received data, stores the general structured data into a native map storage of Neo4j, stores attribute values except BLOB contents into a front-end storage system of Neo4j if the data is of the BLOB type, and then stores the BLOB contents into a back-end storage system according to the ID; the back-end storage system maintains a mapping between the ID and the data content of the BLOB type attribute values.
The construction process of the system is as follows:
the system is mainly expanded based on an open source database Neo4j (https:// Neo4j. com /), so that all characteristics of labeled attribute maps in a data model Neo4j used by the system include nodes and edges in a data set, entities can add attribute values, and the attribute values can be of a text type, a boolean type, a numerical type and a time type, and in the system, the entities can also be of a BLOB type. The system is built by modifying part of the native code of Neo4j, adding support for BLOBs, with the following modifications:
(1) on the basis of the original function of Neo4j, a trail named WithRecord is designed, wherein the trail comprises a getRecord () method and a setRecord () method, and the getRecord () method and the setRecord () method are respectively used for acquiring a value and a set value.
(2) The support of BLOB type is added in the PropertyType, and the method for reading the value and returning the locked number is realized. Support for BLOB types is added to the getpropertyteornull () method so that a BLOB type can be returned when this method is called. The registration of BLOB is added in the registerScalarsAndCollection () method, i.e. class in Java is injected into Neo4j.
The storage and reading method of the present system is illustrated as follows:
the content of a BLOB object tends to occupy a large storage space, which would make the graph database cumbersome if the entire BLOB object were directly stored in the graph database, causing serious memory, storage space consumption and performance problems. In a specific use scenario, most of the time, the query is performed according to some attribute information of BLOB, so the invention proposes a storage scheme which is fused with and relatively independent from the storage of Neo4 j:
(1) in the present system, some relevant attributes are extracted for the BLOB object, including: the unique ID assigned by the system for each BLOB object, the length of the BLOB object, the 128-bit digest of the BLOB object, the Mime type. In particular, these attribute values are stored in the attribute map model of Neo4j.
(2) The specific content of the BLOB is stored in a storage system other than Neo4j, in the invention, an abstract storage model named as a class of BLOB storage is used, and the BLOB storage can be realized by a local file system or a ceph distributed file system, and is used for storing the content of the BLOB and maintaining the mapping relation between the ID of the BLOB and the attribute information of the BLOB.
The BlobStorage system provides the following interface method:
save: saving the specified BLOB content;
configure: create or modify a path of the BLOB store;
iii, load: acquiring corresponding BLOB data according to the ID;
(3) the invention provides a back-end storage system named FileBlobStorage as a specific storage model based on a local file system, wherein the FileBlobStorage inherits from the BlobStorage, stores the content of BLOB in a file storage mode, and realizes mapping through ID and BLOB. Three methods of save, load and configure are realized.
When the save method is executed, a new file is created under the appointed directory, the content of the BLOB is written into the new file by using a method of outputting a file stream, and the new file name is BLOB. And then calling a write method, creating a new file under the appointed directory, and writing the md5 value of the BLOB abstract into the new file with the name of BLOB.
When the load method is executed, the object bid of the BLOB is taken as a parameter to be transmitted, a corresponding file is searched in an appointed directory, and the data content is read and returned through a fromFile () method.
The attribute value creation and reading method of the BLOB of the present system is as follows:
the invention provides a method for creating BLOB attribute values by taking a file as a data source, and after the method is executed, each attribute value of the BLOB content is obtained:
blob fromfile (file): the BLOB attribute values are generated from the content of the file.
When a user calls the method, a background reads contents from a file designated by the user and creates a Blob object; calling a calculateDiget () method to obtain the digest value of the Blob object and calling a calculateLength () method to obtain the length of the Blob object; then, calling a calcutfirst 8Bytes () method to calculate the first8Bytes contents of the Blob object; and finally returning the attribute information together.
Regarding reading the value of the BLOB, the present invention adds a method of reading the BLOB type to the readValue (). The method for reading the Blob object value is readBlobValue (), and two parameters, namely values and conf, need to be transmitted by using the method, wherein the values store the attribute information of the Blob object, and the conf is system configuration information and comprises the storage position of the Blob in the file system. The system automatically fetches the Blob object's attribute information from the values and reads the Blob object's content values from the file system.
The attributes and operation of the BLOB of the present invention are illustrated below:
the present invention enriches the attributes of a BLOB, which are inherent, including content (content), length (length), digest (digest), and a 32-bit flag (first8Bytes), and provides methods of operation with respect to these attributes. See table below for details.
TABLE 1 Attribute Table for BLOB Attribute values
TABLE 2 operation Table for BLOB attribute values
Operation of | Means of |
getBlobLength | Obtaining the length of a BLOB object |
calculateDigest | Obtaining a digest of a BLOB content |
calculateLenngth | Obtaining the length of a BLOB |
calculateFirst8Bytes | Obtaining the content of the first8bytes of a BLOB |
computeHash | Computing hash values for BLOB digests |
equals | Judging between two BLOB objects, etc |
The invention uses an IdGenerator method to generate the ID of the BLOB, the ID is used as the unique identification of the BLOB object in the system, and the corresponding BLOB content can be found with the help of the Blobstorage.
The BLOB Cypher query method of the system is as follows:
in order to bring the great advantages of the convenient and efficient Cypher language in Neo4j into play, the system designs that a user program can call the related operation function of BLOB in a Cypher query statement, thereby achieving the effect of related query.
In the function of creating a node, a user creates a node according to a conventional method in Neo4j, stores a Blob object as an attribute under the node, and specifies a source file of the Blob object using the Blob.
When a user queries related contents by using Cypher language, the Blob objects under the nodes are treated with other types of common attributes equivalently, and related fields are directly specified for query.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.
Claims (2)
1. A method for constructing a graph database management system supporting unstructured data storage and query comprises the following steps:
taking the labeled attribute map model based on the Neo4j map database as a Neo4j attribute map model;
modifying the original code of Neo4j based on the labeled attribute graph model of the Neo4j graph database, and acquiring values and set values by a getRecord () method and a setRecord () method;
adding BLOB type support in the PropertType of the original code of Neo4j, realizing reading and returning the number of locks to be added, and creating a BLOB data storage model;
a query module is created that supports queries using Cypher.
2. The method of claim 1, wherein the step of adding BLOB type support to propertype of original code of Neo4j comprises:
adding support for BLOB types in the getpropertytypeOrNull () method so that BLOB types can be returned when the method is called;
adding registration of BLOB type in a register ScalarsAndCollection () method, and injecting Java class into Neo4j.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811202708.XA CN109582831B (en) | 2018-10-16 | 2018-10-16 | Graph database management system supporting unstructured data storage and query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811202708.XA CN109582831B (en) | 2018-10-16 | 2018-10-16 | Graph database management system supporting unstructured data storage and query |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109582831A CN109582831A (en) | 2019-04-05 |
CN109582831B true CN109582831B (en) | 2022-02-01 |
Family
ID=65920177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811202708.XA Active CN109582831B (en) | 2018-10-16 | 2018-10-16 | Graph database management system supporting unstructured data storage and query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109582831B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111159112B (en) * | 2019-12-20 | 2022-03-25 | 新华三大数据技术有限公司 | Data processing method and system |
CN111309750A (en) * | 2020-03-31 | 2020-06-19 | 中国邮政储蓄银行股份有限公司 | Data updating method and device for graph database |
CN111611011B (en) * | 2020-04-13 | 2023-01-13 | 中国科学院计算机网络信息中心 | JSON syntax extension method and analysis method and device supporting Blob data types |
CN111831787B (en) * | 2020-06-08 | 2021-09-28 | 中国科学院计算机网络信息中心 | Unstructured data information query method and system based on secondary attributes |
CN111897911B (en) * | 2020-06-11 | 2021-08-31 | 中国科学院计算机网络信息中心 | Unstructured data query method and system based on secondary attribute graph |
CN112836063B (en) * | 2021-01-27 | 2023-06-06 | 四川新网银行股份有限公司 | Method for realizing feature tracing |
CN112883249B (en) * | 2021-03-26 | 2022-10-14 | 瀚高基础软件股份有限公司 | Layout document processing method and device and application method of device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107122486A (en) * | 2017-05-09 | 2017-09-01 | 中国科学院计算机网络信息中心 | A kind of polynary big data fusion method and system for supporting BLOB |
CN108170847A (en) * | 2018-01-18 | 2018-06-15 | 国网福建省电力有限公司 | A kind of big data storage method based on Neo4j chart databases |
CN108376174A (en) * | 2018-02-27 | 2018-08-07 | 河北中科开元数据科技有限公司 | The method and apparatus for supporting structuring to be merged with unstructured big data |
CN108491511A (en) * | 2018-03-23 | 2018-09-04 | 腾讯科技(深圳)有限公司 | Data digging method and device, model training method based on diagram data and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10216902B2 (en) * | 2014-08-31 | 2019-02-26 | General Electric Company | Methods and systems for improving connections within a healthcare ecosystem |
-
2018
- 2018-10-16 CN CN201811202708.XA patent/CN109582831B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107122486A (en) * | 2017-05-09 | 2017-09-01 | 中国科学院计算机网络信息中心 | A kind of polynary big data fusion method and system for supporting BLOB |
CN108170847A (en) * | 2018-01-18 | 2018-06-15 | 国网福建省电力有限公司 | A kind of big data storage method based on Neo4j chart databases |
CN108376174A (en) * | 2018-02-27 | 2018-08-07 | 河北中科开元数据科技有限公司 | The method and apparatus for supporting structuring to be merged with unstructured big data |
CN108491511A (en) * | 2018-03-23 | 2018-09-04 | 腾讯科技(深圳)有限公司 | Data digging method and device, model training method based on diagram data and device |
Non-Patent Citations (1)
Title |
---|
非结构化数据统一存储平台的设计与实现;何颖鹏;《中国优秀硕士论文电子期刊信息科技辑》;20140228;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109582831A (en) | 2019-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109582831B (en) | Graph database management system supporting unstructured data storage and query | |
CN107480198B (en) | Distributed NewSQL database system and full-text retrieval method | |
CN106970936B (en) | Data processing method and device and data query method and device | |
CN106294695A (en) | A kind of implementation method towards the biggest data search engine | |
CN103123650B (en) | A kind of XML data storehouse full-text index method mapped based on integer | |
US9218394B2 (en) | Reading rows from memory prior to reading rows from secondary storage | |
CN105373541A (en) | Processing method and system for data operation request of database | |
CN108170752B (en) | Template-based metadata management method and system | |
CN100397397C (en) | XML data storage and access method based on relational database | |
CN111125229A (en) | Data blood margin generation method and device and electronic equipment | |
WO2023024247A1 (en) | Range query method, apparatus and device for tag data, and storage medium | |
CN111506621A (en) | Data statistical method and device | |
Mpinda et al. | Evaluation of graph databases performance through indexing techniques | |
CN115905630A (en) | Graph database query method, device, equipment and storage medium | |
US20220035820A1 (en) | Storage structure of data object, method and system for storing and dynamically managing data object on computer, and storage medium and electronic device | |
CN113704248B (en) | Block chain query optimization method based on external index | |
CN111859863A (en) | Document structure conversion method and device, storage medium and electronic equipment | |
CN116049193A (en) | Data storage method and device | |
Ives et al. | Querying provenance for ranking and recommending | |
KR101263945B1 (en) | Method for storing xml data using rdbms | |
CN113821514B (en) | Data splitting method, device, electronic equipment and readable storage medium | |
CN109753533A (en) | A kind of multi-source relevant database client development approach and device | |
CN114048219A (en) | Graph database updating method and device | |
CN112416966A (en) | Ad hoc query method, apparatus, computer device and storage medium | |
CN113779068A (en) | Data query method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |