CN115794875A

CN115794875A - Graph database system supporting time sequence data storage and fusion storage method

Info

Publication number: CN115794875A
Application number: CN202111055522.8A
Authority: CN
Inventors: 吴章生; 张晨; 王海波
Original assignee: Zhongke Know Beijing Technology Co ltd
Current assignee: Zhongke Know Beijing Technology Co ltd
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2023-03-14

Abstract

The invention discloses a graph database system supporting time sequence data storage and a fusion storage method, belonging to the field of graph databases; after the data receiving unit receives the data to be stored, the data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in the attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently.

Description

Graph database system supporting time sequence data storage and fusion storage method

Technical Field

The invention relates to the field of graph databases, in particular to a graph database system supporting time sequence data storage and a fusion storage method.

Background

The graph database is a branch system of the non-relational database NoSQL in the classification according to the data model, and a good database storage and data processing solution is provided for a certain graph model problem by applying graph storage entities and relation information among the entities. The most common people-to-people relationship information in the social network is taken as an example, the effect of storing the social network data by using the traditional relational database RDBMS is not ideal, a large amount of complex and interconnected data is difficult to find and traverse in depth, the response time is slow and exceeds the expectation, and the storage and calculation mode of the graphic database is an effective means for solving the problem. With the development of the fields of social networks, electronic commerce, resource retrieval and the like, a storage technology capable of processing complex association is urgently needed, and the adoption of a graphic database for organizing storage and calculating, analyzing and mining low-structured and interconnected data is more effective, so that the rapid development of the graphic database is greatly promoted. The graph database relies on graph theory as a theoretical basis, and describes and stores the relationship between the nodes in the graph and the graph. The work developed at home and abroad based on graph theory data mining is divided into five aspects of graph matching, keyword query, graph classification, graph clustering, frequent subgraph mining and the like.

The time series data refers to time series data. The time-series data is a data sequence in which the same uniform index is recorded in time series. The data in the same data column must be of the same aperture, requiring comparability. The time series data can be the number of epochs or the number of epochs.

In the related prior art, there is no graph database for storing time series data temporarily, and in order to better meet the development requirement of artificial intelligence for current big data, it is necessary to provide a graph database system for supporting storage of time series data.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a graph database system supporting time series data storage and a fusion storage method, so as to better adapt to the development requirement of artificial intelligence of current big data.

In order to achieve the above object, a first aspect of the present invention provides a graph database system supporting time series data storage, including a data receiving unit, a data attribute feature identifying unit, an attribute storage engine, a time series storage engine, and a relationship storage unit;

the data receiving unit is connected with the data attribute feature identification unit; the data attribute feature identification unit is respectively connected with the time sequence storage engine and the attribute storage engine; the attribute storage engine is connected with the time sequence storage engine; the attribute storage engine is connected with the time sequence storage engine through a relation storage unit;

after the data receiving unit receives data to be stored, the data attribute feature identification unit identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;

the structured data identified by the data attribute feature identification unit is stored in the attribute storage engine; the unstructured data identified by the data attribute feature identification unit is stored in the time sequence storage engine;

the time sequence storage engine is also used for acquiring and storing the time stamp data in the unstructured data and constructing a time sequence index according to the time stamp data.

Further, the time sequence storage engine comprises a data preprocessing unit and a data format unification unit;

the data preprocessing unit is connected with the data attribute feature identification unit and is used for carrying out normalization processing on the unstructured data obtained by the data attribute feature identification unit;

the data format unification unit is connected with the data preprocessing unit and is used for unifying formats of the data processed by the data preprocessing unit.

Further, the sequential storage engine further comprises a data validity check unit;

the data validity duplication checking unit is connected with the data format unification unit and is used for checking the duplication of the data with the unified format obtained by the data format unification unit and removing duplicated data.

Further, the sequential storage engine further comprises a timestamp data storage unit;

and the time stamp storage unit is connected with the data validity duplication checking unit and is used for acquiring and storing the time stamp information of the data obtained by the data validity duplication checking unit.

Further, the time sequence storage engine also comprises a creating time sequence index unit;

the time sequence index creating unit is connected with the time stamp data storage unit and used for creating a time sequence index according to the time stamp data stored in the time stamp data storage unit.

The second aspect of the invention provides a graph database fusion storage method supporting time sequence data storage, which comprises the following steps:

receiving data to be stored;

identifying a structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;

and if the data is unstructured data, acquiring and storing the time stamp data of the unstructured data.

Further, the method also comprises the following steps: and if the data is the structured data, storing the structured data to the attribute storage engine.

Further, if the data is unstructured data, acquiring and storing the timestamp data of the unstructured data includes:

if the data is unstructured data, performing normalization processing on the unstructured data;

carrying out format unified processing on the data after the normalization processing;

and acquiring and storing the time stamp data of the unstructured data according to the data subjected to format unified processing.

Further, the obtaining and storing the timestamp data of the unstructured data according to the uniformly processed data in the format includes:

carrying out duplicate checking on the data subjected to the formatting unified processing and removing repeated data;

and acquiring and storing the time stamp data of the de-duplicated data.

Further, the method also comprises the following steps:

and creating a time sequence index according to the time stamp data so as to be convenient for later data calling or inquiring.

This application adopts above technical scheme, possesses following beneficial effect at least:

the technical scheme of the application provides a graph database system supporting time sequence data storage and a fusion storage method, wherein after a data receiving unit receives data to be stored, a data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in an attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently. Therefore, the graph database system can store time sequence data, and a user can find out the statistical characteristics and the development regularity of the time sequence in the sample through the time sequence data stored in the graph database system, so that a time sequence model is constructed, the sample is predicted or other analyses are carried out, and the research and the use of the time sequence data are greatly facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating a graph database system supporting time series data storage according to an embodiment of the present invention;

FIG. 2 is a diagram of an exemplary graph database system that supports chronological data storage according to an embodiment of the present invention;

fig. 3 is a flowchart of a graph database fusion storage method supporting time series data storage according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the technical solutions of the present invention is provided with reference to the accompanying drawings and embodiments. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a graph database system supporting time series data storage according to an embodiment of the present invention includes a data receiving unit 10, a data attribute feature identifying unit 20, an attribute storage engine 30, a time series storage engine 40, and a relationship storage unit 50;

the data receiving unit 10 is connected with the data attribute feature identification unit 20; the data attribute feature identification unit 20 is respectively connected with the time sequence storage engine 40 and the attribute storage engine 30; the attribute storage engine 30 is connected with the timing storage engine 40; the attribute storage engine 30 is connected with the time sequence storage engine 40 through a relation storage unit 50;

after the data receiving unit 10 receives the data to be stored, the data attribute feature identification unit 20 identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;

the structured data identified by the data attribute feature identification unit 20 is stored in the attribute storage engine 30; the unstructured data identified by the data attribute feature identification unit 20 are stored in the time sequence storage engine 40;

the time sequence storage engine 40 is further configured to obtain and store timestamp data in the unstructured data, and construct a time sequence index according to the timestamp data.

According to the graph database system supporting time sequence data storage, after a data receiving unit receives data to be stored, firstly, a data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in an attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently.

In one embodiment, the present invention also provides a specific graph database system supporting time series data storage. As shown in fig. 2, the system comprises a data receiving unit 10, a data attribute feature identifying unit 20, an attribute storage engine 30, a time sequence storage engine 40 and a relation storage unit 50;

the data receiving unit 10 is connected with the data attribute feature identification unit 20; the data attribute feature identification unit 20 is respectively connected with the time sequence storage engine 40 and the attribute storage engine 30; the attribute storage engine 30 is connected with the timing storage engine 40; the attribute storage engine 30 is connected with the time sequence storage engine 40 through the relation storage unit 50;

the structured data identified by the data attribute feature identification unit 20 is stored in the attribute storage engine 30; the unstructured data identified by the data attribute feature identification unit 20 are stored in the time-series storage engine 40;

The sequential storage engine 40 includes a data preprocessing unit 41 and a data format unification unit 42; the data preprocessing unit 41 is connected to the data attribute feature identifying unit 20, and is configured to perform normalization processing on the unstructured data identified by the data attribute feature identifying unit 20, for example, data with a time stamp in a patent is sent to the data preprocessing unit 41; the data format unifying unit 42 is connected to the data preprocessing unit 41, and is configured to unify formats of the data processed by the data preprocessing unit 41. Illustratively, the data format unifying unit 42 mainly unifies the format of the preprocessed data, so as to facilitate subsequent storage. Such as: some timestamp data formats are: year, month, day, this order of storage; some of the timestamp data are: in the year of the month and the day, some are: the year of the day and the month. Data containing time stamps needs to be uniform in format.

The sequential storage engine 40 further includes a data validity checking unit 43; the data validity duplication checking unit 43 is connected to the data format unification unit 42, and is configured to duplicate the data with the unified format obtained by the data format unification unit 42 and remove duplicate data.

The sequential storage engine 40 further includes a timestamp data storage unit 44; the timestamp storage unit is connected to the data validity check unit 43, and is configured to obtain and store timestamp information of the data obtained by the data validity check unit 43. The sequential storage engine 40 is mainly used for performing duplicate checking on the received data and removing duplicate data. For example, the timestamp data sent by the front-end sensing device may be the same data that is sent repeatedly, and here, the duplicate removal is needed.

The timing storage engine 40 further includes a create timing index unit 45; the creation timing index unit 45 is connected to the time stamp data storage unit 44, and is configured to create a timing index from the time stamp data stored in the time stamp data storage unit 44.

The image data system provided by the embodiment of the invention is additionally provided with the time sequence storage engine, and after the type of the data received by the front end is identified, the data with the timestamp is preprocessed and stored in the time sequence storage engine. The shortcoming that the existing graph database cannot obtain timestamp data is overcome. The processing function of the graph database on the big data is expanded. Efficiency of data processing is provided.

In an embodiment, the present invention further provides a graph database fusion storage method supporting time series data storage, as shown in fig. 3, including the following steps:

receiving data to be stored;

identifying a structure type of data to be stored, wherein the structure type comprises structured data and unstructured data;

and if the data is unstructured data, acquiring and storing the time stamp data of the unstructured data. Specifically, if the unstructured data exist, the unstructured data are subjected to normalization processing; carrying out format unified processing on the data after normalization processing; carrying out duplicate checking on the formatted uniformly processed data and removing repeated data; acquiring and storing the timestamp data of the removed repeated data; a timing index is created from the timestamp data for later data calls or queries.

And if the data is the structured data, storing the structured data to the attribute storage engine.

The graph database fusion storage method supporting time sequence data storage, provided by the embodiment of the invention, is characterized in that data to be stored is distinguished by structured data and unstructured data, and if the data is structured data, the data is directly stored in an attribute storage engine; if the data is unstructured data, firstly performing normalization processing and format unification processing, and then performing duplicate checking and duplicate removal to ensure that the data formats are consistent and are not repeated; then obtaining and storing timestamp data; and finally, creating a time sequence index according to the timestamp data, and calling or inquiring later data according to the created time sequence index.

Claims

1. A graph database system supporting time series data storage, characterized by: the system comprises a data receiving unit, a data attribute feature identification unit, an attribute storage engine, a time sequence storage engine and a relation storage unit;

the time sequence storage engine is also used for acquiring and storing the timestamp data in the unstructured data and constructing a time sequence index according to the timestamp data.

2. A graph database system supporting time-series data storage according to claim 1, wherein: the time sequence storage engine comprises a data preprocessing unit and a data format unification unit;

3. A graph database system supporting time-series data storage according to claim 2, wherein: the time sequence storage engine also comprises a data validity check unit;

the data validity duplication checking unit is connected with the data format unification unit and is used for carrying out duplication checking on the data with unified format obtained by the data format unification unit and removing duplicated data.

4. A graph database system supporting time-series data storage according to claim 3, wherein: the time sequence storage engine also comprises a time stamp data storage unit;

5. A graph database system supporting time-series data storage according to claim 4, wherein: the time sequence storage engine also comprises a creation time sequence index unit;

6. A graph database fusion storage method supporting time sequence data storage is characterized by comprising the following steps:

receiving data to be stored;

and if the data is the unstructured data, acquiring and storing the time stamp data of the unstructured data.

7. The method for fusion storage of a graph database supporting time-series data storage according to claim 6, further comprising: and if the data is structured data, storing the structured data to the attribute storage engine.

8. The method for fusion storage of a graph database supporting time-series data storage according to claim 6, wherein: if the data is unstructured data, acquiring and storing the timestamp data of the unstructured data comprises:

if the data is unstructured data, normalization processing is carried out on the unstructured data;

carrying out format unified processing on the data after normalization processing;

9. The method for fusion storage of a graph database supporting time-series data storage according to claim 8, wherein: the time stamp data of the data obtained and stored unstructured data after the data is processed uniformly according to the format comprises:

and acquiring and storing the time stamp data of the de-duplicated data.

10. The method for fusion storage of a graph database supporting time-series data storage according to claim 9, further comprising:

and creating a time sequence index according to the time stamp data so as to be convenient for later-period data calling or querying.