CN115794875A - Graph database system supporting time sequence data storage and fusion storage method - Google Patents

Graph database system supporting time sequence data storage and fusion storage method Download PDF

Info

Publication number
CN115794875A
CN115794875A CN202111055522.8A CN202111055522A CN115794875A CN 115794875 A CN115794875 A CN 115794875A CN 202111055522 A CN202111055522 A CN 202111055522A CN 115794875 A CN115794875 A CN 115794875A
Authority
CN
China
Prior art keywords
data
storage
unit
time sequence
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111055522.8A
Other languages
Chinese (zh)
Inventor
吴章生
张晨
王海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Know Beijing Technology Co ltd
Original Assignee
Zhongke Know Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Know Beijing Technology Co ltd filed Critical Zhongke Know Beijing Technology Co ltd
Priority to CN202111055522.8A priority Critical patent/CN115794875A/en
Publication of CN115794875A publication Critical patent/CN115794875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a graph database system supporting time sequence data storage and a fusion storage method, belonging to the field of graph databases; after the data receiving unit receives the data to be stored, the data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in the attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently.

Description

Graph database system supporting time sequence data storage and fusion storage method
Technical Field
The invention relates to the field of graph databases, in particular to a graph database system supporting time sequence data storage and a fusion storage method.
Background
The graph database is a branch system of the non-relational database NoSQL in the classification according to the data model, and a good database storage and data processing solution is provided for a certain graph model problem by applying graph storage entities and relation information among the entities. The most common people-to-people relationship information in the social network is taken as an example, the effect of storing the social network data by using the traditional relational database RDBMS is not ideal, a large amount of complex and interconnected data is difficult to find and traverse in depth, the response time is slow and exceeds the expectation, and the storage and calculation mode of the graphic database is an effective means for solving the problem. With the development of the fields of social networks, electronic commerce, resource retrieval and the like, a storage technology capable of processing complex association is urgently needed, and the adoption of a graphic database for organizing storage and calculating, analyzing and mining low-structured and interconnected data is more effective, so that the rapid development of the graphic database is greatly promoted. The graph database relies on graph theory as a theoretical basis, and describes and stores the relationship between the nodes in the graph and the graph. The work developed at home and abroad based on graph theory data mining is divided into five aspects of graph matching, keyword query, graph classification, graph clustering, frequent subgraph mining and the like.
The time series data refers to time series data. The time-series data is a data sequence in which the same uniform index is recorded in time series. The data in the same data column must be of the same aperture, requiring comparability. The time series data can be the number of epochs or the number of epochs.
In the related prior art, there is no graph database for storing time series data temporarily, and in order to better meet the development requirement of artificial intelligence for current big data, it is necessary to provide a graph database system for supporting storage of time series data.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a graph database system supporting time series data storage and a fusion storage method, so as to better adapt to the development requirement of artificial intelligence of current big data.
In order to achieve the above object, a first aspect of the present invention provides a graph database system supporting time series data storage, including a data receiving unit, a data attribute feature identifying unit, an attribute storage engine, a time series storage engine, and a relationship storage unit;
the data receiving unit is connected with the data attribute feature identification unit; the data attribute feature identification unit is respectively connected with the time sequence storage engine and the attribute storage engine; the attribute storage engine is connected with the time sequence storage engine; the attribute storage engine is connected with the time sequence storage engine through a relation storage unit;
after the data receiving unit receives data to be stored, the data attribute feature identification unit identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
the structured data identified by the data attribute feature identification unit is stored in the attribute storage engine; the unstructured data identified by the data attribute feature identification unit is stored in the time sequence storage engine;
the time sequence storage engine is also used for acquiring and storing the time stamp data in the unstructured data and constructing a time sequence index according to the time stamp data.
Further, the time sequence storage engine comprises a data preprocessing unit and a data format unification unit;
the data preprocessing unit is connected with the data attribute feature identification unit and is used for carrying out normalization processing on the unstructured data obtained by the data attribute feature identification unit;
the data format unification unit is connected with the data preprocessing unit and is used for unifying formats of the data processed by the data preprocessing unit.
Further, the sequential storage engine further comprises a data validity check unit;
the data validity duplication checking unit is connected with the data format unification unit and is used for checking the duplication of the data with the unified format obtained by the data format unification unit and removing duplicated data.
Further, the sequential storage engine further comprises a timestamp data storage unit;
and the time stamp storage unit is connected with the data validity duplication checking unit and is used for acquiring and storing the time stamp information of the data obtained by the data validity duplication checking unit.
Further, the time sequence storage engine also comprises a creating time sequence index unit;
the time sequence index creating unit is connected with the time stamp data storage unit and used for creating a time sequence index according to the time stamp data stored in the time stamp data storage unit.
The second aspect of the invention provides a graph database fusion storage method supporting time sequence data storage, which comprises the following steps:
receiving data to be stored;
identifying a structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
and if the data is unstructured data, acquiring and storing the time stamp data of the unstructured data.
Further, the method also comprises the following steps: and if the data is the structured data, storing the structured data to the attribute storage engine.
Further, if the data is unstructured data, acquiring and storing the timestamp data of the unstructured data includes:
if the data is unstructured data, performing normalization processing on the unstructured data;
carrying out format unified processing on the data after the normalization processing;
and acquiring and storing the time stamp data of the unstructured data according to the data subjected to format unified processing.
Further, the obtaining and storing the timestamp data of the unstructured data according to the uniformly processed data in the format includes:
carrying out duplicate checking on the data subjected to the formatting unified processing and removing repeated data;
and acquiring and storing the time stamp data of the de-duplicated data.
Further, the method also comprises the following steps:
and creating a time sequence index according to the time stamp data so as to be convenient for later data calling or inquiring.
This application adopts above technical scheme, possesses following beneficial effect at least:
the technical scheme of the application provides a graph database system supporting time sequence data storage and a fusion storage method, wherein after a data receiving unit receives data to be stored, a data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in an attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently. Therefore, the graph database system can store time sequence data, and a user can find out the statistical characteristics and the development regularity of the time sequence in the sample through the time sequence data stored in the graph database system, so that a time sequence model is constructed, the sample is predicted or other analyses are carried out, and the research and the use of the time sequence data are greatly facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating a graph database system supporting time series data storage according to an embodiment of the present invention;
FIG. 2 is a diagram of an exemplary graph database system that supports chronological data storage according to an embodiment of the present invention;
fig. 3 is a flowchart of a graph database fusion storage method supporting time series data storage according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the technical solutions of the present invention is provided with reference to the accompanying drawings and embodiments. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a graph database system supporting time series data storage according to an embodiment of the present invention includes a data receiving unit 10, a data attribute feature identifying unit 20, an attribute storage engine 30, a time series storage engine 40, and a relationship storage unit 50;
the data receiving unit 10 is connected with the data attribute feature identification unit 20; the data attribute feature identification unit 20 is respectively connected with the time sequence storage engine 40 and the attribute storage engine 30; the attribute storage engine 30 is connected with the timing storage engine 40; the attribute storage engine 30 is connected with the time sequence storage engine 40 through a relation storage unit 50;
after the data receiving unit 10 receives the data to be stored, the data attribute feature identification unit 20 identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
the structured data identified by the data attribute feature identification unit 20 is stored in the attribute storage engine 30; the unstructured data identified by the data attribute feature identification unit 20 are stored in the time sequence storage engine 40;
the time sequence storage engine 40 is further configured to obtain and store timestamp data in the unstructured data, and construct a time sequence index according to the timestamp data.
According to the graph database system supporting time sequence data storage, after a data receiving unit receives data to be stored, firstly, a data attribute feature identification unit identifies the structure type of the data to be stored, and if the data is structured data, the data is stored in an attribute storage engine; if the data is unstructured data, the unstructured data is sent to a time sequence storage engine, the time sequence storage engine acquires and stores timestamp data in the unstructured data and constructs a time sequence index according to the timestamp data, and therefore data containing timestamp information in a database can be called and inquired according to the created time sequence index when the data is used subsequently.
In one embodiment, the present invention also provides a specific graph database system supporting time series data storage. As shown in fig. 2, the system comprises a data receiving unit 10, a data attribute feature identifying unit 20, an attribute storage engine 30, a time sequence storage engine 40 and a relation storage unit 50;
the data receiving unit 10 is connected with the data attribute feature identification unit 20; the data attribute feature identification unit 20 is respectively connected with the time sequence storage engine 40 and the attribute storage engine 30; the attribute storage engine 30 is connected with the timing storage engine 40; the attribute storage engine 30 is connected with the time sequence storage engine 40 through the relation storage unit 50;
after the data receiving unit 10 receives the data to be stored, the data attribute feature identification unit 20 identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
the structured data identified by the data attribute feature identification unit 20 is stored in the attribute storage engine 30; the unstructured data identified by the data attribute feature identification unit 20 are stored in the time-series storage engine 40;
the time sequence storage engine 40 is further configured to obtain and store timestamp data in the unstructured data, and construct a time sequence index according to the timestamp data.
The sequential storage engine 40 includes a data preprocessing unit 41 and a data format unification unit 42; the data preprocessing unit 41 is connected to the data attribute feature identifying unit 20, and is configured to perform normalization processing on the unstructured data identified by the data attribute feature identifying unit 20, for example, data with a time stamp in a patent is sent to the data preprocessing unit 41; the data format unifying unit 42 is connected to the data preprocessing unit 41, and is configured to unify formats of the data processed by the data preprocessing unit 41. Illustratively, the data format unifying unit 42 mainly unifies the format of the preprocessed data, so as to facilitate subsequent storage. Such as: some timestamp data formats are: year, month, day, this order of storage; some of the timestamp data are: in the year of the month and the day, some are: the year of the day and the month. Data containing time stamps needs to be uniform in format.
The sequential storage engine 40 further includes a data validity checking unit 43; the data validity duplication checking unit 43 is connected to the data format unification unit 42, and is configured to duplicate the data with the unified format obtained by the data format unification unit 42 and remove duplicate data.
The sequential storage engine 40 further includes a timestamp data storage unit 44; the timestamp storage unit is connected to the data validity check unit 43, and is configured to obtain and store timestamp information of the data obtained by the data validity check unit 43. The sequential storage engine 40 is mainly used for performing duplicate checking on the received data and removing duplicate data. For example, the timestamp data sent by the front-end sensing device may be the same data that is sent repeatedly, and here, the duplicate removal is needed.
The timing storage engine 40 further includes a create timing index unit 45; the creation timing index unit 45 is connected to the time stamp data storage unit 44, and is configured to create a timing index from the time stamp data stored in the time stamp data storage unit 44.
The image data system provided by the embodiment of the invention is additionally provided with the time sequence storage engine, and after the type of the data received by the front end is identified, the data with the timestamp is preprocessed and stored in the time sequence storage engine. The shortcoming that the existing graph database cannot obtain timestamp data is overcome. The processing function of the graph database on the big data is expanded. Efficiency of data processing is provided.
In an embodiment, the present invention further provides a graph database fusion storage method supporting time series data storage, as shown in fig. 3, including the following steps:
receiving data to be stored;
identifying a structure type of data to be stored, wherein the structure type comprises structured data and unstructured data;
and if the data is unstructured data, acquiring and storing the time stamp data of the unstructured data. Specifically, if the unstructured data exist, the unstructured data are subjected to normalization processing; carrying out format unified processing on the data after normalization processing; carrying out duplicate checking on the formatted uniformly processed data and removing repeated data; acquiring and storing the timestamp data of the removed repeated data; a timing index is created from the timestamp data for later data calls or queries.
And if the data is the structured data, storing the structured data to the attribute storage engine.
The graph database fusion storage method supporting time sequence data storage, provided by the embodiment of the invention, is characterized in that data to be stored is distinguished by structured data and unstructured data, and if the data is structured data, the data is directly stored in an attribute storage engine; if the data is unstructured data, firstly performing normalization processing and format unification processing, and then performing duplicate checking and duplicate removal to ensure that the data formats are consistent and are not repeated; then obtaining and storing timestamp data; and finally, creating a time sequence index according to the timestamp data, and calling or inquiring later data according to the created time sequence index.

Claims (10)

1. A graph database system supporting time series data storage, characterized by: the system comprises a data receiving unit, a data attribute feature identification unit, an attribute storage engine, a time sequence storage engine and a relation storage unit;
the data receiving unit is connected with the data attribute feature identification unit; the data attribute feature identification unit is respectively connected with the time sequence storage engine and the attribute storage engine; the attribute storage engine is connected with the time sequence storage engine; the attribute storage engine is connected with the time sequence storage engine through a relation storage unit;
after the data receiving unit receives data to be stored, the data attribute feature identification unit identifies the structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
the structured data identified by the data attribute feature identification unit is stored in the attribute storage engine; the unstructured data identified by the data attribute feature identification unit is stored in the time sequence storage engine;
the time sequence storage engine is also used for acquiring and storing the timestamp data in the unstructured data and constructing a time sequence index according to the timestamp data.
2. A graph database system supporting time-series data storage according to claim 1, wherein: the time sequence storage engine comprises a data preprocessing unit and a data format unification unit;
the data preprocessing unit is connected with the data attribute feature identification unit and is used for carrying out normalization processing on the unstructured data obtained by the data attribute feature identification unit;
the data format unification unit is connected with the data preprocessing unit and is used for unifying formats of the data processed by the data preprocessing unit.
3. A graph database system supporting time-series data storage according to claim 2, wherein: the time sequence storage engine also comprises a data validity check unit;
the data validity duplication checking unit is connected with the data format unification unit and is used for carrying out duplication checking on the data with unified format obtained by the data format unification unit and removing duplicated data.
4. A graph database system supporting time-series data storage according to claim 3, wherein: the time sequence storage engine also comprises a time stamp data storage unit;
and the time stamp storage unit is connected with the data validity duplication checking unit and is used for acquiring and storing the time stamp information of the data obtained by the data validity duplication checking unit.
5. A graph database system supporting time-series data storage according to claim 4, wherein: the time sequence storage engine also comprises a creation time sequence index unit;
the time sequence index creating unit is connected with the time stamp data storage unit and used for creating a time sequence index according to the time stamp data stored in the time stamp data storage unit.
6. A graph database fusion storage method supporting time sequence data storage is characterized by comprising the following steps:
receiving data to be stored;
identifying a structure type of the data to be stored, wherein the structure type comprises structured data and unstructured data;
and if the data is the unstructured data, acquiring and storing the time stamp data of the unstructured data.
7. The method for fusion storage of a graph database supporting time-series data storage according to claim 6, further comprising: and if the data is structured data, storing the structured data to the attribute storage engine.
8. The method for fusion storage of a graph database supporting time-series data storage according to claim 6, wherein: if the data is unstructured data, acquiring and storing the timestamp data of the unstructured data comprises:
if the data is unstructured data, normalization processing is carried out on the unstructured data;
carrying out format unified processing on the data after normalization processing;
and acquiring and storing the time stamp data of the unstructured data according to the data subjected to format unified processing.
9. The method for fusion storage of a graph database supporting time-series data storage according to claim 8, wherein: the time stamp data of the data obtained and stored unstructured data after the data is processed uniformly according to the format comprises:
carrying out duplicate checking on the data subjected to the formatting unified processing and removing repeated data;
and acquiring and storing the time stamp data of the de-duplicated data.
10. The method for fusion storage of a graph database supporting time-series data storage according to claim 9, further comprising:
and creating a time sequence index according to the time stamp data so as to be convenient for later-period data calling or querying.
CN202111055522.8A 2021-09-09 2021-09-09 Graph database system supporting time sequence data storage and fusion storage method Pending CN115794875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111055522.8A CN115794875A (en) 2021-09-09 2021-09-09 Graph database system supporting time sequence data storage and fusion storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111055522.8A CN115794875A (en) 2021-09-09 2021-09-09 Graph database system supporting time sequence data storage and fusion storage method

Publications (1)

Publication Number Publication Date
CN115794875A true CN115794875A (en) 2023-03-14

Family

ID=85473483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111055522.8A Pending CN115794875A (en) 2021-09-09 2021-09-09 Graph database system supporting time sequence data storage and fusion storage method

Country Status (1)

Country Link
CN (1) CN115794875A (en)

Similar Documents

Publication Publication Date Title
US20200301945A1 (en) System for data management in a large scale data repository
EP3513314B1 (en) System for analysing data relationships to support query execution
CN110019218B (en) Data storage and query method and equipment
WO2018051096A1 (en) System for importing data into a data repository
CN106547918B (en) Statistical data integration method and system
CN102262640A (en) Method and device for full-text retrieval of document database
CN109947796B (en) Caching method for query intermediate result set of distributed database system
CN110659282B (en) Data route construction method, device, computer equipment and storage medium
CN115269515B (en) Processing method for searching specified target document data
CN114911830B (en) Index caching method, device, equipment and storage medium based on time sequence database
CN112965979B (en) User behavior analysis method and device and electronic equipment
CN103886011A (en) Social-relation network creation and retrieval system and method based on index files
CN114238388A (en) Heterogeneous data collection and retrieval system based on multiple protocols
CN115827862A (en) Associated acquisition method for multivariate expense voucher data
CN105677723A (en) Method for establishing and searching data labels for industrial signal source
CN116991931A (en) Metadata management method and system
CN117076742A (en) Data blood edge tracking method and device and electronic equipment
CN115794875A (en) Graph database system supporting time sequence data storage and fusion storage method
CN110347726A (en) A kind of efficient time series data is integrated to store inquiry system and method
CN110389939A (en) A kind of Internet of Things storage system based on NoSQL and distributed file system
Colosi et al. Time series data management optimized for smart city policy decision
CN112487255A (en) Data sharing directory model based on block chain alliance
CN110781309A (en) Entity parallel relation similarity calculation method based on pattern matching
CN112286892B (en) Data real-time synchronization method and device of post-relation database, storage medium and terminal
CN110928868A (en) Vehicle data retrieval method, device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination