CN110209742B

CN110209742B - Block chain based storage system and method classified according to data importance

Info

Publication number: CN110209742B
Application number: CN201910521926.8A
Authority: CN
Inventors: 于辉; 刘善武; 李进; 刘文敏; 唐坤
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2021-07-27
Anticipated expiration: 2039-06-17
Also published as: CN110209742A

Abstract

The invention particularly relates to a storage system and a storage method based on block chain classification according to data importance. The storage system based on the block chain and classified according to the data importance comprises a metadata crawler module, a data importance judgment module, a data repeatability judgment module, a data storage routing information module, an SQL Server database cluster and a block chain storage module; the data to be stored is accessed to the SQL Server database cluster and the block chain storage module through the data storage routing information module, the SQL Server database cluster stores non-importance data, and the block chain storage module stores importance data. According to the storage system and the storage method based on the block chain and classified according to the data importance, the importance data and the non-importance data are respectively stored while repeated storage is avoided, and meanwhile, the characteristics of openness, transparency, non-deletability and the like of the block chain are utilized, so that the important data in a mass data resource pool are efficiently accessed, and the storage efficiency and the equipment utilization rate of storage resources are improved.

Description

Block chain based storage system and method classified according to data importance

Technical Field

The invention relates to the technical field of block chains, in particular to a block chain-based storage system and a block chain-based storage method classified according to data importance.

Background

Currently, with the continuous development of cloud computing and big data technology, the data volume gradually shows geometric growth. Therefore, the efficiency of data retrieval is becoming lower, and a lot of important data is lost in a huge amount of garbage (non-important) data, which causes a great waste of storage resources and affects the utilization efficiency of data resources.

The existing storage mode and the storage system do not judge the importance of data and do not effectively store and retrieve the important data, and the storage hardware equipment is greatly consumed in the process of storing unimportant garbage data resources.

The block chain technology is a brand new distributed infrastructure and computing mode which utilizes a block chain type data structure to verify and store data, utilizes a distributed node consensus algorithm to generate and update data, utilizes a cryptography mode to ensure the safety of data transmission and access, and utilizes an intelligent contract composed of automatic script codes to program and operate data. The blockchain technology has the characteristics of openness, transparency, non-tampering and permanent storage, and the authenticity and the robustness of data are high. Currently, the blockchain has been widely used in the financial industry, and various virtual currencies are most representative of bitcoin; besides being used in financial industries, other industries such as logistics industry and real estate industry have good application prospects.

Compared with the traditional database or other recording modes, the invention designs a storage system and a storage method based on block chain classification according to data importance.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a simple and efficient storage system and method based on block chains and classified according to data importance.

The invention is realized by the following technical scheme:

a block chain based storage system that classifies data importance, comprising: the system comprises a metadata crawler module, a data importance judgment module, a data repeatability judgment module, a data storage routing information module, an SQLServer database cluster and a block chain storage module; the metadata crawler module is connected to the data storage routing information module through the data importance judging module, the data storage requirement is accessed to the data storage routing information module through the data repeatability judging module, the data to be stored is accessed to the SQLServer database cluster and the block chain storage module through the data storage routing information module, the SQL Server database cluster stores non-importance data, and the block chain storage module stores importance data;

the SQL Server database cluster comprises an SQL Server storage cluster management module and an SQL Server storage node, wherein the SQL Server storage cluster management module manages the storage behavior of non-important data in the SQL Server storage node and simultaneously monitors and manages the state of the SQL Server storage node;

the data storage routing information module is also connected with a data storage routing information backup module, and is connected with a block chain storage module through an intelligent contract module; the intelligent contract module automatically calls the importance data and calculates the specific storage position in the block chain storage module.

The metadata crawler module is used for configuring key data and important data sources in the resource field according to the configured data crawling source and aiming at the resource types; the data importance judging module is used for calculating an importance value IMP, comparing the importance value IMP with a data importance standard value QUA preset by a user, judging that the importance data is important data if the calculated importance value IMP is larger than a data importance standard value QUA, and judging that the importance data is non-important data if the calculated importance value IMP is not larger than the data importance standard value QUA; the data access routing information module is used for storing routing information records of data storage, and the routing information records are equivalent to index record information; the data access route information backup module is used for performing backup storage on route information records stored by data and forming a main/standby mode with the data access route information module; when the data access routing information module fails to influence the overall function, the data access routing information backup module actively switches the service to the data access routing information backup module to operate, so that the same data access routing retrieval function as the data access routing information module is realized, and the robustness of the system is improved; and the data repeatability judging module is used for comparing the data storage requirement with the storage information record of the data storage routing information module, judging whether the data is repetitive data or not, and if so, giving up the storage.

The data crawling source is updated and set by a user independently and comprises a search engine and a thesis, academic and periodical website.

The calculation formula of the importance value IMP of the resource Q is as follows:

wherein, lambda is a correction factor which is more than 0 and less than 1, M is the number of important sources related to the resource Q, and M is_{General assembly}The total number of important sources configured for the system, N is the number of times resource Q is referenced, N_{General assembly}The number of times all resources crawled for the crawler are referenced, L is the number of times resource Q is retrieved by the user_{General assembly}The number of times that all resources are retrieved by the user, W is the number of the existing reference resources Q in the system, W_{General assembly}The total number of all resources existing in the system.

According to the storage method of the storage system based on the block chain classification according to the data importance, the data repeatability and the data importance are judged during data storage, repeated storage is avoided, the intelligent contract module is called according to the importance judgment result, the importance data are stored by adopting the block chain technology, and the SQLServer database cluster is adopted to store the non-importance data, so that the importance data and the non-importance data are stored separately, and the storage efficiency of the important data and the resource utilization rate of the storage device are improved. By utilizing the characteristics of open, transparent and undeletable block chains and the like, the key data in the mass data resource pool can be efficiently accessed, and the storage of storage resources and the utilization rate of equipment are improved.

The invention discloses a storage method for classifying a storage system according to data importance based on a block chain, which comprises the following steps:

(1) after a user inputs a data storage requirement, a data repeatability judging module searches whether repeated data information exists in a data storage routing information module or not, and if so, the data storage requirement is abandoned;

(2) the data storage requirement of the non-repeated data judged by the data repeatability judging module is accessed to the data storage routing information module, the data storage routing information module sends the data to be stored to the data importance judging module, calculates an importance value IMP, compares the importance value IMP with a data importance standard value QUA preset by a user, judges the data to be important if the calculated importance value IMP is greater than a data importance standard value QUA, and otherwise, judges the data to be non-important data;

(3) the data importance judging module returns the judging result to the data storage routing information module, the intelligent contract module automatically calls importance data and stores the importance data into the block chain storage module, and non-importance data is sent to the SQL Server database cluster for storage;

the data storage routing information module stores the routing information record of the data storage for calling, and the data access routing information backup module performs backup storage on the routing information record of the data storage.

The invention has the beneficial effects that: according to the storage system and the storage method based on the block chain and classified according to the data importance, the importance data and the non-importance data are respectively stored while repeated storage is avoided, and meanwhile, the characteristics of openness, transparency, non-deletability and the like of the block chain are utilized, so that the important data in a mass data resource pool are efficiently accessed, and the storage efficiency and the equipment utilization rate of storage resources are improved.

Drawings

FIG. 1 is a schematic diagram of a system and method for sorting storage according to data importance based on block chains according to the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is described in detail below with reference to the accompanying drawings and embodiments. It should be noted that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention

The storage system based on the block chain and classified according to the data importance comprises a metadata crawler module, a data importance judgment module, a data repeatability judgment module, a data storage routing information module, an SQL Server database cluster and a block chain storage module; the metadata crawler module is connected to the data storage routing information module through the data importance judging module, the data storage requirement is accessed to the data storage routing information module through the data repeatability judging module, the data to be stored is accessed to the SQL Server database cluster and the block chain storage module through the data storage routing information module, the SQL Server database cluster stores non-importance data, and the block chain storage module stores importance data;

The metadata crawler module is used for configuring key data and important data sources in the resource field according to the configured data crawling source and aiming at the resource types; the data importance judging module is used for calculating an importance value IMP, comparing the importance value IMP with a data importance standard value QUA preset by a user, judging that the importance data is important data if the calculated importance value IMP is larger than a data importance standard value QUA, and otherwise, judging that the importance data is non-important data; the data access routing information module is used for storing routing information records of data storage, and the routing information records are equivalent to index record information; the data access route information backup module is used for performing backup storage on route information records stored by data and forming a main/standby mode with the data access route information module; when the data access routing information module fails to influence the overall function, the data access routing information backup module actively switches the service to the data access routing information backup module to operate, so that the same data access routing retrieval function as the data access routing information module is realized, and the robustness of the system is improved; and the data repeatability judging module is used for comparing the data storage requirement with the storage information record of the data storage routing information module, judging whether the data is repetitive data or not, and if so, giving up the storage.

wherein, lambda is a correction factor which is more than 0 and less than 1, M is the number of important sources related to the resource Q, and M is_{General assembly}The total number of important sources configured for the system, N is the number of times resource Q is referenced, N_{General assembly}The number of times all resources crawled for the crawler are referenced, L is the number of times resource Q is retrieved by the user_{General assembly}The number of times that all resources are retrieved by the user, W is the number of the existing reference resources Q in the system, W_{General assembly}The total number of all resources already (stored) in the system.

The correction factor lambda can be dynamically adjusted according to the system operation condition. When the system values are stored in the system (SQL Server database cluster and block chain storage module) less (the application degree of the stored resources may be inaccurate), the value of the correction factor lambda can be increased properly, so that 1-lambda becomes smaller, and the importance degree is calculated from the internet data by a metadata crawler module; with the continuous operation of the system, when the data stored in the system is gradually increased, the numerical value of the correction factor lambda can be properly improved, and the proportion of the latter half of the formula is improved.

For example: the initial value lambda of the system can be given to be 0.8, the value of the correction factor lambda is reduced by 0.1 every time the system operates for a period of time, and finally the value of the correction factor lambda tends to be stable after the system gradually reaches dynamic balance.

According to the storage method of the block chain based storage system classified according to the data importance, the data repeatability and the data importance are judged during data storage, repeated storage is avoided, an intelligent contract module is called according to the importance judgment result, the importance data are stored by adopting a block chain technology, an SQL Server database cluster is adopted to store the non-importance data in a database mode, so that the importance data and the non-importance data are stored separately, the storage efficiency of the importance data and the resource utilization rate of storage equipment are improved, the characteristics of the block chain being open, transparent and non-deletable are utilized, efficient access of the importance data in a mass data resource pool is achieved, and the storage of storage resources and the equipment utilization rate are improved.

The storage method for classifying the storage system according to the data importance based on the block chain comprises the following steps:

(4) the data storage routing information module stores the routing information record of the data storage for calling, and the data access routing information backup module performs backup storage on the routing information record of the data storage.

Claims

1. A block chain based storage system that classifies data importance, comprising: the system comprises a metadata crawler module, a data importance judgment module, a data repeatability judgment module, a data storage routing information module, an SQL Server database cluster and a block chain storage module; the metadata crawler module is connected to the data storage routing information module through the data importance judging module, a data storage requirement input by a user comprises a resource Q to be stored, the resource Q to be stored is accessed to the data storage routing information module through the data repeatability judging module, the resource Q to be stored is accessed to the SQL Server database cluster and the block chain storage module through the data storage routing information module, the SQL Server database cluster stores non-importance data, and the block chain storage module stores importance data;

the data storage routing information module is also connected with a data storage routing information backup module, and is connected with a block chain storage module through an intelligent contract module; the intelligent contract module automatically calls the importance data and calculates the specific storage position in the block chain storage module;

the metadata crawler module is used for crawling key data and important data sources in the resource field according to the configured data crawling source aiming at the type of the resource Q; the data importance judging module is used for calculating an importance value IMP of the resource Q, comparing the importance value IMP of the resource Q with a data importance standard value QUA preset by a user, if the calculated importance value IMP of the resource Q is larger than a data importance standard value QUA, judging the resource Q to be importance data, and if not, judging the resource Q to be non-importance data; the data storage routing information module is used for storing routing information records of data storage, which are equivalent to index record information; the data storage routing information backup module is used for performing backup storage on routing information records stored in the data storage and forming a main/standby mode with the data storage routing information module; when the data storage routing information module fails to influence the overall function, the data storage routing information backup module actively switches the service to the data storage routing information backup module to operate, so that the same data access routing retrieval function as the data storage routing information module is realized, and the robustness of the system is improved; the data repeatability judging module is used for comparing the data storage requirement with the storage information record of the data storage routing information module, judging whether the resource Q is repetitive data or not, and if so, giving up the storage;

wherein, λ is a correction factor larger than 0 and smaller than 1, M is the number of important data sources associated with resource Q, M is_{General assembly}Total number of important data sources configured for the system, N is the number of times resource Q is referenced, N is_{General assembly}Is the number of times all resources in the important data source associated with resource Q are referenced, L is the number of times resource Q is retrieved_{General assembly}The number of times all resources in the important data source associated with resource Q have been retrieved; w is the number of resources of the existing reference resource Q in the SQL Server database cluster and the block chain storage module, W is_{General assembly}The total number of resources existing in the SQL Server database cluster and the block chain storage module.

2. The system according to claim 1, wherein the memory system is classified according to data importance based on blockchain, and comprises: the data crawling source is updated and set by a user independently and comprises a search engine and a thesis, academic and periodical website.

3. The method according to claim 1 or 2, wherein the storage system is classified according to data importance based on blockchain, and the method comprises: the method comprises the steps of judging data repeatability and data importance during data storage, avoiding repeated storage, calling an intelligent contract module according to an importance judgment result, storing importance data by adopting a block chain technology, storing non-importance data by adopting an SQL Server database cluster, realizing separate storage of the importance data and the non-importance data, improving storage efficiency of the importance data and resource utilization rate of storage equipment, realizing efficient access of the importance data in a mass data resource pool by utilizing the characteristics of openness, transparency and non-delectability of a block chain, and improving storage of storage resources and equipment utilization rate.

4. The method of claim 3, comprising the steps of:

(4) the data storage routing information module stores the routing information record of the data storage for calling, and the data storage routing information backup module performs backup storage on the routing information record of the data storage.