CN117519611B

CN117519611B - Data distributed storage method and system for information system

Info

Publication number: CN117519611B
Application number: CN202410014478.3A
Authority: CN
Inventors: 杜睿; 吴凯; 陆斌; 曲秀华
Original assignee: Nanjing Yangzi Information Technology Co ltd
Current assignee: Nanjing Yangzi Information Technology Co ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-03-15
Anticipated expiration: 2044-01-05
Also published as: CN117519611A

Abstract

The invention is applicable to the technical field of information storage, and particularly relates to a data distributed storage method and system for an information system, wherein the method comprises the following steps: determining the number of nodes for data storage and establishing a node architecture; the method comprises the steps of slicing data to be stored, obtaining slicing units, determining target nodes where corresponding slicing units are located according to keywords in the slicing units, and storing the slicing units into the target nodes; determining data in a target node as first data; and determining the data stored again after the first data call is completed as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data. The invention fundamentally reduces the occupied space of information storage and backup by backing up the information variable, and improves the efficiency of information storage and extraction while ensuring the information security.

Description

Data distributed storage method and system for information system

Technical Field

The present invention relates to the field of information storage technologies, and in particular, to a data distributed storage method and system for an information system.

Background

With the increasing expansion of data scale, the traditional centralized data storage encounters a great bottleneck, and one solution is to perform distributed storage on data, where the distributed storage is a technology for jointly providing storage services by a plurality of nodes through networking connection, and has advantages such as: the data can be stored in a distributed mode, so that the storage efficiency, the reliability and the availability of the data storage can be improved, and the data can be retrieved and processed more efficiently; in the process of storing data, in order to avoid data loss, data can be backed up, which further aggravates the complexity of data storage.

Therefore, in the process of carrying out distributed storage on data, not only the original data is required to be stored, but also the modified data is required to be stored after the original data is modified, and a large amount of storage space is required to be occupied; when the data is required to be called, the data is required to be searched in a large amount of data, so that the calling efficiency of the data is greatly limited, the quick response of the data calling request is not facilitated, and the user experience is affected.

Disclosure of Invention

The present invention is directed to a data distributed storage method for an information system, so as to solve the problem of how to reduce the storage space of data in the above background art.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a method of data distributed storage for an information system, the method comprising:

determining the number of nodes for data storage and establishing a node architecture;

the method comprises the steps of fragmenting data to be stored, obtaining fragmenting units, determining target nodes where corresponding fragmenting units are located according to keywords in the fragmenting units, and storing the fragmenting units into the target nodes;

determining data in a target node as first data; determining data stored in the target node again after the first data is called as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data;

and backing up the distinguishing value based on the node architecture.

Further, the step of determining the number of nodes for data storage and establishing a node architecture includes:

acquiring data to be stored; determining the number of nodes for data storage and service roles corresponding to the nodes;

establishing a topological logic structure of all nodes based on the service roles;

and establishing a node architecture according to the topological logic structure and the access delay of all the nodes, and determining the number of data and the type of data of each node.

Further, the step of obtaining the slicing unit according to the data to be stored in the slicing unit, determining the target node where the corresponding slicing unit is located according to the keyword in the slicing unit, and storing the slicing unit in the target node includes:

data package is carried out on the data to be stored; and assigning the data packets to a storage queue for storage;

inserting the slicing characteristics into the data packet and slicing the data packet to obtain slicing units;

based on the keywords in the slicing units, establishing a corresponding relation between the keywords and the nodes according to the data types of the nodes; wherein one of the slicing units corresponds to only one node;

and routing the fragment units to the corresponding nodes according to the corresponding relation.

Further, the step of calculating the discrimination value indexed by the data span includes:

selecting a first node from the node architecture, and determining data in the first node as first data;

when the first data call is completed, determining the data stored in the first node again as second data;

determining a discrimination value of the first data and the second data based on a data span of the first data and the second data; wherein the data span comprises at least: range spans and dimension spans;

and generating an index relation between the distinguishing value and the first data, and storing the index relation into an index database.

Further, the step of backing up the differential value based on the node architecture includes:

adding a backup node in the node architecture;

and backing up the distinguishing value and storing backup data into a backup node.

Further, the method further comprises:

adding additional nodes into the node architecture, and constructing a pseudo distributed data cluster in the additional nodes;

copying the node architecture into a pseudo-distributed data cluster;

synchronization node architecture and pseudo-distributed data clusters.

Further, the method further comprises:

classifying the nodes according to the data types of the nodes;

and converting and integrating the classified nodes to establish a node set conforming to the format and structure.

Further, the system includes:

the confirmation module can determine the number of nodes for data storage and establish a node architecture;

the node corresponding module is capable of slicing data to be stored, acquiring slicing units, determining target nodes where the corresponding slicing units are located according to keywords in the slicing units, and storing the slicing units into the target nodes;

the distinguishing value calculation module is used for determining the data in the target node as first data; determining the data stored again after the first data call is completed as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data;

and the backup module is used for backing up the distinguishing value based on the node architecture.

Further, the confirmation module includes:

the node determining unit is used for acquiring data to be stored; determining the number of nodes for data storage and service roles corresponding to the nodes;

the architecture building unit can build the topological logic structure of all nodes based on the service roles; and establishing a node architecture according to the topological logic structure and the access delay of all the nodes, and determining the number of data and the type of data of each node.

Further, the node correspondence module includes:

the packaging unit can carry out data packaging on the data to be stored; and assigning the data packets to a storage queue for storage;

the slicing unit is used for inserting slicing characteristics into the data packet and slicing the data packet to obtain slicing units;

the corresponding unit can establish a corresponding relation between the keywords and the nodes according to the data types of the nodes based on the keywords in the slicing unit; wherein one of the slicing units corresponds to only one node;

and the transmission unit is used for routing the slicing units to the corresponding nodes according to the corresponding relation.

Compared with the prior art, the invention has the beneficial effects that:

1. the data can be stored in a distributed mode by determining the number of the nodes, the node structure, the data parts and the data types, so that the complicated data storage requirement is met, and meanwhile, the data loss is avoided and the storage capacity of the data is greatly reduced by backing up the distinguishing value, so that the storage space of the data is fundamentally reduced; in addition, due to the reduction of the memory, the calling efficiency of the data is also greatly improved.

2. By setting the pseudo distributed data clusters, the node architecture can be corrected, and by establishing the node set, the calling range of the data is reduced, the calling efficiency of the data is further improved, and the data is more convenient to use.

Drawings

FIG. 1 is a block flow diagram of a method for distributed storage of data for an information system according to an embodiment of the present invention;

FIG. 2 is a first sub-flowchart of a method for distributed storage of data for an information system according to an embodiment of the present invention;

FIG. 3 is a second sub-flowchart of a method for distributed storage of data for an information system according to an embodiment of the present invention;

FIG. 4 is a third sub-flowchart of a method for distributed storage of data for an information system according to an embodiment of the present invention;

FIG. 5 is a fourth sub-flowchart of a method for distributed storage of data for an information system according to an embodiment of the present invention;

FIG. 6 is a block diagram of a data distributed storage system for an information system according to an embodiment of the present invention;

FIG. 7 is a block diagram illustrating a validation module in a distributed data storage system for an information system according to an embodiment of the present invention;

FIG. 8 is a block diagram illustrating a node corresponding module in a data distributed storage system for an information system according to an embodiment of the present invention;

FIG. 9 is a block diagram illustrating a discrimination calculation module in a data distribution storage system for an information system according to an embodiment of the present invention;

fig. 10 is a block diagram illustrating a backup module in a data distributed storage system for an information system according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first data-distributed storage method script for an information system may be referred to as a second data-distributed storage method script for an information system, and similarly, a second data-distributed storage system script for an information system may be referred to as a first data-distributed storage system script for an information system, without departing from the scope of the present application.

In embodiment 1, fig. 1 shows a flow of implementing a data distributed storage method for an information system according to an embodiment of the present invention, and the following details are described below:

s100: the number of nodes for data storage is determined and a node architecture is established.

Determining total number of nodes which can be used for data storage, determining specific detail information such as node capacity, node transmission rate and the like, establishing a node architecture according to the total number of the nodes, wherein a main node, a sub-node and an auxiliary node are required to be determined in the node architecture, the main node is mainly responsible for managing the sub-nodes in the whole system and processing requests of clients, the sub-nodes do not provide external services during the period that the main node provides services, the main node controls all the sub-nodes to perform data storage, each sub-node provides data query and update services, and the auxiliary node is mainly used under other node damage or special conditions; after the node architecture is built, the architecture needs to be tested, and normal operation of the node architecture is ensured.

S200: and fragmenting data to be stored, acquiring a fragmenting unit, determining a target node where a corresponding fragmenting unit is located according to a keyword in the fragmenting unit, and storing the fragmenting unit into the target node.

The acquired data to be stored is segmented, and segmentation features can be inserted into the data to be stored at equal intervals, so that the data to be stored is divided into a plurality of segmentation units, and keywords in each segmentation unit are extracted; and storing the slicing units into different slicing nodes according to the keywords, thereby completing slicing and storing the data to be stored.

S300: determining data in a target node as first data; and determining the data stored in the target node again after the first data call is completed as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data.

After the slicing of the data to be stored is completed, taking the data in a certain node as first data, after the first data is called, the first data is stored in the node again after being modified and converted, the modified data in the node is taken as second data, the first data and the second data are compared, the modification range of the data, such as data, image modification and the like, is found, the distinguishing value of the first data and the second data is determined according to the modification range, wherein the original data in the first data is the initial position, the modified data in the second data is the final position, and other parts of the first data are not changed except the initial position and the final position.

S400: and backing up the distinguishing value based on the node architecture.

And in the auxiliary nodes of the node architecture, creating backup nodes and backing up the difference values.

In embodiment 2, fig. 2 shows a flow of implementing a data distributed storage method for an information system according to an embodiment of the present invention, and the following details the steps of determining the number of nodes for data storage and establishing a node architecture are as follows:

s101: acquiring data to be stored; the number of nodes used for data storage and the service roles corresponding to the nodes are determined.

The method comprises the steps of obtaining data to be stored, determining the number of nodes which can be used for data storage, namely the number of the nodes in a node architecture, determining the service role of each node, namely the classification of the nodes, for example, the service role of a certain node is nutrient data, the service role and keywords have a corresponding relation, and if nutrient data exists in a certain fragment unit, the fragment unit is classified into the node for storage.

S102: establishing a topological logic structure of all nodes based on the service roles; and establishing a node architecture according to the topological logic structure and the access delay of all the nodes, and determining the number of data and the type of data of each node.

The topology logic structure of all nodes is established according to service roles, wherein the topology logic structure is the layout of a main node, a partition node and an auxiliary node, the main node is mainly responsible for overall planning and communication, the partition node is used for data storage, the auxiliary node serves as a standby, access delay, capacity and other data of the partition node are acquired, the number and types of data corresponding to each partition node are determined, if the access delay of a certain partition node is lower, the partition node can be used as a priority node, the number of partition units which can be stored by the partition node is determined according to the capacity of the partition node, and the data can be stored in different partition nodes in a classified mode according to different types of data, such as image data, text data and the like.

In embodiment 3, fig. 3 shows a flow for implementing a data distributed storage method for an information system according to an embodiment of the present invention, and the following steps are detailed for obtaining a sliced unit from data to be stored in a sliced unit, determining a target node where the corresponding sliced unit is located according to a keyword in the sliced unit, and storing the sliced unit in the target node, where the steps are as follows:

s201: data package is carried out on the data to be stored; and assigning the data packets to a store queue for storage.

The received data are packaged, and after the packaging, the data are transferred to a storage queue for storage;

s202: inserting the slicing feature into the data packet and slicing the data packet to obtain slicing units.

And the master node controls the packet data in the storage queue, adds the fragmentation feature into the packet data, and fragments the data packet to obtain fragmentation units.

S203: based on the keywords in the slicing units, establishing a corresponding relation between the keywords and the nodes according to the data types of the nodes; one of the tile units corresponds to only one node.

Extracting keywords in the slicing units, and then dividing the slicing units into different slicing nodes according to the data types of the slicing nodes to finish slicing the data; wherein each slicing unit corresponds to a slicing node, and if the number of data of the slicing node is 3, 3 slicing units can be stored on behalf of the slicing node.

S204: and routing the fragment units to the corresponding nodes according to the corresponding relation.

After the corresponding of the slicing units and the slicing nodes is completed, the slicing units are routed to the corresponding slicing nodes, and then the distributed storage of the data can be completed.

In embodiment 4, fig. 4 shows a flow for implementing a data distributed storage method for an information system according to an embodiment of the present invention, and the following details the step of calculating a difference value with a data span as an index are as follows:

s301: selecting a first node from the node architecture, and determining data in the first node as first data; when the first data call is completed, the data stored again in the first node is determined as the second data.

The first node is selected from the partial nodes (the first node may be any partial node for convenience of explanation), all data in the first node is used as first data, when the first data is called, the first data is changed, the changed data is stored into the first node again, the data is used as second data, and of course, after the first node is used for many times, third data or other more data can be generated.

S302: determining a discrimination value of the first data and the second data based on a data span of the first data and the second data; wherein the data span comprises at least: range spans and dimension spans.

Determining the data span of the first data and the second data, wherein the data span refers to the change range of the data, for example, a certain section of characters in the first data is changed into pictures in the second data after being modified and used, and the change is called dimension span; if the text is modified into another text, the text is called range change, and the changed part in the second data is determined.

S303: and generating an index relation between the distinguishing value and the first data, and storing the index relation into an index database.

An index relation between the discrimination value (the change portion) and the first data is generated, and such index relation is stored in an index database, and the first data is searched for in the index data, so that the discrimination value can be obtained, and similarly, the discrimination value can be searched for, so that the first data can be obtained.

In embodiment 5, fig. 5 shows a flow of implementing the data distributed storage method for an information system according to the embodiment of the present invention, and the following details of the step of backing up the differential value based on the node architecture are as follows:

s401: and adding a backup node in the node architecture.

In the auxiliary nodes, a backup node is added, and the backup node is mainly used for backing up the distinguishing value.

S402: and backing up the distinguishing value and storing backup data into a backup node.

Backing up the difference value and storing the backup data into a backup node;

when the data is required to be called in daily use, the data of the difference values in the sub-nodes and the backup nodes are integrated, and the modified data can be obtained; by backing up the distinguishing value, the data safety can be ensured, the data loss can be prevented, the workload of data backup can be reduced, and the data storage space can be reduced;

in the field of data storage, original data before use is generally stored and backed up, after the data is used, the original data is not covered because the original data is required to be stored, the original data and the used data are stored at the same time and are backed up, and the data storage mode needs to occupy a large amount of storage space.

In embodiment 6, unlike embodiment 1, in an embodiment of the present invention, the method further includes:

copying the node architecture into a pseudo-distributed data cluster;

synchronization node architecture and pseudo-distributed data clusters.

In this embodiment, additional nodes are added in the auxiliary nodes of the node architecture, a pseudo distributed data cluster is built in the additional nodes, the node architecture is checked and calculated by using the pseudo distributed data cluster, the node architecture is synchronized into the pseudo distributed data cluster, and the node architecture is verified by using experimental data, so that the normal operation of the node architecture is ensured, and unknown errors of the node architecture in the building process are prevented.

In embodiment 7, unlike embodiment 1, in an embodiment of the present invention, the method further includes:

classifying the nodes according to the data types of the nodes;

According to the types of the data, a node set is created, and the node set is given corresponding names, when the data in the node is needed to be used, screening and searching can be carried out in the node set, so that the calling efficiency of the data is further improved.

Fig. 6 is a block diagram showing the constitution of a data distributed storage system for an information system 1 according to an embodiment of the present invention, the data distributed storage system for an information system 1 comprising:

a confirmation module 11, which can determine the number of nodes for data storage and establish a node architecture;

the node corresponding module 12 is capable of slicing data to be stored, acquiring slicing units, determining target nodes where the corresponding slicing units are located according to keywords in the slicing units, and storing the slicing units into the target nodes;

a difference value calculation module 13, configured to determine data in the target node as first data; determining the data stored again after the first data call is completed as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data;

a backup module 14, configured to backup the difference value based on the node architecture.

Fig. 7 is a block diagram showing the composition and structure of a data distributed storage system for an information system according to an embodiment of the present invention, and the confirmation module 11 includes:

a node determining unit 111 for acquiring data to be stored; determining the number of nodes for data storage and service roles corresponding to the nodes;

an architecture establishing unit 112 capable of constructing a topology logic structure of all nodes based on the service roles; and establishing a node architecture according to the topological logic structure and the access delay of all the nodes, and determining the number of data and the type of data of each node.

Fig. 8 is a block diagram showing the structure of a data distributed storage system for an information system according to an embodiment of the present invention, where the node correspondence module 12 includes:

the packetizing unit 121 may perform data packetizing on the data to be stored; and assigning the data packets to a storage queue for storage;

the slicing unit 122 is configured to insert a slicing feature into the data packet and slice the data packet to obtain a slicing unit;

a correspondence unit 123, configured to establish a correspondence between the keywords and the nodes according to the data types of the nodes based on the keywords in the slicing unit; wherein one of the slicing units corresponds to only one node;

and the transmission unit 124 is configured to route the slicing unit to a corresponding node according to the corresponding relationship.

Fig. 9 is a block diagram showing the composition and structure of a data distributed storage system for an information system according to an embodiment of the present invention, and the discrimination value calculating module 13 includes:

the calling unit 131 selects a first node from the node architecture, and determines data in the first node as first data; when the first data call is completed, determining the data stored in the first node again as second data;

a calculation unit 132 that determines a discrimination value of the first data and the second data based on a data span of the first data and the second data; wherein the data span comprises at least: range spans and dimension spans;

the index unit 133 generates an index relation between the discrimination value and the first data, and stores the index relation in an index database.

FIG. 10 is a block diagram showing the structure of a data distributed storage system for an information system according to an embodiment of the present invention, where the backup module 14 includes:

an adding unit 141 that adds a backup node in the node architecture;

and a storage unit 142 for backing up the discrimination values and storing the backup data in the backup node.

Wherein step S100 is completed by the confirmation module 11, step S200 is completed by the node correspondence module 12, step S300 is completed by the discrimination value calculation module 13, and step S400 is completed by the backup module 14;

specifically, the node determining unit 111 is configured to complete S101, and may complete the work of establishing the node architecture by acquiring the data to be stored, the number of nodes, the corresponding service roles, and the like; the architecture building unit 112 is configured to complete S102, and is configured to build a node architecture, and determine the number and types of shard units that can be stored;

the packaging unit 121 is used for completing S201, the slicing unit 122 is used for completing S202, the corresponding unit 123 is used for completing S203, the transmission unit 124 is used for completing S204, and the operations of slicing data, confirming a target node and the like are completed;

the calling unit 131 is used for completing S301, the calculating unit 132 is used for completing S302, and the indexing unit 133 is used for completing S303.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method for distributed storage of data for an information system, the method comprising:

the method comprises the steps of slicing data to be stored, obtaining slicing units, determining target nodes where corresponding slicing units are located according to keywords in the slicing units, and storing the slicing units into the target nodes;

and backing up the distinguishing value based on the node architecture.

2. The method of claim 1, wherein the step of determining the number of nodes for data storage and establishing a node architecture comprises:

3. The method of claim 2, wherein the step of slicing the data to be stored, obtaining a sliced unit, determining a target node where the corresponding sliced unit is located according to a keyword in the sliced unit, and storing the sliced unit in the target node comprises:

inserting the fragmentation feature into the data packet and fragmenting the data packet; obtaining a slicing unit;

4. The method of claim 2, wherein the step of calculating a discrimination value indexed by data span comprises:

5. The method of claim 4, wherein the step of backing up the discrimination values based on the node architecture comprises:

adding a backup node in the node architecture;

6. The method of claim 5, wherein the method further comprises:

copying the node architecture into a pseudo-distributed data cluster;

synchronization node architecture and pseudo-distributed data clusters.

7. A method according to claim 3, characterized in that the method further comprises:

classifying the nodes according to the data types of the nodes;

8. A data distributed storage system for an information system, the system comprising:

the confirmation module is used for determining the number of nodes for data storage and establishing a node architecture;

the distinguishing value calculation module is used for determining the data in the target node as first data; determining data stored in the target node again after the first data is called as second data, comparing the data spans of the first data and the second data obtained through statistics, and calculating a distinguishing value with the data spans as indexes, wherein the data spans are used for representing the initial position and the final position of the data;

9. The system of claim 8, the validation module comprising:

10. The system of claim 9, the node correspondence module comprising:

the packaging unit is used for packaging the data to be stored; and assigning the data packets to a storage queue for storage;