CN104123300A

CN104123300A - Data distributed storage system and method

Info

Publication number: CN104123300A
Application number: CN201310150539.0A
Authority: CN
Inventors: 吴朱华; 潘志铭
Original assignee: SHANGHAI PEOPLEYUN INFORMATION TECHNOLOGY Co Ltd
Current assignee: SHANGHAI PEOPLEYUN INFORMATION TECHNOLOGY Co Ltd
Priority date: 2013-04-26
Filing date: 2013-04-26
Publication date: 2014-10-29
Anticipated expiration: 2033-04-26
Also published as: CN104123300B

Abstract

The invention discloses a data distributed storage system and method. The system comprises a node cluster module, a data import module and a storage module. The node cluster module is used for connecting data nodes in a cluster with corresponding management nodes; the data import module is used for scanning input data according to data blocks with sizes set and loading the input data into a memory, data in the memory are grouped according to characteristic values of the data, and the grouped data are sent to the corresponding data nodes; the storage module is used for storing data fragmentations in the memory after the data nodes receive file fragmentations, and the data nodes output logs to a hard disk; whether the data in the memory exceed a set threshold value or not is judged, if the data in the memory exceed the set threshold value, the data are reorganized and compressed and then written into the hard disk, and corresponding log files restored through user memory data are deleted. According to the system and method, the cluster based on memory computing power can be accelerated; real-time loading and processing capacity for large-scale data can be improved, and response time of the system is shortened.

Description

Data distributed memory system and method

Technical field

The invention belongs to database storage techniques field, relate to a kind of distributed memory system, relate in particular to a kind of data distributed memory system; Meanwhile, the invention still further relates to a kind of data distributed storage method.

Background technology

At present, the data storage method of database has: 1. unit data storage method; 2. master-slave back-up storage mode; 3. utilize the storage mode of distributed file system.But, no matter adopt above which kind of mode, all exist certain deficiency.

Although unit data storage method is convenient to management and using, extensibility exists major defect to be difficult to meet the access needs of current mass data, and the security of data also has problems.Master-slave back-up storage mode has only solved safety issue, and other problems still exists.Utilize the database storage mode of distributed file system, although solve the security of data and the access requirement of mass data, and be not suitable for those data access that requires low delay and processing.

In view of this, nowadays in the urgent need to designing a kind of new distributed memory system for database and method, to solve the above-mentioned defect of existing storage system.

Summary of the invention

Technical matters to be solved by this invention is: a kind of distributed memory system for database is provided, can realizes the cluster based on rapid memory computing power and promote large-scale data real-time loading and processing power, accelerate the response time of whole system.

In addition, the present invention also provides a kind of data distributed storage method, can realize the cluster based on rapid memory computing power and promote large-scale data real-time loading and processing power, accelerates the response time of whole system.

For solving the problems of the technologies described above, the present invention adopts following technical scheme:

A kind of data distributed memory system, described system comprises:

Registering modules, in order to be registered to management node by client by the back end in cluster;

Data importing module, in order to the data of input are scanned and be written into internal memory according to setting big or small data block, the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end; Described data importing module specifically comprises data scanning unit, packet rule match unit, packet unit, data transmission unit; Described data scanning unit to be so that the data of input are scanned and be written into internal memory according to setting big or small data block, and in order to data are carried out cutting and generated integer numerical value identification code as data according to eigenwert according to data feature values; Described packet rule match unit divides into groups according to rule of classification this identification code in order to the Data Identification code according to different pieces of information to it; Described packet unit is in order to divide into groups the setting size data piece through overscanning in internal memory according to the eigenwert of data; The data after grouping are sent to corresponding back end by described data transmission unit;

Memory module, in order to data fragmentation is retained in internal memory after back end receives file fragmentation, judging whether needs these data to backup to other back end, as needs back up by backup module; Back end output journal, to hard disk, recovers for datarams data; Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data is sorted out according to metadata feature, after the reorganization of data, then compresses; The mode of the reorganization to data is mainly the eigenwert according to data, and similarity between data sorts, and the data of maximum similarity can be deposited continuously, for next step compression storing data is prepared; After the reorganization of data, because similar data can store together, adopt LZAM algorithm to compress it, to obtain higher compressibility, and then after write hard disk, and delete the journal file that corresponding user memory data are recovered;

Backup module, in order in data transmission to after on corresponding back end, these data are backed up according to the backup number of setting, the data of backup will be distributed on other back end;

Retrieval module is retrieved corresponding data in order to receive the request of data retrieval at management node after; Retrieval module specifically comprises positioning unit, inefficacy judging unit, request Dispatching Unit, retrieval unit, result merge cells; Management node is by the related back end of positioning unit locator data retrieval request; Management node adopts by inefficacy judging unit Lease is machine-processed determines whether this back end lost efficacy, and directly returns to request failure information as lost efficacy, if effectively, management node is by asking Dispatching Unit dispense request to respective nodes; Back end receives after data retrieval request, after by retrieval unit, corresponding data being retrieved, returns results to client; Client utilizes result merge cells that the result receiving is merged.

A kind of data distributed memory system, described system comprises:

Node cluster module, in order to connect corresponding management node by the back end in cluster;

Data importing module, in order to the data of input are scanned and be written into internal memory according to setting big or small data block, the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end;

Memory module, in order to after back end receives file fragmentation, data fragmentation is retained in internal memory, back end output journal, to hard disk, recovers for datarams data; Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data are reorganized, after compression, write hard disk, and delete the journal file that corresponding user memory data are recovered.

As a preferred embodiment of the present invention, described data importing module specifically comprises data cutting unit, document scanning unit, packet rule match unit, packet unit, data transmission unit;

Described data cutting unit is in order to scan and to be written into internal memory to the data of input according to setting big or small data block; Described packet rule match unit is in order to set the eigenwert of different regular computational datas according to different data types; Described packet unit is in order to divide into groups the data block of the setting size through overscanning according to the feature of data; The data after grouping are sent to corresponding back end by described data transmission unit.

As a preferred embodiment of the present invention, described system also comprises backup module, in order in data transmission to after on corresponding back end, these data are backed up according to the backup number of setting, the data of backup will be distributed on other back end.

As a preferred embodiment of the present invention, described system also comprises retrieval module, in order to receive the request of data retrieval at management node after, corresponding data is retrieved;

Described retrieval module specifically comprises positioning unit, inefficacy judging unit, request Dispatching Unit, retrieval unit, result merge cells;

Management node is by the related back end of positioning unit locator data retrieval request; Management node adopts by inefficacy judging unit Lease is machine-processed determines whether this back end lost efficacy, and directly returns to request failure information as lost efficacy, if effectively, management node is by asking Dispatching Unit dispense request to respective nodes; Back end receives after data retrieval request, after by retrieval unit, corresponding data being retrieved, returns results to client; Client utilizes result merge cells that the result receiving is merged.

A kind of data distributed storage method, described method comprises the steps:

Node cluster step: the back end in cluster is connected to corresponding management node;

Data importing step: the data of input are scanned and be written into internal memory according to setting big or small data block, and the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end;

Storing step: after back end receives file fragmentation, data fragmentation is retained in internal memory, back end output journal, to hard disk, recovers for datarams data; Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data are reorganized, after compression, write hard disk, and delete the journal file that corresponding user memory data are recovered.

As a preferred embodiment of the present invention, described data importing step comprises:

Data scanning step, scans and is written into internal memory to the data of input according to setting big or small data block;

Packet rule match step, sets the eigenwert of different regular computational datas according to different data types;

Packet step, divides into groups the data block of the setting size through overscanning according to the feature of data;

Data sending step, is sent to corresponding back end by the data after grouping.

As a preferred embodiment of the present invention, described method also comprises backup-step:, to after on corresponding back end these data are backed up according to the backup number of setting in data transmission, the data of backup will be distributed on other back end.

As a preferred embodiment of the present invention, described method also comprises searching step, corresponding data is retrieved after receiving the request of data retrieval at management node;

Described searching step specifically comprises:

The back end that management node locator data retrieval request is related;

Management node adopts Lease mechanism to determine whether this back end lost efficacy, and directly returns to request failure information as lost efficacy, if effectively, management node dispense request is to respective nodes;

Back end receives after data retrieval request, after corresponding data is retrieved, returns results to client;

Client merges the result receiving.

Beneficial effect of the present invention is: data distributed memory system and method that the present invention proposes, can realize the cluster calculating based on internal memory; Can realize the real-time transaction management to large-scale data, the response time of Hoisting System.On each back end, internal storage data all backs up on disk, the safety of bonding machine data; Simultaneity factor adopts redundant design, and each piece of data all has redundancy backup on different nodes, and the machine of delaying of any node does not affect data integrity and system availability.

Brief description of the drawings

Fig. 1 is the composition schematic diagram of data distributed memory system of the present invention.

Fig. 2 is the process flow diagram that imports data in data distributed storage method of the present invention.

Fig. 3 is the composition schematic diagram of the data importing module of system of the present invention.

Fig. 4 is the process flow diagram of data storage in data distributed storage method of the present invention.

Fig. 5 is the process flow diagram of data retrieval in data distributed storage method of the present invention.

Embodiment

Describe the preferred embodiments of the present invention in detail below in conjunction with accompanying drawing.

Embodiment mono-

Refer to Fig. 1, the present invention has disclosed a kind of data distributed memory system, and described system comprises: Registering modules 1(also can be called " node cluster module "), data importing module 2, memory module 3, backup module, retrieval module 4.

Registering modules 1 is in order to be registered to management node by client by the back end in cluster;

Data importing module 2 is in order to scan and to be written into internal memory to the data of input according to setting big or small data block, and the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end.

Particularly, refer to Fig. 3, in the present embodiment, described data importing module specifically comprises data cutting unit, document scanning unit, packet rule match unit, packet unit, data transmission unit.

Described data cutting unit is in order to scan and to be written into internal memory to the data of input according to setting big or small data block; Described packet rule match unit is in order to set the eigenwert of different regular computational datas according to different data types; Described packet unit is in order to divide into groups the setting size data piece through overscanning in internal memory according to the eigenwert of data; The data after grouping are sent to corresponding back end by described data transmission unit.

Memory module 3 in order to be retained in data fragmentation in internal memory after back end receives file fragmentation, and judging whether needs these data to backup to other back end, as needs back up by backup module.Backup module in order in data transmission to after on corresponding back end, these data are backed up according to the backup number of setting, the data of backup will be distributed on other back end.Back end output journal, to hard disk, recovers for datarams data; Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data are reorganized, then compress; The mode of the reorganization to data is mainly the eigenwert according to data, and similarity between data sorts, and the data of maximum similarity can be deposited continuously, for next step compression storing data is prepared; After the reorganization of data, because similar data can store together, adopt LZAM algorithm to compress it, to obtain higher compressibility, and then after write hard disk, and delete the journal file that corresponding user memory data are recovered.

Retrieval module 4 is retrieved corresponding data in order to receive the request of data retrieval at management node after.Retrieval module specifically comprises positioning unit, inefficacy judging unit, request Dispatching Unit, retrieval unit, result merge cells.

Particularly, management node is by the related back end of positioning unit locator data retrieval request; Management node adopts by inefficacy judging unit Lease is machine-processed determines whether this back end lost efficacy, and directly returns to request failure information as lost efficacy, if effectively, management node is by asking Dispatching Unit dispense request to respective nodes; Back end receives after data retrieval request, after by retrieval unit, corresponding data being retrieved, returns results to client; Client utilizes result merge cells that the result receiving is merged.

More than introduced the composition of data distributed memory system of the present invention, the present invention, in disclosing said system, also discloses a kind of data distributed storage method; Refer to Fig. 2, Fig. 4, described method comprises the steps:

[step S1] node cluster step (being registration step): the back end in cluster is connected to corresponding management node, can complete connection by the mode of registration, as client sends log-on message, the back end in cluster is registered on management node.

[step S2] data importing step: the data of input are scanned and be written into internal memory according to setting big or small data block, and the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end.In conjunction with Fig. 3, described data importing step specifically comprises:

Step S21, data scanning step, scan and be written into internal memory to the data of input according to setting big or small data block;

Step S22, packet rule match step, set the eigenwert of different regular computational datas according to different data types;

Step S23, packet step, divide into groups the data block of the setting size through overscanning according to the feature of data;

Step S24, data sending step, be sent to corresponding back end by the data after grouping.

[step S3] storing step: as shown in Figure 4, after back end receives file fragmentation, data fragmentation is retained in internal memory, judging whether needs these data to backup to other back end, as needs back up.

Backup-step is included in data transmission to after on corresponding back end, and these data are backed up according to the backup number of setting, and the data of backup will be distributed on other back end.Back end output journal, to hard disk, recovers for datarams data.

Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data are reorganized, then compress; The mode of the reorganization to data is mainly the eigenwert according to data, and similarity between data sorts, and the data of maximum similarity can be deposited continuously, for next step compression storing data is prepared; After the reorganization of data, because similar data can store together, adopt LZAM algorithm to compress it, to obtain higher compressibility, and then after write hard disk, and delete the journal file that corresponding user memory data are recovered.

[step S4] searching step, to corresponding data retrieves after receiving the request of data retrieval at management node.Refer to Fig. 5, described searching step specifically comprises:

Step S40, client send to the request of data retrieval on the node of data management;

Step S41, the related back end of management node locator data retrieval request;

Step S42, management node adopt Lease mechanism to determine whether this back end lost efficacy, and directly return to request failure information as lost efficacy, if effectively, management node dispense request is to respective nodes;

Step S43, back end receive after data retrieval request, after corresponding data is retrieved, return results to client;

Step S44, client merge the result receiving.

In sum, data distributed memory system and method that the present invention proposes, can realize the cluster calculating based on internal memory; Can realize the real-time transaction management to large-scale data, the response time of Hoisting System.On each back end, internal storage data all backs up on disk, the safety of bonding machine data; Simultaneity factor adopts redundant design, and each piece of data all has redundancy backup on different nodes, and the machine of delaying of any node does not affect data integrity and system availability.

Here description of the invention and application is illustrative, not wants scope of the present invention to limit in the above-described embodiments.Here the distortion of disclosed embodiment and change is possible, and for those those of ordinary skill in the art, the various parts of the replacement of embodiment and equivalence are known.Those skilled in the art are noted that in the situation that not departing from spirit of the present invention or essential characteristic, and the present invention can be with other form, structure, layout, ratio, and realize with other assembly, material and parts.In the situation that not departing from the scope of the invention and spirit, can carry out other distortion and change to disclosed embodiment here.

Claims

1. a data distributed memory system, is characterized in that, described system comprises:

Data importing module, in order to the data of input are scanned and be written into internal memory according to setting big or small data block, the data in internal memory are divided into groups according to the eigenwert of data, then the data after grouping are sent to corresponding back end; Described data importing module specifically comprises data cutting unit, data scanning unit, packet rule match unit, packet unit, data transmission unit; Described data cutting unit is in order to scan and to be written into internal memory to the data of input according to setting big or small data block; Described packet rule match unit is in order to set the eigenwert of Different Rule computational data according to different data types; Described packet unit is in order to divide into groups the setting size data piece through overscanning in internal memory according to the eigenwert of data; The data after grouping are sent to corresponding back end by described data transmission unit;

2. a data distributed memory system, is characterized in that, described system comprises:

Memory module, in order to after back end receives data fragmentation, data fragmentation is retained in internal memory, back end output journal, to hard disk, recovers for datarams data; Judge whether the size of data in internal memory exceedes the threshold values setting, as exceeded, data are reorganized, after compression, write hard disk, and delete the journal file that corresponding user memory data are recovered.

3. data distributed memory system according to claim 2, is characterized in that:

Described data importing module specifically comprises data cutting unit, document scanning unit, packet rule match unit, packet unit, data transmission unit;

Described data cutting unit is in order to scan and to be written into internal memory to the data of input according to setting big or small data block; Described packet rule match unit is in order to set the eigenwert of Different Rule computational data according to different data types; Described packet unit is in order to divide into groups the data block of the setting size through overscanning according to the feature of data; The data after grouping are sent to corresponding back end by described data transmission unit.

4. data distributed memory system according to claim 2, is characterized in that:

Described system also comprises backup module, in order in data transmission to after on corresponding back end, these data are backed up according to the backup number of setting, the data of backup will be distributed on other back end.

5. data distributed memory system according to claim 2, is characterized in that:

Described system also comprises retrieval module, in order to receive the request of data retrieval at management node after, corresponding data is retrieved;

6. a data distributed storage method, is characterized in that, described method comprises the steps:

7. data distributed storage method according to claim 6, is characterized in that:

Described data importing step comprises:

8. data distributed storage method according to claim 6, is characterized in that:

Described method also comprises backup-step:, to after on corresponding back end these data are backed up according to the backup number of setting in data transmission, the data of backup will be distributed on other back end.

9. data distributed storage method according to claim 6, is characterized in that:

Described method also comprises searching step, corresponding data is retrieved after receiving the request of data retrieval at management node;

Described searching step specifically comprises:

The back end that management node locator data retrieval request is related;

Client merges the result receiving.