CN114840488B

CN114840488B - Distributed storage method, system and storage medium based on super fusion structure

Info

Publication number: CN114840488B
Application number: CN202210778538.XA
Authority: CN
Inventors: 刘江; 龚立义; 郭军
Original assignee: Baike Data Technology Shenzhen Co ltd
Current assignee: Baike Data Technology Shenzhen Co ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2023-05-02
Anticipated expiration: 2042-07-04
Also published as: CN114840488A

Abstract

The invention discloses a distributed storage method, a system and a storage medium based on a super fusion structure, wherein the method comprises the following steps: acquiring data to be stored, and generating log statistical information according to the data to be stored; according to the log statistical information, determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool, and integrating and marking data to be stored when confirming that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool, so as to obtain an integrated mark file; splitting the integrated mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources and realize high-efficiency communication.

Description

Distributed storage method, system and storage medium based on super fusion structure

Technical Field

The present invention relates to the field of data storage technologies, and in particular, to a distributed storage method, system, and storage medium based on a super fusion structure.

Background

Storage systems are one of the important components of computers. The storage system provides the capability of writing and reading information (programs and data) required by the computer work, and realizes the information memory function of the computer. Register, cache, main memory, external memory multilevel memory architecture are commonly adopted in modern computer systems; the core of the computer storage system is a memory, which is a memory device necessary for storing programs and data in a computer; the internal memory (abbreviated as memory) mainly stores programs and data required by the current work of the computer, and comprises a Cache memory and a main memory. The main component of the memory is a semiconductor memory. External memory (external memory for short) mainly has three implementation modes of magnetic memory, optical memory and semiconductor memory, and storage mediums include hard disk, optical disk, magnetic tape and mobile memory.

However, in the prior art, the efficiency of storing data is relatively low, and when the data changes or needs to be updated, it may be necessary to redistribute all the data.

Accordingly, there is a need for improvement and advancement in the art.

Disclosure of Invention

The invention aims to solve the technical problems that the efficiency of data storage is low and all data may need to be redistributed when the data changes or needs to be updated in the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a distributed storage method based on a super fusion structure, which is characterized in that the method includes:

acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;

determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed;

Splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.

In one implementation, the generating log statistics according to the data to be stored includes:

acquiring a file name, a keyword, a file size and a file type in the data to be stored;

and generating the log statistical information according to the file name, the keyword, the file size and the file type.

In one implementation manner, the determining, according to the log statistics information, whether a file identical or similar to the log statistics information exists in a preset uniform resource pool includes:

searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files which are matched with the file name, the keyword, the file size and the file type in the uniform resource respectively;

if the candidate files have the files with the same file names, keywords, file sizes and file types, determining that the uniform resource pool has the files with the same log statistical information;

And if the candidate file does not have the file with the same name, keyword, file size and file type, determining that the unified resource pool does not have the file with the same log statistical information.

sequentially carrying out similarity analysis on the file name, the keyword, the file size and the file type and the existing files in the uniform resource pool;

if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool;

and if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing files and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool.

In one implementation, the method further comprises:

if the unified resource pool has the files which are the same as or similar to the log statistical information, prompting selection items, wherein the selection items comprise: replacing similar files, saving the similar files as new files or not saving the similar files;

And receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.

In one implementation manner, the splitting, by the commercial server, the integrated markup file to obtain a split file includes:

determining the difference of the integration mark file through a computing node in the commercial server, and determining the same position of the integration mark file through a fusion node in the commercial server;

and splitting the integration mark file based on the same place and the different place to obtain the split file.

In one implementation manner, the obtaining the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file in the target storage disk includes:

determining the type information of the split file based on the file type in the log statistics;

according to the type information, the target storage disk with the same storage type as the type information is found out from the uniform resource pool;

and storing the split file into the target storage disk.

In a second aspect, an embodiment of the present invention further provides a distributed storage system based on a super fusion structure, where the system includes: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fusion all-in-one includes:

the system comprises a log statistical information acquisition module, a storage module and a storage module, wherein the log statistical information acquisition module is used for acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;

the integrated mark file acquisition module is used for determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool is confirmed;

the file splitting and storing module is used for splitting the integration mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.

In a third aspect, an embodiment of the present invention further provides a super-fusion all-in-one machine, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage program based on a super-fusion structure stored in the memory and capable of running on the processor, where when the processor executes the distributed storage program based on the super-fusion structure, the steps of the distributed storage method based on the super-fusion structure according to any one of the above schemes are implemented.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a distributed storage program based on a super fusion structure is stored, where the step of the distributed storage method based on a super fusion structure according to any one of the above schemes is implemented when the distributed storage program based on a super fusion structure is executed by a processor.

The beneficial effects are that: compared with the prior art, the invention provides a distributed storage method based on a super fusion structure, which is used for acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super-fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes mutually cooperate through an internal efficient distributed protocol to realize efficient communication.

Drawings

Fig. 1 is a flowchart of a specific implementation of a distributed storage method based on a super fusion structure according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a framework of a distributed storage system based on a super fusion structure according to an embodiment of the present invention.

Fig. 3 is a schematic block diagram of a super-fusion integrated machine in a distributed storage system based on a super-fusion structure according to an embodiment of the present invention.

Fig. 4 is a functional schematic diagram of a super-fusion integrated machine according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment provides a distributed storage method based on a super fusion structure, and the method based on the embodiment can realize efficient storage of data. In specific implementation, the embodiment obtains data to be stored, temporarily stores the data to be stored, and generates log statistics information according to the data to be stored, wherein the log statistics information is used for reflecting attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes are mutually cooperated through an internal efficient distributed protocol to realize efficient communication.

Exemplary method

The distributed storage method based on the super-fusion structure can be applied to terminal equipment, the terminal equipment can be a super-fusion integrated machine, the super-fusion integrated machine cluster has very good elastic expansion capability, and in the running process of the system, when nodes and hard disks are newly added or deleted, the super-fusion structure can realize optimization, automatic redistribution and equalization of data in the cluster; the whole data migration and re-equalization process does not influence the access of the application to the data; in the process of redistributing and balancing all data, the system can ensure that only as little data as possible is required to be redistributed, and all data in the system is not required to be adjusted and migrated, so that the stability and performance of the system are improved. Specifically, as shown in fig. 1, the distributed storage method based on the super fusion structure of the present embodiment specifically includes the following steps:

step S100, data to be stored is obtained, temporary storage is carried out on the data to be stored, and log statistical information is generated according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.

In this embodiment, as shown in fig. 2, the PC side first uploads the data to be stored, and the data is received and temporarily stored by the super-fusion integrated machine. The PC end in the embodiment is connected with a plurality of super-fusion all-in-one machines through a protocol channel, the protocol channel adopts a TCP/IP protocol to realize data transmission, and the super-fusion all-in-one machines temporarily store the data to be stored after acquiring the data to be stored uploaded by the PC end through the TCP/IP protocol. And then, the super fusion all-in-one machine generates log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.

Specifically, the super fusion all-in-one machine of the embodiment obtains a file name, a keyword, a file size and a file type in the data to be stored, and then generates the log statistical information according to the file name, the keyword, the file size and the file type.

Step 200, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed.

After the log statistical information is obtained, the super-fusion all-in-one machine of the embodiment can search in a preset unified resource pool according to the log statistical information to determine whether files which are the same as or similar to the log statistical information exist in the preset unified resource pool. Specifically, the super-fusion all-in-one machine of the embodiment may search in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determine candidate files in the uniform resource, which are matched with the file name, the keyword, the file size and the file type respectively. And if the candidate files have the files with the same file names, keywords, file sizes and file types, determining that the uniform resource pool has the files with the same log statistical information. And if the candidate file does not have the file with the same file name, keyword, file size and file type, determining that the unified resource pool does not have the file with the same log statistical information. Or, the super-fusion integrated machine of this embodiment may further perform similarity analysis on the file name, the keyword, the file size, and the file type and the existing file in the uniform resource pool sequentially. And if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool. And if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing files and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool. And if the unified resource pool has the files which are the same as or similar to the log statistical information, prompting selection items, wherein the selection items comprise: and replacing the similar file, saving the similar file as a new file or not saving the file, then receiving an input instruction by the super fusion all-in-one machine, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item. Specifically, the super fusion all-in-one machine of the embodiment can receive the instruction, and replace the similar file uploaded by the instruction with the similar file in the uniform resource pool through the PC end or save the similar file as a new file or not save the similar file. And when the unified resource pool does not have the files which are the same as or similar to the log statistical information, judging the data to be stored as new files, and integrating and marking the data to be stored to obtain an integrated marked file. The embodiment integrates and marks the data to be stored to distinguish the existing files, avoid confusion with the existing files, and facilitate better storage of the data to be stored.

And step S300, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.

In this embodiment, each super-fusion all-in-one machine is connected with the same resource pool through a commercial server, and after the integration mark file is obtained, the integration mark file can be split through the commercial server to obtain a split file. And then, the type information of the split file can be acquired, a target storage disk matched with the type information is selected from the uniform resource pool, and the split file is stored in the target storage disk. That is, in this embodiment, when the integrated markup file is stored, the integrated markup file is split first, and then stored according to the type information, so that data management is facilitated.

In one implementation manner, the step S300 specifically includes the following steps:

step S301, determining the difference of the integration mark file through a computing node in the commercial server, and determining the same position of the integration mark file through a fusion node in the commercial server;

And step S302, splitting the integration mark file based on the same place and the different place to obtain the split file.

In a specific implementation, the commercial server in this embodiment includes a computing node and a fusion node, where the computing node is configured to determine a difference of the integration markup file, and the fusion node is configured to determine the same place of the integration markup file. When determining the different places and the same place, the embodiment can split the integrated mark file based on the same place and the different place to obtain the split file. In other words, in this embodiment, the same part of the integrated tag file is split into one file, and the different parts of the integrated tag file are split into one file. In this embodiment, a resource pool is formed by a plurality of storage disks, and each storage disk is used to store different types of data files. Therefore, after the split file is obtained, the embodiment can further obtain the type information of the split file, and then store the split file into the corresponding storage disk according to the type information, thereby realizing distributed storage.

Specifically, the embodiment may determine the type information of the split file based on the file type in the log statistics information. Since the log statistics information is obtained based on the file name, the keyword, the file size and the file type in the data to be stored, the log statistics information includes the file type. The split file is obtained by splitting an integrated mark file formed by integrating and marking the data to be stored, so that after the file type is determined according to the log statistical information, the type information of the split file can be determined. Then, according to the type information, the embodiment finds out the target storage disk with the same storage type as the type information from the uniform resource pool; and finally, storing the split file into the target storage disk, so that different types of information can be distributed and orderly stored into the corresponding storage disk.

In one implementation manner, after the split file is stored in the target storage disk, the embodiment can encrypt data in each storage disk in the uniform resource pool, and the data is integrated with the identity information during encryption. Only after passing the identity verification, the PC end can call the data in the uniform resource pool, thereby ensuring the security of the data.

The super fusion all-in-one machine in the embodiment adopts a design concept of distributed and shared nothing, and the data is stored on all nodes in the cluster in a distributed mode through a distributed algorithm, so that a data redundancy mode of 2/3 copies of the cross-node can be possessed, and the data reliability is greatly improved; the super fusion architecture has no master-slave node arrangement, each computing/data node has the capability of bearing the function of the other computing/data node, and the nodes mutually cooperate and communicate through an internal efficient distributed protocol. The super fusion all-in-one machine deploys the computing virtualization and the distributed storage in the same server hardware, stores data on a local physical server aiming at applications with high I/O delay requirements such as virtualization, databases and the like, reduces network overhead caused by traditional external shared storage (SAN/NAS), allows a user to set the service level of computing and storage resources according to own needs, allows the distribution of actual resources to be automatically completed by a management platform, and allows management to be easy and simple.

To sum up, in this embodiment, first, data to be stored is obtained, the data to be stored is temporarily stored, and log statistics information is generated according to the data to be stored, where the log statistics information is used to reflect attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes are mutually cooperated through an internal efficient distributed protocol to realize efficient communication.

Exemplary System

Based on the above embodiment, the present invention further provides a distributed storage system based on a super fusion structure, where the system includes: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server. Wherein, as shown in fig. 3, the super fusion all-in-one machine includes: the system comprises a log statistical information acquisition module 10, an integrated mark file acquisition module 20 and a file splitting and storing module 30. Specifically, the log statistics information obtaining module 10 in this embodiment is configured to obtain data to be stored, temporarily store the data to be stored, and generate log statistics information according to the data to be stored, where the log statistics information is used to reflect attribute information in the data to be stored. The integrated tag file obtaining module 20 is configured to determine, according to the log statistics information, whether a file identical or similar to the log statistics information exists in a preset unified resource pool, and integrate and tag the data to be stored to obtain an integrated tag file when it is determined that the file identical or similar to the log statistics information does not exist in the unified resource pool. The file splitting and storing module 30 is configured to split the integration markup file through a commercial server to obtain a split file, obtain type information of the split file, select a target storage disk matched with the type information from the uniform resource pool, and store the split file in the target storage disk.

In one implementation, the log statistics information acquisition module 10 includes:

the information acquisition unit is used for acquiring file names, keywords, file sizes and file types in the data to be stored;

and the information generation unit is used for generating the log statistical information according to the file name, the key word, the file size and the file type.

In one implementation, the integrated markup file acquisition module 20 includes:

the candidate matching unit is used for searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource respectively;

the same judging unit is used for determining that the files with the same log statistical information exist in the uniform resource pool if the files with the same file name, keyword, file size and file type exist in the candidate files;

and determining different units, wherein the different units are used for determining that the file which is the same as the log statistical information does not exist in the uniform resource pool if the file which is the same as the file name, the keyword, the file size and the file type does not exist in the candidate file.

the similarity analysis unit is used for sequentially carrying out similarity analysis on the file name, the key word, the file size and the file type and the existing files in the uniform resource pool;

the similarity judging unit is used for determining that the files similar to the log statistical information exist in the uniform resource pool if the similarity between the file name, the keyword, the file size and the file type in the existing files exceeds a threshold value;

and the dissimilar judging unit is used for determining that the file similar to the log statistical information does not exist in the uniform resource pool if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing file and exceeds a threshold value.

In one implementation, the system further comprises:

the selection prompting module is configured to prompt a selection item if a file which is the same as or similar to the log statistical information exists in the uniform resource pool, where the selection item includes: replacing similar files, saving the similar files as new files or not saving the similar files;

and the selection operation module is used for receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.

In one implementation, the file splitting and storing module 30 further includes:

a file analysis unit, configured to determine, by using a computing node in the commercial server, a difference of the integrated markup file, and determine, by using a fusion node in the commercial server, the same place of the integrated markup file;

and the file splitting unit is used for splitting the integrated mark file based on the same place and the different place to obtain the split file.

a type determining unit configured to determine the type information of the split file based on the file type in the log statistical information;

the type analysis unit is used for finding out the target storage disk with the same storage type as the type information from the uniform resource pool according to the type information;

and the file storage unit is used for storing the split file into the target storage disk.

The working principle of each module in the distributed storage system based on the super fusion structure in this embodiment is the same as the principle of each step in the above method embodiment, and will not be described herein.

Based on the above embodiment, the present invention further provides a super-fusion integrated machine, and a schematic block diagram thereof may be shown in fig. 4. The super fusion all-in-one machine comprises a processor and a memory which are connected through a system bus, wherein the processor and the memory are arranged in a host. The processor of the super fusion all-in-one machine is used for providing computing and control capabilities. The memory of the super fusion all-in-one machine comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the super fusion integrated machine is used for communicating with an external terminal through network communication connection. The computer program, when executed by a processor, implements a distributed storage method based on a super fusion structure.

It will be appreciated by those skilled in the art that the functional block diagram shown in FIG. 4 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the super fusion machine to which the present inventive arrangements are applied, and that a particular super fusion machine may include more or fewer components than those shown, or may combine certain components, or may have a different arrangement of components.

In one embodiment, a super-fusion all-in-one machine is provided, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage method program based on a super-fusion structure stored in the memory and capable of running on the processor, where when the processor executes the distributed storage method program based on the super-fusion structure, the following operation instructions are implemented:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

In summary, the invention discloses a distributed storage method, a system and a storage medium based on a super fusion structure, wherein the method comprises the following steps: acquiring data to be stored, and generating log statistical information according to the data to be stored; according to the log statistical information, determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool, and integrating and marking data to be stored when confirming that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool, so as to obtain an integrated mark file; splitting the integrated mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources and realize high-efficiency communication.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A distributed storage method based on a super fusion structure, the method comprising:

determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed; splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk;

the obtaining the data to be stored, and temporarily storing the data to be stored includes:

the PC end is connected with a plurality of super-fusion integrated machines through a protocol channel, and the protocol channel adopts a TCP/IP protocol; after the super-fusion all-in-one machine obtains data to be stored uploaded by a PC (personal computer) end through the TCP/IP protocol, temporarily storing the data to be stored, wherein each super-fusion all-in-one machine is connected with a uniform resource pool through a commercial server;

The generating log statistics information according to the data to be stored includes:

generating the log statistical information according to the file name, the keyword, the file size and the file type; and determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, wherein the determining comprises the following steps:

searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource pool respectively;

if the candidate file does not have the file with the same name, keyword, file size and file type, determining that the uniform resource pool does not have the file with the same log statistical information; and determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, wherein the determining comprises the following steps:

if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool; if the similarity between the existing file and the file name, the keyword, the file size and the file type does not exist in the existing file and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool;

the method further comprises the steps of:

receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item;

splitting the integrated mark file through the commercial server to obtain a split file, wherein the method comprises the following steps:

Splitting the integration mark file based on the same place and the different place to obtain the split file;

the obtaining the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk comprises the following steps: determining the type information of the split file based on the file type in the log statistics; according to the type information, the target storage disk with the same storage type as the type information is found out from the uniform resource pool;

storing the split file into the target storage disk;

the method further comprises the steps of:

after the split file is stored in the target storage disk, the data in each storage disk in the uniform resource pool is encrypted, and identity information is fused during encryption.

2. A distributed storage system based on a super-fusion architecture, the system comprising: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fusion all-in-one includes:

the file splitting and storing module is used for splitting the integration mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk; the log statistical information acquisition module comprises:

The log statistical information acquisition module comprises:

the information generation unit is used for generating the log statistical information according to the file name, the key word, the file size and the file type;

the integrated mark file obtaining module comprises:

the candidate matching unit is used for searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource pool respectively;

determining different units, wherein if the candidate files do not have files with the same file names, keywords, file sizes and file types, the files with the same log statistical information do not exist in the uniform resource pool;

The integrated mark file obtaining module comprises:

a dissimilarity determining unit, configured to determine that, if the similarity between the existing file and the file name, the keyword, the file size and the file type does not exist in the existing file, exceeds a threshold, that a file similar to the log statistical information does not exist in the uniform resource pool;

the system further comprises:

the selection operation module is used for receiving an input instruction, determining a selection item corresponding to the instruction and executing an operation corresponding to the selection item;

The file splitting and storing module further comprises:

a file analysis unit, configured to determine, by using a computing node in the commercial server, a difference of the integrated markup file, and determine, by using a fusion node in the commercial server, the same place of the integrated markup file; the file splitting unit is used for splitting the integrated mark file based on the same place and the different place to obtain the split file;

the file splitting and storing module further comprises:

the file storage unit is used for storing the split file into the target storage disk;

the system further comprises:

3. The super-fusion all-in-one machine is characterized by comprising a memory, a processor and a super-fusion structure-based distributed storage program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the super-fusion structure-based distributed storage method according to claim 1 when executing the super-fusion structure-based distributed storage program.

4. A computer readable storage medium, wherein the computer readable storage medium has stored thereon a distributed storage program based on a super fusion structure, which when executed by a processor, implements the steps of the super fusion structure based distributed storage method according to claim 1.