CN114840488B - Distributed storage method, system and storage medium based on super fusion structure - Google Patents

Distributed storage method, system and storage medium based on super fusion structure Download PDF

Info

Publication number
CN114840488B
CN114840488B CN202210778538.XA CN202210778538A CN114840488B CN 114840488 B CN114840488 B CN 114840488B CN 202210778538 A CN202210778538 A CN 202210778538A CN 114840488 B CN114840488 B CN 114840488B
Authority
CN
China
Prior art keywords
file
resource pool
uniform resource
statistical information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210778538.XA
Other languages
Chinese (zh)
Other versions
CN114840488A (en
Inventor
刘江
龚立义
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baike Data Technology Shenzhen Co ltd
Original Assignee
Baike Data Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baike Data Technology Shenzhen Co ltd filed Critical Baike Data Technology Shenzhen Co ltd
Priority to CN202210778538.XA priority Critical patent/CN114840488B/en
Publication of CN114840488A publication Critical patent/CN114840488A/en
Application granted granted Critical
Publication of CN114840488B publication Critical patent/CN114840488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a distributed storage method, a system and a storage medium based on a super fusion structure, wherein the method comprises the following steps: acquiring data to be stored, and generating log statistical information according to the data to be stored; according to the log statistical information, determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool, and integrating and marking data to be stored when confirming that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool, so as to obtain an integrated mark file; splitting the integrated mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources and realize high-efficiency communication.

Description

Distributed storage method, system and storage medium based on super fusion structure
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a distributed storage method, system, and storage medium based on a super fusion structure.
Background
Storage systems are one of the important components of computers. The storage system provides the capability of writing and reading information (programs and data) required by the computer work, and realizes the information memory function of the computer. Register, cache, main memory, external memory multilevel memory architecture are commonly adopted in modern computer systems; the core of the computer storage system is a memory, which is a memory device necessary for storing programs and data in a computer; the internal memory (abbreviated as memory) mainly stores programs and data required by the current work of the computer, and comprises a Cache memory and a main memory. The main component of the memory is a semiconductor memory. External memory (external memory for short) mainly has three implementation modes of magnetic memory, optical memory and semiconductor memory, and storage mediums include hard disk, optical disk, magnetic tape and mobile memory.
However, in the prior art, the efficiency of storing data is relatively low, and when the data changes or needs to be updated, it may be necessary to redistribute all the data.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
The invention aims to solve the technical problems that the efficiency of data storage is low and all data may need to be redistributed when the data changes or needs to be updated in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a distributed storage method based on a super fusion structure, which is characterized in that the method includes:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed;
Splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In one implementation, the generating log statistics according to the data to be stored includes:
acquiring a file name, a keyword, a file size and a file type in the data to be stored;
and generating the log statistical information according to the file name, the keyword, the file size and the file type.
In one implementation manner, the determining, according to the log statistics information, whether a file identical or similar to the log statistics information exists in a preset uniform resource pool includes:
searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files which are matched with the file name, the keyword, the file size and the file type in the uniform resource respectively;
if the candidate files have the files with the same file names, keywords, file sizes and file types, determining that the uniform resource pool has the files with the same log statistical information;
And if the candidate file does not have the file with the same name, keyword, file size and file type, determining that the unified resource pool does not have the file with the same log statistical information.
In one implementation manner, the determining, according to the log statistics information, whether a file identical or similar to the log statistics information exists in a preset uniform resource pool includes:
sequentially carrying out similarity analysis on the file name, the keyword, the file size and the file type and the existing files in the uniform resource pool;
if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool;
and if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing files and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool.
In one implementation, the method further comprises:
if the unified resource pool has the files which are the same as or similar to the log statistical information, prompting selection items, wherein the selection items comprise: replacing similar files, saving the similar files as new files or not saving the similar files;
And receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.
In one implementation manner, the splitting, by the commercial server, the integrated markup file to obtain a split file includes:
determining the difference of the integration mark file through a computing node in the commercial server, and determining the same position of the integration mark file through a fusion node in the commercial server;
and splitting the integration mark file based on the same place and the different place to obtain the split file.
In one implementation manner, the obtaining the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file in the target storage disk includes:
determining the type information of the split file based on the file type in the log statistics;
according to the type information, the target storage disk with the same storage type as the type information is found out from the uniform resource pool;
and storing the split file into the target storage disk.
In a second aspect, an embodiment of the present invention further provides a distributed storage system based on a super fusion structure, where the system includes: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fusion all-in-one includes:
the system comprises a log statistical information acquisition module, a storage module and a storage module, wherein the log statistical information acquisition module is used for acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
the integrated mark file acquisition module is used for determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool is confirmed;
the file splitting and storing module is used for splitting the integration mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In a third aspect, an embodiment of the present invention further provides a super-fusion all-in-one machine, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage program based on a super-fusion structure stored in the memory and capable of running on the processor, where when the processor executes the distributed storage program based on the super-fusion structure, the steps of the distributed storage method based on the super-fusion structure according to any one of the above schemes are implemented.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a distributed storage program based on a super fusion structure is stored, where the step of the distributed storage method based on a super fusion structure according to any one of the above schemes is implemented when the distributed storage program based on a super fusion structure is executed by a processor.
The beneficial effects are that: compared with the prior art, the invention provides a distributed storage method based on a super fusion structure, which is used for acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super-fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes mutually cooperate through an internal efficient distributed protocol to realize efficient communication.
Drawings
Fig. 1 is a flowchart of a specific implementation of a distributed storage method based on a super fusion structure according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a framework of a distributed storage system based on a super fusion structure according to an embodiment of the present invention.
Fig. 3 is a schematic block diagram of a super-fusion integrated machine in a distributed storage system based on a super-fusion structure according to an embodiment of the present invention.
Fig. 4 is a functional schematic diagram of a super-fusion integrated machine according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment provides a distributed storage method based on a super fusion structure, and the method based on the embodiment can realize efficient storage of data. In specific implementation, the embodiment obtains data to be stored, temporarily stores the data to be stored, and generates log statistics information according to the data to be stored, wherein the log statistics information is used for reflecting attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes are mutually cooperated through an internal efficient distributed protocol to realize efficient communication.
Exemplary method
The distributed storage method based on the super-fusion structure can be applied to terminal equipment, the terminal equipment can be a super-fusion integrated machine, the super-fusion integrated machine cluster has very good elastic expansion capability, and in the running process of the system, when nodes and hard disks are newly added or deleted, the super-fusion structure can realize optimization, automatic redistribution and equalization of data in the cluster; the whole data migration and re-equalization process does not influence the access of the application to the data; in the process of redistributing and balancing all data, the system can ensure that only as little data as possible is required to be redistributed, and all data in the system is not required to be adjusted and migrated, so that the stability and performance of the system are improved. Specifically, as shown in fig. 1, the distributed storage method based on the super fusion structure of the present embodiment specifically includes the following steps:
step S100, data to be stored is obtained, temporary storage is carried out on the data to be stored, and log statistical information is generated according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.
In this embodiment, as shown in fig. 2, the PC side first uploads the data to be stored, and the data is received and temporarily stored by the super-fusion integrated machine. The PC end in the embodiment is connected with a plurality of super-fusion all-in-one machines through a protocol channel, the protocol channel adopts a TCP/IP protocol to realize data transmission, and the super-fusion all-in-one machines temporarily store the data to be stored after acquiring the data to be stored uploaded by the PC end through the TCP/IP protocol. And then, the super fusion all-in-one machine generates log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored.
Specifically, the super fusion all-in-one machine of the embodiment obtains a file name, a keyword, a file size and a file type in the data to be stored, and then generates the log statistical information according to the file name, the keyword, the file size and the file type.
Step 200, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed.
After the log statistical information is obtained, the super-fusion all-in-one machine of the embodiment can search in a preset unified resource pool according to the log statistical information to determine whether files which are the same as or similar to the log statistical information exist in the preset unified resource pool. Specifically, the super-fusion all-in-one machine of the embodiment may search in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determine candidate files in the uniform resource, which are matched with the file name, the keyword, the file size and the file type respectively. And if the candidate files have the files with the same file names, keywords, file sizes and file types, determining that the uniform resource pool has the files with the same log statistical information. And if the candidate file does not have the file with the same file name, keyword, file size and file type, determining that the unified resource pool does not have the file with the same log statistical information. Or, the super-fusion integrated machine of this embodiment may further perform similarity analysis on the file name, the keyword, the file size, and the file type and the existing file in the uniform resource pool sequentially. And if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool. And if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing files and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool. And if the unified resource pool has the files which are the same as or similar to the log statistical information, prompting selection items, wherein the selection items comprise: and replacing the similar file, saving the similar file as a new file or not saving the file, then receiving an input instruction by the super fusion all-in-one machine, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item. Specifically, the super fusion all-in-one machine of the embodiment can receive the instruction, and replace the similar file uploaded by the instruction with the similar file in the uniform resource pool through the PC end or save the similar file as a new file or not save the similar file. And when the unified resource pool does not have the files which are the same as or similar to the log statistical information, judging the data to be stored as new files, and integrating and marking the data to be stored to obtain an integrated marked file. The embodiment integrates and marks the data to be stored to distinguish the existing files, avoid confusion with the existing files, and facilitate better storage of the data to be stored.
And step S300, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
In this embodiment, each super-fusion all-in-one machine is connected with the same resource pool through a commercial server, and after the integration mark file is obtained, the integration mark file can be split through the commercial server to obtain a split file. And then, the type information of the split file can be acquired, a target storage disk matched with the type information is selected from the uniform resource pool, and the split file is stored in the target storage disk. That is, in this embodiment, when the integrated markup file is stored, the integrated markup file is split first, and then stored according to the type information, so that data management is facilitated.
In one implementation manner, the step S300 specifically includes the following steps:
step S301, determining the difference of the integration mark file through a computing node in the commercial server, and determining the same position of the integration mark file through a fusion node in the commercial server;
And step S302, splitting the integration mark file based on the same place and the different place to obtain the split file.
In a specific implementation, the commercial server in this embodiment includes a computing node and a fusion node, where the computing node is configured to determine a difference of the integration markup file, and the fusion node is configured to determine the same place of the integration markup file. When determining the different places and the same place, the embodiment can split the integrated mark file based on the same place and the different place to obtain the split file. In other words, in this embodiment, the same part of the integrated tag file is split into one file, and the different parts of the integrated tag file are split into one file. In this embodiment, a resource pool is formed by a plurality of storage disks, and each storage disk is used to store different types of data files. Therefore, after the split file is obtained, the embodiment can further obtain the type information of the split file, and then store the split file into the corresponding storage disk according to the type information, thereby realizing distributed storage.
Specifically, the embodiment may determine the type information of the split file based on the file type in the log statistics information. Since the log statistics information is obtained based on the file name, the keyword, the file size and the file type in the data to be stored, the log statistics information includes the file type. The split file is obtained by splitting an integrated mark file formed by integrating and marking the data to be stored, so that after the file type is determined according to the log statistical information, the type information of the split file can be determined. Then, according to the type information, the embodiment finds out the target storage disk with the same storage type as the type information from the uniform resource pool; and finally, storing the split file into the target storage disk, so that different types of information can be distributed and orderly stored into the corresponding storage disk.
In one implementation manner, after the split file is stored in the target storage disk, the embodiment can encrypt data in each storage disk in the uniform resource pool, and the data is integrated with the identity information during encryption. Only after passing the identity verification, the PC end can call the data in the uniform resource pool, thereby ensuring the security of the data.
The super fusion all-in-one machine in the embodiment adopts a design concept of distributed and shared nothing, and the data is stored on all nodes in the cluster in a distributed mode through a distributed algorithm, so that a data redundancy mode of 2/3 copies of the cross-node can be possessed, and the data reliability is greatly improved; the super fusion architecture has no master-slave node arrangement, each computing/data node has the capability of bearing the function of the other computing/data node, and the nodes mutually cooperate and communicate through an internal efficient distributed protocol. The super fusion all-in-one machine deploys the computing virtualization and the distributed storage in the same server hardware, stores data on a local physical server aiming at applications with high I/O delay requirements such as virtualization, databases and the like, reduces network overhead caused by traditional external shared storage (SAN/NAS), allows a user to set the service level of computing and storage resources according to own needs, allows the distribution of actual resources to be automatically completed by a management platform, and allows management to be easy and simple.
To sum up, in this embodiment, first, data to be stored is obtained, the data to be stored is temporarily stored, and log statistics information is generated according to the data to be stored, where the log statistics information is used to reflect attribute information in the data to be stored. And then, determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed. And finally, splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The embodiment can automatically realize the distributed storage of data, realize the automatic allocation of resources, and the super fusion structure has no master-slave node arrangement, each computing/data node has the capability of bearing the function of another computing/data node, and the nodes are mutually cooperated through an internal efficient distributed protocol to realize efficient communication.
Exemplary System
Based on the above embodiment, the present invention further provides a distributed storage system based on a super fusion structure, where the system includes: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server. Wherein, as shown in fig. 3, the super fusion all-in-one machine includes: the system comprises a log statistical information acquisition module 10, an integrated mark file acquisition module 20 and a file splitting and storing module 30. Specifically, the log statistics information obtaining module 10 in this embodiment is configured to obtain data to be stored, temporarily store the data to be stored, and generate log statistics information according to the data to be stored, where the log statistics information is used to reflect attribute information in the data to be stored. The integrated tag file obtaining module 20 is configured to determine, according to the log statistics information, whether a file identical or similar to the log statistics information exists in a preset unified resource pool, and integrate and tag the data to be stored to obtain an integrated tag file when it is determined that the file identical or similar to the log statistics information does not exist in the unified resource pool. The file splitting and storing module 30 is configured to split the integration markup file through a commercial server to obtain a split file, obtain type information of the split file, select a target storage disk matched with the type information from the uniform resource pool, and store the split file in the target storage disk.
In one implementation, the log statistics information acquisition module 10 includes:
the information acquisition unit is used for acquiring file names, keywords, file sizes and file types in the data to be stored;
and the information generation unit is used for generating the log statistical information according to the file name, the key word, the file size and the file type.
In one implementation, the integrated markup file acquisition module 20 includes:
the candidate matching unit is used for searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource respectively;
the same judging unit is used for determining that the files with the same log statistical information exist in the uniform resource pool if the files with the same file name, keyword, file size and file type exist in the candidate files;
and determining different units, wherein the different units are used for determining that the file which is the same as the log statistical information does not exist in the uniform resource pool if the file which is the same as the file name, the keyword, the file size and the file type does not exist in the candidate file.
In one implementation, the integrated markup file acquisition module 20 includes:
the similarity analysis unit is used for sequentially carrying out similarity analysis on the file name, the key word, the file size and the file type and the existing files in the uniform resource pool;
the similarity judging unit is used for determining that the files similar to the log statistical information exist in the uniform resource pool if the similarity between the file name, the keyword, the file size and the file type in the existing files exceeds a threshold value;
and the dissimilar judging unit is used for determining that the file similar to the log statistical information does not exist in the uniform resource pool if the similarity between the file name, the keyword, the file size and the file type does not exist in the existing file and exceeds a threshold value.
In one implementation, the system further comprises:
the selection prompting module is configured to prompt a selection item if a file which is the same as or similar to the log statistical information exists in the uniform resource pool, where the selection item includes: replacing similar files, saving the similar files as new files or not saving the similar files;
and the selection operation module is used for receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item.
In one implementation, the file splitting and storing module 30 further includes:
a file analysis unit, configured to determine, by using a computing node in the commercial server, a difference of the integrated markup file, and determine, by using a fusion node in the commercial server, the same place of the integrated markup file;
and the file splitting unit is used for splitting the integrated mark file based on the same place and the different place to obtain the split file.
In one implementation, the file splitting and storing module 30 further includes:
a type determining unit configured to determine the type information of the split file based on the file type in the log statistical information;
the type analysis unit is used for finding out the target storage disk with the same storage type as the type information from the uniform resource pool according to the type information;
and the file storage unit is used for storing the split file into the target storage disk.
The working principle of each module in the distributed storage system based on the super fusion structure in this embodiment is the same as the principle of each step in the above method embodiment, and will not be described herein.
Based on the above embodiment, the present invention further provides a super-fusion integrated machine, and a schematic block diagram thereof may be shown in fig. 4. The super fusion all-in-one machine comprises a processor and a memory which are connected through a system bus, wherein the processor and the memory are arranged in a host. The processor of the super fusion all-in-one machine is used for providing computing and control capabilities. The memory of the super fusion all-in-one machine comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the super fusion integrated machine is used for communicating with an external terminal through network communication connection. The computer program, when executed by a processor, implements a distributed storage method based on a super fusion structure.
It will be appreciated by those skilled in the art that the functional block diagram shown in FIG. 4 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the super fusion machine to which the present inventive arrangements are applied, and that a particular super fusion machine may include more or fewer components than those shown, or may combine certain components, or may have a different arrangement of components.
In one embodiment, a super-fusion all-in-one machine is provided, where the super-fusion all-in-one machine includes a memory, a processor, and a distributed storage method program based on a super-fusion structure stored in the memory and capable of running on the processor, where when the processor executes the distributed storage method program based on the super-fusion structure, the following operation instructions are implemented:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed;
splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In summary, the invention discloses a distributed storage method, a system and a storage medium based on a super fusion structure, wherein the method comprises the following steps: acquiring data to be stored, and generating log statistical information according to the data to be stored; according to the log statistical information, determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool, and integrating and marking data to be stored when confirming that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool, so as to obtain an integrated mark file; splitting the integrated mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk. The invention can automatically realize the distributed storage of data, realize the automatic allocation of resources and realize high-efficiency communication.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (4)

1. A distributed storage method based on a super fusion structure, the method comprising:
acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the file which is the same as or similar to the log statistical information does not exist in the uniform resource pool is confirmed; splitting the integration mark file through a commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk;
the obtaining the data to be stored, and temporarily storing the data to be stored includes:
the PC end is connected with a plurality of super-fusion integrated machines through a protocol channel, and the protocol channel adopts a TCP/IP protocol; after the super-fusion all-in-one machine obtains data to be stored uploaded by a PC (personal computer) end through the TCP/IP protocol, temporarily storing the data to be stored, wherein each super-fusion all-in-one machine is connected with a uniform resource pool through a commercial server;
The generating log statistics information according to the data to be stored includes:
acquiring a file name, a keyword, a file size and a file type in the data to be stored;
generating the log statistical information according to the file name, the keyword, the file size and the file type; and determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, wherein the determining comprises the following steps:
searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource pool respectively;
if the candidate files have the files with the same file names, keywords, file sizes and file types, determining that the uniform resource pool has the files with the same log statistical information;
if the candidate file does not have the file with the same name, keyword, file size and file type, determining that the uniform resource pool does not have the file with the same log statistical information; and determining whether a file which is the same as or similar to the log statistical information exists in a preset uniform resource pool according to the log statistical information, wherein the determining comprises the following steps:
Sequentially carrying out similarity analysis on the file name, the keyword, the file size and the file type and the existing files in the uniform resource pool;
if the similarity between the existing file and the file name, the keyword, the file size and the file type exceeds a threshold value, determining that the file similar to the log statistical information exists in the uniform resource pool; if the similarity between the existing file and the file name, the keyword, the file size and the file type does not exist in the existing file and exceeds a threshold value, determining that the file similar to the log statistical information does not exist in the uniform resource pool;
the method further comprises the steps of:
if the unified resource pool has the files which are the same as or similar to the log statistical information, prompting selection items, wherein the selection items comprise: replacing similar files, saving the similar files as new files or not saving the similar files;
receiving an input instruction, determining a selection item corresponding to the instruction, and executing an operation corresponding to the selection item;
splitting the integrated mark file through the commercial server to obtain a split file, wherein the method comprises the following steps:
determining the difference of the integration mark file through a computing node in the commercial server, and determining the same position of the integration mark file through a fusion node in the commercial server;
Splitting the integration mark file based on the same place and the different place to obtain the split file;
the obtaining the type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk comprises the following steps: determining the type information of the split file based on the file type in the log statistics; according to the type information, the target storage disk with the same storage type as the type information is found out from the uniform resource pool;
storing the split file into the target storage disk;
the method further comprises the steps of:
after the split file is stored in the target storage disk, the data in each storage disk in the uniform resource pool is encrypted, and identity information is fused during encryption.
2. A distributed storage system based on a super-fusion architecture, the system comprising: the system comprises a super-fusion all-in-one machine, a commercial server connected with the super-fusion all-in-one machine and a uniform resource pool connected with the commercial server; wherein, super fusion all-in-one includes:
The system comprises a log statistical information acquisition module, a storage module and a storage module, wherein the log statistical information acquisition module is used for acquiring data to be stored, temporarily storing the data to be stored, and generating log statistical information according to the data to be stored, wherein the log statistical information is used for reflecting attribute information in the data to be stored;
the integrated mark file acquisition module is used for determining whether files which are the same as or similar to the log statistical information exist in a preset uniform resource pool according to the log statistical information, and integrating and marking the data to be stored to obtain an integrated mark file when the fact that the files which are the same as or similar to the log statistical information do not exist in the uniform resource pool is confirmed;
the file splitting and storing module is used for splitting the integration mark file through the commercial server to obtain a split file, acquiring type information of the split file, selecting a target storage disk matched with the type information from the uniform resource pool, and storing the split file into the target storage disk; the log statistical information acquisition module comprises:
the PC end is connected with a plurality of super-fusion integrated machines through a protocol channel, and the protocol channel adopts a TCP/IP protocol; after the super-fusion all-in-one machine obtains data to be stored uploaded by a PC (personal computer) end through the TCP/IP protocol, temporarily storing the data to be stored, wherein each super-fusion all-in-one machine is connected with a uniform resource pool through a commercial server;
The log statistical information acquisition module comprises:
the information acquisition unit is used for acquiring file names, keywords, file sizes and file types in the data to be stored;
the information generation unit is used for generating the log statistical information according to the file name, the key word, the file size and the file type;
the integrated mark file obtaining module comprises:
the candidate matching unit is used for searching in the uniform resource pool according to the file name, the keyword, the file size and the file type in sequence, and determining candidate files matched with the file name, the keyword, the file size and the file type in the uniform resource pool respectively;
the same judging unit is used for determining that the files with the same log statistical information exist in the uniform resource pool if the files with the same file name, keyword, file size and file type exist in the candidate files;
determining different units, wherein if the candidate files do not have files with the same file names, keywords, file sizes and file types, the files with the same log statistical information do not exist in the uniform resource pool;
The integrated mark file obtaining module comprises:
the similarity analysis unit is used for sequentially carrying out similarity analysis on the file name, the key word, the file size and the file type and the existing files in the uniform resource pool;
the similarity judging unit is used for determining that the files similar to the log statistical information exist in the uniform resource pool if the similarity between the file name, the keyword, the file size and the file type in the existing files exceeds a threshold value;
a dissimilarity determining unit, configured to determine that, if the similarity between the existing file and the file name, the keyword, the file size and the file type does not exist in the existing file, exceeds a threshold, that a file similar to the log statistical information does not exist in the uniform resource pool;
the system further comprises:
the selection prompting module is configured to prompt a selection item if a file which is the same as or similar to the log statistical information exists in the uniform resource pool, where the selection item includes: replacing similar files, saving the similar files as new files or not saving the similar files;
the selection operation module is used for receiving an input instruction, determining a selection item corresponding to the instruction and executing an operation corresponding to the selection item;
The file splitting and storing module further comprises:
a file analysis unit, configured to determine, by using a computing node in the commercial server, a difference of the integrated markup file, and determine, by using a fusion node in the commercial server, the same place of the integrated markup file; the file splitting unit is used for splitting the integrated mark file based on the same place and the different place to obtain the split file;
the file splitting and storing module further comprises:
a type determining unit configured to determine the type information of the split file based on the file type in the log statistical information;
the type analysis unit is used for finding out the target storage disk with the same storage type as the type information from the uniform resource pool according to the type information;
the file storage unit is used for storing the split file into the target storage disk;
the system further comprises:
after the split file is stored in the target storage disk, the data in each storage disk in the uniform resource pool is encrypted, and identity information is fused during encryption.
3. The super-fusion all-in-one machine is characterized by comprising a memory, a processor and a super-fusion structure-based distributed storage program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the super-fusion structure-based distributed storage method according to claim 1 when executing the super-fusion structure-based distributed storage program.
4. A computer readable storage medium, wherein the computer readable storage medium has stored thereon a distributed storage program based on a super fusion structure, which when executed by a processor, implements the steps of the super fusion structure based distributed storage method according to claim 1.
CN202210778538.XA 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure Active CN114840488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210778538.XA CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210778538.XA CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Publications (2)

Publication Number Publication Date
CN114840488A CN114840488A (en) 2022-08-02
CN114840488B true CN114840488B (en) 2023-05-02

Family

ID=82574251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210778538.XA Active CN114840488B (en) 2022-07-04 2022-07-04 Distributed storage method, system and storage medium based on super fusion structure

Country Status (1)

Country Link
CN (1) CN114840488B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555874B (en) * 2024-01-11 2024-03-29 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238240A (en) * 2022-02-14 2022-03-25 柏科数据技术(深圳)股份有限公司 Distributed multi-cluster data storage method and device and storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0104227D0 (en) * 2001-02-21 2001-04-11 Ibm Information component based data storage and management
US10785294B1 (en) * 2015-07-30 2020-09-22 EMC IP Holding Company LLC Methods, systems, and computer readable mediums for managing fault tolerance of hardware storage nodes
US10642516B2 (en) * 2015-12-30 2020-05-05 Seagate Technology Llc External hard drive device with cloud drive support
CN105912587A (en) * 2016-03-31 2016-08-31 乐视控股(北京)有限公司 Data acquisition method and system
US10540212B2 (en) * 2016-08-09 2020-01-21 International Business Machines Corporation Data-locality-aware task scheduling on hyper-converged computing infrastructures
CN110019048A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 Document handling method, device, system and server based on MongoDB
CN107807796B (en) * 2017-11-17 2021-03-05 北京联想超融合科技有限公司 Data layering method, terminal and system based on super-fusion storage system
CN109558404B (en) * 2018-10-19 2023-12-01 中国平安人寿保险股份有限公司 Data storage method, device, computer equipment and storage medium
CN109542861B (en) * 2018-11-08 2023-06-09 浪潮软件集团有限公司 File management method, device and system
CN109960587A (en) * 2019-02-27 2019-07-02 厦门市世纪网通网络服务有限公司 The storage resource distribution method and device of super fusion cloud computing system
CN110209633A (en) * 2019-06-06 2019-09-06 深圳龙图腾创新设计有限公司 A kind of document handling method, system, computer equipment and storage medium
CN111488198B (en) * 2020-04-16 2023-05-23 湖南麒麟信安科技股份有限公司 Virtual machine scheduling method, system and medium in super fusion environment
US11886720B2 (en) * 2020-07-15 2024-01-30 EMC IP Holding Company LLC Determining storage system configuration recommendations based on vertical sectors and size parameters using machine learning techniques
CN113590033B (en) * 2021-06-30 2023-11-03 郑州云海信息技术有限公司 Information synchronization method and device of super fusion system
CN113448938A (en) * 2021-07-20 2021-09-28 恒安嘉新(北京)科技股份公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238240A (en) * 2022-02-14 2022-03-25 柏科数据技术(深圳)股份有限公司 Distributed multi-cluster data storage method and device and storage medium

Also Published As

Publication number Publication date
CN114840488A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN107391758B (en) Database switching method, device and equipment
US7685459B1 (en) Parallel backup
US20070239747A1 (en) Methods, systems, and computer program products for providing read ahead and caching in an information lifecycle management system
CN110147407B (en) Data processing method and device and database management server
KR102031588B1 (en) Method and system for implementing index when saving file
EP3432157A1 (en) Data table joining mode processing method and apparatus
US10515228B2 (en) Commit and rollback of data streams provided by partially trusted entities
US10708379B1 (en) Dynamic proxy for databases
CN111475483A (en) Database migration method and device and computing equipment
CN113760847A (en) Log data processing method, device, equipment and storage medium
CN114840488B (en) Distributed storage method, system and storage medium based on super fusion structure
CN111324665A (en) Log playback method and device
US11726743B2 (en) Merging multiple sorted lists in a distributed computing system
US20180260463A1 (en) Computer system and method of assigning processing
CN110597459B (en) Storage method, main node, auxiliary node and system comprising main node and auxiliary node
CN109388651B (en) Data processing method and device
US9684668B1 (en) Systems and methods for performing lookups on distributed deduplicated data systems
CN114721594A (en) Distributed storage method, device, equipment and machine readable storage medium
US10545667B1 (en) Dynamic data partitioning for stateless request routing
US10592530B2 (en) System and method for managing transactions for multiple data store nodes without a central log
CN107943615B (en) Data processing method and system based on distributed cluster
CN111930684A (en) Small file processing method, device and equipment based on HDFS (Hadoop distributed File System) and storage medium
US20200081869A1 (en) File storage method and storage apparatus
CN109542860B (en) Service data management method based on HDFS and terminal equipment
US9537941B2 (en) Method and system for verifying quality of server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant