CN116701380B - Method and device for clearing redundant data based on Openstack - Google Patents

Method and device for clearing redundant data based on Openstack Download PDF

Info

Publication number
CN116701380B
CN116701380B CN202310960475.4A CN202310960475A CN116701380B CN 116701380 B CN116701380 B CN 116701380B CN 202310960475 A CN202310960475 A CN 202310960475A CN 116701380 B CN116701380 B CN 116701380B
Authority
CN
China
Prior art keywords
data
target data
target
stored
openstack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310960475.4A
Other languages
Chinese (zh)
Other versions
CN116701380A (en
Inventor
田晋丞
李飞
刘无敌
刘琼
姜海昆
范宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changyang Technology Beijing Co ltd
Original Assignee
Changyang Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changyang Technology Beijing Co ltd filed Critical Changyang Technology Beijing Co ltd
Priority to CN202310960475.4A priority Critical patent/CN116701380B/en
Publication of CN116701380A publication Critical patent/CN116701380A/en
Application granted granted Critical
Publication of CN116701380B publication Critical patent/CN116701380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of cloud computing, in particular to a method and a device for clearing redundant data based on Openstack, comprising the following steps: monitoring data uploaded by the front end of the Openstack, and calculating a unique code of the target data when the completion of uploading the target data is monitored; determining whether stored data which is the same as the unique code of the target data exists in the cloud database; if so, determining whether the target data can be deleted based on the cold and hot state of the stored data; if the data can be deleted, the data pointer of the Openstack front end of the target user aiming at the target data is pointed to the stored data; if the target data cannot be deleted, storing the target data into a cloud database, and pointing a data pointer of the Openstack front end of the target user aiming at the target data to the target data stored in the cloud database. According to the scheme, the same data uploaded by different users can be cleaned, so that the hardware storage space is utilized to the maximum extent.

Description

Method and device for clearing redundant data based on Openstack
Technical Field
The embodiment of the invention relates to the technical field of cloud computing, in particular to a method and a device for clearing redundant data based on Openstack.
Background
At present, more and more users realize data storage by continuously uploading own data to the cloud. Each user has its own privately-owned storage space at the cloud, the self data is also uploaded in the own privately-owned storage space. And different users have the condition of uploading the same data, namely the privately-owned storage spaces of the different users store the same data. In the face of increasing data volume and increasing demands on hardware storage resources, how to clean redundant data so as to maximize the utilization of hardware storage space becomes a current urgent need for disposal and solving.
Disclosure of Invention
The embodiment of the invention provides a method and a device for clearing redundant data based on Openstack, which can clear the same data uploaded by different users so as to maximize the utilization of hardware storage space.
In a first aspect, an embodiment of the present invention provides a method for cleaning redundant data based on Openstack, including:
monitoring data uploaded by a target user at the front end of an Openstack, and calculating a unique code of target data when the completion of the uploading of the target data is monitored;
determining whether stored data which is the same as the unique code of the target data exists in a cloud database;
if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; and if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database.
In a second aspect, an embodiment of the present invention further provides an Openstack-based device for cleaning redundant data, including:
the computing unit is used for monitoring the data uploaded by the target user at the front end of the Openstack, and computing the unique code of the target data when the completion of the uploading of the target data is monitored;
the data processing unit is used for determining whether stored data which is the same as the unique code of the target data exists in the cloud database; if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; and if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and when the processor executes the computer program, the method described in any embodiment of the present specification is implemented.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a method according to any of the embodiments of the present specification.
The embodiment of the invention provides a method and a device for clearing redundant data based on Openstack, which are characterized in that a unique code is calculated on target data uploaded by a target user at the front end of Openstack so as to judge whether identical storage data is stored in a cloud database or not by utilizing the unique code, then whether the target data can be deleted or not is judged according to the cold and hot states of the identical storage data, if the target data can be deleted, the storage data and the target data are identical, if the target data and the storage data are simultaneously stored in the cloud database, redundancy of the identical data exists, and therefore the target data can be deleted as redundant data, and only a data pointer of the front end of Openstack of the target user aiming at the target data is required to be pointed to the storage data in the cloud database. Therefore, the scheme can clear the same data uploaded by different users so as to maximize the utilization of the hardware storage space.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for cleaning redundant data based on Openstack according to an embodiment of the present invention;
FIG. 2 is a hardware architecture diagram of an electronic device according to an embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus for cleaning redundant data based on Openstack according to an embodiment of the present invention;
fig. 4 is a block diagram of another device for clearing redundant data based on Openstack according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.
As described above, each user has its own proprietary storage space in the cloud, and its own data is also uploaded into its own proprietary storage space. That is, data uploaded to the cloud is isolated between different users. However, when different users upload the same data, data redundancy is caused, and hardware storage resources of the cloud are additionally occupied.
In view of the above problems, the present invention is directed to: based on the Openstack cloud service, when a user uploads data at the front end of the Openstack, the uploaded target data is matched with data stored in the cloud database to perform redundancy judgment, if the uploaded target data is matched with the same data, the target data is indicated to be redundant data, the target data can be deleted from the cloud database, and the data pointer of the target data at the front end of the Openstack of the user points to the same data stored in the cloud database, so that the data cooperativity of the cloud service can be met.
Specific implementations of the above concepts are described below.
Referring to fig. 1, an embodiment of the present invention provides a method for cleaning redundant data based on Openstack, which includes:
step 100, monitoring data uploaded by a target user at the front end of an Openstack, and calculating a unique code of the target data when the completion of the uploading of the target data is monitored;
step 102, determining whether stored data which is the same as the unique code of the target data exists in a cloud database; if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; and if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database.
In the embodiment of the invention, the unique code is calculated on the target data uploaded by the target user at the front end of the Openstack so as to judge whether the same storage data is stored in the database by utilizing the unique code, then whether the target data can be deleted is judged according to the cold and hot states of the same storage data, if the target data can be deleted, the storage data and the target data are completely the same data, and if the target data and the storage data are simultaneously stored in the database, redundancy of the same data exists, therefore, the target data can be deleted as redundant data, and only the data pointer of the front end of the Openstack of the target user aiming at the target data is required to point to the storage data in the database. Therefore, the scheme can clear the same data uploaded by different users so as to maximize the utilization of the hardware storage space.
The manner in which the individual steps shown in fig. 1 are performed is described below.
First, for step 100, data uploaded by a target user at the Openstack front end is monitored, and when it is monitored that the uploading of the target data is completed, a unique code of the target data is calculated.
Openstack provides scalable elasticity's cloud computing service for private cloud and public cloud, and the user can upload data to self privately used storage space through Openstack front end, and the privately used storage space of different users is the hardware resources that is provided for it by the high in the clouds, therefore, the privately used storage space of different users appears as an integral storage space in the high in the clouds, can carry out redundant data's clearance to the data that different users uploaded.
Specifically, the cloud server may monitor data uploaded by the target user at the Openstack front end, and when it is monitored that data uploading is completed, match the uploaded target data with stored data in the cloud database, so as to perform redundancy judgment on the target data.
In the embodiment of the invention, whether the storage data same as the target data exists in the cloud database can be determined by calculating the data unique code.
Because of the problem of data missing and messy codes in the data uploading process, in one embodiment of the invention, after the uploading of the target data is completed, the target data can be scanned by using a CRC algorithm to check the integrity of the data. After the integrity check passes, a corresponding unique code is calculated for the target data. Wherein the unique code may be a hash value.
The cloud database of Openstack can realize that the stored data types are various, and for stream data, the redundancy determination can be omitted because the data space occupation amount is small.
Since the type of the target data may be block data or a compressed package file, and the block data is composed of several groups of data blocks sequentially arranged together, the compressed package file may be formed by compressing at least two files together, that is, both the block data and the compressed package file may be further split. Based on this, in one embodiment of the present invention, before calculating the unique code of the target data, it further includes: and determining whether the target data can be split or not based on the type of the target data, and if so, splitting the target data into a plurality of sub-target data.
Then, when calculating the unique code of the target data, specifically including: each piece of sub-target data is used as target data respectively to calculate the unique code of each piece of target data. That is, each sub-target data may be regarded as one target data to make redundancy determination and redundancy cleaning for each sub-target data.
Because after one target data is split into a plurality of sub-target data, the plurality of sub-target data have a sequential relationship in the target data, in order to ensure that the Openstack front end of the target user is normal for displaying the target data and ensure the accuracy of the target data when the target user operates on the target data (such as downloading the target data), in the embodiment of the invention, a plurality of data pointers corresponding to the plurality of sub-target data one by one need to be determined to respectively point to the sequential relationship of the data in the cloud database in the target data, so that the data in the cloud database pointed by the plurality of data pointers respectively are integrated in the Openstack front end based on the sequential relationship.
Further, in determining whether the target data is detachable based on the type of the target data, it may include: when the type of the target data is block data, determining that the target data can be split; when the type of the target data is a compressed package file and the compressed package file is formed by compressing at least two files, determining that the target data can be split; otherwise, determining that the target data cannot be split.
Then, splitting the target data into a plurality of sub-target data may include:
when the type of the target data is block data, splitting a plurality of groups of data blocks which are sequentially and continuously arranged in the block data, and taking each group of data blocks as split sub-target data;
and when the type of the target data is a compressed package file, respectively taking at least two files obtained after the compressed package file is decompressed as sub-target data after splitting.
Taking the type of the target data as block data as an example, and the target data is block data formed by sequentially forming a data block 1, a data block 2, a data block 3 and a data block 4, splitting the target data to obtain the data block 1, the data block 2, the data block 3 and the data block 4.
Taking the type of the target data as a compressed package file as an example, if the target data is a compressed package file formed by compressing the target data aiming at the file 1 and the file 2, splitting the target data to obtain the file 1 and the file 2.
It can be seen that if one or more groups of data blocks are different between two blocks of data, it is indicated that the two blocks of data are different, and the two blocks of data need to be stored as a single whole; accordingly, if there are one or more compressed files that are different between two compressed package files, it is indicated that the two compressed package files are different, and the two compressed package files need to be stored as a single entity. Thus, the cloud database needs to provide a storage space for two data. In the embodiment of the invention, the block data and the compressed package file are split to judge and clear the redundancy of the split sub-target data, so that the same data block or compressed file is stored for two block data or two compressed package files, thereby further improving the utilization rate of hardware storage resources.
Then, for step 102, determining whether there is stored data in the cloud database that is the same as the unique code of the target data; if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; and if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database.
The cloud database stores the data uploaded by each user, and redundancy judgment needs to be performed on the target data after the target user finishes uploading the target data, namely if the cloud database stores the stored data which is the same as the unique code of the target data, the cloud database indicates that the target data is redundant at the moment. However, the stored data in the cloud database, which is the same as the unique code of the target data, is uploaded by other users with a high probability, and the stored data may be updated, and if the stored data is updated, the unique code of the stored data will also change, so that it is required to determine the current cold and hot state of the stored data, where the cold and hot state includes cold data and hot data, the cold data means that the stored data is not currently operated, and the hot data means that the stored data is currently operated. The operations may include accessing, downloading, updating, deleting, etc.
Specifically, when determining whether the target data can be deleted based on the cold-hot state of the stored data, it may include:
when the cold and hot state of the stored data is cold data, determining that the target data can be deleted;
when the cold and hot state of the stored data is hot data, waiting for the stored data to be cooled, determining whether the unique codes of the stored data before and after cooling change, and if not, determining that the target data can be deleted; if the change occurs, it is determined that the target data cannot be deleted.
Since the stored data may or may not change the unique code after being cooled from the state of the hot data, if the change occurs, it indicates that the target data is not redundant data, and the target data needs to be stored.
Further, after it is determined that the unique code of the stored data before and after cooling has changed (hereinafter referred to as the changed unique code), the following two cases are encountered:
in the first case, cold data with the same unique code as the changed unique code exists in the cloud database;
and in the second case, no cold data with the same unique code as the changed unique code exists in the cloud database.
When the changed unique codes respectively correspond to the two cases, the processing modes are different. Specifically:
and when the changed unique code corresponds to the situation, determining an updating user for operating the stored data, and pointing a data pointer of the front end of the updating user Openstack for the stored data to cold data which is the same as the changed unique code in a cloud database.
And when the changed unique code corresponds to the second condition, determining an updating user for operating the storage data, and pointing a data pointer of the front end of the updating user Openstack to the cooled storage data.
No matter what the above-mentioned case is corresponded to by the changed unique code, as long as the stored data is changed after cooling, it is also necessary to: determining whether a user of the stored data before cooling pointed to by the data pointer corresponds to other users in addition to the updated user; if yes, the data pointers of the other user Openstack front ends aiming at the stored data before cooling are pointed to the target data.
Because the unique code of the storage data changes after cooling, the target data cannot be deleted, the target data needs to be stored in the cloud database, and other users pointing to the storage data before cooling do not operate the storage data, the data pointer needs to be redirected to the target data, so that the normal display of uploaded data by the front end of each user Openstack is ensured.
Based on the above step 100, if the direction of the data pointers of the split sub-target data is changed, it is necessary to redetermine the sequential relationship of the data in the cloud database, in which the data pointers of the sub-target data are respectively directed to the target data, so as to integrate the data in the cloud database, in which the data pointers are respectively directed, at the Openstack front end based on the sequential relationship.
As shown in fig. 2 and fig. 3, the embodiment of the invention provides a device for cleaning redundant data based on Openstack. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. In terms of hardware, as shown in fig. 2, a hardware architecture diagram of an electronic device where an Openstack-based redundant data cleaning device provided in an embodiment of the present invention is located, where the electronic device where the embodiment is located may include other hardware, such as a forwarding chip responsible for processing a packet, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 2. Taking a software implementation as an example, as shown in fig. 3, the device in a logic sense is formed by reading a corresponding computer program in a nonvolatile memory into a memory by a CPU of an electronic device where the device is located and running the computer program. The device for clearing redundant data based on Openstack provided in this embodiment includes:
the computing unit 300 is configured to monitor data uploaded by a target user at the front end of Openstack, and calculate a unique code of the target data when it is monitored that the uploading of the target data is completed;
the data processing unit 302 is configured to determine whether stored data that is the same as the unique code of the target data exists in the cloud database; if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; and if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database.
In one embodiment of the present invention, the type of the target data is a file, block data, or compressed package file.
In one embodiment of the present invention, referring to fig. 4, the apparatus may further include:
a splitting unit 304, configured to determine whether the target data can be split based on the type of the target data, and if yes, split the target data into a plurality of sub-target data;
the calculating unit, when calculating the unique code of the target data, specifically includes: taking each piece of sub-target data as target data respectively to calculate a unique code of each piece of target data;
the data processing unit is further configured to: determining a sequence relation of a plurality of data pointers corresponding to the plurality of sub-target data one by one in the target data, wherein the sequence relation of the data in the cloud database is pointed at by the data pointers, and integrating the data in the cloud database pointed at by the data pointers at the front end of an Openstack based on the sequence relation.
In one embodiment of the present invention, the splitting unit determines whether the target data is detachable based on the type of the target data, and specifically includes: when the type of the target data is block data, determining that the target data can be split; when the type of the target data is a compressed package file and the compressed package file is formed by compressing at least two files, determining that the target data can be split; otherwise, determining that the target data cannot be split.
In one embodiment of the present invention, the splitting unit, when splitting the target data into a plurality of sub-target data, specifically includes:
when the type of the target data is block data, splitting a plurality of groups of data blocks which are sequentially and continuously arranged in the block data, and taking each group of data blocks as split sub-target data;
and when the type of the target data is a compressed package file, respectively taking at least two files obtained after the compressed package file is decompressed as sub-target data after splitting.
In one embodiment of the present invention, when determining whether the target data can be deleted based on the cold and hot state of the stored data, the data processing unit specifically includes:
when the cold and hot state of the stored data is cold data, determining that the target data can be deleted;
when the cold and hot state of the stored data is hot data, waiting for the stored data to be cooled, determining whether the unique codes of the stored data before and after cooling change, and if not, determining that the target data can be deleted; if the change occurs, it is determined that the target data cannot be deleted.
In one embodiment of the present invention, the data processing unit is further configured to determine an update user who operates on the stored data after determining that the unique code of the stored data before and after cooling has changed; determining whether cold data with the same unique code as the changed unique code exists in the cloud database aiming at the unique code after the change of the cooled stored data; if so, the data pointer of the updated user Openstack front end aiming at the stored data is pointed to cold data which is the same as the changed unique code in a cloud database; if not, the data pointer of the updated user Openstack front end aiming at the stored data points to the cooled stored data;
the data processing unit is also used for determining whether a user of the stored data before the data pointer points to the cooling corresponds to other users besides the updating user; if yes, the data pointers of the other user Openstack front ends aiming at the stored data before cooling are pointed to the target data.
It will be appreciated that the structure illustrated in the embodiments of the present invention does not constitute a specific limitation on an Openstack-based apparatus for cleaning redundant data. In other embodiments of the invention, an Openstack based apparatus for cleaning redundant data may include more or fewer components than shown, or may combine certain components, or may split certain components, or may have a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The content of information interaction and execution process between the modules in the device is based on the same conception as the embodiment of the method of the present invention, and specific content can be referred to the description in the embodiment of the method of the present invention, which is not repeated here.
The embodiment of the invention also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the method for clearing redundant data based on Openstack in any embodiment of the invention is realized.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to execute the method for cleaning redundant data based on Openstack in any embodiment of the invention.
Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.
In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.
Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.
Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.
Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion module connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion module is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.
It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one …" does not exclude the presence of additional identical elements in a process, method, article or apparatus that comprises the element.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media in which program code may be stored, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. The method for clearing redundant data based on Openstack is characterized by being executed by a cloud server and used for clearing redundant data of data uploaded by different users, and comprises the following steps:
monitoring data uploaded by a target user at the front end of an Openstack, and calculating a unique code of target data when the completion of the uploading of the target data is monitored;
determining whether stored data which is the same as the unique code of the target data exists in a cloud database;
if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database;
the determining whether the target data can be deleted based on the cold-hot state of the stored data includes: when the cold and hot state of the stored data is cold data, determining that the target data can be deleted; when the cold and hot state of the stored data is hot data, waiting for the stored data to be cooled, determining whether the unique codes of the stored data before and after cooling change, and if not, determining that the target data can be deleted; if the target data is changed, determining that the target data cannot be deleted; the cold data means that the stored data is not currently operated, and the hot data means that the stored data is currently operated; the operations include at least one of accessing, downloading, updating, and deleting.
2. The method of claim 1, wherein the type of the target data is a file, block data, or compressed package file.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
before calculating the unique code of the target data, the method further comprises: determining whether the target data can be split or not based on the type of the target data, if so, splitting the target data into a plurality of sub-target data;
the calculating the unique code of the target data includes: taking each piece of sub-target data as target data respectively to calculate a unique code of each piece of target data;
further comprises: determining a sequence relation of a plurality of data pointers corresponding to the plurality of sub-target data one by one in the target data, wherein the sequence relation of the data in the cloud database is pointed at by the data pointers, and integrating the data in the cloud database pointed at by the data pointers at the front end of an Openstack based on the sequence relation.
4. A method according to claim 3, wherein determining whether the target data is detachable based on the type of the target data comprises:
when the type of the target data is block data, determining that the target data can be split; when the type of the target data is a compressed package file and the compressed package file is formed by compressing at least two files, determining that the target data can be split;
otherwise, determining that the target data cannot be split.
5. The method of claim 4, wherein splitting the target data into a plurality of sub-target data comprises:
when the type of the target data is block data, splitting a plurality of groups of data blocks which are sequentially and continuously arranged in the block data, and taking each group of data blocks as split sub-target data;
and when the type of the target data is a compressed package file, respectively taking at least two files obtained after the compressed package file is decompressed as sub-target data after splitting.
6. The method of claim 1, further comprising, after determining that the unique code of the stored data before and after cooling has changed:
determining an updated user operating on the stored data;
determining whether cold data with the same unique code as the changed unique code exists in the cloud database aiming at the unique code after the change of the cooled stored data;
if so, the data pointer of the updated user Openstack front end aiming at the stored data is pointed to cold data which is the same as the changed unique code in a cloud database;
if not, the data pointer of the updated user Openstack front end aiming at the stored data points to the cooled stored data;
determining whether a user of the stored data before cooling pointed to by the data pointer corresponds to other users in addition to the updated user; if yes, the data pointers of the other user Openstack front ends aiming at the stored data before cooling are pointed to the target data.
7. The utility model provides a device based on Openstack clearance redundant data, its characterized in that is located the high in the clouds server for carry out redundant data's clearance to the data that different users uploaded, the device includes:
the computing unit is used for monitoring the data uploaded by the target user at the front end of the Openstack, and computing the unique code of the target data when the completion of the uploading of the target data is monitored;
the data processing unit is used for determining whether stored data which is the same as the unique code of the target data exists in the cloud database; if so, determining whether the target data can be deleted or not based on the cold and hot state of the stored data; if the target data can be deleted, deleting the target data, and pointing a data pointer of the Openstack front end of the target user for the target data to the storage data; if the target data cannot be deleted, storing the target data into the cloud database, and pointing a data pointer of the Openstack front end of the target user for the target data to the target data stored in the cloud database;
the data processing unit, when determining whether the target data can be deleted based on the cold and hot state of the stored data, specifically includes: when the cold and hot state of the stored data is cold data, determining that the target data can be deleted; when the cold and hot state of the stored data is hot data, waiting for the stored data to be cooled, determining whether the unique codes of the stored data before and after cooling change, and if not, determining that the target data can be deleted; if the target data is changed, determining that the target data cannot be deleted; the cold data means that the stored data is not currently operated, and the hot data means that the stored data is currently operated; the operations include at least one of accessing, downloading, updating, and deleting.
8. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the method of any of claims 1-6 when the computer program is executed.
9. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-6.
CN202310960475.4A 2023-08-02 2023-08-02 Method and device for clearing redundant data based on Openstack Active CN116701380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310960475.4A CN116701380B (en) 2023-08-02 2023-08-02 Method and device for clearing redundant data based on Openstack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310960475.4A CN116701380B (en) 2023-08-02 2023-08-02 Method and device for clearing redundant data based on Openstack

Publications (2)

Publication Number Publication Date
CN116701380A CN116701380A (en) 2023-09-05
CN116701380B true CN116701380B (en) 2023-10-27

Family

ID=87837743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310960475.4A Active CN116701380B (en) 2023-08-02 2023-08-02 Method and device for clearing redundant data based on Openstack

Country Status (1)

Country Link
CN (1) CN116701380B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502536A (en) * 2019-06-26 2019-11-26 中电万维信息技术有限责任公司 Method, apparatus and storage medium based on cache database verification business uniqueness
CN113282540A (en) * 2021-06-04 2021-08-20 深圳大学 Cloud object storage synchronization method and device, computer equipment and storage medium
CN114691617A (en) * 2022-03-29 2022-07-01 深圳市海威达科技有限公司 Intelligent terminal data compression redundancy-prevention interaction method and device and related components
CN114860726A (en) * 2022-04-29 2022-08-05 北京永信至诚科技股份有限公司 Database storage cold-hot separation method, device, equipment and readable storage medium
CN116301652A (en) * 2023-03-31 2023-06-23 众芯汉创(西安)科技有限公司 Wind turbine generator online monitoring historical data cold-hot separation system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10628070B2 (en) * 2018-03-19 2020-04-21 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Selecting and compressing target files to obtain additional free data storage space to perform an operation in a virtual machine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502536A (en) * 2019-06-26 2019-11-26 中电万维信息技术有限责任公司 Method, apparatus and storage medium based on cache database verification business uniqueness
CN113282540A (en) * 2021-06-04 2021-08-20 深圳大学 Cloud object storage synchronization method and device, computer equipment and storage medium
CN114691617A (en) * 2022-03-29 2022-07-01 深圳市海威达科技有限公司 Intelligent terminal data compression redundancy-prevention interaction method and device and related components
CN114860726A (en) * 2022-04-29 2022-08-05 北京永信至诚科技股份有限公司 Database storage cold-hot separation method, device, equipment and readable storage medium
CN116301652A (en) * 2023-03-31 2023-06-23 众芯汉创(西安)科技有限公司 Wind turbine generator online monitoring historical data cold-hot separation system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Secure Client Side Deduplication Scheme in Cloud Storage Environments;Nesrine Kaaniche et al.;《 IEEE Xplore》;全文 *
云存储中数据去重技术的研究;张晓蓉;;西安文理学院学报(自然科学版)(第06期);全文 *
多云存储中的数据分布及混合冗余方法;袁雪梅;《中国优秀硕士学位论文全文数据库 信息科技辑》;第5章 *

Also Published As

Publication number Publication date
CN116701380A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
US10649838B2 (en) Automatic correlation of dynamic system events within computing devices
US10776396B2 (en) Computer implemented method for dynamic sharding
Pelkonen et al. Gorilla: A fast, scalable, in-memory time series database
US8396840B1 (en) System and method for targeted consistency improvement in a distributed storage system
US10783115B2 (en) Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset
US8468134B1 (en) System and method for measuring consistency within a distributed storage system
KR20170054299A (en) Reference block aggregating into a reference set for deduplication in memory management
CN111247518A (en) Database sharding
CN107026881B (en) Method, device and system for processing service data
CN110765076B (en) Data storage method, device, electronic equipment and storage medium
CN104584524A (en) Aggregating data in a mediation system
CN111858520A (en) Method and device for separately storing block link point data
CN109947730B (en) Metadata recovery method, device, distributed file system and readable storage medium
CN114817651B (en) Data storage method, data query method, device and equipment
CN114490060A (en) Memory allocation method and device, computer equipment and computer readable storage medium
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN116701380B (en) Method and device for clearing redundant data based on Openstack
CN112783447A (en) Method, apparatus, device, medium, and article of manufacture for processing snapshots
CN114253936A (en) Capacity reduction method, device, equipment and medium for distributed database
CN110909062A (en) Data processing method and device, electronic equipment and readable storage medium
CN107609038B (en) Data cleaning method and device
CN113590703B (en) ES data importing method and device, electronic equipment and readable storage medium
CN113254269A (en) Method, system, equipment and medium for repairing abnormal event of storage system
US20130218851A1 (en) Storage system, data management device, method and program
JP2010191903A (en) Distributed file system striping class selecting method and distributed file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant