CN114328497A - Redundant data processing method, system, computer equipment and storage medium - Google Patents

Redundant data processing method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN114328497A
CN114328497A CN202210234546.8A CN202210234546A CN114328497A CN 114328497 A CN114328497 A CN 114328497A CN 202210234546 A CN202210234546 A CN 202210234546A CN 114328497 A CN114328497 A CN 114328497A
Authority
CN
China
Prior art keywords
data
storage area
redundant
useful
redundant data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210234546.8A
Other languages
Chinese (zh)
Inventor
张毅博
漆娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongke Intelligent Technology Co ltd
Original Assignee
Shenzhen Zhongke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongke Intelligent Technology Co ltd filed Critical Shenzhen Zhongke Intelligent Technology Co ltd
Priority to CN202210234546.8A priority Critical patent/CN114328497A/en
Publication of CN114328497A publication Critical patent/CN114328497A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the field of data processing, and particularly discloses a redundant data processing method, a redundant data processing system, computer equipment and a storage medium. The embodiment of the invention transfers the redundant data in the first storage area to the second storage area; extracting useful data in the redundant data and transferring the useful data to a third storage area; formatting the second storage area; and monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called. The redundant data processing can be carried out on the storage space partition, useful data in the redundant data are extracted, calling monitoring is carried out on the useful data within set time, and finally useless data in the redundant data and non-calling data in the useful data are all deleted, so that the redundant data are deleted, the storage waste is reduced, some important useful data in the redundant data are saved, and the stored data are prevented from being incomplete.

Description

Redundant data processing method, system, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, system, computer device, and storage medium for processing redundant data.
Background
Data redundancy occurs in database systems, meaning that a field is repeated in multiple tables. Data redundancy can lead to data anomalies and corruption and should generally be avoided by design. Database normalization prevents redundancy and does not waste storage capacity. Proper use of foreign keys can minimize data redundancy and anomalies. However, if efficiency and convenience are taken into consideration, redundant data is sometimes designed regardless of the risk of data corruption.
In the existing redundant data processing process, redundant data is usually directly deleted to reduce the waste of storage, but the redundant data is not useless data, and if the redundant data is simply and directly deleted, some useful data are often deleted to cause data incompleteness, and some important data can be deleted to influence the use of a user.
Disclosure of Invention
Embodiments of the present invention provide a method, a system, a computer device, and a storage medium for processing redundant data, which are intended to solve the problems set forth in the background art.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a redundant data processing method specifically comprises the following steps:
transferring redundant data in the first storage area to a second storage area;
extracting useful data in the redundant data and transferring the useful data to a third storage area;
formatting the second storage area;
and monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called.
As a further limitation of the technical solution of the embodiment of the present invention, the transferring the redundant data in the first storage area to the second storage area specifically includes the following steps:
dividing a data storage space into a first storage area, a second storage area and a third storage area;
setting the first storage area as a data storage library;
extracting redundant data in the data repository;
transferring the redundant data to the second storage area.
As a further limitation of the technical solution of the embodiment of the present invention, the extracting of the redundant data in the data repository specifically includes the following steps:
establishing a data form for each data in the data repository, wherein the data form records a data structure, a source address and an acquisition process of the data;
and judging whether similar data forms exist or not, and setting the data corresponding to the data forms as redundant data and extracting when the similar data forms exist.
As a further limitation of the technical solution of the embodiment of the present invention, the determining whether similar data forms exist, and when similar data forms exist, setting data corresponding to the data forms as redundant data and extracting specifically includes the following steps:
acquiring two data forms with the same source address and the same acquisition process;
and comparing the data structure similarity of the two data forms, and setting the two data corresponding to the two data forms as redundant data and extracting when the data structure similarity is larger.
As a further limitation of the technical solution of the embodiment of the present invention, the comparing the data structure similarity of the two data forms, and when the data structure similarity is large, setting the two data corresponding to the two data forms as redundant data and extracting specifically includes the following steps:
comparing whether the data structure similarity of the two data forms is greater than a preset value;
when the similarity of the data structures is greater than a preset value, setting two data corresponding to the two data forms as redundant data and extracting the redundant data;
and when the similarity of the data structures is not greater than a preset value, judging that the two data corresponding to the two data forms are not redundant data.
As a further limitation of the technical solution of the embodiment of the present invention, the extracting useful data from the redundant data and transferring the useful data to a third storage area specifically includes the following steps:
setting a screening type of useful data;
screening the redundant data according to the screening type;
and setting the screened data as useful data, and transferring the useful data to a third storage area.
As a further limitation of the technical solution of the embodiment of the present invention, the monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data that is not called specifically includes the following steps:
setting monitoring time;
judging whether the useful data in the third storage area is called within monitoring time or not;
if the useful data is not called, deleting the useful data;
and if the useful data is called, transferring the useful data to a data storage library.
Another object of an embodiment of the present invention is to provide a redundant data processing system, which includes a redundant data transfer unit, a useful data extraction unit, a formatting unit, and a call monitoring unit, wherein:
a redundant data transfer unit for transferring the redundant data in the first storage area to the second storage area;
the useful data extraction unit is used for extracting useful data in the redundant data and transferring the useful data to a third storage area;
a formatting unit configured to format the second storage area;
the calling monitoring unit is used for monitoring the calling condition of the useful data in the third storage area within set time and deleting the useful data which are not called;
the redundant data transfer unit specifically includes:
the storage space dividing module is used for dividing the data storage space into a first storage area, a second storage area and a third storage area;
the data storage bank setting module is used for setting the first storage area as a data storage bank;
a redundant data extraction module for extracting redundant data in the data repository;
a redundant data transfer module for transferring the redundant data to the second storage area;
the redundant data extraction module specifically comprises:
the data form establishing submodule is used for establishing a data form for each data in the data repository, and the data structure, the source address and the obtaining process of the data are recorded in the data form;
and the similarity judgment processing submodule is used for judging whether similar data forms exist or not, and setting the data corresponding to the data forms as redundant data and extracting the redundant data when the similar data forms exist.
It is a further object of embodiments of the present invention to provide a computer device, comprising a memory and a processor, the memory having stored therein a computer program, which, when executed by the processor, causes the processor to perform the steps of a redundant data processing method as described above.
It is a further object of embodiments of the present invention to provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of a redundant data processing method as described above.
Compared with the prior art, the invention has the beneficial effects that:
the embodiment of the invention transfers the redundant data in the first storage area to the second storage area; extracting useful data in the redundant data and transferring the useful data to a third storage area; formatting the second storage area; and monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called. The redundant data processing can be carried out on the storage space partition, useful data in the redundant data are extracted, calling monitoring is carried out on the useful data within set time, and finally useless data in the redundant data and non-calling data in the useful data are all deleted, so that the redundant data are deleted, the storage waste is reduced, some important useful data in the redundant data are saved, and the stored data are prevented from being incomplete.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
Fig. 1 is a diagram illustrating a network implementation environment of a method provided by an embodiment of the invention.
Fig. 2 shows a flow chart of a method provided by an embodiment of the invention.
Fig. 3 shows a flow chart of redundant data transfer in the method provided by the embodiment of the present invention.
Fig. 4 shows a flowchart of redundant data extraction in the method provided by the embodiment of the present invention.
Fig. 5 shows a flowchart of setting redundant data in the method provided by the embodiment of the present invention.
Fig. 6 shows a flowchart of data structure similarity comparison in the method provided by the embodiment of the present invention.
Fig. 7 shows a flowchart of useful data transfer in the method provided by the embodiment of the invention.
Fig. 8 shows a flow chart of useful data monitoring in the method provided by the embodiment of the invention.
Fig. 9 shows an application architecture diagram of a system provided by an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," "third," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
It can be understood that, in the prior art, in the process of processing the redundant data, the redundant data is usually deleted directly to reduce the waste of storage, which is easy to cause some important useful data in the redundant data to be deleted, resulting in incomplete stored data and affecting the normal use of users.
In order to solve the above problems, in the embodiments of the present invention, redundant data processing is performed on a storage space partition, useful data in the redundant data is extracted, call monitoring is performed on the useful data within a set time, and finally, useless data in the redundant data and non-call data in the useful data are all deleted, so that some important useful data in the redundant data are saved while the redundant data are deleted, the storage waste is reduced, and the stored data are prevented from being incomplete.
Fig. 1 is a diagram of a network implementation environment of the method according to the embodiment of the present invention.
In the network real-time environment diagram, the storage space of data is divided into a first storage area, a second storage area and a third storage area, and redundant data and useful data are transferred and deleted among the first storage area, the second storage area and the third storage area, so that the extraction efficiency of the useful data in the redundant data can be improved, and the deletion efficiency of useless redundant data can be improved. The data memory is a memory part for storing data information, and is a collection of a plurality of memory cells, which are arranged in order of cell number, and each cell is composed of a plurality of binary bits to represent the value stored in the memory cell.
Fig. 2 shows a flow chart of a method provided by an embodiment of the invention.
Specifically, a redundant data processing method specifically includes the following steps:
s101, transferring the redundant data in the first storage area to a second storage area.
In the embodiment of the invention, various data information is stored in the first storage area, wherein the various data information usually contains redundant data with similar or repeated information, and the redundant data in the various data information in the first storage area is extracted and transferred to the second storage area.
Specifically, fig. 3 shows a flowchart of redundant data transfer in the method provided by the embodiment of the present invention.
In a preferred embodiment provided by the present invention, the transferring the redundant data in the first storage area to the second storage area specifically includes the following steps:
s1011, the data storage space is divided into a first storage area, a second storage area, and a third storage area.
In the embodiment of the invention, the data storage space of the data storage is divided into a first storage area, a second storage area and a third storage area, and the storage spaces of the second storage area and the second storage area are smaller than the storage space of the first storage area.
S1012, setting the first storage area as a data storage library.
In the embodiment of the present invention, the first storage area is set as a data storage library, and the data storage library is used for storing various data information.
It is understood that the data repository is used for storing data in various formats on the device, such as data in document format TXT, DOC, XLS, PPT, DOCX, XLSX, PPTX, etc., data in picture format JPG, PNG, PDF, TIFF, SWF, etc., data in video format FLV, RMVB, MP4, MVB, etc., data in voice format WMA, MP3, etc.
And S1013, extracting redundant data in the data storage library.
In embodiments of the present invention, some information contained in the data repository is extracted similarly or repeatedly according to a particular redundant data extraction method.
Specifically, fig. 4 shows a flowchart of extracting redundant data in the method provided by the embodiment of the present invention.
In a preferred embodiment of the present invention, the extracting redundant data in the data repository specifically includes the following steps:
s10131, establishing a data form for each data in the data repository, wherein the data form records a data structure, a source address and an acquisition process of the data.
In the embodiment of the invention, a data form is established for each data in the data repository, and the data structure, the source address and the acquisition process of the data are recorded in the data form. For example, a data form is created for data a, and a data structure, a source address, and an obtaining process of the data in the data form respectively correspond to 1, 2, and 3, and the data form of data a generated at this time may be set as a 123.
It will be appreciated that the data structure may be the data format, data name and data field length of the data; the source address mainly reflects the acquisition address of the data, for example: the equipment generates and acquires the network, and the source address acquired by the network also comprises an acquisition website of the data and the like; the acquisition process mainly reflects the information of the acquisition process and time of the data, such as: the device automatically generates certain data at a certain time, and the device generates certain data when operated by a user at a certain time.
S10132, judging whether similar data forms exist or not, and if the similar data forms exist, setting data corresponding to the data forms as redundant data and extracting the redundant data.
In the embodiment of the invention, the data forms of all the data are compared, whether similar data forms exist is judged, if the similar data forms exist, the data corresponding to the data forms are judged to be redundant data, and the redundant data are extracted.
Specifically, fig. 5 shows a flowchart of setting redundant data in the method provided by the embodiment of the present invention.
In a preferred embodiment provided by the present invention, the determining whether similar data forms exist, and when similar data forms exist, setting data corresponding to the data forms as redundant data and extracting specifically includes the following steps:
s101321, two data forms with the same source address and the same obtaining process are obtained.
In the embodiment of the invention, the source address of the data form is compared with the acquisition process to obtain two data forms with the same source address and the same acquisition process.
It will be appreciated that two dataforms having the same source address and acquisition process indicate that both data are acquired in the same manner, from the same address, through the same pass, and at the same time, indicating that the two data are likely to be redundant data that is duplicated.
S101322, comparing the data structure similarity of the two data forms, and setting the two data corresponding to the two data forms as redundant data and extracting when the data structure similarity is large.
In the embodiment of the invention, the data structure similarity comparison is carried out on the data corresponding to the two data forms with the same source address and the obtaining process, if the data structures of the two data forms have great similarity, the two data corresponding to the two data forms are set as redundant data, and the two redundant data are extracted.
It can be understood that the comparison of the structural similarity compares the data formats, the data names and the data field lengths of the two data, and determines whether the data formats, the data names and the data field lengths of the two data have great similarity.
Specifically, fig. 6 shows a flowchart of data structure similarity comparison in the method provided by the embodiment of the present invention.
In an embodiment of the present invention, the comparing the data structure similarity of the two data forms, and when the data structure similarity is large, setting the two data corresponding to the two data forms as redundant data and extracting specifically includes the following steps:
s1013221, comparing whether the data structure similarity of the two data forms is greater than a preset value.
In the embodiment of the present invention, the preset value is set to 80%, and it is determined whether the data structure similarity of the data forms of the two data is greater than 80%, specifically, the data formats, the data names, and the data field lengths of the two data are simultaneously compared, and it is comprehensively determined whether the similarity is greater than 80%.
S1013222, when the data structure similarity is greater than a preset value, setting two data corresponding to the two data forms as redundant data and extracting.
In the embodiment of the present invention, when the data structure similarity of the data form of the two data is greater than 80%, it is determined that the two data corresponding to the two data forms at this time are redundant data, and the two redundant data are extracted.
S1013223, when the similarity of the data structures is not greater than the preset value, determining that the two data corresponding to the two data forms are not redundant data.
In the embodiment of the present invention, when the data structure similarity of the data form of two data is not greater than 80%, it is determined that the data corresponding to the two data forms is not redundant data, and at this time, the two data are not processed.
Further, the step of transferring the redundant data in the first storage area to the second storage area further comprises the following steps:
s1014, transferring the redundant data to the second storage area.
In the embodiment of the invention, the redundant data extracted from the data storage library is transferred into the second storage area, so that the redundant data does not exist in the data storage library, and the storage space in the data storage library is released.
Further, the method comprises the following steps:
s102, useful data in the redundant data are extracted, and the useful data are transferred to a third storage area.
In the embodiment of the invention, among the redundant data transferred to the second storage area, there may be some useful data for users, and according to the type of the data, useful data in the redundant data can be extracted, and then the extracted useful data is transferred to the third storage area.
Specifically, fig. 7 shows a flowchart of useful data transfer in the method provided by the embodiment of the present invention.
In a preferred embodiment provided by the present invention, the extracting useful data from the redundant data and transferring the useful data to a third storage area specifically includes the following steps:
and S1021, setting the screening type of the useful data.
In the embodiment of the present invention, the filtering type of the useful data is set, for example, the filtering type may be set to a file format, and if the user is engaged in an image processing job, the data format of the filtering type may be set to data in a format such as JPG, PNG, PDF, TIFF, SWF, or the like.
S1022, screening the redundant data according to the screening type.
In the embodiment of the invention, the redundant data is screened according to the data format of JPG, PNG, PDF, TIFF, SWF and the like.
S1023, the screened data is set as useful data, and the useful data is transferred to a third storage area.
In the embodiment of the invention, the screened data with the format of JPG, PNG, PDF, TIFF, SWF and the like is set as useful data, and the useful data is transferred to the third storage area for storage.
Further, the method comprises the following steps:
s103, formatting the second storage area.
In the embodiment of the invention, after the useful data in the second storage area is transferred to the third storage area, the redundant data left in the second storage area is redundant data which is useless for users, and the second storage area at the moment is directly formatted, so that the useless redundant data is quickly deleted, and preparation is made for the second storage area to accept the redundant data transferred from the first storage area again.
S104, monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called.
In the embodiment of the invention, the extracted useful data is only temporarily determined to be useful for the user, but useful data which is not called for a long time may be data which is not called again by the user basically, so that the useful data stored in the third storage area is monitored, the calling condition of the useful data in the set time is judged, the useful data which is not called in the set time is deleted, and the storage space is further released.
Specifically, fig. 8 shows a flowchart of useful data monitoring in the method provided by the embodiment of the present invention.
In a preferred embodiment provided by the present invention, the monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data that is not called specifically includes the following steps:
and S1041, setting monitoring time.
In the embodiment of the present invention, the monitoring time for the useful data may be set to 10 days.
S1042, judging whether the useful data in the third storage area is called within the monitoring time.
In an embodiment of the invention, it is determined whether the useful data stored in the third memory area is called within 10 days from the transfer to the third memory area.
And S1043, if the useful data is not called, deleting the useful data.
In the embodiment of the invention, useful data which are not called within 10 days are deleted, and the storage space is further released to prepare for transferring other subsequent useful data to the third storage area.
S1044, if the useful data is called, transferring the useful data to a data repository.
In the embodiment of the invention, the useful data called within 10 days is determined to be frequently called useful data, and is transferred to the data storage library for storage.
Further, fig. 9 shows an application architecture diagram of the system provided by the embodiment of the present invention.
In a further preferred embodiment, a redundant data processing system is provided, which comprises a redundant data transfer unit 100, a useful data extraction unit 200, a formatting unit 300 and a call monitoring unit 400, wherein:
and a redundant data transfer unit 100 for transferring the redundant data in the first storage area to the second storage area.
In the embodiment of the present invention, various data information is stored in the first storage area, and the various data information usually contains redundant data with similar or repeated information, and the redundant data transfer unit 100 extracts the redundant data in the various data information in the first storage area and transfers the redundant data to the second storage area.
Specifically, the redundant data transfer unit 100 specifically includes:
and the storage space dividing module is used for dividing the data storage space into a first storage area, a second storage area and a third storage area.
And the data storage bank setting module is used for setting the first storage area as a data storage bank.
And the redundant data extraction module is used for extracting the redundant data in the data storage library.
And the redundant data transfer module is used for transferring the redundant data to the second storage area.
Specifically, the redundant data extraction module specifically includes:
and the data form establishing submodule is used for establishing a data form for each data in the data storage bank, and the data structure, the source address and the obtaining process of the data are recorded in the data form.
And the similarity judgment processing submodule is used for judging whether similar data forms exist or not, and setting the data corresponding to the data forms as redundant data and extracting the redundant data when the similar data forms exist.
And a useful data extracting unit 200 for extracting useful data from the redundant data and transferring the useful data to a third storage area.
In the embodiment of the present invention, among the redundant data transferred to the second storage area, there may be some useful data for the user, and the useful data extracting unit 200 may extract the useful data from the redundant data according to the type of the data, and then transfer the extracted useful data to the third storage area.
A formatting unit 300 for formatting the second storage area.
In the embodiment of the present invention, after the useful data in the second storage area is transferred to the third storage area, the redundant data remaining in the second storage area is redundant data that is not useful for the user, and the formatting unit 300 directly formats the second storage area at this time, so as to quickly delete the useless redundant data, and prepare for the second storage area to accept the redundant data transferred from the first storage area again.
And the call monitoring unit 400 is used for monitoring the call condition of the useful data in the third storage area within a set time and deleting the useful data which is not called.
In the embodiment of the present invention, the extracted useful data is only temporarily determined to be useful for the user, and the useful data that is not called for a long time may be data that the user does not basically call any more, so the call monitoring unit 400 monitors the useful data stored in the third storage area, determines the call condition of the useful data within the set time, deletes the useful data that is not called within the set time, and further releases the storage space.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
transferring redundant data in the first storage area to a second storage area;
extracting useful data in the redundant data and transferring the useful data to a third storage area;
formatting the second storage area;
and monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called.
In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:
transferring redundant data in the first storage area to a second storage area;
extracting useful data in the redundant data and transferring the useful data to a third storage area;
formatting the second storage area;
and monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A redundant data processing method is characterized by specifically comprising the following steps:
transferring redundant data in the first storage area to a second storage area;
extracting useful data in the redundant data and transferring the useful data to a third storage area;
formatting the second storage area;
monitoring the calling condition of the useful data in the third storage area within a set time, and deleting the useful data which are not called;
the step of transferring the redundant data in the first storage area to the second storage area specifically includes the following steps:
dividing a data storage space into a first storage area, a second storage area and a third storage area;
setting the first storage area as a data storage library;
extracting redundant data in the data repository;
transferring the redundant data to the second storage area;
the extracting of the redundant data in the data repository specifically includes the following steps:
establishing a data form for each data in the data repository, wherein the data form records a data structure, a source address and an acquisition process of the data;
and judging whether similar data forms exist or not, and setting the data corresponding to the data forms as redundant data and extracting when the similar data forms exist.
2. The method according to claim 1, wherein the step of determining whether similar dataforms exist, and when similar dataforms exist, setting data corresponding to the dataforms as redundant data and extracting the redundant data specifically comprises the steps of:
acquiring two data forms with the same source address and the same acquisition process;
and comparing the data structure similarity of the two data forms, and setting the two data corresponding to the two data forms as redundant data and extracting when the data structure similarity is larger.
3. The method according to claim 2, wherein the step of comparing the data structure similarity of the two data forms, and when the data structure similarity is greater, setting the two data corresponding to the two data forms as redundant data and extracting specifically comprises the steps of:
comparing whether the data structure similarity of the two data forms is greater than a preset value;
when the similarity of the data structures is greater than a preset value, setting two data corresponding to the two data forms as redundant data and extracting the redundant data;
and when the similarity of the data structures is not greater than a preset value, judging that the two data corresponding to the two data forms are not redundant data.
4. The method according to claim 1, wherein the step of extracting useful data from the redundant data and transferring the useful data to a third storage area comprises the following steps:
setting a screening type of useful data;
screening the redundant data according to the screening type;
and setting the screened data as useful data, and transferring the useful data to a third storage area.
5. The method according to claim 1, wherein the step of monitoring the calling of the useful data in the third storage area within a set time and deleting the useful data that is not called includes the following steps:
setting monitoring time;
judging whether the useful data in the third storage area is called within monitoring time or not;
if the useful data is not called, deleting the useful data;
and if the useful data is called, transferring the useful data to a data storage library.
6. A redundant data processing system, characterized in that the system comprises a redundant data transfer unit, a useful data extraction unit, a formatting unit and a call monitoring unit, wherein:
a redundant data transfer unit for transferring the redundant data in the first storage area to the second storage area;
the useful data extraction unit is used for extracting useful data in the redundant data and transferring the useful data to a third storage area;
a formatting unit configured to format the second storage area;
the calling monitoring unit is used for monitoring the calling condition of the useful data in the third storage area within set time and deleting the useful data which are not called;
the redundant data transfer unit specifically includes:
the storage space dividing module is used for dividing the data storage space into a first storage area, a second storage area and a third storage area;
the data storage bank setting module is used for setting the first storage area as a data storage bank;
a redundant data extraction module for extracting redundant data in the data repository;
a redundant data transfer module for transferring the redundant data to the second storage area;
the redundant data extraction module specifically comprises:
the data form establishing submodule is used for establishing a data form for each data in the data repository, and the data structure, the source address and the obtaining process of the data are recorded in the data form;
and the similarity judgment processing submodule is used for judging whether similar data forms exist or not, and setting the data corresponding to the data forms as redundant data and extracting the redundant data when the similar data forms exist.
7. A computer arrangement comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of a redundant data processing method according to any of claims 1 to 5.
8. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of a redundant data processing method according to any of claims 1 to 5.
CN202210234546.8A 2022-03-11 2022-03-11 Redundant data processing method, system, computer equipment and storage medium Pending CN114328497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234546.8A CN114328497A (en) 2022-03-11 2022-03-11 Redundant data processing method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234546.8A CN114328497A (en) 2022-03-11 2022-03-11 Redundant data processing method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114328497A true CN114328497A (en) 2022-04-12

Family

ID=81033687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234546.8A Pending CN114328497A (en) 2022-03-11 2022-03-11 Redundant data processing method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114328497A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203193A (en) * 2022-09-19 2022-10-18 南京薄幕软件科技有限公司 Internet of things terminal equipment redundant data processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120109907A1 (en) * 2010-10-30 2012-05-03 International Business Machines Corporation On-demand data deduplication
WO2013143393A1 (en) * 2012-03-30 2013-10-03 北京网秦天下科技有限公司 Method and system utilizing cloud computation for scanning files of device
JP2019045952A (en) * 2017-08-30 2019-03-22 株式会社ケーヒン Vehicle information memory device
CN110929493A (en) * 2020-02-16 2020-03-27 广州信安数据有限公司 Data management method, redundant data detection method, storage medium and data system
CN113420251A (en) * 2021-07-20 2021-09-21 湖南工业大学 Method for cleaning up unused redundant attachments uploaded to server and server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120109907A1 (en) * 2010-10-30 2012-05-03 International Business Machines Corporation On-demand data deduplication
WO2013143393A1 (en) * 2012-03-30 2013-10-03 北京网秦天下科技有限公司 Method and system utilizing cloud computation for scanning files of device
JP2019045952A (en) * 2017-08-30 2019-03-22 株式会社ケーヒン Vehicle information memory device
CN110929493A (en) * 2020-02-16 2020-03-27 广州信安数据有限公司 Data management method, redundant data detection method, storage medium and data system
CN113420251A (en) * 2021-07-20 2021-09-21 湖南工业大学 Method for cleaning up unused redundant attachments uploaded to server and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203193A (en) * 2022-09-19 2022-10-18 南京薄幕软件科技有限公司 Internet of things terminal equipment redundant data processing method and system
CN115203193B (en) * 2022-09-19 2023-01-06 南京薄幕软件科技有限公司 Internet of things terminal equipment redundant data processing method and system

Similar Documents

Publication Publication Date Title
CN110276002B (en) Search application data processing method and device, computer equipment and storage medium
US10747621B2 (en) Data management and backup for image and video media
CN102456059A (en) Data deduplication processing system
CN109460438B (en) Message data storage method, device, computer equipment and storage medium
CN114328497A (en) Redundant data processing method, system, computer equipment and storage medium
CN110399096B (en) Method, device and equipment for deleting metadata cache of distributed file system again
CN110795508A (en) Data copying method, device, equipment and storage medium
US20240168921A1 (en) File processing method, apparatus and device, and readable storage medium
CN111125002B (en) Data backup archiving method and system based on distributed storage
CN110727724A (en) Data extraction method and device, computer equipment and storage medium
CN114138549A (en) Data backup and recovery method based on kubernets system
CN114817230A (en) Data stream filtering method and system
CN109522273B (en) Method and device for realizing data writing
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
CN111966531B (en) Data snapshot method and device, computer equipment and storage medium
CN112256649A (en) Medical file storage method and device
CN110019056B (en) Container metadata separation for cloud layer
KR101844528B1 (en) Method and device for backup using total file
CN113485874B (en) Data processing method and distributed storage system
EP3944111B1 (en) System and method for generating a minimal forensic image of a dataset of interest
US11604805B2 (en) Terminal, storage medium, and database synchronization method thereof
CN113704027A (en) File aggregation compatible method and device, computer equipment and storage medium
CN110471623B (en) Hard disk file writing method, device, computer equipment and storage medium
CN110413583B (en) Log monitoring method and device based on FLUME system and server
US20230024682A1 (en) Logical imaging apparatus and method for digital forensic triage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220412