CN109344158A - The method and apparatus of redundant data is deleted from mass data - Google Patents

The method and apparatus of redundant data is deleted from mass data Download PDF

Info

Publication number
CN109344158A
CN109344158A CN201811178431.1A CN201811178431A CN109344158A CN 109344158 A CN109344158 A CN 109344158A CN 201811178431 A CN201811178431 A CN 201811178431A CN 109344158 A CN109344158 A CN 109344158A
Authority
CN
China
Prior art keywords
data
database
deleted
redundant data
redundant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811178431.1A
Other languages
Chinese (zh)
Inventor
曾成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Sipuleng Technology Co Ltd
Wuhan Sipuling Technology Co Ltd
Original Assignee
Wuhan Sipuleng Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Sipuleng Technology Co Ltd filed Critical Wuhan Sipuleng Technology Co Ltd
Priority to CN201811178431.1A priority Critical patent/CN109344158A/en
Publication of CN109344158A publication Critical patent/CN109344158A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention provides a kind of from mass data deletes the method and apparatus of redundant data.Wherein, the described method includes: triggering customized scheduler task, the preservation duration of data data in the accounting and/or database in disk in reading database, according to the preservation duration of the data in accounting and/or database in disk of data in the database, preset data deletion rule, extracts redundant data to be deleted in combined data library;Delete the redundant data to be deleted.The method and apparatus provided in an embodiment of the present invention that redundant data is deleted from mass data, by triggering customized scheduler task, preset data deletion rule in combined data library, extract redundant data to be deleted, then redundant data to be deleted is deleted in the form of partition table, redundant data can be being deleted, while providing more preferably table management strategy for O&M, a large amount of index fragments generated in existing deleting technique and deletion log avoided to prolong pile to data deletion efficiency.

Description

The method and apparatus of redundant data is deleted from mass data
Technical field
A kind of deleting the present embodiments relate to field of computer technology more particularly to from mass data redundant data Method and apparatus.
Background technique
When a large amount of data are stored in columnar database, as a large amount of data are constantly written in disk, need It deletes old data and can guarantee new data write-in.When simple delete operation acts on the database comprising mass data When, need in view of index fragment substantial increase and guarantee log high speed insertion while, cannot influence inquiry and The performance of deletion.If the prior art is deleted from mass data using delete, a large amount of journal file will be generated, is consumed simultaneously When it is very long, every time delete data will update index, lead to slower hard disk input and output, at the same will affect remaining insertion with Query function.Therefore, it finds one kind and is deleting redundant data, while providing more preferably table management strategy for O&M, avoid existing There are a large amount of index fragments generated in deleting technique and delete log and pile is prolonged to data deletion efficiency, just becomes industry and urgently solve Certainly the technical issues of.
Summary of the invention
In view of the above-mentioned problems existing in the prior art, the embodiment of the invention provides one kind, and redundancy is deleted from mass data The method and apparatus of data.
In a first aspect, the method that the embodiment provides a kind of to delete redundant data from mass data, comprising: Customized scheduler task is triggered, the preservation duration of data data in the accounting and/or database in disk in reading database, According to the preservation duration of the data in accounting and/or database in disk of data in the database, preset in combined data library Data deletion rule, extract redundant data to be deleted;Delete the redundant data to be deleted.
Further, the customized scheduler task of triggering, comprising: mission frequency is configured to the customized frequency of user, The task triggered time is configured to user's self defined time.
Further, preset data deletion rule in the database, comprising: data accounting in disk in database Greater than accounting threshold value, then extracts and be stored in moment earliest data in database, as redundant data to be deleted.
Further, preset data deletion rule in the database, comprising: grow up when the preservation of data in database In duration threshold value, then the preservation duration for extracting data in database is greater than the data of duration threshold value, as redundant data to be deleted.
Further, preset data deletion rule in the database, further includes: data account in disk in database Than close to accounting threshold value, then issuing data redundancy warning.
Further, the data in the database carry out data management in the way of partition table.
Further, described to delete the redundant data to be deleted, comprising: according to the partition table after subregion, described in deletion Redundant data to be deleted.
Second aspect, the embodiment provides a kind of from mass data deletes the device of redundant data, comprising:
Redundant data extraction module to be deleted, for triggering customized scheduler task, data are in disk in reading database In accounting and/or database in data preservation duration, according to data in the database in disk accounting and/or data The preservation duration of data in library, preset data deletion rule in combined data library, extracts redundant data to be deleted;
Redundant data removing module, for deleting the redundant data to be deleted.
The third aspect, the embodiment provides a kind of electronic equipment, comprising:
At least one processor;And
At least one processor being connect with processor communication, in which:
Memory is stored with the program instruction that can be executed by processor, and the instruction of processor caller is able to carry out first party Redundant data is deleted from mass data provided by any possible implementation in the various possible implementations in face Method.
Fourth aspect, the embodiment provides a kind of non-transient computer readable storage medium, non-transient calculating Machine readable storage medium storing program for executing stores computer instruction, and computer instruction makes the various possible realization sides of computer execution first aspect Delete the method for redundant data in formula provided by any possible implementation from mass data.
The method and apparatus provided in an embodiment of the present invention that redundant data is deleted from mass data, it is customized by triggering Scheduler task, preset data deletion rule in combined data library, extracts redundant data to be deleted, then in the form of partition table Redundant data to be deleted is deleted, redundant data can deleted, while providing more preferably table management strategy for O&M, The a large amount of index fragments generated in existing deleting technique and deletion log are avoided to prolong pile to data deletion efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the method flow diagram provided in an embodiment of the present invention that redundant data is deleted from mass data;
Fig. 2 is the apparatus structure schematic diagram provided in an embodiment of the present invention that redundant data is deleted from mass data;
Fig. 3 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.In addition, Technical characteristic in each embodiment or single embodiment provided by the invention can mutual any combination, to form feasible skill Art scheme, but must be based on can be realized by those of ordinary skill in the art, when the combination of technical solution occur it is mutual Contradiction or when cannot achieve, it will be understood that the combination of this technical solution is not present, also not the present invention claims protection scope Within.
The method that the embodiment of the invention provides a kind of to delete redundant data from mass data, referring to Fig. 1, this method packet It includes:
101, customized scheduler task is triggered, data count in the accounting and/or database in disk in reading database According to preservation duration, according to the preservation duration of the data in accounting and/or database in disk of data in the database, in conjunction with Preset data deletion rule, extracts redundant data to be deleted in database;
102, the redundant data to be deleted is deleted.
In another embodiment of the invention, the redundant data to be deleted of extraction, can be saved in other storage mediums. For example, the storage mediums such as disk, hard disk or flash disk.
The method provided in an embodiment of the present invention that redundant data is deleted from mass data is appointed by triggering customized scheduling It is engaged in, preset data deletion rule in combined data library is extracted redundant data to be deleted, then treated and deleted in the form of partition table Except redundant data is deleted, redundant data can be being deleted, while providing more preferably table management strategy for O&M, avoided existing There are a large amount of index fragments generated in deleting technique and delete log and pile is prolonged to data deletion efficiency.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, the customized scheduler task of triggering, comprising: mission frequency is configured to the customized frequency of user, by the task triggered time It is configured to user's self defined time.It should be noted that the scheduler task of the embodiment of the present invention is customized, including user is certainly Define triggered time and triggering frequency.But the customized scheduler task is merely to make technical solution of the present invention It is bright, it does not represent technical solution of the present invention and is only limitted to be applied to customized scheduler task, as long as other kinds of scheduler task Technical requirements meet the Spirit Essence of the embodiment of the present invention, it will be understood that the scheduler task of the type is also in the embodiment of the present invention Within the technical solution for including.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, preset data deletion rule in the database, comprising: data accounting in disk is greater than accounting threshold value in database, It then extracts and is stored in moment earliest data in database, as redundant data to be deleted.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, preset data deletion rule in the database, comprising: the preservation duration of data is greater than duration threshold value in database, then The preservation duration for extracting data in database is greater than the data of duration threshold value, as redundant data to be deleted.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, preset data deletion rule in the database, further includes: in database data in disk accounting close to accounting threshold Value then issues data redundancy warning.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, the data in the database carry out data management in the way of partition table.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention Method, it is described to delete the redundant data to be deleted, comprising: according to the partition table after subregion, to delete the redundant data to be deleted.
As long as it should be noted that preset any one (embodiment of the present invention of data deletion rule in trigger data library In middle database there are two types of preset data deletion rules), then the operation of redundant data to be deleted is extracted, and then delete institute State redundant data to be deleted.
The optimized integration of each embodiment of the present invention is the processing that sequencing is carried out by the equipment with processor function It realizes.Therefore engineering in practice, can be by the technical solution of each embodiment of the present invention and its function package at various moulds Block.Based on this reality, on the basis of the various embodiments described above, the embodiment provides one kind from mass data The middle device for deleting redundant data, which, which is used to execute in the slave mass data in above method embodiment, deletes redundant data Method.Referring to fig. 2, which includes:
Redundant data extraction module 201 to be deleted, for triggering customized scheduler task, data are in magnetic in reading database The preservation duration of data in accounting and/or database in disk, according to data in the database in disk accounting and/or number According to the preservation duration of data in library, preset data deletion rule, extracts redundant data to be deleted in combined data library;
Redundant data removing module 202, for deleting the redundant data to be deleted.
The device provided in an embodiment of the present invention that redundant data is deleted from mass data, by the way that redundant digit to be deleted is arranged According to extraction module and redundant data removing module, customized scheduler task is triggered, preset data delete rule in combined data library Then, redundant data to be deleted is extracted, then redundant data to be deleted is deleted in the form of partition table, it can be superfluous in deletion Remainder evidence while providing more preferably table management strategy for O&M, avoids a large amount of index fragments generated in existing deleting technique And it deletes log and pile is prolonged to data deletion efficiency.
The method of the embodiment of the present invention is to rely on electronic equipment to realize, therefore it is necessary to do one to relevant electronic equipment Lower introduction.Based on this purpose, the embodiment provides a kind of electronic equipment, as shown in figure 3, the electronic equipment includes: At least one processor (processor) 301, communication interface (Communications Interface) 304, at least one deposits Reservoir (memory) 302 and communication bus 303, wherein at least one processor 301, communication interface 304, at least one storage Device 302 completes mutual communication by communication bus 303.At least one processor 301 can call at least one processor Logical order in 302, to execute following method: triggering customized scheduler task, data are in disk in reading database The preservation duration of data in accounting and/or database, according to data in the database in disk in accounting and/or database The preservation duration of data, preset data deletion rule in combined data library, extracts redundant data to be deleted;It deletes described wait delete Except redundant data.
In addition, the logical order in above-mentioned at least one processor 302 can be real by way of SFU software functional unit Now and when sold or used as an independent product, it can store in a computer readable storage medium.Based in this way Understanding, the technical solution of the present invention substantially portion of the part that contributes to existing technology or the technical solution in other words Dividing can be embodied in the form of software products, which is stored in a storage medium, including several Instruction is used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention The all or part of the steps of each embodiment the method.For example, trigger customized scheduler task, number in reading database According to the accounting in disk and/or the preservation duration of data in database, according to data accounting in disk in the database And/or in database data preservation duration, preset data deletion rule in combined data library extracts redundant digit to be deleted According to;Delete the redundant data to be deleted.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of method for deleting redundant data from mass data characterized by comprising
Customized scheduler task is triggered, the preservation of data data in the accounting and/or database in disk in reading database Duration, according to the preservation duration of the data in accounting and/or database in disk of data in the database, in combined data library Preset data deletion rule, extracts redundant data to be deleted;
Delete the redundant data to be deleted.
2. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the triggering is certainly Define scheduler task, comprising:
Mission frequency is configured to the customized frequency of user, the task triggered time is configured to user's self defined time.
3. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database In preset data deletion rule, comprising:
Data accounting in disk is greater than accounting threshold value in database, then extracts and be stored in moment earliest data in database, make For redundant data to be deleted.
4. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database In preset data deletion rule, comprising:
The preservation duration of data is greater than duration threshold value in database, then the preservation duration for extracting data in database is greater than duration threshold The data of value, as redundant data to be deleted.
5. the method according to claim 3 for deleting redundant data from mass data, which is characterized in that the database In preset data deletion rule, further includes:
Data accounting in disk then issues data redundancy warning close to accounting threshold value in database.
6. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database In data data management is carried out in the way of partition table.
7. the method according to claim 6 for deleting redundant data from mass data, which is characterized in that the deletion institute State redundant data to be deleted, comprising:
According to the partition table after subregion, the redundant data to be deleted is deleted.
8. a kind of device for deleting redundant data from mass data characterized by comprising.
Redundant data extraction module to be deleted, for triggering customized scheduler task, data are in disk in reading database The preservation duration of data in accounting and/or database, according to data in the database in disk in accounting and/or database The preservation duration of data, preset data deletion rule in combined data library, extracts redundant data to be deleted;
Redundant data removing module, for deleting the redundant data to be deleted.
9. a kind of electronic equipment characterized by comprising
At least one processor, at least one processor, communication interface and bus;Wherein,
The processor, memory, communication interface complete mutual communication by the bus;
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program instruction, To execute method as described in any one of claim 1 to 7.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in any one of claims 1 to 7.
CN201811178431.1A 2018-10-10 2018-10-10 The method and apparatus of redundant data is deleted from mass data Pending CN109344158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811178431.1A CN109344158A (en) 2018-10-10 2018-10-10 The method and apparatus of redundant data is deleted from mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811178431.1A CN109344158A (en) 2018-10-10 2018-10-10 The method and apparatus of redundant data is deleted from mass data

Publications (1)

Publication Number Publication Date
CN109344158A true CN109344158A (en) 2019-02-15

Family

ID=65309330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811178431.1A Pending CN109344158A (en) 2018-10-10 2018-10-10 The method and apparatus of redundant data is deleted from mass data

Country Status (1)

Country Link
CN (1) CN109344158A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
CN106599113A (en) * 2016-11-30 2017-04-26 武汉虹信通信技术有限责任公司 Database read-write method for mass performance data of network management system
CN106648990A (en) * 2016-12-28 2017-05-10 四川秘无痕信息安全技术有限责任公司 Method for extracting data of BlueSky file system monitoring equipment rapidly
CN107295173A (en) * 2017-06-21 2017-10-24 广东欧珀移动通信有限公司 Delete the method and Related product of chat messages
CN107357686A (en) * 2017-07-20 2017-11-17 郑州云海信息技术有限公司 A kind of daily record delet method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method
CN106599113A (en) * 2016-11-30 2017-04-26 武汉虹信通信技术有限责任公司 Database read-write method for mass performance data of network management system
CN106648990A (en) * 2016-12-28 2017-05-10 四川秘无痕信息安全技术有限责任公司 Method for extracting data of BlueSky file system monitoring equipment rapidly
CN107295173A (en) * 2017-06-21 2017-10-24 广东欧珀移动通信有限公司 Delete the method and Related product of chat messages
CN107357686A (en) * 2017-07-20 2017-11-17 郑州云海信息技术有限公司 A kind of daily record delet method and device

Similar Documents

Publication Publication Date Title
US11568042B2 (en) System and methods for sandboxed malware analysis and automated patch development, deployment and validation
US10241681B2 (en) Management of physical extents for space efficient storage volumes
CN106201659B (en) A kind of method and host of live migration of virtual machine
CN104317928A (en) Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN108629029B (en) Data processing method and device applied to data warehouse
CN107209714A (en) The control method of distributed memory system and distributed memory system
US8898677B2 (en) Data arrangement calculating system, data arrangement calculating method, master unit and data arranging method
US10884667B2 (en) Storage controller and IO request processing method
WO2016178316A1 (en) Computer procurement predicting device, computer procurement predicting method, and program
CN105260639A (en) Face recognition system data update method and device
US8676850B2 (en) Prioritization mechanism for deletion of chunks of deduplicated data objects
EP3018581A1 (en) Data staging management system
CN109684271A (en) Snapshot data management method, device, electronic equipment and machine readable storage medium
CN110119422A (en) Small wechat borrows tenant data depot data processing system and equipment
EP4174675A1 (en) On-board data storage method and system
CN109344158A (en) The method and apparatus of redundant data is deleted from mass data
CN109977074A (en) A kind of lob data processing method and processing device based on HDFS
CN110399095A (en) A kind of statistical method and device of memory space
CN110196786A (en) Rollback database synchronizes the control method and equipment of middle memory
CN114036104A (en) Cloud filing method, device and system for re-deleted data based on distributed storage
CN110019071A (en) Data processing method and device
CN114564149A (en) Data storage method, device, equipment and storage medium
CN108459828B (en) Desktop cloud disk redistribution method
CN112685334A (en) Method, device and storage medium for block caching of data
CN110413691A (en) Database backup method, restoration methods and device based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190215

RJ01 Rejection of invention patent application after publication