CN109344158A - The method and apparatus of redundant data is deleted from mass data - Google Patents
The method and apparatus of redundant data is deleted from mass data Download PDFInfo
- Publication number
- CN109344158A CN109344158A CN201811178431.1A CN201811178431A CN109344158A CN 109344158 A CN109344158 A CN 109344158A CN 201811178431 A CN201811178431 A CN 201811178431A CN 109344158 A CN109344158 A CN 109344158A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- deleted
- redundant data
- redundant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The embodiment of the invention provides a kind of from mass data deletes the method and apparatus of redundant data.Wherein, the described method includes: triggering customized scheduler task, the preservation duration of data data in the accounting and/or database in disk in reading database, according to the preservation duration of the data in accounting and/or database in disk of data in the database, preset data deletion rule, extracts redundant data to be deleted in combined data library;Delete the redundant data to be deleted.The method and apparatus provided in an embodiment of the present invention that redundant data is deleted from mass data, by triggering customized scheduler task, preset data deletion rule in combined data library, extract redundant data to be deleted, then redundant data to be deleted is deleted in the form of partition table, redundant data can be being deleted, while providing more preferably table management strategy for O&M, a large amount of index fragments generated in existing deleting technique and deletion log avoided to prolong pile to data deletion efficiency.
Description
Technical field
A kind of deleting the present embodiments relate to field of computer technology more particularly to from mass data redundant data
Method and apparatus.
Background technique
When a large amount of data are stored in columnar database, as a large amount of data are constantly written in disk, need
It deletes old data and can guarantee new data write-in.When simple delete operation acts on the database comprising mass data
When, need in view of index fragment substantial increase and guarantee log high speed insertion while, cannot influence inquiry and
The performance of deletion.If the prior art is deleted from mass data using delete, a large amount of journal file will be generated, is consumed simultaneously
When it is very long, every time delete data will update index, lead to slower hard disk input and output, at the same will affect remaining insertion with
Query function.Therefore, it finds one kind and is deleting redundant data, while providing more preferably table management strategy for O&M, avoid existing
There are a large amount of index fragments generated in deleting technique and delete log and pile is prolonged to data deletion efficiency, just becomes industry and urgently solve
Certainly the technical issues of.
Summary of the invention
In view of the above-mentioned problems existing in the prior art, the embodiment of the invention provides one kind, and redundancy is deleted from mass data
The method and apparatus of data.
In a first aspect, the method that the embodiment provides a kind of to delete redundant data from mass data, comprising:
Customized scheduler task is triggered, the preservation duration of data data in the accounting and/or database in disk in reading database,
According to the preservation duration of the data in accounting and/or database in disk of data in the database, preset in combined data library
Data deletion rule, extract redundant data to be deleted;Delete the redundant data to be deleted.
Further, the customized scheduler task of triggering, comprising: mission frequency is configured to the customized frequency of user,
The task triggered time is configured to user's self defined time.
Further, preset data deletion rule in the database, comprising: data accounting in disk in database
Greater than accounting threshold value, then extracts and be stored in moment earliest data in database, as redundant data to be deleted.
Further, preset data deletion rule in the database, comprising: grow up when the preservation of data in database
In duration threshold value, then the preservation duration for extracting data in database is greater than the data of duration threshold value, as redundant data to be deleted.
Further, preset data deletion rule in the database, further includes: data account in disk in database
Than close to accounting threshold value, then issuing data redundancy warning.
Further, the data in the database carry out data management in the way of partition table.
Further, described to delete the redundant data to be deleted, comprising: according to the partition table after subregion, described in deletion
Redundant data to be deleted.
Second aspect, the embodiment provides a kind of from mass data deletes the device of redundant data, comprising:
Redundant data extraction module to be deleted, for triggering customized scheduler task, data are in disk in reading database
In accounting and/or database in data preservation duration, according to data in the database in disk accounting and/or data
The preservation duration of data in library, preset data deletion rule in combined data library, extracts redundant data to be deleted;
Redundant data removing module, for deleting the redundant data to be deleted.
The third aspect, the embodiment provides a kind of electronic equipment, comprising:
At least one processor;And
At least one processor being connect with processor communication, in which:
Memory is stored with the program instruction that can be executed by processor, and the instruction of processor caller is able to carry out first party
Redundant data is deleted from mass data provided by any possible implementation in the various possible implementations in face
Method.
Fourth aspect, the embodiment provides a kind of non-transient computer readable storage medium, non-transient calculating
Machine readable storage medium storing program for executing stores computer instruction, and computer instruction makes the various possible realization sides of computer execution first aspect
Delete the method for redundant data in formula provided by any possible implementation from mass data.
The method and apparatus provided in an embodiment of the present invention that redundant data is deleted from mass data, it is customized by triggering
Scheduler task, preset data deletion rule in combined data library, extracts redundant data to be deleted, then in the form of partition table
Redundant data to be deleted is deleted, redundant data can deleted, while providing more preferably table management strategy for O&M,
The a large amount of index fragments generated in existing deleting technique and deletion log are avoided to prolong pile to data deletion efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the method flow diagram provided in an embodiment of the present invention that redundant data is deleted from mass data;
Fig. 2 is the apparatus structure schematic diagram provided in an embodiment of the present invention that redundant data is deleted from mass data;
Fig. 3 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.In addition,
Technical characteristic in each embodiment or single embodiment provided by the invention can mutual any combination, to form feasible skill
Art scheme, but must be based on can be realized by those of ordinary skill in the art, when the combination of technical solution occur it is mutual
Contradiction or when cannot achieve, it will be understood that the combination of this technical solution is not present, also not the present invention claims protection scope
Within.
The method that the embodiment of the invention provides a kind of to delete redundant data from mass data, referring to Fig. 1, this method packet
It includes:
101, customized scheduler task is triggered, data count in the accounting and/or database in disk in reading database
According to preservation duration, according to the preservation duration of the data in accounting and/or database in disk of data in the database, in conjunction with
Preset data deletion rule, extracts redundant data to be deleted in database;
102, the redundant data to be deleted is deleted.
In another embodiment of the invention, the redundant data to be deleted of extraction, can be saved in other storage mediums.
For example, the storage mediums such as disk, hard disk or flash disk.
The method provided in an embodiment of the present invention that redundant data is deleted from mass data is appointed by triggering customized scheduling
It is engaged in, preset data deletion rule in combined data library is extracted redundant data to be deleted, then treated and deleted in the form of partition table
Except redundant data is deleted, redundant data can be being deleted, while providing more preferably table management strategy for O&M, avoided existing
There are a large amount of index fragments generated in deleting technique and delete log and pile is prolonged to data deletion efficiency.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, the customized scheduler task of triggering, comprising: mission frequency is configured to the customized frequency of user, by the task triggered time
It is configured to user's self defined time.It should be noted that the scheduler task of the embodiment of the present invention is customized, including user is certainly
Define triggered time and triggering frequency.But the customized scheduler task is merely to make technical solution of the present invention
It is bright, it does not represent technical solution of the present invention and is only limitted to be applied to customized scheduler task, as long as other kinds of scheduler task
Technical requirements meet the Spirit Essence of the embodiment of the present invention, it will be understood that the scheduler task of the type is also in the embodiment of the present invention
Within the technical solution for including.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, preset data deletion rule in the database, comprising: data accounting in disk is greater than accounting threshold value in database,
It then extracts and is stored in moment earliest data in database, as redundant data to be deleted.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, preset data deletion rule in the database, comprising: the preservation duration of data is greater than duration threshold value in database, then
The preservation duration for extracting data in database is greater than the data of duration threshold value, as redundant data to be deleted.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, preset data deletion rule in the database, further includes: in database data in disk accounting close to accounting threshold
Value then issues data redundancy warning.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, the data in the database carry out data management in the way of partition table.
On the basis of the above embodiments, the side of redundant data is provided in the slave mass data provided in the embodiment of the present invention
Method, it is described to delete the redundant data to be deleted, comprising: according to the partition table after subregion, to delete the redundant data to be deleted.
As long as it should be noted that preset any one (embodiment of the present invention of data deletion rule in trigger data library
In middle database there are two types of preset data deletion rules), then the operation of redundant data to be deleted is extracted, and then delete institute
State redundant data to be deleted.
The optimized integration of each embodiment of the present invention is the processing that sequencing is carried out by the equipment with processor function
It realizes.Therefore engineering in practice, can be by the technical solution of each embodiment of the present invention and its function package at various moulds
Block.Based on this reality, on the basis of the various embodiments described above, the embodiment provides one kind from mass data
The middle device for deleting redundant data, which, which is used to execute in the slave mass data in above method embodiment, deletes redundant data
Method.Referring to fig. 2, which includes:
Redundant data extraction module 201 to be deleted, for triggering customized scheduler task, data are in magnetic in reading database
The preservation duration of data in accounting and/or database in disk, according to data in the database in disk accounting and/or number
According to the preservation duration of data in library, preset data deletion rule, extracts redundant data to be deleted in combined data library;
Redundant data removing module 202, for deleting the redundant data to be deleted.
The device provided in an embodiment of the present invention that redundant data is deleted from mass data, by the way that redundant digit to be deleted is arranged
According to extraction module and redundant data removing module, customized scheduler task is triggered, preset data delete rule in combined data library
Then, redundant data to be deleted is extracted, then redundant data to be deleted is deleted in the form of partition table, it can be superfluous in deletion
Remainder evidence while providing more preferably table management strategy for O&M, avoids a large amount of index fragments generated in existing deleting technique
And it deletes log and pile is prolonged to data deletion efficiency.
The method of the embodiment of the present invention is to rely on electronic equipment to realize, therefore it is necessary to do one to relevant electronic equipment
Lower introduction.Based on this purpose, the embodiment provides a kind of electronic equipment, as shown in figure 3, the electronic equipment includes:
At least one processor (processor) 301, communication interface (Communications Interface) 304, at least one deposits
Reservoir (memory) 302 and communication bus 303, wherein at least one processor 301, communication interface 304, at least one storage
Device 302 completes mutual communication by communication bus 303.At least one processor 301 can call at least one processor
Logical order in 302, to execute following method: triggering customized scheduler task, data are in disk in reading database
The preservation duration of data in accounting and/or database, according to data in the database in disk in accounting and/or database
The preservation duration of data, preset data deletion rule in combined data library, extracts redundant data to be deleted;It deletes described wait delete
Except redundant data.
In addition, the logical order in above-mentioned at least one processor 302 can be real by way of SFU software functional unit
Now and when sold or used as an independent product, it can store in a computer readable storage medium.Based in this way
Understanding, the technical solution of the present invention substantially portion of the part that contributes to existing technology or the technical solution in other words
Dividing can be embodied in the form of software products, which is stored in a storage medium, including several
Instruction is used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention
The all or part of the steps of each embodiment the method.For example, trigger customized scheduler task, number in reading database
According to the accounting in disk and/or the preservation duration of data in database, according to data accounting in disk in the database
And/or in database data preservation duration, preset data deletion rule in combined data library extracts redundant digit to be deleted
According to;Delete the redundant data to be deleted.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of method for deleting redundant data from mass data characterized by comprising
Customized scheduler task is triggered, the preservation of data data in the accounting and/or database in disk in reading database
Duration, according to the preservation duration of the data in accounting and/or database in disk of data in the database, in combined data library
Preset data deletion rule, extracts redundant data to be deleted;
Delete the redundant data to be deleted.
2. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the triggering is certainly
Define scheduler task, comprising:
Mission frequency is configured to the customized frequency of user, the task triggered time is configured to user's self defined time.
3. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database
In preset data deletion rule, comprising:
Data accounting in disk is greater than accounting threshold value in database, then extracts and be stored in moment earliest data in database, make
For redundant data to be deleted.
4. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database
In preset data deletion rule, comprising:
The preservation duration of data is greater than duration threshold value in database, then the preservation duration for extracting data in database is greater than duration threshold
The data of value, as redundant data to be deleted.
5. the method according to claim 3 for deleting redundant data from mass data, which is characterized in that the database
In preset data deletion rule, further includes:
Data accounting in disk then issues data redundancy warning close to accounting threshold value in database.
6. the method according to claim 1 for deleting redundant data from mass data, which is characterized in that the database
In data data management is carried out in the way of partition table.
7. the method according to claim 6 for deleting redundant data from mass data, which is characterized in that the deletion institute
State redundant data to be deleted, comprising:
According to the partition table after subregion, the redundant data to be deleted is deleted.
8. a kind of device for deleting redundant data from mass data characterized by comprising.
Redundant data extraction module to be deleted, for triggering customized scheduler task, data are in disk in reading database
The preservation duration of data in accounting and/or database, according to data in the database in disk in accounting and/or database
The preservation duration of data, preset data deletion rule in combined data library, extracts redundant data to be deleted;
Redundant data removing module, for deleting the redundant data to be deleted.
9. a kind of electronic equipment characterized by comprising
At least one processor, at least one processor, communication interface and bus;Wherein,
The processor, memory, communication interface complete mutual communication by the bus;
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program instruction,
To execute method as described in any one of claim 1 to 7.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction makes the computer execute the method as described in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811178431.1A CN109344158A (en) | 2018-10-10 | 2018-10-10 | The method and apparatus of redundant data is deleted from mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811178431.1A CN109344158A (en) | 2018-10-10 | 2018-10-10 | The method and apparatus of redundant data is deleted from mass data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109344158A true CN109344158A (en) | 2019-02-15 |
Family
ID=65309330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811178431.1A Pending CN109344158A (en) | 2018-10-10 | 2018-10-10 | The method and apparatus of redundant data is deleted from mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344158A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036001A (en) * | 2014-06-13 | 2014-09-10 | 上海新炬网络技术有限公司 | Dynamic hotlist priority scheduling based quick data cleaning method |
CN106599113A (en) * | 2016-11-30 | 2017-04-26 | 武汉虹信通信技术有限责任公司 | Database read-write method for mass performance data of network management system |
CN106648990A (en) * | 2016-12-28 | 2017-05-10 | 四川秘无痕信息安全技术有限责任公司 | Method for extracting data of BlueSky file system monitoring equipment rapidly |
CN107295173A (en) * | 2017-06-21 | 2017-10-24 | 广东欧珀移动通信有限公司 | Delete the method and Related product of chat messages |
CN107357686A (en) * | 2017-07-20 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of daily record delet method and device |
-
2018
- 2018-10-10 CN CN201811178431.1A patent/CN109344158A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036001A (en) * | 2014-06-13 | 2014-09-10 | 上海新炬网络技术有限公司 | Dynamic hotlist priority scheduling based quick data cleaning method |
CN106599113A (en) * | 2016-11-30 | 2017-04-26 | 武汉虹信通信技术有限责任公司 | Database read-write method for mass performance data of network management system |
CN106648990A (en) * | 2016-12-28 | 2017-05-10 | 四川秘无痕信息安全技术有限责任公司 | Method for extracting data of BlueSky file system monitoring equipment rapidly |
CN107295173A (en) * | 2017-06-21 | 2017-10-24 | 广东欧珀移动通信有限公司 | Delete the method and Related product of chat messages |
CN107357686A (en) * | 2017-07-20 | 2017-11-17 | 郑州云海信息技术有限公司 | A kind of daily record delet method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11568042B2 (en) | System and methods for sandboxed malware analysis and automated patch development, deployment and validation | |
US10241681B2 (en) | Management of physical extents for space efficient storage volumes | |
CN106201659B (en) | A kind of method and host of live migration of virtual machine | |
CN104317928A (en) | Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database | |
CN108629029B (en) | Data processing method and device applied to data warehouse | |
CN107209714A (en) | The control method of distributed memory system and distributed memory system | |
US8898677B2 (en) | Data arrangement calculating system, data arrangement calculating method, master unit and data arranging method | |
US10884667B2 (en) | Storage controller and IO request processing method | |
WO2016178316A1 (en) | Computer procurement predicting device, computer procurement predicting method, and program | |
CN105260639A (en) | Face recognition system data update method and device | |
US8676850B2 (en) | Prioritization mechanism for deletion of chunks of deduplicated data objects | |
EP3018581A1 (en) | Data staging management system | |
CN109684271A (en) | Snapshot data management method, device, electronic equipment and machine readable storage medium | |
CN110119422A (en) | Small wechat borrows tenant data depot data processing system and equipment | |
EP4174675A1 (en) | On-board data storage method and system | |
CN109344158A (en) | The method and apparatus of redundant data is deleted from mass data | |
CN109977074A (en) | A kind of lob data processing method and processing device based on HDFS | |
CN110399095A (en) | A kind of statistical method and device of memory space | |
CN110196786A (en) | Rollback database synchronizes the control method and equipment of middle memory | |
CN114036104A (en) | Cloud filing method, device and system for re-deleted data based on distributed storage | |
CN110019071A (en) | Data processing method and device | |
CN114564149A (en) | Data storage method, device, equipment and storage medium | |
CN108459828B (en) | Desktop cloud disk redistribution method | |
CN112685334A (en) | Method, device and storage medium for block caching of data | |
CN110413691A (en) | Database backup method, restoration methods and device based on block chain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190215 |
|
RJ01 | Rejection of invention patent application after publication |