CN109460411A - A kind of data aging method based on hive, device and equipment - Google Patents

A kind of data aging method based on hive, device and equipment Download PDF

Info

Publication number
CN109460411A
CN109460411A CN201811346834.2A CN201811346834A CN109460411A CN 109460411 A CN109460411 A CN 109460411A CN 201811346834 A CN201811346834 A CN 201811346834A CN 109460411 A CN109460411 A CN 109460411A
Authority
CN
China
Prior art keywords
aging
hive
data
default
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811346834.2A
Other languages
Chinese (zh)
Inventor
郑艳涛
袁益梦
林锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201811346834.2A priority Critical patent/CN109460411A/en
Publication of CN109460411A publication Critical patent/CN109460411A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data aging methods based on hive, comprising: when receiving the read and write access for the data field of hive, refreshes the access time record of the corresponding data field of read and write access in the metadatabase of hive;Whether judgement currently meets the default condition of scanning;If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, using the data field of the access time default aging condition of record satisfaction as aging area, and deletes the content in aging area.The present invention can be realized to the automatically scanning of the data field hive and aging judgement and the deletion in aging area, high-efficient, and accuracy and reliability is higher;It is a further object of the present invention to provide a kind of device based on the above method, equipment and computer readable storage mediums.

Description

A kind of data aging method based on hive, device and equipment
Technical field
The present invention relates to data aging processing technology fields, more particularly to a kind of data aging method based on hive. The invention further relates to a kind of data aging device, equipment and computer readable storage medium based on hive.
Background technique
Hive is a Tool for Data Warehouse based on Hadoop, the data file of structuring can be mapped as a number According to library table, and simple sql query function is provided, sql sentence can be converted to distributed computing task execution.
Currently, hive for big data processing when, since big data itself has the characteristics that amount of storage is huge, especially In many actual production systems, there are largely newly-increased data daily, the data increasingly expanded have the storage resource of system huge Big challenge.When the physical disk that distributed file system is relied on takes, not only increasing data newly can not be written, also, former The calculating of some data needs to occupy certain disk space due to being related to the problem of temporary file generates, and also causes by very Big influence.
Currently, when memory space takes, it, can only be by manually selecting storage in the case where not can increase hardware resource Table carries out drop (a kind of delete operation) and carrys out Free up Memory, but the distributed file system as big data warehouse is come It says, selects which table to remove drop by hand from mass data table, not only screen more complicated, and it is possible to accidentally delete and also use Data, accuracy rate and reliability are lower.
Therefore, how to provide a kind of data aging method based on hive for being able to solve the above problem is art technology The current problem to be solved of personnel.
Summary of the invention
The object of the present invention is to provide a kind of data aging method based on hive, can be realized to the data field hive from Dynamic scanning and aging judgement and the deletion in aging area, it is high-efficient, and accuracy and reliability is higher;Another object of the present invention It is to provide a kind of device based on the above method, equipment and computer readable storage medium.
In order to solve the above technical problems, the present invention provides a kind of data aging methods based on hive, comprising:
When receiving the read and write access for the data field of hive, refresh the read and write access in the metadatabase of hive The access time of corresponding data field records;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, when by accessing Between record meet the data field of default aging condition as aging area, and delete the content in the aging area.
Preferably, the data field is specially the tables of data subregion of hive.
Preferably, the default condition of scanning includes timing;Whether the judgement currently meets the default condition of scanning Process include:
Judge whether current time meets the timing, if so, meeting the default condition of scanning, otherwise, no Meet the default condition of scanning.
Preferably, the default condition of scanning includes that the occupancy of the memory space of distributed system locating for hive reaches Default high load threshold value;It is described to judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
Preferably, the default aging condition includes recording the access time away from current time beyond preset time threshold Value.
Preferably, the default aging condition includes that the access time record belongs to distance in whole access time records Farthest K record of current time.
Preferably, the access time record in the total data area of the scanning hive, it is default will to record satisfaction access time The data field of aging condition is as aging area, and the process for deleting the content in the aging area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting described pre- If aging condition, then it is aging area that the access time of the Current Scan, which records corresponding data field, and records the aging area Mark, continue to scan on next access time record;Otherwise, next access time record is directly scanned, until whole numbers It is finished according to the access time writing scan in area;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area Content.
In order to solve the above technical problems, the present invention also provides a kind of data aging device based on hive, comprising:
Time recording module, when for receiving the read and write access for the data field of hive, in the metadatabase of hive The interior access time record for refreshing the corresponding data field of the read and write access;
Judgment module is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning, Trigger aging judgment module;
The aging judgment module, the access time record in the total data area for then scanning hive, by access time Record meets the data field of default aging condition as aging area, and deletes the content in the aging area.
In order to solve the above technical problems, the present invention also provides a kind of data aging equipment based on hive, comprising:
Memory, for storing computer program;
Processor realizes that the data based on hive as described in any of the above item are old when for executing the computer program The step of change method.
In order to solve the above technical problems, the computer can the present invention also provides a kind of computer readable storage medium It reads to be stored with computer program on storage medium, be realized as described in any of the above item when the computer program is executed by processor The data aging method based on hive the step of.
The present invention provides a kind of data aging methods based on hive, visit receiving the read-write for the data field hive When asking, refreshes the access time record of data field corresponding to the read and write access in the metadatabase of hive first, that is, exist Preserve the access time record of each data in the metadatabase of hive, and access time record with read and write access into Row refreshes in real time.The subsequent present invention can be scanned and judge to record of whole access times when meeting the default condition of scanning Whether the access time record of Current Scan meets default aging condition, if satisfied, the access time is then recorded corresponding number According to area as aging area, and the content in the subsequent data field deleted and be judged as aging area.As it can be seen that the present invention is pre- by setting If the condition of scanning presets aging condition and access time record is arranged for each data field, can be realized to the data field hive Automatically scanning and aging judgement, and to being judged as that the data zone content in aging area deletes, whole process is not needed artificially Participation, compared to the method for artificial screening, workload is small, high-efficient, and is not easy mistake occur, and accuracy and reliability is more It is high.The present invention also provides a kind of device based on the above method, equipment and computer readable storage mediums, no longer superfluous herein It states.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow chart of the process of the data aging method based on hive provided by the invention;
Fig. 2 is the flow chart of the process of another data aging method based on hive provided by the invention;
Fig. 3 is the flow chart of the process of another data aging method based on hive provided by the invention;
Fig. 4 is a kind of structural schematic diagram of the data aging device based on hive provided by the invention.
Specific embodiment
Core of the invention is to provide a kind of data aging method based on hive, can be realized to the data field hive from Dynamic scanning and aging judgement and the deletion in aging area, it is high-efficient, and accuracy and reliability is higher;Another object of the present invention It is to provide a kind of device based on the above method, equipment and computer readable storage medium.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Shown in Figure 1 the present invention provides a kind of data aging method based on hive, Fig. 1 is provided by the invention A kind of flow chart of the process of the data aging method based on hive;This method comprises:
Step s1: when receiving the read and write access for the data field of hive, refresh read-write in the metadatabase of hive Access the access time record of corresponding data field;
It is understood that the data field of hive is substantially a mesh of distributed file system (usually HDFS) Record, access time record here is it can be appreciated that data access time stamp, is for reflecting that each data field is newest primary By the time accessed.Certainly, due to the read-write interface of current distributed file system and when not including above-mentioned refreshing access Between the function that records, certain transformation is carried out therefore, it is necessary to the read-write interface to current distributed file system, is specifically changed The mode present invention of making is without limitation.
Step s2: whether judgement currently meets the default condition of scanning, if meeting the default condition of scanning, enters step s3;If It is unsatisfactory for, then repeats this step operation;
It is understood that the data field quantity of hive is very more, corresponding access for the application scenarios of big data Time record is also very more.Therefore, if can bring very big burden to system if being scanned in real time, can also account for It with long time, and is not that constantly can all generate aging data to be deleted in the data field of hive yet.Therefore, pass through The default condition of scanning is set, is limited the condition as one, so that only just can in the case where meeting the default condition of scanning Single pass is carried out, so that Free up Memory is carried out in the region for deleting aging in a data field hive.This mode can meet Under the premise of the purpose for being automatically deleted aging region, scanning is reduced as far as possible and judges that the process in aging area is brought to system Burden.
Step s3: scanning the access time record in the total data area of hive, will record satisfaction default aging access time The content in the aging area is deleted as aging area in the data field of condition.
It is understood that show is when its corresponding data field the last time receiving access to access time record Between, therefore, access time recording distance is currently closer, then show the access time record corresponding data field by access when Between distance it is currently closer, i.e., the data at present in the data field are more active, this partial data is also in use.Therefore, pass through Judge whether access time record meets default aging condition, i.e., enough data filtered out in any partial data area to a certain extent It is still using, which partial data area does not receive access for a long time, to realize the examination whether aging of data field.
A kind of data aging method based on hive provided by the invention is visited receiving the read-write for the data field hive When asking, refreshes the access time record of data field corresponding to the read and write access in the metadatabase of hive first, that is, exist Preserve the access time record of each data in the metadatabase of hive, and access time record with read and write access into Row refreshes in real time.The subsequent present invention can be scanned and judge to record of whole access times when meeting the default condition of scanning Whether the access time record of Current Scan meets default aging condition, if satisfied, the access time is then recorded corresponding number According to area as aging area, and the content in the subsequent data field deleted and be judged as aging area.As it can be seen that the present invention is pre- by setting If the condition of scanning presets aging condition and access time record is arranged for each data field, can be realized to the data field hive Automatically scanning and aging judgement, and to being judged as that the data zone content in aging area deletes, whole process is not needed artificially Participation, compared to the method for artificial screening, workload is small, high-efficient, and is not easy mistake occur, and accuracy and reliability is more It is high.
Wherein, shown in Figure 2, Fig. 2 is the process of another data aging method based on hive provided by the invention Flow chart.This process of the content in deletion aging area can be after one data field of every determination is aging area, delete immediately The content in aging area: the i.e. content of step s3 are as follows:
Step s311: whether current judgement includes the access time record not scanned, if so, entering step s312;If it is not, The then end of scan;
Step s312: the access time record not scanned before is scanned, and judges the access time of Current Scan Record whether corresponding data field meets default aging condition, if meeting default aging condition, the access time of Current Scan Recording corresponding data field is aging area, enters step s41;Otherwise, return step s311;
It successively scans and judges whether the access time record of Current Scan meets default aging condition.
Step s41: deleting the content in aging area, can return step s311.
It is understood that in the present embodiment, in one access time record of every scanning, if the access time records Meet default aging condition, then deleting the access time immediately records content in corresponding data field, is further continued for scanning later Next access time record.Using it is this delete in real time by the way of if, if suddenly by certain reasons in scanning process Cause to interrupt, can not continue to scan on, then the content in aging area scanned before also has been completed deletion, realizes part The space in aging area discharges.Scanning process before making in this way is not wasted, even if subsequent be no longer scanned, system is deposited A degree of release has also been obtained in storage space, can permit subsequent normal operation.Therefore, this side for deleting aging area Formula can be improved the reliability of deletion.
In another embodiment, shown in Figure 3, Fig. 3 is another data aging based on hive provided by the invention The flow chart of the process of method.This process of the content in deletion aging area is also possible to: in the scanning of record of whole access times After the completion, unified to delete the whole aging areas for scanning and.That is the content of step s3 are as follows:
Step s321: whether current judgement includes the access time record not scanned, if so, entering step s322;If it is not, Enter step s42.
Step s322: the access time record not scanned before is scanned, and judges the access time of Current Scan It records whether corresponding data field meets default aging condition, if meeting default aging condition, enters step s323;Otherwise, it returns Return step s321;
Step s323: it is aging area that the access time of Current Scan, which records corresponding data field, and records the aging area Mark;Return step s321 later;
Step s42: the whole of deletion record identifies the content of corresponding data field.
It can be understood that being not to be deleted immediately to it after every scanning to an aging area in the present embodiment It removes, but only records the mark in aging area, continue to scan on next access time record later, remember until whole access times Record it is scanned after, just whole aging areas data of record can uniformly be deleted.This mode does not need every scanning to one Enter interrupt routine behind a aging area to go to delete the content in aging area, the continuity of program is stronger.Also, it is old compared to one Change the mode that aging area, one, area is individually deleted, this unified mode deleted of the present embodiment, in multiple aging areas to be deleted In the continuous situation in position, it is only necessary to which whole deletion, deletion efficiency can be realized in the beginning and end address for finding this continuum It is higher.
In other embodiments, when recording aging area, the mark in aging area can not also be recorded, but in aging area Addition label, subsequent step s42 adjustment are as follows: delete the total data area comprising label.Certainly, in other examples, also Other modes can be used to record which data field as aging area, this is not limited by the present invention.
Preferably, data field is specially the tables of data subregion of hive.
It is understood that data are stored in tables of data in the application scenarios of hive, but a tables of data Very more data are usually contained, such as include some 1 year data of project.Since this data volume is very big, because This, often includes the data and active data of aging in a tables of data simultaneously.Therefore, it directly deletes in a tables of data Data or retain total data in a tables of data, be not very suitable.Therefore preferably by data field in the present embodiment It is limited to tables of data subregion, tables of data subregion here refers to a part of content in tables of data, for example, can be some The data of the moon or some day.In this case, so that the data volume in each data field is smaller, it is subsequent judge whether delete When be also relatively easy to.Certainly, the above is only a kind of preferred embodiment, data field specifically may be hive tables of data, this hair It is bright not limit this.
Wherein, judge whether that the operation for meeting the default condition of scanning can be carried out periodically, such as carried out every 1s, or Time is shorter (can think real-time perfoming), and certainly, the present invention does not limit the progress period of above-mentioned judgement operation, does not also limit It is above-mentioned to judge whether operation periodically carries out.
Preferably, the default condition of scanning includes timing;Whether judgement currently meets the mistake of the default condition of scanning Journey includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning The condition of scanning.
It is understood that in the present embodiment, using the mode of timing scan.Here timing can be set Some or certain several fixed times are set to, when reaching the fixed time at current time, i.e. driver sweep.For example, this In fixed time can be 12 noon.Certainly, the present invention does not limit the specific value of fixed time in this case.Separately Outside, timing here can also be the scan period, for example, the scan period can be set to four hours, judgement is worked as at this time Whether the preceding moment meets timing process, as judges whether the time reaches at the time of distance current time, driver sweep last time To the scan period, if reaching, meet the default condition of scanning.The present invention does not limit the specific value of scan period yet.Above only For two kinds of specific embodiments, it is only for showing the mode in the present embodiment using timing scan, when timing here Between the particular content present invention without limitation.
In another preferred embodiment, the default condition of scanning includes accounting for for the memory space of distributed system locating for hive Reach default high load threshold value with rate;Judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
It is understood that the case where being the occupancy according to the memory space of distributed system in the present embodiment, triggers Whether be scanned, i.e., when memory space, which faces storage pressure, needs Free up Memory, then driver sweep.Here pre- If height carries the higher threshold value that threshold value refers to the occupancy of memory space.When the occupancy of memory space has reached default When height carries threshold value, then show that memory space is discharged at this time.Here the default high threshold value that carries preferably is not configured to percentage Hundred because show that memory space is not available completely at this time when the occupancy of memory space reaches a hundred percent, but It is to need the regular hour due to scanning, in this case during the scanning process, subsequent data can not be saved, thus The normal use of influence system.It is therefore preferable that be scanned when the occupancy of memory space is not up to a hundred percent, though So the occupancy of memory space is very high at this time, but still may be used in scanning process, to improve distributed system Reliability.Certainly, the present invention does not limit default high load threshold value specific value.
Above several default conditions of scanning are only several preferred embodiments, in specific application, can be incited somebody to action above several pre- If the condition of scanning is used in combination, or can also be arranged other default conditions of scanning according to demand, this is not limited by the present invention.
In addition, the operation for judging whether to meet the default condition of scanning can be with real-time perfoming, or according to predetermined period, period It carries out to property, this is not limited by the present invention.
Preferably, default aging condition includes recording access time away from current time beyond preset time threshold.
It is understood that since show is the data field the last time by the time accessed to access time record. Therefore, when access time, time of the record away from current time had exceeded preset time threshold, that is, show that the access time records Corresponding data field does not receive and has accessed for a long time, and the data in the data field belong to inactive data, need into Row is deleted.Otherwise the data in the data field, which belong to alive data, to delete.This judgment mode can be upper intuitive from the time The each data field of reaction in data active degree, avoid deleting alive data, the accuracy of deletion is higher.Wherein, The numerical value of above-mentioned preset time threshold can be set according to the actual situation, and this is not limited by the present invention.
In another preferred embodiment, default aging condition includes that access time record belongs to whole access time records The middle K record farthest apart from current time.
It is understood that the deletion mode of use is by maximum duration in record of whole access times in the present embodiment The K record not accessed is deleted.This mode, overabundance of data in systems need to delete, but total data area Access time record when the time is shorter apart from current time, be more applicable in.Because in this case, it is necessary to portion Data in divided data area are deleted.Therefore, the present embodiment found from these data fields most sluggish K data field into Row is deleted.As long as this mode can guarantee the overabundance of data in system under any circumstance, corresponding data can be found Area is deleted, to realize the release in space, is avoided as much as possible the case where space does not discharge after the completion of scanning appearance, space The reliability of the operation of release is higher.
Above several default aging conditions are only several preferred embodiments, in specific application, can be incited somebody to action above several pre- If aging condition is used in combination, or can also be arranged other default aging conditions according to demand, this is not limited by the present invention.
The present invention also provides a kind of data aging device based on hive, shown in Figure 4, Fig. 4 provides for the present invention A kind of data aging device based on hive structural schematic diagram.The device includes:
Time recording module 1, when for receiving the read and write access for the data field of hive, in the metadatabase of hive The interior access time record for refreshing the corresponding data field of read and write access;
Judgment module 2 is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning, touching Send out aging judgment module 3;
Aging judgment module 3, the access time record in the total data area for scanning hive, will record full access time The content in the aging area is deleted as aging area in the data field of the default aging condition of foot.
Data aging device provided by the invention based on hive is for realizing the data aging side above based on hive Method, therefore, the data aging device and the aforementioned data aging method one-to-one correspondence based on hive based on hive here.
The data aging equipment based on hive that the present invention also provides a kind of, comprising:
Memory, for storing computer program;
Processor realizes following methods when for executing computer program:
When receiving the read and write access for the data field of hive, it is corresponding to refresh read and write access in the metadatabase of hive Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time Record meets the data field of default aging condition as aging area, and deletes the content in aging area.
In a preferred embodiment, processor executes the tables of data point that data field corresponding to the above method is specially hive Area.
In a preferred embodiment, when the default condition of scanning in the computer program of the memory storage includes timing Between;Judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning The condition of scanning.
In another preferred embodiment, the default condition of scanning in the computer program of the memory storage includes hive The occupancy of the memory space of locating distributed system reaches default high load threshold value;Whether judgement currently meets default scan stripes The process of part includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
In a preferred embodiment, when the default aging condition in the computer program of the memory storage includes access Between record away from current time exceed preset time threshold.
In another preferred embodiment, the default aging condition in the computer program of the memory storage includes access Time record belongs in whole access time records apart from the K record that current time is farthest.
In another preferred embodiment, the processor executes the total data that computer program executes above-mentioned scanning hive The access time in area records, and access time record is met the data field for presetting aging condition as aging area, and delete aging The process of the content in area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting default old Change condition, then it is aging area that the access time of Current Scan, which records corresponding data field, and records the mark in aging area, continues to sweep Retouch next access time record;Otherwise, next access time record is directly scanned, until the access time in total data area Writing scan finishes;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area Content.
The present invention also provides a kind of computer readable storage medium, computer is stored on computer readable storage medium Program realizes following methods when computer program is executed by processor:
When receiving the read and write access for the data field of hive, it is corresponding to refresh read and write access in the metadatabase of hive Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time Record meets the data field of default aging condition as aging area, and deletes the content in aging area.
In a preferred embodiment, processor executes the tables of data point that data field corresponding to the above method is specially hive Area.
In a preferred embodiment, the default condition of scanning in the computer program includes timing;Judgement is current Whether the process of the default condition of scanning of satisfaction includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning The condition of scanning.
In another preferred embodiment, the default condition of scanning in the computer program includes distribution locating for hive The occupancy of the memory space of system reaches default high load threshold value;Whether judgement currently meets the process packet of the default condition of scanning It includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
In a preferred embodiment, the default aging condition in the computer program includes access time record away from current Moment exceeds preset time threshold.
In another preferred embodiment, the default aging condition in the computer program includes that access time record belongs to Apart from the K record that current time is farthest in whole access time records.
In another preferred embodiment, the processor executes the total data that computer program executes above-mentioned scanning hive The access time in area records, and access time record is met the data field for presetting aging condition as aging area, and delete aging The process of the content in area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting default old Change condition, then it is aging area that the access time of Current Scan, which records corresponding data field, and records the mark in aging area, continues to sweep Retouch next access time record;Otherwise, next access time record is directly scanned, until the access time in total data area Writing scan finishes;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area Content.
Above several specific embodiments are only the preferred embodiment of the present invention, and above several specific embodiments can be with Any combination, the embodiment obtained after combination is also within protection scope of the present invention.It should be pointed out that for the art For those of ordinary skill, relevant speciality technical staff deduced out in the case where not departing from spirit of that invention and concept thereof other change Into and variation, should all be included in the protection scope of the present invention.
It should also be noted that, in the present specification, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims (10)

1. a kind of data aging method based on hive characterized by comprising
When receiving the read and write access for the data field of hive, it is corresponding to refresh the read and write access in the metadatabase of hive Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time Record meets the data field of default aging condition as aging area, and deletes the content in the aging area.
2. the data aging method according to claim 1 based on hive, which is characterized in that the data field is specially The tables of data subregion of hive.
3. the data aging method according to claim 1 based on hive, which is characterized in that the default condition of scanning packet Include timing;It is described to judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether current time meets the timing, is otherwise unsatisfactory for if so, meeting the default condition of scanning The default condition of scanning.
4. the data aging method according to claim 1 based on hive, which is characterized in that the default condition of scanning packet The occupancy for including the memory space of distributed system locating for hive reaches default high load threshold value;Whether the judgement currently meets The process of the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if so, Then meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
5. the data aging method according to claim 1 based on hive, which is characterized in that the default aging condition packet The access time record is included away from current time beyond preset time threshold.
6. the data aging method according to claim 1 based on hive, which is characterized in that the default aging condition packet The access time record is included to belong in whole access time records apart from the K record that current time is farthest.
7. the data aging method according to claim 1-6 based on hive, which is characterized in that the scanning The access time in the total data area of hive records, and access time record is met the data field of default aging condition as aging Area, and the process for deleting the content in the aging area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting described default old Change condition, then it is aging area that the access time of the Current Scan, which records corresponding data field, and records the mark in the aging area Know, continues to scan on next access time record;Otherwise, next access time record is directly scanned, until total data area Access time writing scan finish;
After the access time writing scan in total data area, the whole of unified deletion record identifies the interior of corresponding aging area Hold.
8. a kind of data aging device based on hive characterized by comprising
Time recording module when for receiving the read and write access for the data field of hive, is brushed in the metadatabase of hive The newly access time record of the corresponding data field of the read and write access;
Judgment module is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning, triggering Aging judgment module;
The aging judgment module, the access time record in the total data area for scanning hive, will record full access time The content in the aging area is deleted as aging area in the data field of the default aging condition of foot.
9. a kind of data aging equipment based on hive characterized by comprising
Memory, for storing computer program;
Processor realizes the number as described in any one of claim 1 to 7 based on hive when for executing the computer program The step of according to aging method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize the data as described in any one of claim 1 to 7 based on hive when being executed by processor The step of aging method.
CN201811346834.2A 2018-11-13 2018-11-13 A kind of data aging method based on hive, device and equipment Pending CN109460411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811346834.2A CN109460411A (en) 2018-11-13 2018-11-13 A kind of data aging method based on hive, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811346834.2A CN109460411A (en) 2018-11-13 2018-11-13 A kind of data aging method based on hive, device and equipment

Publications (1)

Publication Number Publication Date
CN109460411A true CN109460411A (en) 2019-03-12

Family

ID=65610265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811346834.2A Pending CN109460411A (en) 2018-11-13 2018-11-13 A kind of data aging method based on hive, device and equipment

Country Status (1)

Country Link
CN (1) CN109460411A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457300A (en) * 2019-07-15 2019-11-15 中国平安人寿保险股份有限公司 A kind of method for cleaning and device, electronic equipment in common test library
CN113434492A (en) * 2021-06-21 2021-09-24 青岛海尔科技有限公司 Data detection method and device, storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581100A (en) * 2003-07-31 2005-02-16 华为技术有限公司 Data aging method for network processor
CN105354193A (en) * 2014-08-19 2016-02-24 阿里巴巴集团控股有限公司 Caching method, query method, caching apparatus and query apparatus for database data
CN106354779A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Data management system for trunking architecture
CN107562889A (en) * 2017-09-05 2018-01-09 郑州云海信息技术有限公司 A kind of metadata aging method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581100A (en) * 2003-07-31 2005-02-16 华为技术有限公司 Data aging method for network processor
CN105354193A (en) * 2014-08-19 2016-02-24 阿里巴巴集团控股有限公司 Caching method, query method, caching apparatus and query apparatus for database data
CN106354779A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Data management system for trunking architecture
CN107562889A (en) * 2017-09-05 2018-01-09 郑州云海信息技术有限公司 A kind of metadata aging method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457300A (en) * 2019-07-15 2019-11-15 中国平安人寿保险股份有限公司 A kind of method for cleaning and device, electronic equipment in common test library
CN110457300B (en) * 2019-07-15 2024-02-02 中国平安人寿保险股份有限公司 Method and device for cleaning public test library and electronic equipment
CN113434492A (en) * 2021-06-21 2021-09-24 青岛海尔科技有限公司 Data detection method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US8954699B1 (en) Techniques for identifying IO hot spots using range-lock information
US8782324B1 (en) Techniques for managing placement of extents based on a history of active extents
CN104978361B (en) Method and device for storing real-time monitoring data of power environment
CN105549905B (en) A kind of method that multi-dummy machine accesses distributed objects storage system
CN105677240B (en) Data-erasure method and system
CN105302478B (en) A kind of date storage method and electronic equipment
CA2442188A1 (en) Methods and mechanisms for proactive memory management
CN103440207A (en) Caching method and caching device
CN108334284A (en) Tail delay perception foreground garbage collection algorithm
CN104769520A (en) System and method for dynamic memory power management
CN109460411A (en) A kind of data aging method based on hive, device and equipment
CN106227621A (en) The data back up method of logic-based volume management simplification volume and system
US20170123975A1 (en) Centralized distributed systems and methods for managing operations
CN107193494A (en) RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system
CN101645802B (en) Method and device for controlling contents
US11829377B2 (en) Efficient storage method for time series data
CN105528274B (en) A kind of disk monitoring method and system that optimization accelerates
US20100058020A1 (en) Mobile phone and method for managing memory of the mobile phone
US8352398B2 (en) Time-based conflict resolution
CN107179883A (en) Spark architecture optimization method of hybrid storage system based on SSD and HDD
CN104050100B (en) A kind of data flow memory management method and system suitable for big data environment
CN106021124B (en) A kind of storage method and storage system of data
CN113867641B (en) Host memory buffer management method and device and solid state disk
CN112988513B (en) Method, device, equipment and medium for managing server hard disk information
CN114676127A (en) Server service analysis method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190312

RJ01 Rejection of invention patent application after publication