CN109460411A - A kind of data aging method based on hive, device and equipment - Google Patents
A kind of data aging method based on hive, device and equipment Download PDFInfo
- Publication number
- CN109460411A CN109460411A CN201811346834.2A CN201811346834A CN109460411A CN 109460411 A CN109460411 A CN 109460411A CN 201811346834 A CN201811346834 A CN 201811346834A CN 109460411 A CN109460411 A CN 109460411A
- Authority
- CN
- China
- Prior art keywords
- aging
- hive
- data
- default
- scanning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000032683 aging Effects 0.000 title claims abstract description 170
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000012217 deletion Methods 0.000 claims abstract description 16
- 230000037430 deletion Effects 0.000 claims abstract description 16
- 238000003860 storage Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims description 32
- 238000004590 computer program Methods 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 6
- 230000005055 memory storage Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data aging methods based on hive, comprising: when receiving the read and write access for the data field of hive, refreshes the access time record of the corresponding data field of read and write access in the metadatabase of hive;Whether judgement currently meets the default condition of scanning;If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, using the data field of the access time default aging condition of record satisfaction as aging area, and deletes the content in aging area.The present invention can be realized to the automatically scanning of the data field hive and aging judgement and the deletion in aging area, high-efficient, and accuracy and reliability is higher;It is a further object of the present invention to provide a kind of device based on the above method, equipment and computer readable storage mediums.
Description
Technical field
The present invention relates to data aging processing technology fields, more particularly to a kind of data aging method based on hive.
The invention further relates to a kind of data aging device, equipment and computer readable storage medium based on hive.
Background technique
Hive is a Tool for Data Warehouse based on Hadoop, the data file of structuring can be mapped as a number
According to library table, and simple sql query function is provided, sql sentence can be converted to distributed computing task execution.
Currently, hive for big data processing when, since big data itself has the characteristics that amount of storage is huge, especially
In many actual production systems, there are largely newly-increased data daily, the data increasingly expanded have the storage resource of system huge
Big challenge.When the physical disk that distributed file system is relied on takes, not only increasing data newly can not be written, also, former
The calculating of some data needs to occupy certain disk space due to being related to the problem of temporary file generates, and also causes by very
Big influence.
Currently, when memory space takes, it, can only be by manually selecting storage in the case where not can increase hardware resource
Table carries out drop (a kind of delete operation) and carrys out Free up Memory, but the distributed file system as big data warehouse is come
It says, selects which table to remove drop by hand from mass data table, not only screen more complicated, and it is possible to accidentally delete and also use
Data, accuracy rate and reliability are lower.
Therefore, how to provide a kind of data aging method based on hive for being able to solve the above problem is art technology
The current problem to be solved of personnel.
Summary of the invention
The object of the present invention is to provide a kind of data aging method based on hive, can be realized to the data field hive from
Dynamic scanning and aging judgement and the deletion in aging area, it is high-efficient, and accuracy and reliability is higher;Another object of the present invention
It is to provide a kind of device based on the above method, equipment and computer readable storage medium.
In order to solve the above technical problems, the present invention provides a kind of data aging methods based on hive, comprising:
When receiving the read and write access for the data field of hive, refresh the read and write access in the metadatabase of hive
The access time of corresponding data field records;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, when by accessing
Between record meet the data field of default aging condition as aging area, and delete the content in the aging area.
Preferably, the data field is specially the tables of data subregion of hive.
Preferably, the default condition of scanning includes timing;Whether the judgement currently meets the default condition of scanning
Process include:
Judge whether current time meets the timing, if so, meeting the default condition of scanning, otherwise, no
Meet the default condition of scanning.
Preferably, the default condition of scanning includes that the occupancy of the memory space of distributed system locating for hive reaches
Default high load threshold value;It is described to judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if
It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
Preferably, the default aging condition includes recording the access time away from current time beyond preset time threshold
Value.
Preferably, the default aging condition includes that the access time record belongs to distance in whole access time records
Farthest K record of current time.
Preferably, the access time record in the total data area of the scanning hive, it is default will to record satisfaction access time
The data field of aging condition is as aging area, and the process for deleting the content in the aging area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting described pre-
If aging condition, then it is aging area that the access time of the Current Scan, which records corresponding data field, and records the aging area
Mark, continue to scan on next access time record;Otherwise, next access time record is directly scanned, until whole numbers
It is finished according to the access time writing scan in area;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area
Content.
In order to solve the above technical problems, the present invention also provides a kind of data aging device based on hive, comprising:
Time recording module, when for receiving the read and write access for the data field of hive, in the metadatabase of hive
The interior access time record for refreshing the corresponding data field of the read and write access;
Judgment module is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning,
Trigger aging judgment module;
The aging judgment module, the access time record in the total data area for then scanning hive, by access time
Record meets the data field of default aging condition as aging area, and deletes the content in the aging area.
In order to solve the above technical problems, the present invention also provides a kind of data aging equipment based on hive, comprising:
Memory, for storing computer program;
Processor realizes that the data based on hive as described in any of the above item are old when for executing the computer program
The step of change method.
In order to solve the above technical problems, the computer can the present invention also provides a kind of computer readable storage medium
It reads to be stored with computer program on storage medium, be realized as described in any of the above item when the computer program is executed by processor
The data aging method based on hive the step of.
The present invention provides a kind of data aging methods based on hive, visit receiving the read-write for the data field hive
When asking, refreshes the access time record of data field corresponding to the read and write access in the metadatabase of hive first, that is, exist
Preserve the access time record of each data in the metadatabase of hive, and access time record with read and write access into
Row refreshes in real time.The subsequent present invention can be scanned and judge to record of whole access times when meeting the default condition of scanning
Whether the access time record of Current Scan meets default aging condition, if satisfied, the access time is then recorded corresponding number
According to area as aging area, and the content in the subsequent data field deleted and be judged as aging area.As it can be seen that the present invention is pre- by setting
If the condition of scanning presets aging condition and access time record is arranged for each data field, can be realized to the data field hive
Automatically scanning and aging judgement, and to being judged as that the data zone content in aging area deletes, whole process is not needed artificially
Participation, compared to the method for artificial screening, workload is small, high-efficient, and is not easy mistake occur, and accuracy and reliability is more
It is high.The present invention also provides a kind of device based on the above method, equipment and computer readable storage mediums, no longer superfluous herein
It states.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow chart of the process of the data aging method based on hive provided by the invention;
Fig. 2 is the flow chart of the process of another data aging method based on hive provided by the invention;
Fig. 3 is the flow chart of the process of another data aging method based on hive provided by the invention;
Fig. 4 is a kind of structural schematic diagram of the data aging device based on hive provided by the invention.
Specific embodiment
Core of the invention is to provide a kind of data aging method based on hive, can be realized to the data field hive from
Dynamic scanning and aging judgement and the deletion in aging area, it is high-efficient, and accuracy and reliability is higher;Another object of the present invention
It is to provide a kind of device based on the above method, equipment and computer readable storage medium.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Shown in Figure 1 the present invention provides a kind of data aging method based on hive, Fig. 1 is provided by the invention
A kind of flow chart of the process of the data aging method based on hive;This method comprises:
Step s1: when receiving the read and write access for the data field of hive, refresh read-write in the metadatabase of hive
Access the access time record of corresponding data field;
It is understood that the data field of hive is substantially a mesh of distributed file system (usually HDFS)
Record, access time record here is it can be appreciated that data access time stamp, is for reflecting that each data field is newest primary
By the time accessed.Certainly, due to the read-write interface of current distributed file system and when not including above-mentioned refreshing access
Between the function that records, certain transformation is carried out therefore, it is necessary to the read-write interface to current distributed file system, is specifically changed
The mode present invention of making is without limitation.
Step s2: whether judgement currently meets the default condition of scanning, if meeting the default condition of scanning, enters step s3;If
It is unsatisfactory for, then repeats this step operation;
It is understood that the data field quantity of hive is very more, corresponding access for the application scenarios of big data
Time record is also very more.Therefore, if can bring very big burden to system if being scanned in real time, can also account for
It with long time, and is not that constantly can all generate aging data to be deleted in the data field of hive yet.Therefore, pass through
The default condition of scanning is set, is limited the condition as one, so that only just can in the case where meeting the default condition of scanning
Single pass is carried out, so that Free up Memory is carried out in the region for deleting aging in a data field hive.This mode can meet
Under the premise of the purpose for being automatically deleted aging region, scanning is reduced as far as possible and judges that the process in aging area is brought to system
Burden.
Step s3: scanning the access time record in the total data area of hive, will record satisfaction default aging access time
The content in the aging area is deleted as aging area in the data field of condition.
It is understood that show is when its corresponding data field the last time receiving access to access time record
Between, therefore, access time recording distance is currently closer, then show the access time record corresponding data field by access when
Between distance it is currently closer, i.e., the data at present in the data field are more active, this partial data is also in use.Therefore, pass through
Judge whether access time record meets default aging condition, i.e., enough data filtered out in any partial data area to a certain extent
It is still using, which partial data area does not receive access for a long time, to realize the examination whether aging of data field.
A kind of data aging method based on hive provided by the invention is visited receiving the read-write for the data field hive
When asking, refreshes the access time record of data field corresponding to the read and write access in the metadatabase of hive first, that is, exist
Preserve the access time record of each data in the metadatabase of hive, and access time record with read and write access into
Row refreshes in real time.The subsequent present invention can be scanned and judge to record of whole access times when meeting the default condition of scanning
Whether the access time record of Current Scan meets default aging condition, if satisfied, the access time is then recorded corresponding number
According to area as aging area, and the content in the subsequent data field deleted and be judged as aging area.As it can be seen that the present invention is pre- by setting
If the condition of scanning presets aging condition and access time record is arranged for each data field, can be realized to the data field hive
Automatically scanning and aging judgement, and to being judged as that the data zone content in aging area deletes, whole process is not needed artificially
Participation, compared to the method for artificial screening, workload is small, high-efficient, and is not easy mistake occur, and accuracy and reliability is more
It is high.
Wherein, shown in Figure 2, Fig. 2 is the process of another data aging method based on hive provided by the invention
Flow chart.This process of the content in deletion aging area can be after one data field of every determination is aging area, delete immediately
The content in aging area: the i.e. content of step s3 are as follows:
Step s311: whether current judgement includes the access time record not scanned, if so, entering step s312;If it is not,
The then end of scan;
Step s312: the access time record not scanned before is scanned, and judges the access time of Current Scan
Record whether corresponding data field meets default aging condition, if meeting default aging condition, the access time of Current Scan
Recording corresponding data field is aging area, enters step s41;Otherwise, return step s311;
It successively scans and judges whether the access time record of Current Scan meets default aging condition.
Step s41: deleting the content in aging area, can return step s311.
It is understood that in the present embodiment, in one access time record of every scanning, if the access time records
Meet default aging condition, then deleting the access time immediately records content in corresponding data field, is further continued for scanning later
Next access time record.Using it is this delete in real time by the way of if, if suddenly by certain reasons in scanning process
Cause to interrupt, can not continue to scan on, then the content in aging area scanned before also has been completed deletion, realizes part
The space in aging area discharges.Scanning process before making in this way is not wasted, even if subsequent be no longer scanned, system is deposited
A degree of release has also been obtained in storage space, can permit subsequent normal operation.Therefore, this side for deleting aging area
Formula can be improved the reliability of deletion.
In another embodiment, shown in Figure 3, Fig. 3 is another data aging based on hive provided by the invention
The flow chart of the process of method.This process of the content in deletion aging area is also possible to: in the scanning of record of whole access times
After the completion, unified to delete the whole aging areas for scanning and.That is the content of step s3 are as follows:
Step s321: whether current judgement includes the access time record not scanned, if so, entering step s322;If it is not,
Enter step s42.
Step s322: the access time record not scanned before is scanned, and judges the access time of Current Scan
It records whether corresponding data field meets default aging condition, if meeting default aging condition, enters step s323;Otherwise, it returns
Return step s321;
Step s323: it is aging area that the access time of Current Scan, which records corresponding data field, and records the aging area
Mark;Return step s321 later;
Step s42: the whole of deletion record identifies the content of corresponding data field.
It can be understood that being not to be deleted immediately to it after every scanning to an aging area in the present embodiment
It removes, but only records the mark in aging area, continue to scan on next access time record later, remember until whole access times
Record it is scanned after, just whole aging areas data of record can uniformly be deleted.This mode does not need every scanning to one
Enter interrupt routine behind a aging area to go to delete the content in aging area, the continuity of program is stronger.Also, it is old compared to one
Change the mode that aging area, one, area is individually deleted, this unified mode deleted of the present embodiment, in multiple aging areas to be deleted
In the continuous situation in position, it is only necessary to which whole deletion, deletion efficiency can be realized in the beginning and end address for finding this continuum
It is higher.
In other embodiments, when recording aging area, the mark in aging area can not also be recorded, but in aging area
Addition label, subsequent step s42 adjustment are as follows: delete the total data area comprising label.Certainly, in other examples, also
Other modes can be used to record which data field as aging area, this is not limited by the present invention.
Preferably, data field is specially the tables of data subregion of hive.
It is understood that data are stored in tables of data in the application scenarios of hive, but a tables of data
Very more data are usually contained, such as include some 1 year data of project.Since this data volume is very big, because
This, often includes the data and active data of aging in a tables of data simultaneously.Therefore, it directly deletes in a tables of data
Data or retain total data in a tables of data, be not very suitable.Therefore preferably by data field in the present embodiment
It is limited to tables of data subregion, tables of data subregion here refers to a part of content in tables of data, for example, can be some
The data of the moon or some day.In this case, so that the data volume in each data field is smaller, it is subsequent judge whether delete
When be also relatively easy to.Certainly, the above is only a kind of preferred embodiment, data field specifically may be hive tables of data, this hair
It is bright not limit this.
Wherein, judge whether that the operation for meeting the default condition of scanning can be carried out periodically, such as carried out every 1s, or
Time is shorter (can think real-time perfoming), and certainly, the present invention does not limit the progress period of above-mentioned judgement operation, does not also limit
It is above-mentioned to judge whether operation periodically carries out.
Preferably, the default condition of scanning includes timing;Whether judgement currently meets the mistake of the default condition of scanning
Journey includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning
The condition of scanning.
It is understood that in the present embodiment, using the mode of timing scan.Here timing can be set
Some or certain several fixed times are set to, when reaching the fixed time at current time, i.e. driver sweep.For example, this
In fixed time can be 12 noon.Certainly, the present invention does not limit the specific value of fixed time in this case.Separately
Outside, timing here can also be the scan period, for example, the scan period can be set to four hours, judgement is worked as at this time
Whether the preceding moment meets timing process, as judges whether the time reaches at the time of distance current time, driver sweep last time
To the scan period, if reaching, meet the default condition of scanning.The present invention does not limit the specific value of scan period yet.Above only
For two kinds of specific embodiments, it is only for showing the mode in the present embodiment using timing scan, when timing here
Between the particular content present invention without limitation.
In another preferred embodiment, the default condition of scanning includes accounting for for the memory space of distributed system locating for hive
Reach default high load threshold value with rate;Judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if
It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
It is understood that the case where being the occupancy according to the memory space of distributed system in the present embodiment, triggers
Whether be scanned, i.e., when memory space, which faces storage pressure, needs Free up Memory, then driver sweep.Here pre-
If height carries the higher threshold value that threshold value refers to the occupancy of memory space.When the occupancy of memory space has reached default
When height carries threshold value, then show that memory space is discharged at this time.Here the default high threshold value that carries preferably is not configured to percentage
Hundred because show that memory space is not available completely at this time when the occupancy of memory space reaches a hundred percent, but
It is to need the regular hour due to scanning, in this case during the scanning process, subsequent data can not be saved, thus
The normal use of influence system.It is therefore preferable that be scanned when the occupancy of memory space is not up to a hundred percent, though
So the occupancy of memory space is very high at this time, but still may be used in scanning process, to improve distributed system
Reliability.Certainly, the present invention does not limit default high load threshold value specific value.
Above several default conditions of scanning are only several preferred embodiments, in specific application, can be incited somebody to action above several pre-
If the condition of scanning is used in combination, or can also be arranged other default conditions of scanning according to demand, this is not limited by the present invention.
In addition, the operation for judging whether to meet the default condition of scanning can be with real-time perfoming, or according to predetermined period, period
It carries out to property, this is not limited by the present invention.
Preferably, default aging condition includes recording access time away from current time beyond preset time threshold.
It is understood that since show is the data field the last time by the time accessed to access time record.
Therefore, when access time, time of the record away from current time had exceeded preset time threshold, that is, show that the access time records
Corresponding data field does not receive and has accessed for a long time, and the data in the data field belong to inactive data, need into
Row is deleted.Otherwise the data in the data field, which belong to alive data, to delete.This judgment mode can be upper intuitive from the time
The each data field of reaction in data active degree, avoid deleting alive data, the accuracy of deletion is higher.Wherein,
The numerical value of above-mentioned preset time threshold can be set according to the actual situation, and this is not limited by the present invention.
In another preferred embodiment, default aging condition includes that access time record belongs to whole access time records
The middle K record farthest apart from current time.
It is understood that the deletion mode of use is by maximum duration in record of whole access times in the present embodiment
The K record not accessed is deleted.This mode, overabundance of data in systems need to delete, but total data area
Access time record when the time is shorter apart from current time, be more applicable in.Because in this case, it is necessary to portion
Data in divided data area are deleted.Therefore, the present embodiment found from these data fields most sluggish K data field into
Row is deleted.As long as this mode can guarantee the overabundance of data in system under any circumstance, corresponding data can be found
Area is deleted, to realize the release in space, is avoided as much as possible the case where space does not discharge after the completion of scanning appearance, space
The reliability of the operation of release is higher.
Above several default aging conditions are only several preferred embodiments, in specific application, can be incited somebody to action above several pre-
If aging condition is used in combination, or can also be arranged other default aging conditions according to demand, this is not limited by the present invention.
The present invention also provides a kind of data aging device based on hive, shown in Figure 4, Fig. 4 provides for the present invention
A kind of data aging device based on hive structural schematic diagram.The device includes:
Time recording module 1, when for receiving the read and write access for the data field of hive, in the metadatabase of hive
The interior access time record for refreshing the corresponding data field of read and write access;
Judgment module 2 is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning, touching
Send out aging judgment module 3;
Aging judgment module 3, the access time record in the total data area for scanning hive, will record full access time
The content in the aging area is deleted as aging area in the data field of the default aging condition of foot.
Data aging device provided by the invention based on hive is for realizing the data aging side above based on hive
Method, therefore, the data aging device and the aforementioned data aging method one-to-one correspondence based on hive based on hive here.
The data aging equipment based on hive that the present invention also provides a kind of, comprising:
Memory, for storing computer program;
Processor realizes following methods when for executing computer program:
When receiving the read and write access for the data field of hive, it is corresponding to refresh read and write access in the metadatabase of hive
Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time
Record meets the data field of default aging condition as aging area, and deletes the content in aging area.
In a preferred embodiment, processor executes the tables of data point that data field corresponding to the above method is specially hive
Area.
In a preferred embodiment, when the default condition of scanning in the computer program of the memory storage includes timing
Between;Judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning
The condition of scanning.
In another preferred embodiment, the default condition of scanning in the computer program of the memory storage includes hive
The occupancy of the memory space of locating distributed system reaches default high load threshold value;Whether judgement currently meets default scan stripes
The process of part includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if
It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
In a preferred embodiment, when the default aging condition in the computer program of the memory storage includes access
Between record away from current time exceed preset time threshold.
In another preferred embodiment, the default aging condition in the computer program of the memory storage includes access
Time record belongs in whole access time records apart from the K record that current time is farthest.
In another preferred embodiment, the processor executes the total data that computer program executes above-mentioned scanning hive
The access time in area records, and access time record is met the data field for presetting aging condition as aging area, and delete aging
The process of the content in area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting default old
Change condition, then it is aging area that the access time of Current Scan, which records corresponding data field, and records the mark in aging area, continues to sweep
Retouch next access time record;Otherwise, next access time record is directly scanned, until the access time in total data area
Writing scan finishes;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area
Content.
The present invention also provides a kind of computer readable storage medium, computer is stored on computer readable storage medium
Program realizes following methods when computer program is executed by processor:
When receiving the read and write access for the data field of hive, it is corresponding to refresh read and write access in the metadatabase of hive
Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time
Record meets the data field of default aging condition as aging area, and deletes the content in aging area.
In a preferred embodiment, processor executes the tables of data point that data field corresponding to the above method is specially hive
Area.
In a preferred embodiment, the default condition of scanning in the computer program includes timing;Judgement is current
Whether the process of the default condition of scanning of satisfaction includes:
Judge whether current time meets timing, otherwise is unsatisfactory for presetting if so, meeting the default condition of scanning
The condition of scanning.
In another preferred embodiment, the default condition of scanning in the computer program includes distribution locating for hive
The occupancy of the memory space of system reaches default high load threshold value;Whether judgement currently meets the process packet of the default condition of scanning
It includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if
It is then to meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
In a preferred embodiment, the default aging condition in the computer program includes access time record away from current
Moment exceeds preset time threshold.
In another preferred embodiment, the default aging condition in the computer program includes that access time record belongs to
Apart from the K record that current time is farthest in whole access time records.
In another preferred embodiment, the processor executes the total data that computer program executes above-mentioned scanning hive
The access time in area records, and access time record is met the data field for presetting aging condition as aging area, and delete aging
The process of the content in area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting default old
Change condition, then it is aging area that the access time of Current Scan, which records corresponding data field, and records the mark in aging area, continues to sweep
Retouch next access time record;Otherwise, next access time record is directly scanned, until the access time in total data area
Writing scan finishes;
After the access time writing scan in total data area, the whole of unified deletion record identifies corresponding aging area
Content.
Above several specific embodiments are only the preferred embodiment of the present invention, and above several specific embodiments can be with
Any combination, the embodiment obtained after combination is also within protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, relevant speciality technical staff deduced out in the case where not departing from spirit of that invention and concept thereof other change
Into and variation, should all be included in the protection scope of the present invention.
It should also be noted that, in the present specification, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Claims (10)
1. a kind of data aging method based on hive characterized by comprising
When receiving the read and write access for the data field of hive, it is corresponding to refresh the read and write access in the metadatabase of hive
Data field access time record;
Whether judgement currently meets the default condition of scanning;
If meeting the default condition of scanning, the access time record in the total data area of hive is scanned, will be remembered access time
Record meets the data field of default aging condition as aging area, and deletes the content in the aging area.
2. the data aging method according to claim 1 based on hive, which is characterized in that the data field is specially
The tables of data subregion of hive.
3. the data aging method according to claim 1 based on hive, which is characterized in that the default condition of scanning packet
Include timing;It is described to judge that the process for currently whether meeting the default condition of scanning includes:
Judge whether current time meets the timing, is otherwise unsatisfactory for if so, meeting the default condition of scanning
The default condition of scanning.
4. the data aging method according to claim 1 based on hive, which is characterized in that the default condition of scanning packet
The occupancy for including the memory space of distributed system locating for hive reaches default high load threshold value;Whether the judgement currently meets
The process of the default condition of scanning includes:
Judge whether the occupancy of the memory space of distributed system locating for current hive reaches default high load threshold value, if so,
Then meet the default condition of scanning, otherwise, is unsatisfactory for the default condition of scanning.
5. the data aging method according to claim 1 based on hive, which is characterized in that the default aging condition packet
The access time record is included away from current time beyond preset time threshold.
6. the data aging method according to claim 1 based on hive, which is characterized in that the default aging condition packet
The access time record is included to belong in whole access time records apart from the K record that current time is farthest.
7. the data aging method according to claim 1-6 based on hive, which is characterized in that the scanning
The access time in the total data area of hive records, and access time record is met the data field of default aging condition as aging
Area, and the process for deleting the content in the aging area includes:
It successively scans and judges whether the access time record of Current Scan meets default aging condition, if meeting described default old
Change condition, then it is aging area that the access time of the Current Scan, which records corresponding data field, and records the mark in the aging area
Know, continues to scan on next access time record;Otherwise, next access time record is directly scanned, until total data area
Access time writing scan finish;
After the access time writing scan in total data area, the whole of unified deletion record identifies the interior of corresponding aging area
Hold.
8. a kind of data aging device based on hive characterized by comprising
Time recording module when for receiving the read and write access for the data field of hive, is brushed in the metadatabase of hive
The newly access time record of the corresponding data field of the read and write access;
Judgment module is scanned, for judging currently whether meet the default condition of scanning, if meeting the default condition of scanning, triggering
Aging judgment module;
The aging judgment module, the access time record in the total data area for scanning hive, will record full access time
The content in the aging area is deleted as aging area in the data field of the default aging condition of foot.
9. a kind of data aging equipment based on hive characterized by comprising
Memory, for storing computer program;
Processor realizes the number as described in any one of claim 1 to 7 based on hive when for executing the computer program
The step of according to aging method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize the data as described in any one of claim 1 to 7 based on hive when being executed by processor
The step of aging method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346834.2A CN109460411A (en) | 2018-11-13 | 2018-11-13 | A kind of data aging method based on hive, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346834.2A CN109460411A (en) | 2018-11-13 | 2018-11-13 | A kind of data aging method based on hive, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109460411A true CN109460411A (en) | 2019-03-12 |
Family
ID=65610265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811346834.2A Pending CN109460411A (en) | 2018-11-13 | 2018-11-13 | A kind of data aging method based on hive, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109460411A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457300A (en) * | 2019-07-15 | 2019-11-15 | 中国平安人寿保险股份有限公司 | A kind of method for cleaning and device, electronic equipment in common test library |
CN113434492A (en) * | 2021-06-21 | 2021-09-24 | 青岛海尔科技有限公司 | Data detection method and device, storage medium and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581100A (en) * | 2003-07-31 | 2005-02-16 | 华为技术有限公司 | Data aging method for network processor |
CN105354193A (en) * | 2014-08-19 | 2016-02-24 | 阿里巴巴集团控股有限公司 | Caching method, query method, caching apparatus and query apparatus for database data |
CN106354779A (en) * | 2016-08-23 | 2017-01-25 | 成都卡莱博尔信息技术股份有限公司 | Data management system for trunking architecture |
CN107562889A (en) * | 2017-09-05 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of metadata aging method and device |
-
2018
- 2018-11-13 CN CN201811346834.2A patent/CN109460411A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581100A (en) * | 2003-07-31 | 2005-02-16 | 华为技术有限公司 | Data aging method for network processor |
CN105354193A (en) * | 2014-08-19 | 2016-02-24 | 阿里巴巴集团控股有限公司 | Caching method, query method, caching apparatus and query apparatus for database data |
CN106354779A (en) * | 2016-08-23 | 2017-01-25 | 成都卡莱博尔信息技术股份有限公司 | Data management system for trunking architecture |
CN107562889A (en) * | 2017-09-05 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of metadata aging method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457300A (en) * | 2019-07-15 | 2019-11-15 | 中国平安人寿保险股份有限公司 | A kind of method for cleaning and device, electronic equipment in common test library |
CN110457300B (en) * | 2019-07-15 | 2024-02-02 | 中国平安人寿保险股份有限公司 | Method and device for cleaning public test library and electronic equipment |
CN113434492A (en) * | 2021-06-21 | 2021-09-24 | 青岛海尔科技有限公司 | Data detection method and device, storage medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8954699B1 (en) | Techniques for identifying IO hot spots using range-lock information | |
US8782324B1 (en) | Techniques for managing placement of extents based on a history of active extents | |
CN104978361B (en) | Method and device for storing real-time monitoring data of power environment | |
CN105549905B (en) | A kind of method that multi-dummy machine accesses distributed objects storage system | |
CN105677240B (en) | Data-erasure method and system | |
CN105302478B (en) | A kind of date storage method and electronic equipment | |
CA2442188A1 (en) | Methods and mechanisms for proactive memory management | |
CN103440207A (en) | Caching method and caching device | |
CN108334284A (en) | Tail delay perception foreground garbage collection algorithm | |
CN104769520A (en) | System and method for dynamic memory power management | |
CN109460411A (en) | A kind of data aging method based on hive, device and equipment | |
CN106227621A (en) | The data back up method of logic-based volume management simplification volume and system | |
US20170123975A1 (en) | Centralized distributed systems and methods for managing operations | |
CN107193494A (en) | RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system | |
CN101645802B (en) | Method and device for controlling contents | |
US11829377B2 (en) | Efficient storage method for time series data | |
CN105528274B (en) | A kind of disk monitoring method and system that optimization accelerates | |
US20100058020A1 (en) | Mobile phone and method for managing memory of the mobile phone | |
US8352398B2 (en) | Time-based conflict resolution | |
CN107179883A (en) | Spark architecture optimization method of hybrid storage system based on SSD and HDD | |
CN104050100B (en) | A kind of data flow memory management method and system suitable for big data environment | |
CN106021124B (en) | A kind of storage method and storage system of data | |
CN113867641B (en) | Host memory buffer management method and device and solid state disk | |
CN112988513B (en) | Method, device, equipment and medium for managing server hard disk information | |
CN114676127A (en) | Server service analysis method, device, medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190312 |
|
RJ01 | Rejection of invention patent application after publication |