CN113282235A - Method and system for dynamically processing data set based on shift-out in cache - Google Patents

Method and system for dynamically processing data set based on shift-out in cache Download PDF

Info

Publication number
CN113282235A
CN113282235A CN202110406895.9A CN202110406895A CN113282235A CN 113282235 A CN113282235 A CN 113282235A CN 202110406895 A CN202110406895 A CN 202110406895A CN 113282235 A CN113282235 A CN 113282235A
Authority
CN
China
Prior art keywords
data
association
compression
level
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110406895.9A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110406895.9A priority Critical patent/CN113282235A/en
Publication of CN113282235A publication Critical patent/CN113282235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Abstract

The invention discloses a method and a system for dynamically processing a data set based on shift-out in a cache, wherein the method comprises the following steps: performing real-time statistics on the number of times of accessing each data group in the cache, determining a data group to be moved out and a plurality of associated data groups to be moved from the cache to the first memory, and setting an association level according to the association degree of each associated data group; scanning each data segment to determine a current data segment and a current data area; determining the compression rate and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data section, compressing the current data section according to the compression level and the compression rate, determining the compression rate and the compression level of at least one associated data section and each associated data area in the associated data section, and compressing the current data section according to the compression level and the compression level in the associated data section.

Description

Method and system for dynamically processing data set based on shift-out in cache
Divisional application
The application is a divisional application of a patent application which is filed in 2018, 6, month and 16, has an application number of 2018106245462 and is named as a method and a system for dynamically processing a data set based on cache eviction.
Technical Field
The present invention relates to the field of mobile device data processing, and more particularly, to a method and system for dynamic processing based on a data set shifted out from a cache.
Background
At present, as mobile terminals such as mobile phones are more and more widely used, devices such as processors, memories, cameras and the like of the mobile terminals are greatly improved. However, as the user demands for the running speed, images, and the like of various applications are higher, the processing resources or the storage resources occupied by the applications are also higher. For this reason, while improvements are made to devices such as processors, memories, cameras, and the like, improvements in access performance of data in mobile terminals are also required to improve data processing capabilities of the mobile terminals.
Disclosure of Invention
According to an aspect of the present invention, there is provided a method for dynamic processing based on a data set shifted out of a cache, the method comprising:
the method comprises the steps of carrying out real-time statistics on the number of times of access to each data group in a plurality of data groups in a cache of a processor in the mobile terminal, and determining the data group with the number of times of access lower than a first time threshold value in a preset time period as a data group to be moved from the cache to a first memory;
determining a plurality of associated data sets which need to be operated in an associated manner and are stored in a first memory when the data set to be removed runs on the basis of the associated statistical information of the data set to be removed, and setting an association level for each associated data set according to the association degree of each associated data set and the data set to be removed, wherein the association levels comprise: a high association level, a medium association level, and a low association level;
scanning each data segment in the plurality of data segments in the first memory to determine the number of associated data sets included in each data segment, and determining the data segment with the most number of associated data sets in the plurality of data segments, of which the remaining space can accommodate the data set to be moved out, as the current data segment (from the most beginning until the data segment is determined to be satisfactory);
determining a current data area allocated to the data group to be moved out from a plurality of data areas in the current data section, and moving the data group to be moved out from a cache to the current data area of the current data section;
scanning all data areas except the current data area in the current data section, determining the data areas except the current data area in the current data section and having at least one associated data group as associated data areas, and determining the compression rate and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data section, wherein the compression rate comprises a high compression rate, a medium compression rate and a low compression rate, and the compression degrees of the high compression rate, the medium compression rate and the low compression rate are sequentially increased; wherein the compression stage comprises: the compression sequence of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage is sequentially reduced;
setting the current data area to be at a high compression rate and marking the current data area as a first compression level, wherein the step of determining the compression rate and the compression level of each associated data area according to the highest association level related to the associated data group in each associated data area in the current data section comprises the following steps: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, marking the related associated data area with the highest association level as a medium association level as a third compression level and marking the related associated data area with the highest association level as a low association level as a fourth compression level;
within the current data segment (from low address to high address direction, or from high address to low address direction-one scheme is extended) compression is performed according to compression level and compression ratio:
firstly, compressing a data group in the current data area marked as a first compression stage according to a high compression rate;
then, compressing the data group in the associated data area marked as the second compression stage according to the high compression rate;
then, compressing the data group in the associated data area marked as the third compression level according to the medium compression rate;
finally, compressing the data group in the associated data area marked as the fourth compression stage according to the low compression rate;
determining at least one associated data segment of the plurality of data segments, which is other than the current data segment and has an associated data set, while compressing the current data segment according to a compression level and a compression rate, wherein the associated data set is stored in at least one associated data area in each associated data segment;
determining the compression rate and the compression level of each associated data area in the associated data section according to the highest association level related to the associated data group in each associated data area in at least one associated data area of each associated data section comprises: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, and marking the related associated data area with the highest association level as a medium association level or an association level as a third compression level;
in response to completion of compression by compression level and compression ratio within the current compressed data segment, compressing by compression level and compression ratio within the at least one associated compressed data segment (each of the associated compressed data segments-parallel, or serial):
firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate;
then, the associated data area marked as the third compression stage and the medium compression rate in each associated compressed data section is compressed at the medium compression rate, and the associated data area marked as the third compression stage and the low compression rate in each associated compressed data section is compressed at the low compression rate at the same time.
The method further comprises the steps of carrying out real-time statistics on the number of times of access to each data group in a plurality of data groups in each data section in the first memory, and determining the data group with the number of times of access higher than a second time threshold value in a preset time period as a data group to be moved in.
When the ratio of the remaining storage space of the cache is higher than the shift-in threshold, forming a queue with the increasing storage size by at least one data group to be shifted in, starting from the data group to be shifted with the smallest storage size, and shifting the at least one data group to be shifted in the cache according to the queue sequence until the ratio of the remaining storage space of the cache is lower than the shift-out threshold after the next data group to be shifted in is shifted to the cache; wherein the move-in threshold is greater than the move-out threshold.
Wherein the second nonce threshold is greater than the first nonce threshold, or the second nonce threshold is less than the first nonce threshold;
wherein the predetermined period of time is a period of time with a current time as an end point and a past time as a start point;
determining a time length of the predetermined time period according to a system configuration or a user setting;
the cache is a cache memory inside or outside the processor;
the number of accesses is the number of times each data set is accessed by the processor.
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be shifted out, randomly selecting one data segment from the plurality of data segments as a current data segment; or
When a plurality of data segments having the largest number of associated data groups and having a remaining space capable of accommodating the data group to be moved out exist, selecting a data segment having the largest remaining space from the plurality of data segments as a current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and the remaining space can accommodate the data group to be removed, the data segment having the smallest remaining space is selected from the plurality of data segments as the current data segment.
And when the remaining space of the data segments with the most number of the associated data groups cannot accommodate the data group to be moved out, selecting the data segment with the second most number of the associated data groups and judging whether the remaining space can accommodate the data group to be moved out, until the remaining space is determined to accommodate the data segment with the most number of the associated data groups in the plurality of data segments of the data group to be moved out.
When the operating system of the mobile terminal is detected to be loaded into the first memory and the starting of the operating system is completed, a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and a file package associated with each application in the plurality of applications to be loaded is copied from a second memory into the first memory.
The first memory is a volatile memory and the second memory is a non-volatile memory.
Creating a plurality of data segments for storing data in the first memory after the operating system is started and before a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, wherein each data segment comprises a plurality of data areas.
Wherein the package of files associated with each application includes at least one data set and in the first memory or cache, the data set is used as a base storage unit when storing data.
Wherein an arbitrary data group is stored in a single data area of the data section, and at least one data group can be stored in the single data area.
The second memory includes a plurality of compressed data segments for storing compressed data therein, wherein each compressed data segment includes a plurality of compressed data regions, and each compressed data region includes a plurality of sub-regions.
Wherein the file package associated with each application includes at least one compressed data group, and in the second memory, the compressed data group is used as a basic storage unit when data is compressed and stored;
wherein any of the compressed data sets is stored in a single data area of the data segment, and wherein at least one data set can be stored in the single data area.
Wherein determining a current data region allocated for the data group to be shifted out among a plurality of data regions within the current data segment comprises:
randomly distributing a data area for the data group to be shifted out in a plurality of data areas of the current data section to serve as the current data area; or
Calculating a hash value of the identifier of the data group to be moved out, and selecting one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
Taking a data area with the largest ratio of the residual storage space in a plurality of data areas of the current data segment as a current data area; or
And taking the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
After determining a plurality of applications to be loaded according to a preset loading configuration file, copying an association statistical file from a second memory to a first memory, wherein the association statistical file comprises a plurality of pieces of association statistical information, and each piece of association statistical information is used for indicating a plurality of associated data groups of each data group.
Determining the content association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the content association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the content relevancy between any two data sets is determined by carrying out content matching on the summary information of any two data sets according to the matching value of content matching.
The preset selection rule comprises the following steps: the content relevancy in the sorted list is larger than a relevancy threshold value, or the content relevancy in the sorted list is ranked before a predetermined ranking.
Determining the operation association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the operation association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps of obtaining operation history information of each data set, determining statistical data of the data sets operated in each basic time unit according to the operation history information, determining the number of times of any two data sets operated in the same basic time unit, and determining the operation association degree between any two data sets based on the number of times of operation in the same basic time unit.
The preset selection rule comprises the following steps: and running a plurality of data groups with the association degree larger than the association degree threshold value in the ordered list, or running a plurality of data groups with the association degree ranking before the preset name time in the ordered list.
Determining the feedback association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the feedback association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
The preset selection rule comprises the following steps: the feedback association degree of the data groups in the sorted list is larger than the association degree threshold value, or the feedback association degree of the data groups in the sorted list is ranked before the preset ranking.
Determining the comprehensive association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
wherein the content relevance and the running relevance of each data group except the current data group in the plurality of data groups and the current data group are weighted to calculate to determine the comprehensive relevance.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
Determining the comprehensive association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
Determining the comprehensive association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the running relevance and the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
Determining the comprehensive association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the summary information of any two data groups is subjected to content matching to determine the content relevancy between any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance degree, the operation relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
Setting an association level for each associated data group according to the association degree of each associated data group and the data group to be moved out comprises the following steps:
when the association degree of the associated data set and the data set to be moved out is greater than or equal to a first association degree threshold value, setting the association level of the associated data set to be a high association level;
when the association degree of the associated data set and the data set to be moved out is smaller than a first association degree threshold value and larger than or equal to a second association degree threshold value, setting the association level of the associated data set to be a medium association level; and
and when the association degree of the associated data set and the data set to be moved out is smaller than a second association degree threshold value, setting the association level of the associated data set to be a low association level.
Wherein the first relevancy threshold is greater than the second relevancy threshold.
And setting association level for each association data group in the association statistical information based on the association operation times, the association operation time and the synchronous starting times of each association data group in the plurality of association data groups and the data group to be removed during the association operation.
And acquiring the operation history information of each data group, and determining the associated operation times, the associated operation time and the synchronous starting times of each associated data group and the data group to be moved out during the associated operation according to the operation history information.
The correlation operation times are the times of correlation operation of the two data sets in a statistical time period;
the correlation operation time is the time length of the correlation operation of the two data sets in the statistical time period;
the synchronous starting times are the times of synchronous starting of the two data groups in a statistical time period;
the correlated operation means that the difference value of the time for which the two data sets are respectively called or started to operate is greater than a first preset time interval and less than a second preset time interval;
wherein synchronous start means that the difference between the times at which the two data sets are respectively called or started to run is less than or equal to a first predetermined time interval.
Determining, based on the associated statistical information of the data group to be removed, that the data group to be removed needs to be associated and run during running includes:
and determining a plurality of associated data groups which need to be operated in an associated manner and have high association levels when the data group to be removed runs based on the associated statistical information of the data group to be removed.
Determining, based on the associated statistical information of the data group to be removed, that the data group to be removed needs to be associated and run during running includes:
and determining a plurality of associated data groups which need to be operated in an associated manner and have an association level of a medium association level when the data group to be removed runs based on the associated statistical information of the data group to be removed.
Determining, based on the associated statistical information of the data group to be removed, that the data group to be removed needs to be associated and run during running includes:
and determining a plurality of associated data groups which need to be associated and operated at the time of operation and have the association levels of a high association level and a medium association level based on the associated statistical information of the data group to be removed.
Each associated data area has at least one associated data set therein.
With a high compression ratio of 90%, a medium compression ratio of 80%, and a low compression ratio of 70% (actually, a range of compression ratios).
Wherein compressing the data groups in the associated data area labeled as the second compression stage comprises: the data groups in the plurality of associated data areas marked as the second compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a high compression ratio.
Wherein compressing the data groups in the associated data area labeled as the third compression stage comprises: the data groups in the plurality of associated data areas marked as the third compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at the medium compression rate.
Wherein compressing the data groups in the associated data area labeled as the fourth compression stage comprises: the data groups in the plurality of associated data areas labeled as the fourth compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a low compression ratio.
Wherein compressing the associated data region labeled as the second compression level within each associated compressed data segment at a high compression ratio comprises:
and compressing the associated data area marked as the second compression stage in each associated compressed data segment according to a high compression rate based on the serial or parallel mode of the plurality of associated compressed data segments.
Wherein compressing the associated data region labeled as the third compression level and the medium compression rate within each associated compressed data segment according to the medium compression rate comprises:
and compressing the associated data area marked as the third compression level and the middle compression level in each associated compressed data segment according to the middle compression rate based on the serial or parallel mode of the plurality of associated compressed data segments.
Wherein compressing the associated data regions labeled as the third compression stage and the low compression ratio within each associated compressed data segment according to the low compression ratio comprises:
and compressing the associated data area marked as the third compression stage and the low compression rate in each associated compressed data segment according to the low compression rate based on the serial or parallel mode of the plurality of associated compressed data segments.
Moving a compressed data group generated by compressing a data group in a current data area of a current data segment into a first buffer area of the first memory; and
moving a compressed data set resulting from compressing the data set in each associated data region of the current data segment into a second buffer region of the first memory.
After the compressed data group is moved into the first buffer area of the first memory, initializing the storage space of the current data area.
After moving the compressed data set into the second buffer of the first memory, initializing the storage space of each associated data area.
A data segment having at least one associated data group is selected as an associated data segment, and a data area having at least one associated data group within the associated data segment is determined as an associated data area.
And moving a compressed data group generated by compressing the data group in each associated data area of the at least one associated data area in each associated data segment to a second buffer area of the first memory.
Initializing a storage space of each associated data area of the at least one associated data area within each associated data segment after moving the compressed data set into the second buffer area of the first memory.
Determining the number of times of accessing each data group in all data groups in the first memory in each third preset time interval, and moving the data groups with the number of times of accessing in the current third preset time interval higher than a third number threshold value into a cache;
or determining the number of times of accessing each compressed data group in all the compressed data groups in the first memory in each third predetermined time interval, decompressing and moving the compressed data groups with the number of times of accessing higher than a third number threshold in the current third predetermined time interval into the cache.
According to another aspect of the present invention, there is provided a system for dynamic processing of sets of data evicted from a cache, the system comprising:
the statistical unit is used for carrying out real-time statistics on the accessed times of each data group in a plurality of data groups in a cache of a processor in the mobile terminal, and determining the data group with the accessed times lower than a first time threshold value in a preset time period as a data group to be moved out and to be moved from the cache to a first memory;
the association unit is used for determining a plurality of association data sets which need to be associated and run during running and are stored in the first memory based on the association statistical information of the data sets to be removed, and setting an association level for each association data set according to the association degree of each association data set and the data sets to be removed, wherein the association level comprises: a high association level, a medium association level, and a low association level;
a first scanning unit, which scans each data segment in the plurality of data segments in the first memory to determine the number of associated data sets included in each data segment, and determines the data segment with the largest number of associated data sets in the plurality of data segments, of which the remaining space can accommodate the data set to be moved out, as the current data segment (from the largest to the largest until the data segment is determined to be satisfactory);
the mobile unit is used for determining a current data area which is distributed for the data group to be moved out in a plurality of data areas in the current data section and moving the data group to be moved out to the current data area of the current data section from the cache;
a second scanning unit, which scans all data areas except the current data area in the current data segment, determines the data areas except the current data area in the current data segment and having at least one associated data group as associated data areas, and determines the compression rate and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data segment, wherein the compression rate comprises a high compression rate, a medium compression rate and a low compression rate, and the compression degrees of the high compression rate, the medium compression rate and the low compression rate are sequentially increased; wherein the compression stage comprises: the compression sequence of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage is sequentially reduced;
a first setting unit, configured to set the current data area to a high compression rate and mark the current data area as a first compression level, wherein determining the compression rate and the compression level of each associated data area according to a highest association level related to an associated data group in each associated data area in the current data segment includes: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the association data area to be a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, marking the related associated data area with the highest association level as a medium association level as a third compression level and marking the related associated data area with the highest association level as a fourth compression level;
a first compression unit for compressing (from a low address to a high address direction, or from a high address to a low address direction — expanding a scheme) within the current data segment according to a compression level and a compression rate:
firstly, compressing a data group in the current data area marked as a first compression stage according to a high compression rate;
then, compressing the data group in the associated data area marked as the second compression stage according to the high compression rate;
then, compressing the data group in the associated data area marked as the third compression level according to the medium compression rate;
finally, compressing the data group in the associated data area marked as the fourth compression stage according to the low compression rate;
a third scanning unit which determines at least one associated data segment which is other than the current data segment and has an associated data group in the plurality of data segments while the current data segment is compressed according to a compression level and a compression rate, wherein the associated data group is stored in at least one associated data area in each associated data segment;
the second setting unit, determining the compression rate and the compression level of each associated data area in the associated data segment according to the highest association level related to the associated data group in each associated data area in at least one associated data area of each associated data segment, comprises: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate correspondingly; marking the related associated data area with the highest association level as a high association level as a second compression level, and marking the related associated data area with the highest association level as a medium association level or a low association level as a third compression level;
a second compression unit that, in response to completion of compression by compression level and compression ratio within the current compressed data segment, compresses by compression level and compression ratio within the at least one associated compressed data segment (each of which is associated-parallel, or serial):
firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate;
then, the associated data area marked as the third compression stage and the medium compression rate in each associated compressed data section is compressed at the medium compression rate, and the associated data area marked as the third compression stage and the low compression rate in each associated compressed data section is compressed at the low compression rate at the same time.
The counting unit is used for carrying out real-time counting on the accessed times of each data group in a plurality of data groups in each data section in the first memory, and determining the data group with the accessed times higher than a second time threshold value in a preset time period as a data group to be shifted in.
When the ratio of the remaining storage space of the cache is higher than the shift-in threshold, the mobile unit forms a queue with the increasing storage size by at least one data group to be shifted in, and from the data group to be shifted with the smallest storage size, the mobile unit moves the at least one data group to be shifted in the cache according to the queue order until the ratio of the remaining storage space of the cache is lower than the shift-out threshold after the next data group to be shifted in is moved to the cache; wherein the move-in threshold is greater than the move-out threshold.
Wherein the second nonce threshold is greater than the first nonce threshold, or the second nonce threshold is less than the first nonce threshold;
wherein the predetermined period of time is a period of time with a current time as an end point and a past time as a start point;
determining a time length of the predetermined time period according to a system configuration or a user setting;
the cache is a cache memory inside or outside the processor;
the number of accesses is the number of times each data set is accessed by the processor.
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be removed, the first scanning unit randomly selects one data segment from the plurality of data segments as a current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be moved out, the first scanning unit selects a data segment having the largest remaining space from among the plurality of data segments as a current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be removed, the first scanning unit selects a data segment having the smallest remaining space from among the plurality of data segments as a current data segment.
When the remaining space of the data segment with the largest number of the associated data groups cannot accommodate the data group to be moved out, the first scanning unit selects the data segment with the second largest number of the associated data groups and judges whether the remaining space can accommodate the data group to be moved out or not until the remaining space is determined to accommodate the data segment with the largest number of the associated data groups in the plurality of data segments of the data group to be moved out.
The method further comprises an initialization unit, when the operating system of the mobile terminal is detected to be loaded into the first memory and the starting of the operating system is completed, the plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and the file package associated with each application in the plurality of applications to be loaded is copied from the second memory to the first memory.
The first memory is a volatile memory and the second memory is a non-volatile memory.
After the operating system is started and before the plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, an initialization unit creates a plurality of data segments for storing data in the first memory, wherein each data segment comprises a plurality of data areas.
Wherein the package of files associated with each application includes at least one data set and in the first memory or cache, the data set is used as a base storage unit when storing data.
Wherein an arbitrary data group is stored in a single data area of the data section, and at least one data group can be stored in the single data area.
The second memory includes a plurality of compressed data segments for storing compressed data therein, wherein each compressed data segment includes a plurality of compressed data regions, and each compressed data region includes a plurality of sub-regions.
Wherein the file package associated with each application includes at least one compressed data group, and in the second memory, the compressed data group is used as a basic storage unit when data is compressed and stored;
wherein any of the compressed data sets is stored in a single data area of the data segment, and wherein at least one data set can be stored in the single data area.
Wherein the mobile unit determining a current data region allocated for the data group to be removed among a plurality of data regions within the current data segment comprises:
the mobile unit randomly allocates a data area for the data group to be shifted out in a plurality of data areas of the current data segment as a current data area; or
The mobile unit calculates the hash value of the identifier of the data group to be moved out, and selects one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
The mobile unit takes the data area with the largest ratio of the residual storage space in the plurality of data areas of the current data segment as the current data area; or
And the mobile unit takes the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
After determining a plurality of applications to be loaded according to a preset loading configuration file, copying an association statistical file from a second memory to a first memory, wherein the association statistical file comprises a plurality of pieces of association statistical information, and each piece of association statistical information is used for indicating a plurality of associated data groups of each data group.
The association unit determines the content association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the content association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the content relevancy between any two data sets is determined by carrying out content matching on the summary information of any two data sets according to the matching value of content matching.
The preset selection rule comprises the following steps: the content relevancy in the sorted list is larger than a relevancy threshold value, or the content relevancy in the sorted list is ranked before a predetermined ranking.
The association unit determines the operation association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the operation association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps of obtaining operation history information of each data set, determining statistical data of the data sets operated in each basic time unit according to the operation history information, determining the number of times of any two data sets operated in the same basic time unit, and determining the operation association degree between any two data sets based on the number of times of operation in the same basic time unit.
The preset selection rule comprises the following steps: and running a plurality of data groups with the association degree larger than the association degree threshold value in the ordered list, or running a plurality of data groups with the association degree ranking before the preset name time in the ordered list.
The association unit determines the feedback association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the feedback association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
The preset selection rule comprises the following steps: the feedback association degree of the data groups in the sorted list is larger than the association degree threshold value, or the feedback association degree of the data groups in the sorted list is ranked before the preset ranking.
The association unit determines the comprehensive association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
wherein the content relevance and the running relevance of each data group except the current data group in the plurality of data groups and the current data group are weighted to calculate to determine the comprehensive relevance.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
The association unit determines the comprehensive association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
The association unit determines the comprehensive association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the running relevance and the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
The association unit determines the comprehensive association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance degree, the operation relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a predetermined ranking.
The association unit setting an association level for each associated data set according to the association degree of each associated data set and the data set to be moved out includes:
when the association degree of the associated data set and the data set to be moved out is greater than or equal to a first association degree threshold value, setting the association level of the associated data set to be a high association level;
when the association degree of the associated data set and the data set to be moved out is smaller than a first association degree threshold value and larger than or equal to a second association degree threshold value, setting the association level of the associated data set to be a medium association level; and
and when the association degree of the associated data set and the data set to be moved out is smaller than a second association degree threshold value, setting the association level of the associated data set to be a low association level.
Wherein the first relevancy threshold is greater than the second relevancy threshold.
And the association unit sets association levels for each association data set in the association statistical information based on the association operation times, the association operation times and the synchronous starting times of each association data set in the plurality of association data sets and the data set to be removed during the association operation.
The association unit acquires the operation history information of each data set, and determines the association operation times, the association operation time and the synchronous starting times of each association data set and the data set to be moved out during association operation according to the operation history information.
The correlation operation times are the times of correlation operation of the two data sets in a statistical time period;
the correlation operation time is the time length of the correlation operation of the two data sets in the statistical time period;
the synchronous starting times are the times of synchronous starting of the two data groups in a statistical time period;
the correlated operation means that the difference value of the time for which the two data sets are respectively called or started to operate is greater than a first preset time interval and less than a second preset time interval;
wherein synchronous start means that the difference between the times at which the two data sets are respectively called or started to run is less than or equal to a first predetermined time interval.
The association unit determines, based on the association statistical information of the data group to be removed, that the plurality of associated data groups that need to be operated in association with each other when the data group to be removed is operated include:
the association unit determines a plurality of association data sets which need to be associated and run when the data set to be removed runs and have high association levels based on the association statistical information of the data set to be removed.
The association unit determines, based on the association statistical information of the data group to be removed, that the plurality of associated data groups that need to be operated in association with each other when the data group to be removed is operated include:
the association unit determines a plurality of association data sets which need to be associated and run when the data set to be removed runs and have an association level of a middle association level based on the association statistical information of the data set to be removed.
The association unit determines, based on the association statistical information of the data group to be removed, that the plurality of associated data groups that need to be operated in association with each other when the data group to be removed is operated include:
the association unit determines a plurality of association data sets which need to be associated and run when the data set to be removed runs and have association levels of a high association level and a medium association level based on the association statistical information of the data set to be removed.
Each associated data area has at least one associated data set therein.
With a high compression ratio of 90%, a medium compression ratio of 80%, and a low compression ratio of 70% (actually, a range of compression ratios).
Wherein compressing the data groups in the associated data area labeled as the second compression stage comprises: the data groups in the plurality of associated data areas marked as the second compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a high compression ratio.
Wherein compressing the data groups in the associated data area labeled as the third compression stage comprises: the data groups in the plurality of associated data areas marked as the third compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at the medium compression rate.
Wherein compressing the data groups in the associated data area labeled as the fourth compression stage comprises: the data groups in the plurality of associated data areas labeled as the fourth compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a low compression ratio.
The second compression unit compresses the associated data area marked as the second compression stage in each associated compressed data segment according to a high compression rate, and comprises the following steps:
the second compression unit compresses the associated data area marked as the second compression stage in each associated compressed data segment at a high compression rate based on a serial or parallel manner of the plurality of associated compressed data segments.
The second compression unit compresses the associated data area marked as the third compression level and the medium compression rate in each associated compressed data segment according to the medium compression rate, and comprises the following steps:
the second compression unit compresses the associated data area marked as the third compression stage and the middle compression ratio within each associated compressed data segment at the middle compression ratio based on a serial or parallel manner of the plurality of associated compressed data segments.
The second compression unit compresses the associated data area marked as the third compression stage and the low compression rate in each associated compressed data segment according to the low compression rate, and the compression comprises the following steps:
the second compression unit compresses the associated data area marked as the third compression stage and the low compression rate in each associated compressed data segment at the low compression rate based on a serial or parallel manner of the plurality of associated compressed data segments.
The mobile unit moves a compressed data group generated by compressing the data group in the current data area of the current data segment into a first buffer area of the first memory; and
the mobile unit moves the compressed data set generated by compressing the data set in each associated data area of the current data segment into the second buffer area of the first memory.
After the compressed data group is moved to the first buffer area of the first memory, the initialization unit initializes the storage space of the current data area.
After moving the compressed data set into the second buffer area of the first memory, an initialization unit initializes the storage space of each associated data area.
The third scanning unit selects a data segment having at least one associated data group as an associated data segment, and determines a data area having at least one associated data group within the associated data segment as an associated data area.
And the mobile unit moves the compressed data group generated by compressing the data group in each associated data area in at least one associated data area in each associated data segment into the second buffer area of the first memory.
After the compressed data group is moved to the second buffer area of the first memory, the initialization unit initializes the storage space of each associated data area in at least one associated data area in each associated data segment.
Determining the number of times of accessing each data group in all data groups in the first memory in each third preset time interval, and moving the data groups with the number of times of accessing in the current third preset time interval higher than a third number threshold into a cache by the moving unit; or
And determining the number of times of accessing each compressed data group in all the compressed data groups in the first memory in each third predetermined time interval, and decompressing and moving the compressed data groups with the number of times of accessing higher than a third number threshold in the current third predetermined time interval into a cache by the mobile unit.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow diagram of a method for dynamic processing of data sets based on in-cache evictions, according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a logical structure of a storage device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a logical structure of a data segment according to an embodiment of the present invention;
FIG. 4 is a flow diagram of a method of compression level and compression rate compression within a current data segment according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method for determining an associated data set according to an embodiment of the present invention;
FIG. 6 is a flow chart of a method for determining an associated data set according to another embodiment of the present invention;
FIG. 7 is a flow diagram of a method for determining an associated data set in accordance with yet another embodiment of the present invention;
FIG. 8 is a flow chart of a method for determining an associated data set according to yet another embodiment of the present invention; and
FIG. 9 is a block diagram of a system for dynamic processing of data sets based on in-cache evictions, according to an embodiment of the invention.
Detailed Description
FIG. 1 is a flow diagram of a method 100 for dynamic processing of data sets based on in-cache evictions, according to an embodiment of the invention. As shown in fig. 1, method 100 begins at step 101.
In step 101, the number of times of access to each of a plurality of data sets in a cache of an processor in a mobile terminal is counted in real time, and a data set with the number of times of access within a predetermined time period lower than a first threshold is determined as a data set to be moved out to be moved from the cache to a first memory. Where the predetermined period of time (e.g., 8 hours, 16 hours, 1 day, etc.) is a period of time with the current time as the end point and the past time as the starting point. The application counts the access records (e.g. including the identifier and access time of the data group) of each data group (in the cache or in the first memory) of the processor in real time and determines the real-time statistical information of the accessed times of each data group according to the access records. The predetermined period of time may be regarded as a moving time window proceeding along the time, and the length of time of the moving time window (i.e., the predetermined period of time) may be determined according to a user setting or a system configuration, that is, may be determined according to a system configuration or a user setting.
The mobile terminal may be any type of mobile device including a mobile handset, station, unit, device, multimedia tablet, communicator, laptop, Personal Digital Assistant (PDA), or any combination thereof. Generally, a mobile terminal may be communicatively coupled to other devices, such as mobile devices, servers, base stations, etc., via a network connection. Where the cache is a cache memory internal or external to the processor. A cache memory is a primary memory located between a main memory (memory, the first memory in this application) and a processor, and is composed of static memory chips (static random access memory SRAM), and has a relatively small capacity but a much higher speed than the main memory, which is close to the speed of the processor.
In addition, the method and the device perform real-time statistics on the number of times of access to each data group in a plurality of data groups in each data section in a first memory of the mobile terminal, and determine the data group (or the compressed data group) with the number of times of access higher than a second time threshold value in a preset time period as a data group (or the compressed data group) to be moved into, which is ready to be moved from the first memory to a cache.
Wherein the second count threshold is greater than the first count threshold, or the second count threshold is less than the first count threshold. For example, the first-order threshold value is 300 times, 500 times, 1000 times, etc., and the second-order threshold value is 300 times, 500 times, 1000 times, etc. Further, the number of times of access is the number of times each data set is accessed by the processor.
Alternatively, when the remaining storage space ratio of the buffer memory is higher than the move-in threshold (e.g., 20%, 30%, 40%), at least one data set to be moved into the queue is configured as a queue with increasing storage size, and the at least one data set to be moved into the buffer memory is moved into the buffer memory in the queue order from the data set to be moved with the smallest storage size until the remaining storage space ratio of the buffer memory is lower than the move-out threshold (e.g., 10%, 15%) after the next data set to be moved into the buffer memory. Wherein the move-in threshold is greater than the move-out threshold. For example, when the remaining storage space ratio of the buffer memory is 25% and the move-in threshold is 20%, the data groups 1 to 10 to be moved in are formed into a queue that is incremented by the storage size. Starting from the movement of the data set 1 to be moved into the cache, the next data set to be moved is judged. And when the remaining storage space ratio of the cache is 12% and the moving-out threshold value is 15% after the next data group to be moved is moved into the cache, stopping moving the data group to be moved after the next data group to be moved is moved into the cache.
And when the remaining storage space ratio of the cache is higher than the shift-in threshold, forming at least one data group to be shifted in into a queue with the decreasing storage size, and starting from the data group with the largest storage size to be shifted in, moving the at least one data group to be shifted in the cache according to the queue order until the remaining storage space ratio of the cache is lower than the shift-out threshold after moving the next data group to be shifted in to the cache.
The method further comprises the steps of forming at least one data group to be moved into a queue with the storage time decreasing when the ratio of the remaining storage space of the cache is higher than a move-in threshold, and moving the at least one data group to be moved into the cache from the data group to be moved with the longest storage time according to the queue sequence until the ratio of the remaining storage space of the cache is lower than the move-out threshold after the next data group to be moved into the cache.
The method further comprises the steps of forming a queue with increasing storage time by at least one data group to be moved when the ratio of the remaining storage space of the cache is higher than a move-in threshold, and moving the at least one data group to be moved into the cache from the data group to be moved with the shortest storage time according to the queue sequence until the ratio of the remaining storage space of the cache is lower than the move-out threshold after the next data group to be moved into the cache.
When the operating system of the mobile terminal is detected to be loaded into the first memory and the starting of the operating system is completed, a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and a file package associated with each application in the plurality of applications to be loaded is copied from a second memory into the first memory. Wherein the first memory is a volatile memory, such as a random access memory RAM, memory, and the second memory is a non-volatile memory, such as a flash memory. The operating system is stored in the second memory when the mobile terminal is powered off, and is loaded from the second memory into the first memory when the mobile terminal is started to run. The loading of the configuration file may be preset at the time of factory shipment of the mobile terminal or may be preset by a user of the mobile terminal. The loading configuration file may record one or more applications that can be automatically started when the operating system is started, among the plurality of applications in the mobile terminal. Typically, each application has an associated or at least one file package, and each file package may include multiple sub-packages of files therein, i.e., the file structure of each application is constructed in a hierarchical manner.
After the operating system of the mobile terminal is started up (which is when the system service, the system application, the resource management, the network initialization, and the like of the operating system are started up and the user application is not loaded), and before the plurality of applications to be loaded of the mobile terminal are determined according to the preset loading configuration file, a plurality of data segments for storing data are created in the storage area of the first memory, wherein each data segment comprises a plurality of data areas, as shown in fig. 2. FIG. 2 is a schematic diagram of a logical structure of a storage device 200 according to an embodiment of the present invention. The storage device 200 (e.g., a first memory) includes: a boot area 201, a first buffer 202, a second buffer 203, a storage area 204, and a reserved storage area 205. Wherein the boot area 201 is used to store system files associated with the booting of the operating system and directory files for indicating storage directory information within the storage device 200. The first buffer 202 and the second buffer 203 include a plurality of compressed data segments and are each used to store a compressed data set. In general, the compressed data sets stored in the first buffer 202 and the second buffer 203 may be from the memory 204 or the second memory.
It should be appreciated that in the second memory, the data files associated with each application are typically stored as compressed files. In general, a single data file may include at least one data group. Thus, in the second memory, the data set or data file is stored in the form of a compressed data file or compressed data set. In the second memory, the compressed data file is stored, usually in units of compressed data files, and when the compressed data file is loaded into the first memory, the compressed data file is split (usually, the data file includes at least one data group) to generate a plurality of compressed data groups. It should be appreciated that in the second memory, multimedia files, such as pictures, video, audio, documents, etc., may be stored in compressed form or uncompressed form. The reserved storage area 205 includes a plurality of data segments and compressed data segments, and is used for storing backup files, or for storing reserved resources of a system, or for storing emergency files, or for storing storage areas when emergency storage is performed by a mobile terminal.
The storage area 204 is used for storing uncompressed data sets or uncompressed data sets, and a data set is a data set that can be directly read, acquired, accessed, processed or calculated by a processor, a controller, a communication interface or the like. Storage area 204 includes a plurality of data segments for storing data, such as data segment 204-1, data segment 204-2, data segment 204-3, data segment 204-5, ·. The storage space (or capacity) of data segment 204-1, data segment 204-2, data segment 204-3, and data segment 204-5 are set differently in fig. 2, but it should be understood that the storage space (or capacity) of each data segment may be the same or different, or partially the same. The uncompressed data set or the uncompressed data set in the storage area 204 may be moved or copied into the cache when certain conditions are satisfied.
Each of the plurality of data segments includes a plurality of data areas, as shown in fig. 3. Fig. 3 is a schematic diagram of a logical structure of a data segment 300 according to an embodiment of the present invention. In fig. 3, the data segment 300 includes: data region 301-1, data region 301-2, data region 301-3, data region 301-4, data region 301-5, data region 301-6, and data region 301-7. The storage space (or capacity) of each data area may be the same or different, or the storage space of part of the data areas may be the same.
Wherein the package of files associated with each application includes at least one data set and in the first memory or cache, the data set is used as a base storage unit when storing data. Alternatively, in the first memory, a compressed data group is used as a basic storage unit when storing data. That is, in the first memory, each application is stored in the form of a plurality of compressed data groups or uncompressed data groups.
Wherein an arbitrary data group is stored in a single data area of the data section, and at least one data group can be stored in the single data area. The second memory includes a plurality of compressed data segments for storing compressed data therein, wherein each compressed data segment includes a plurality of compressed data regions, and each compressed data region includes a plurality of sub-regions. Wherein the file package associated with each application includes at least one compressed data group, and in the second storage, the compressed data group is used as a basic storage unit when data is compressed and stored. Wherein any of the compressed data sets is stored in a single data area of the data segment and at least one data set can be stored in the single data area.
In step 102, based on the associated statistical information of the data group to be removed, determining a plurality of associated data groups that are required to be operated in association and stored in a first memory when the data group to be removed is operated, and setting an association level for each associated data group according to the association degree of each associated data group with the data group to be removed, wherein the association levels include: a high association level, a medium association level, and a low association level.
Fig. 5 is a flow chart of a method 500 for determining an associated data set according to an embodiment of the present invention. As shown in fig. 5, method 500 begins at step 501. In step 501, the content association degree of each of the plurality of data sets (i.e., compressed data sets) other than the current data set with the current data set is determined. By taking each of the plurality of data sets as a current data set, the content association degree between any two of the plurality of data sets can be determined.
At step 502, each data group other than the current data group is sorted in descending order based on content relevance to generate a sorted list. By taking each data group in the multiple data groups as a current data group, the application can obtain a respective sorted list of each data group, wherein each data group except the current data group in the multiple data groups in the sorted list is sorted in a descending order according to the content relevance. For example, in the sorted list of the current data group, the data group having the greatest degree of association with the content of the current data group is sorted first, and the data group having the least degree of association with the content of the current data group is sorted last. And when at least two data groups with the same content relevance degree as the current data group exist, determining the sequence according to the identifiers of the at least two data groups, or determining the sequence of the at least two data groups according to a random mode.
In step 503, a plurality of data groups are selected from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group. The preset selection rule comprises the following steps: the content relevancy in the sorted list is larger than a relevancy threshold, or the content relevancy in the sorted list is ranked before a predetermined ranking, and the like. In general, the relevancy threshold may be determined based on user input or system settings, such as 70%, 80%, 85%, etc. Alternatively, the predetermined ranking is determined based on user input or system settings, e.g., the first 20, 30, 35, etc.
The matching value of content matching of the summary information of any two data groups is used as the content association degree between any two data groups. The summary information is determined for each data group according to the content, action, source and the like of each data group, and is used for describing various relevant characteristics of the data group. The matching value of the summary information of any two data groups is determined through text comparison and semantic comparison peer-to-peer content matching modes.
Fig. 6 is a flow chart of a method 600 for determining an associated data set according to another embodiment of the present invention. As shown in fig. 6, method 600 begins at step 601. In step 601, a running relevance of each data group of the plurality of data groups except the current data group to the current data group is determined. By taking each of the plurality of data sets as a current data set, the application can determine the operational association between any two of the plurality of data sets.
At step 602, each data group other than the current data group is sorted in descending order based on the running relevance to generate a sorted list. By taking each data group in the multiple data groups as a current data group, the application can obtain a respective sorted list of each data group, wherein each data group except the current data group in the multiple data groups in the sorted list is sorted in a descending order according to the operation association degree. For example, in the sorted list of the current dataset, the dataset having the greatest degree of operational association with the current dataset is sorted first, and the dataset having the least degree of operational association with the current dataset is sorted last. And when at least two data groups with the same operation association degree as the current data group exist, determining the sequence according to the identifiers of the at least two data groups, or determining the sequence of the at least two data groups according to a random mode.
In step 603, a plurality of data groups are selected from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group. The preset selection rule comprises the following steps: and running a plurality of data groups with the association degree larger than the association degree threshold value in the ordered list, or running a plurality of data groups with the association degree ranking before the preset ranking in the ordered list. In general, the relevancy threshold may be determined based on user input or system settings, such as 5 times, 8 times, 10 times, etc. Alternatively, the predetermined ranking is determined based on user input or system settings, e.g., the first 20, 30, 35, etc.
The method comprises the steps of obtaining operation history information of each data group, wherein the operation history information comprises operation time, read time, write time, decompression time and the like of each data group. Then, the statistical data of the data groups operated in each basic time unit, that is, which data groups are operated in each basic time unit, is determined according to the operation history information. In general, the basic time unit may be determined from user input or system settings, e.g., 5 minutes, 10 minutes, 15 minutes, etc. The method divides time into continuous basic time units, and takes the basic time units as basic statistical units. Then, the number of times that any two data sets operate in the same basic time unit is determined, and the operation association degree between any two data sets is determined based on the number of times that any two data sets operate in the same basic time unit. For example, if the first data set is run in the 1 st, 3 rd, 5 th, 6 th, 7 th, 8 th, and 9 th basic time units and the second data set is run in the 2 nd, 3 th, 4 th, 6 th, 7 th, 8 th, and 9 th basic time units, the number of times the first data set and the second data set are run in the same basic time units is 5. When any data set is run for a plurality of times in the same basic time unit, the correlation calculation is only executed for 1 time. It should be appreciated that, in determining the operational association degree, the length of the statistical time is generally predetermined, i.e., the operational association degree is calculated according to a predetermined number of basic time units, such as 30 basic time units, 50 basic time units, 60 basic time units, etc.
Fig. 7 is a flow chart of a method 700 for determining an associated data set according to yet another embodiment of the present invention. As shown in fig. 7, method 700 begins at step 701. In step 701, a feedback relevance of each data group of the plurality of data groups except the current data group to the current data group is determined. By taking each of the plurality of data sets as a current data set, the feedback association degree between any two of the plurality of data sets can be determined.
At step 702, each data group other than the current data group is sorted in descending order based on the feedback relevance to generate a sorted list. By taking each data group in the multiple data groups as the current data group, the application can obtain the respective sorted list of each data group, wherein each data group except the data group in the multiple data groups in the sorted list is sorted in a descending order according to the feedback relevance. For example, in the sorted list of the current data group, the data group having the greatest degree of association with the feedback of the current data group is sorted first, and the data group having the smallest degree of association with the feedback of the current data group is sorted last. And when at least two data groups with the same feedback relevance degree as the current data group exist, determining the sequence according to the identifiers of the at least two data groups, or determining the sequence of the at least two data groups according to a random mode.
In step 703, a plurality of data groups are selected from the ordered list according to a preset selection rule as a plurality of associated data groups of the current data group. The preset selection rule comprises the following steps: the feedback association degree of the data groups in the sorted list is larger than the association degree threshold value, or the feedback association degree of the data groups in the sorted list is ranked before the preset ranking. In general, the relevancy threshold may be determined based on user input or system settings, such as relevancy thresholds of 15, 25, 35, etc. Alternatively, the predetermined ranking is determined based on user input or system settings, e.g., the first 20, 30, 35, etc.
The initial value of the feedback association degree between any two data groups is set to be 0, preset association degree rules or dynamic operation data are analyzed to determine the data group pair which needs to be set by the feedback association degree in the multiple data group pairs, and the feedback association degree is set for two data groups in each data group pair which needs to be set by the feedback association degree according to the association degree rules or the dynamic operation data. The preset association degree rule is, for example, an association degree rule preset by a user, an operating system, a system application, a user application, or the like. The association degree rule may be used to indicate a degree of association between any two data groups of the plurality of data groups. For example, the association rule indicates that the association degree of the first data group and the fifth data group is high, for example, the feedback association degree is 20, indicates that the association degree of the first data group and the second data group is medium, for example, the feedback association degree is 15, and indicates that the association degree of the first data group and the third data group is low, for example, the feedback association degree is 10. The dynamic operation data is, for example, a degree of association between any two data groups of the plurality of data groups determined by an operating system, a system application, a user application, and the like based on the operation data of the data groups. For example, the dynamic operational data indicates that the association between the second data set and the fifth data set is high, e.g., the feedback association is 25, indicates that the association between the second data set and the third data set is medium, e.g., the feedback association is 15, and indicates that the association between the second data set and the sixth data set is low, e.g., the feedback association is 5. Wherein, the format of the data group pair is < data group name, data group name > and is used for indicating the feedback relevance between two different data groups. For example, the association degree rule indicates that the association degree of the first data group and the fifth data group is high, for example, the feedback association degree is 15, and the feedback association degree of the data group to < the first data group, the fifth data group > is 15.
Further, the feedback correlation may be a cumulative sum of a plurality of values. For example, when the system application determines that the feedback association degree of the first data group and the second data group is 5 according to the dynamic operation data, the first user application determines that the feedback association degree of the first data group and the second data group is 10 according to the dynamic operation data, and the second user application determines that the feedback association degree of the first data group and the second data group is 15 according to the dynamic operation data, the feedback association degree of the first data group and the second data group is 30.
Fig. 8 is a flow chart of a method 800 for determining an associated data set according to yet another embodiment of the present invention. As shown in fig. 8, method 800 begins at step 801. In step 801, a comprehensive association degree of each data group of the plurality of data groups other than the current data group with the current data group is determined. By taking each of the plurality of data sets as a current data set, the application can determine the comprehensive association degree between any two of the plurality of data sets.
At step 802, each data group other than the current data group is sorted in descending order based on the composite relevance to generate a sorted list. By taking each data group in the multiple data groups as the current data group, the application can obtain the respective sorted list of each data group, wherein each data group except the data group in the multiple data groups in the sorted list is sorted in a descending order according to the comprehensive association degree. For example, in the sorted list of the current data group, the data group having the greatest degree of association with the feedback of the current data group is sorted first, and the data group having the smallest degree of overall association with the current data group is sorted last. And when at least two data groups with the same comprehensive association degree with the current data group exist, determining the sequence according to the identifiers of the at least two data groups, or determining the sequence of the at least two data groups according to a random mode.
In step 803, a plurality of data groups are selected from the ordered list according to a preset selection rule as a plurality of associated data groups of the current data group. The preset selection rule comprises the following steps: the comprehensive association degree of the data groups is larger than the association degree threshold value, or the comprehensive association degree of the data groups is ranked before the preset ranking. In general, the relevancy threshold may be determined based on user input or system settings, such as 70%, 80%, 85%, etc. Alternatively, the predetermined ranking is determined based on user input or system settings, e.g., the predetermined ranking is the top 20, 30, 35, etc.
In the following, the manner of determining the content relevance, running relevance and feedback relevance refers to the contents of fig. 5-7 above.
First embodiment
And performing content matching on the summary information of any two data groups to serve as the content relevance (in percentage form) between any two data groups.
The method comprises the steps of obtaining operation history information of each data set, determining statistical data of the data sets operated in each basic time unit according to the operation history information, determining the number of times of any two data sets operated in the same basic time unit, and determining the operation association degree between any two data sets based on the number of times of operation in the same basic time unit.
Wherein the content relevance and the running relevance of each data group except the current data group in the plurality of data groups and the current data group are weighted to calculate to determine the comprehensive relevance. Specifically, the maximum value of the operational association degrees of each of the plurality of data sets other than the current data set with the current data set, that is, the maximum operational association degree, is determined. Then, the operational association of each of the plurality of data sets other than the current data set with the current data set is divided by the maximum operational association to determine an operational association of each data set as a percentage of the current data set. The same or different weight values are determined according to user settings or system settings, or according to dynamic settings of the running state, the content relevance and the running relevance (in percentage). And according to the determined weight value, carrying out weighted calculation on the content association degree and the running association degree (in percentage form) of each data group except the current data group in the plurality of data groups and the current data group to determine the comprehensive association degree.
Second embodiment
Wherein, the summary information of any two data groups is subjected to content matching as the content relevance (in percentage form) between the any two data groups;
the initial value of the feedback association degree between any two data groups is set to be 0, preset association degree rules or dynamic operation data are analyzed to determine the data group pair which needs to be set by the feedback association degree in the multiple data group pairs, and the feedback association degree is set for two data groups in each data group pair which needs to be set by the feedback association degree according to the association degree rules or the dynamic operation data.
Wherein the content relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree. Specifically, the maximum value of the feedback association degrees of each of the plurality of data groups except the current data group with the current data group, that is, the maximum feedback association degree is determined. The feedback relevance of each of the plurality of data sets other than the current data set to the current data set is then divided by the maximum feedback relevance to determine the feedback relevance as a percentage of each data set to the current data set. The same or different weight values are determined according to user settings or system settings, or according to dynamic settings of the operating state, the content relevance and the feedback relevance (in percentage form). And according to the determined weight value, carrying out weighted calculation on the content association degree and the feedback association degree (in percentage form) of each data group except the current data group in the plurality of data groups and the current data group to determine the comprehensive association degree.
Third embodiment
Acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, preset association degree rules or dynamic operation data are analyzed to determine the data group pair which needs to be set by the feedback association degree in the multiple data group pairs, and the feedback association degree is set for two data groups in each data group pair which needs to be set by the feedback association degree according to the association degree rules or the dynamic operation data.
Wherein the running relevance and the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance. Specifically, the maximum value of the operational association degrees of each of the plurality of data sets other than the current data set with the current data set, that is, the maximum operational association degree, is determined. Then, the operational association of each of the plurality of data sets other than the current data set with the current data set is divided by the maximum operational association to determine an operational association of each data set as a percentage of the current data set. And determining the maximum value of the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group, namely the maximum feedback relevance. The feedback relevance of each of the plurality of data sets other than the current data set to the current data set is then divided by the maximum feedback relevance to determine the feedback relevance as a percentage of each data set to the current data set. The same or different weight values are determined according to user settings or system settings, or according to dynamic settings of the operating state, such as the operating association degree (in the form of a percentage) and the feedback association degree (in the form of a percentage). And according to the determined weight values, performing weighted calculation on the running relevance (in percentage form) and the feedback relevance (in percentage form) of each data group except the current data group in the plurality of data groups and the current data group to determine the comprehensive relevance.
Fourth embodiment
Wherein, the summary information of any two data groups is subjected to content matching as the content relevance (in percentage form) between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, preset association degree rules or dynamic operation data are analyzed to determine the data group pair which needs to be set by the feedback association degree in the multiple data group pairs, and the feedback association degree is set for two data groups in each data group pair which needs to be set by the feedback association degree according to the association degree rules or the dynamic operation data.
Wherein the content relevance degree, the operation relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree. Specifically, the maximum value of the operational association degrees of each of the plurality of data sets other than the current data set with the current data set, that is, the maximum operational association degree, is determined. Then, the operational association of each of the plurality of data sets other than the current data set with the current data set is divided by the maximum operational association to determine an operational association of each data set as a percentage of the current data set. And determining the maximum value of the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group, namely the maximum feedback relevance. Then, the feedback association degree of each data group except the current data group in the plurality of data groups with the current data group is divided by the maximum feedback association degree to determine the feedback association degree in the form of the percentage of each data group with the current data group. The same or different weight values are determined according to user settings or system settings, or according to dynamic settings of the operating state as content relevance (in the form of a percentage), operating relevance (in the form of a percentage), and feedback relevance (in the form of a percentage). According to the determined weight value, the content relevance (in percentage form), the running relevance (in percentage form) and the feedback relevance (in percentage form) of each data set except the current data set in the plurality of data sets and the current data set are weighted to calculate to determine the comprehensive relevance. For example, the weight values for the content relevance (in percent), the running relevance (in percent), and the feedback relevance (in percent) are 1/3, 1/3, and 1/3; 1/2, 1/4, and 1/4; 1/4, 1/4, and 1/2, and the like.
Wherein setting an association level for each associated data group according to the association degree of each associated data group with the data group to be removed comprises: when the association degree of the associated data set and the data set to be moved out is greater than or equal to a first association degree threshold value, setting the association level of the associated data set to be a high association level; when the association degree of the associated data set and the data set to be moved out is smaller than a first association degree threshold value and larger than or equal to a second association degree threshold value, setting the association level of the associated data set as a medium association level; and when the association degree of the associated data set and the data set to be moved out is smaller than a second association degree threshold value, setting the association level of the associated data set to be a low association level. Wherein the first relevance threshold (e.g., 95%, 90%, etc.) is greater than the second relevance threshold (e.g., 75%, 80%, etc.).
Alternatively, the method further comprises setting an association grade for each association data group in the association statistical information based on the number of association operation times, the association operation time and the number of synchronous starting times of each association data group in the plurality of association data groups and the data group to be removed when the association operation is performed. Before that, operation history information of each data set is acquired (the operation history information is usually stored in an operation history file and may be obtained by recording information such as a log file, for example), and the number of associated operations, the associated operation time, and the number of synchronization starts of each associated data set with the data set to be removed when the associated operation is performed are determined according to the operation history information.
Wherein the number of associated runs is the number of associated runs (e.g., 5, 6, 8, etc.) made by the two data sets over a statistical time period (e.g., 10 minutes, 20 minutes, 30 minutes, etc.); wherein the correlation runtime is a length of time (e.g., 2 minutes, 3 minutes, 5 minutes, etc.) for which the correlation run is performed for the two data sets within a statistical time period; wherein the number of synchronous starts is the number of times (e.g., 3 times, 4 times, 5 times, etc.) that two data sets are synchronously started within a statistical time period; the correlated operation refers to the difference of the time for which the two data sets are respectively called or started to operate is larger than a first preset time interval and smaller than a second preset time interval. Wherein synchronous start means that the difference between the times at which the two data sets are respectively called or started to run is less than or equal to a first predetermined time interval. Wherein the second predetermined time interval is greater than the first predetermined time interval, for example the second predetermined time interval is 30 seconds and the first predetermined time interval is 5 seconds.
After determining the association level of the data set, the application may filter or filter the associated data set of the data set to be removed (or any data set) according to the association level. For example, determining, based on the associated statistical information of the data group to be removed, a plurality of associated data groups that need to be associated and run at the time of running the data group to be removed includes: and determining a plurality of associated data groups which need to be operated in an associated manner and have high association levels when the data group to be removed runs based on the associated statistical information of the data group to be removed. Determining, based on the associated statistical information of the data set to be removed, that a plurality of associated data sets that need to be operated in association when the data set to be removed is operated includes: and determining a plurality of associated data groups of which the association level is the middle association level and which need to be operated in association with the data group to be removed during operation based on the association statistical information of the data group to be removed. Determining, based on the associated statistical information of the data group to be removed, that the data group to be removed needs to be associated and run during running includes: and determining a plurality of associated data groups which need to be associated and operated at the time of operation and have the association levels of a high association level and a medium association level based on the associated statistical information of the data group to be removed.
In step 103, each of the plurality of data segments in the first memory is scanned to determine the number of associated data sets included in each data segment, and the data segment having the largest number of associated data sets in the plurality of data segments, of which the remaining space can accommodate the data set to be removed, is determined as the current data segment (from the largest number until it is determined that the data segment meets the requirement). When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be removed, randomly selecting one data segment from the plurality of data segments as a current data segment; or when a plurality of data segments with the most number of associated data groups exist and the remaining space can accommodate the data group to be moved out, selecting the data segment with the largest remaining space from the plurality of data segments as the current data segment; or when a plurality of data segments having the largest number of associated data sets and having a remaining space capable of accommodating the data set to be moved out exist, selecting a data segment having the smallest remaining space from the plurality of data segments as the current data segment.
And when the remaining space of the data segments with the most number of the associated data groups cannot accommodate the data group to be moved out, selecting the data segment with the second most number of the associated data groups and judging whether the remaining space can accommodate the data group to be moved out, until the remaining space is determined to accommodate the data segment with the most number of the associated data groups in the plurality of data segments of the data group to be moved out.
For example, data segment 204-1 includes 10 associated data sets, data segment 204-2 includes 20 associated data sets, data segment 204-3 includes 20 associated data sets, and data segment 204-4 includes 30 associated data sets. The remaining space of data segments 204-1, 204-2, and 204-3 can accommodate the data set to be removed, but the remaining space of data segment 204-4 cannot accommodate the data set to be removed (e.g., the remaining space is zero or the remaining space is below a minimum storage threshold such that no additional data sets can be stored). The data segments 204-2 and 204-3 may be determined to be candidates for a current data segment, for which a random one of the data segments 204-2 and 204-3, the data segment with the largest remaining space, or the data segment with the smallest remaining space may be selected as the current data segment (e.g., data segment 204-2).
In step 104, a current data area allocated to the data group to be moved out in the plurality of data areas in the current data segment is determined, and the data group to be moved out is moved from the cache to the current data area of the current data segment. Wherein determining a current data region allocated for the data group to be shifted out among a plurality of data regions within the current data segment comprises: and randomly distributing a data area for the data group to be shifted out in the plurality of data areas of the current data segment to serve as the current data area. Or calculating the hash value of the identifier of the data group to be moved out, and selecting one data area from the plurality of data areas of the current data segment as the current data area according to the hash value. Or taking the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area. Or taking the data area with the minimum residual storage space in the plurality of data areas of the current data segment as the current data area.
The first memory includes therein a data segment index information table, and the data segment index information table includes data segment index information for each of the first buffer area, the second buffer area, the storage area, and the reserved storage area. The data segment index information includes, for the storage area, an identification, a start address, an end address, a storage capacity, and the like of each data segment. The data segment index information includes, for the first buffer and the second buffer, an identification, a start address, an end address, a storage capacity, and the like of each compressed data segment. For the reserved storage area, the data segment index information includes an identification, a start address, an end address, a storage capacity, and the like of each compressed data segment or uncompressed data segment.
Preferably, the data piece index information of each of the first buffer area, the second buffer area, the storage area, and the reserved storage area may constitute a single data piece index information table, and this single data piece index information table is stored in the boot area. Alternatively, the data piece index information of each of the first buffer area, the second buffer area, the storage area, and the reserved storage area may constitute a respective data piece index information table, and the data piece index information tables of the first buffer area, the second buffer area, the storage area, and the reserved storage area may be stored in respective storage areas. The data segment index information may include a tuple < identifier of data group, identifier of belonging data segment >.
In step 105, scanning all data areas except the current data area in the current data section, determining the data areas except the current data area in the current data section and having at least one associated data group as associated data areas, and determining the compression rate and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data section, wherein the compression rate comprises a high compression rate, a medium compression rate and a low compression rate, and the compression degrees of the high compression rate, the medium compression rate and the low compression rate are sequentially increased; wherein the compression stage comprises: the first compression stage, the second compression stage, the third compression stage and the fourth compression stage are sequentially reduced in compression order.
And the directory area of each data segment or compressed data segment stores a data area index information table. The data area index information table may include a doublet < identifier of data group, identifier of belonging (compressed) data area >. And determining an identifier of each associated data group, inquiring the index information table of each data area according to the identifier of the associated data group, and determining the data area where each associated data group is located in the current data segment according to the inquiry result. The associated data areas are determined from the data area in which each associated data set is located and each associated data area has at least one associated data set therein.
For example, the high compression ratio is 90%, the medium compression ratio is 80%, and the low compression ratio is 70%. Alternatively, the high compression ratio is a compression ratio of 89% to 99%, the medium compression ratio is a compression ratio of 79% to 89% (excluding 89%), and the low compression ratio is a compression ratio of less than 79%. Alternatively, the high compression ratio is a compression ratio of 85% -100% (excluding 100%), the medium compression ratio is a compression ratio of 70% -85% (excluding 85%), and the low compression ratio is a compression ratio of less than 70%. The above values are exemplary only, and one skilled in the art will appreciate that the compression ratio or range of values can be any reasonable value.
The compression order of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage decreases in sequence. It should be appreciated that the first, second, third, and fourth compression stages may be used to indicate a sequential level of compression of the data set, such as a first batch, a second batch, a third batch, and a fourth batch, etc. Alternatively, two or more compression stages are determined to be the same batch.
At step 106, the current data region is set to a high compression ratio and marked as the first compression stage. According to the method and the device, the current data area is used as an area with a high probability of being accessed, so that the current data area is compressed to the minimum degree and is subjected to priority processing.
Determining the compression rate and the compression level of each associated data area according to the highest association level referred to by the associated data group in each associated data area in the current data segment (that is, taking the association level of the associated data group with the highest association level in at least one associated data group in each associated data area as the highest association level referred to by the associated data group) comprises: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to a high compression rate, a medium compression rate or a low compression rate, respectively; the method comprises the steps of marking the related data area with the highest related level as a high related level as a second compression level, marking the related data area with the highest related level as a medium related level as a third compression level and marking the related data area with the highest related level as a low related level as a fourth compression level.
For example, within the current data segment are associated data areas 1, 2, and 3, and the associated data area 1 has 1 associated data group of high association level and 3 data groups of medium association level, the associated data area 2 has 5 data groups of low association level, and the associated data area 3 has 2 associated data groups of medium association level and 1 data group of low association level. It can be seen that the highest association level related to the associated data area 1 is a high association level, and the compression rate of the associated data area 1 is set to be a high compression rate and marked as a second compression level; the highest association level related to the associated data area 2 is a low association level, and the compression rate of the associated data area 2 is set as a low compression rate and marked as a fourth compression level; the highest association level to which the associated data area 3 relates is the intermediate association level, and the compression rate of the associated data area 3 is set to the intermediate compression rate and marked as the third compression level.
In step 107, compression is performed within the current data segment (from low address to high address direction, or from high address to low address direction — one scheme is extended) according to the compression level and compression rate, as shown in fig. 4. Fig. 4 is a flow diagram of a method 400 of compression level and compression rate compression within a current data segment according to an embodiment of the present invention. The method 400 begins at step 401. In step 401, compressing the data group in the current data area marked as the first compression stage according to a high compression rate; at step 402, compressing the data group in the associated data area (which may be a plurality of associated data areas) marked as the second compression stage at a high compression rate; in step 403, compressing the data group in the associated data area (which may be a plurality of associated data areas) marked as the third compression stage at the medium compression rate; at step 404, the data groups in the associated data area (which may be a plurality of associated data areas) labeled as the fourth compression stage are compressed at a low compression ratio. The sequence of the compression in the current data section is determined according to the compression stage, namely, after all data groups in the current data area of the first compression stage are compressed, all data groups in the associated data area of the second compression stage are compressed, and so on.
Wherein compressing the data groups in the associated data area labeled as the second compression stage comprises: the data groups in the plurality of associated data areas marked as the second compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a high compression ratio. Wherein compressing the data groups in the associated data area labeled as the third compression stage comprises: the data groups in the plurality of associated data areas marked as the third compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at the medium compression rate. Wherein compressing the data groups in the associated data area labeled as the fourth compression stage comprises: the data groups in the plurality of associated data areas labeled as the fourth compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a low compression ratio.
At step 108, at least one associated data segment of the plurality of data segments, which is other than the current data segment and has associated data sets, is determined while the current data segment is compressed according to the compression level and the compression rate, wherein the associated data set is stored in at least one associated data area in each associated data segment.
And the directory area of each data segment or compressed data segment stores a data area index information table. The data area index information table may include a doublet < identifier of data group, identifier of belonging (compressed) data area >. Determining an identifier of each associated data set, inquiring each data area index information table of each data section except the current data section according to the identifier of the associated data set, and determining the data section (associated data section) where each associated data set is located and the data area where the data section is located according to the inquiry result. A data segment having an associated data set is determined as an associated data segment. And determining an associated data area according to the data area in which each associated data group is located in the associated data section, wherein each associated data area has at least one associated data group.
After compression according to the compression level and the compression rate in the current data segment, the compressed data group(s) generated by compressing the data group(s) in the current data area of the current data segment is moved to the first buffer area of the first memory. Moving the compressed data group(s) resulting from the compression of the data group(s) in each associated data region of the current data segment into the second buffer region of the first memory. After moving the compressed data group(s) into the first buffer of the first memory, initializing the storage space of the current data area. After moving the compressed data set(s) into the second buffer of the first memory, the storage space of each associated data area is initialized. That is, after compressing the current data area and the plurality of data groups in each associated data area (all data groups in the current data area and each associated data area) in the current data segment, the storage space of the current data area and each associated data area is initialized. The initialization is, for example, memory space reset, memory content deletion, or the like, so that the current data area and each associated data area can store new data.
In step 109, determining the compression rate and the compression level of each associated data area in the associated data segment according to the highest association level involved by the associated data group in each associated data area in the at least one associated data area of each associated data segment includes: when the highest association level involved in the associated data area is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to be a high compression rate, a medium compression rate or a low compression rate correspondingly; the associated data area whose highest involved association level is a high association level is marked as the second compression level, and the associated data area whose highest involved association level is a medium association level or a low association level is marked as the third compression level.
Determining the compression rate and the compression level of each associated data area according to the highest association level referred to by the associated data set in each associated data area (in other words, regarding the association level of the associated data set with the highest association level in at least one associated data set in each associated data area as the highest association level referred to by the associated data set) in the associated data area (in each of the plurality of associated data areas) comprises: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to be a high compression rate, a medium compression rate or a low compression rate, respectively; the method comprises the steps of marking the related data area with the highest related level as a high related level as a second compression level, marking the related data area with the highest related level as a medium related level as a third compression level and marking the related data area with the highest related level as a low related level as a fourth compression level.
Determining the compression rate and the compression level of each associated data area in each associated data section according to the highest association level related to the associated data group in each associated data area in at least one associated data area of each associated data section (in other words, taking the association level of the associated data group with the highest association level in at least one associated data group in each associated data area as the highest association level related to the associated data group) comprises: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate, respectively; the associated data area of which the highest concerned association level is the high association level is marked as the second compression level, and the associated data area of which the highest concerned association level is the medium association level or the low association level is marked as the third compression level.
For example, associated data areas 1 and 2 are provided in associated data segment 1, and associated data area 3 is provided in associated data segment 2. The associated data area 1 has 1 associated data group of high association level and 3 data groups of medium association level, the associated data area 2 has 5 data groups of low association level, and the associated data area 3 has 2 associated data groups of medium association level and 1 data group of low association level. Therefore, the highest association level related to the associated data area 1 is a high association level, and the compression rate of the associated data area 1 is set to be a high compression rate and marked as a second compression level; the highest association level related to the associated data area 2 is a low association level, and the compression rate of the associated data area 2 is set to be a low compression rate and marked as a third compression level; the highest association level to which the associated data area 3 relates is the intermediate association level, and the compression rate of the associated data area 3 is set to the intermediate compression rate and marked as the third compression level.
In step 110, in response to completion of compression by compression level and compression ratio within the current compressed data segment, compression by compression level and compression ratio is performed within each associated compressed data segment of the at least one associated compressed data segment (where each associated compressed data segment is parallel, or serial): firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate; then, the associated data area marked as the third compression stage and the medium compression rate in each associated compressed data section is compressed at the medium compression rate, and the associated data area marked as the third compression stage and the low compression rate in each associated compressed data section is compressed at the same time at the low compression rate.
Wherein a data segment having at least one associated data group is selected as an associated data segment, and a data area having at least one associated data group within the associated data segment is determined as an associated data area.
Wherein compressing the associated data region labeled as the second compression level within each associated compressed data segment at a high compression ratio comprises: and compressing the associated data area marked as the second compression stage in each associated compressed data segment according to a high compression rate based on the serial or parallel mode of the plurality of associated compressed data segments. Wherein compressing the associated data regions labeled as the third compression stage and the medium compression ratio within each associated compressed data segment according to the medium compression ratio comprises: and compressing the associated data area marked as the third compression level and the intermediate compression level in each associated compressed data segment according to the intermediate compression rate based on the serial or parallel mode of the plurality of associated compressed data segments. Wherein compressing the associated data regions labeled as the third compression stage and the low compression ratio within each associated compressed data segment at the low compression ratio comprises: and compressing the associated data area marked as the third compression stage and the low compression rate in each associated compressed data segment according to the low compression rate based on the serial or parallel mode of the plurality of associated compressed data segments.
And moving a plurality of compressed data groups generated by compressing a plurality of data groups in each associated data area (all data groups in each associated data area) in at least one associated data area in each associated data segment into a second buffer area of the first memory. Initializing a storage space of each of at least one associated data region within each associated data segment after moving the compressed data set into a second buffer region of the first memory. That is, after compressing a plurality of data groups (all data groups in each associated data area) in each associated data area within each associated data segment, the storage space of each associated data area is initialized. The initialization is, for example, memory space reset, memory content deletion, or the like, so that the current data area and each associated data area can store new data.
And determining the number of times of accessing each data group in all data groups (stored in the storage area) in the first storage in each third preset time interval, and moving the data groups with the number of times of accessing in the current third preset time interval higher than the threshold value of the third time number into the cache. Or determining the number of times of accessing each compressed data group in all the compressed data groups (stored in the first buffer area or the second buffer area) in the first memory within each third predetermined time, decompressing and moving the compressed data groups which are accessed within the current third predetermined time interval and have the number of times of accessing higher than the third time threshold value into the cache.
FIG. 9 is a block diagram of a system 900 for dynamic processing of sets of data based on in-cache evictions, in accordance with an embodiment of the invention. As shown in fig. 9, the system 900 includes:
the statistical unit 901 performs real-time statistics on the number of times of access to each of a plurality of data sets in a cache of an processor in the mobile terminal, and determines a data set with the number of times of access within a predetermined time period lower than a first time threshold as a data set to be moved out to be moved from the cache to a first memory.
Where the predetermined period of time (e.g., 8 hours, 16 hours, 1 day, etc.) is a period of time with the current time as the end point and the past time as the starting point. The application counts the access records (including the identifier and the access time of the data group) of each data group (in the cache or the first memory) of the processor in real time and determines the real-time statistical information of the accessed times of each data group according to the access records. The predetermined period of time may be considered as a moving time window progressing along time, and a length of time of the moving time window (i.e., the predetermined period of time) may be determined according to a user setting or a system configuration, i.e., the length of time of the predetermined period of time may be determined according to a system configuration or a user setting.
The mobile terminal may be any type of mobile device including a mobile handset, station, unit, device, multimedia tablet, communicator, laptop, Personal Digital Assistant (PDA), or any combination thereof. Generally, a mobile terminal may be communicatively coupled to other devices, such as mobile devices, servers, base stations, etc., via a network connection. Where the cache is a cache memory internal or external to the processor. A cache memory is a primary memory located between a main memory (memory, the first memory in this application) and a processor, and is composed of static memory chips (static random access memory SRAM), and has a relatively small capacity but a much higher speed than the main memory, which is close to the speed of the processor.
In addition, the method and the device perform real-time statistics on the number of times of access to each data group in a plurality of data groups in each data section in a first memory of the mobile terminal, and determine the data group (or the compressed data group) with the number of times of access higher than a second time threshold value in a preset time period as a data group (or the compressed data group) to be moved into, which is ready to be moved from the first memory to a cache.
Wherein the second count threshold is greater than the first count threshold, or the second count threshold is less than the first count threshold. For example, the first-order threshold value is 300 times, 500 times, 1000 times, etc., and the second-order threshold value is 300 times, 500 times, 1000 times, etc. Further, the number of times of access is the number of times each data set is accessed by the processor.
Alternatively, when the remaining storage space ratio of the buffer memory is higher than the move-in threshold (e.g., 20%, 30%, 40%), at least one data set to be moved into the queue is configured as a queue with increasing storage size, and the at least one data set to be moved into the buffer memory is moved into the buffer memory in the queue order from the data set to be moved with the smallest storage size until the remaining storage space ratio of the buffer memory is lower than the move-out threshold (e.g., 10%, 15%) after the next data set to be moved into the buffer memory. Wherein the move-in threshold is greater than the move-out threshold. For example, when the remaining storage space ratio of the buffer memory is 25% and the move-in threshold is 20%, the data groups 1 to 10 to be moved in are formed into a queue that is incremented by the storage size. Starting from the movement of the data set 1 to be moved into the cache, the next data set to be moved is judged. And when the remaining storage space ratio of the cache is 12% and the moving-out threshold value is 15% after the next data group to be moved is moved into the cache, stopping moving the data group to be moved after the next data group to be moved is moved into the cache.
And when the remaining storage space ratio of the cache is higher than the move-in threshold, forming at least one data group to be moved into a queue with the decreasing storage size, and moving the at least one data group to be moved into the cache in the queue order from the data group with the largest storage size to be moved into the cache until the remaining storage space ratio of the cache is lower than the move-out threshold after moving the next data group to be moved into the cache.
The method further comprises the steps of forming at least one data group to be moved into a queue with the storage time decreasing when the ratio of the remaining storage space of the cache is higher than a move-in threshold, and moving the at least one data group to be moved into the cache from the data group to be moved with the longest storage time according to the queue sequence until the ratio of the remaining storage space of the cache is lower than the move-out threshold after the next data group to be moved into the cache.
The method further comprises the steps of forming a queue with increasing storage time by at least one data group to be moved when the ratio of the remaining storage space of the cache is higher than a move-in threshold, and moving the at least one data group to be moved into the cache from the data group to be moved with the shortest storage time according to the queue sequence until the ratio of the remaining storage space of the cache is lower than the move-out threshold after the next data group to be moved into the cache.
When the operating system of the mobile terminal is detected to be loaded into the first memory and the starting of the operating system is completed, a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and a file package associated with each application in the plurality of applications to be loaded is copied from a second memory into the first memory. Wherein the first memory is a volatile memory, such as a random access memory RAM, memory, and the second memory is a non-volatile memory, such as a flash memory. The operating system is stored in the second memory when the mobile terminal is powered off, and is loaded from the second memory into the first memory when the mobile terminal is started to run. The loading of the configuration file may be preset at the time of factory shipment of the mobile terminal or may be preset by a user of the mobile terminal. The loading configuration file may record one or more applications that can be automatically started when the operating system is started, among the plurality of applications in the mobile terminal. Typically, each application has an associated or at least one file package, and each file package may include multiple sub-packages of files therein, i.e., the file structure of each application is constructed in a hierarchical manner.
After the operating system of the mobile terminal is started up (which is when the system service, the system application, the resource management, the network initialization, and the like of the operating system are started up and the user application is not loaded), and before the plurality of applications to be loaded of the mobile terminal are determined according to the preset loading configuration file, a plurality of data segments for storing data are created in the storage area of the first memory, wherein each data segment comprises a plurality of data areas, as shown in fig. 2. Each of the plurality of data segments includes a plurality of data areas, as shown in fig. 3.
Wherein the package of files associated with each application includes at least one data set and in the first memory or cache, the data set is used as a base storage unit when storing data. Alternatively, in the first memory, a compressed data group is used as a basic storage unit when storing data. That is, in the first memory, each application is stored in the form of a plurality of compressed data groups or uncompressed data groups.
Wherein an arbitrary data group is stored in a single data area of the data section, and at least one data group can be stored in the single data area. The second memory includes a plurality of compressed data segments for storing compressed data therein, wherein each compressed data segment includes a plurality of compressed data regions, and each compressed data region includes a plurality of sub-regions. Wherein the file package associated with each application includes at least one compressed data group, and in the second storage, the compressed data group is used as a basic storage unit when data is compressed and stored. Wherein any of the compressed data sets is stored in a single data area of the data segment and at least one data set can be stored in the single data area.
An associating unit 902, configured to determine, based on the association statistical information of the to-be-removed data set, multiple association data sets that need to be operated in association and are stored in the first memory when the to-be-removed data set is operated, and set an association level for each association data set according to an association degree between each association data set and the to-be-removed data set, where the association level includes: a high association level, a medium association level, and a low association level. After determining a plurality of applications to be loaded according to a preset loading configuration file, copying an association statistical file from a second memory to a first memory, wherein the association statistical file comprises a plurality of pieces of association statistical information, and each piece of association statistical information is used for indicating a plurality of associated data groups of each data group.
The associating unit 902 determines a content association degree of each of the plurality of data sets except the current data set with the current data set, performs descending order arrangement on each of the plurality of data sets except the current data set based on the content association degree to generate an ordered list, and selects a plurality of data sets from the ordered list according to a preset selection rule as a plurality of associated data sets of the current data set; and determining the content relevance between any two data groups by carrying out content matching on the summary information of any two data groups according to the matching value. The preset selection rule comprises the following steps: the content relevancy in the sorted list is larger than a relevancy threshold value, or the content relevancy in the sorted list is ranked before a preset ranking.
The associating unit 902 determines an operation association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the operation association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; the method comprises the steps of obtaining operation history information of each data set, determining statistical data of the data sets operated in each basic time unit according to the operation history information, determining the number of times of any two data sets operated in the same basic time unit, and determining the operation association degree between any two data sets based on the number of times of operation in the same basic time unit. The preset selection rule comprises the following steps: and running a plurality of data groups with the association degree larger than the association degree threshold value in the ordered list, or running a plurality of data groups with the association degree ranking before the preset name time in the ordered list.
The associating unit 902 determines a feedback association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the feedback association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule. The preset selection rule comprises the following steps: the feedback association degree of the data groups in the sorted list is larger than the association degree threshold value, or the feedback association degree of the data groups in the sorted list is ranked before the preset ranking.
The associating unit 902 determines a comprehensive association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups; acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit; wherein the content relevance and the running relevance of each data group except the current data group in the plurality of data groups and the current data group are weighted to calculate to determine the comprehensive relevance. The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than the multiple data groups of the association degree threshold value, or the comprehensive association degree in the sorted list is ranked before a preset name.
The associating unit 902 determines a comprehensive association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups; the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be subjected to feedback association degree setting in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule. Wherein the content relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree. The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a preset ranking.
The associating unit 902 determines a comprehensive association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit; the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups, which needs to be set by the feedback association degree, in the data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule. Wherein the running relevance and the feedback relevance of each data set except the current data set in the plurality of data sets and the current data set are subjected to weighted calculation to determine the comprehensive relevance. The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a preset ranking.
The associating unit 902 determines a comprehensive association degree of each data group of the plurality of data groups except the current data group with the current data group, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects a plurality of data groups from the ordered list according to a preset selection rule to serve as a plurality of associated data groups of the current data group; the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups; acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit; the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups, which needs to be set by the feedback association degree, in the data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule. Wherein the content relevance degree, the operation relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree. The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a preset ranking.
The setting, by the associating unit 902, an association level for each associated data set according to the association degree between each associated data set and the data set to be removed includes: when the association degree of the associated data set and the data set to be moved out is greater than or equal to a first association degree threshold value, setting the association level of the associated data set to be a high association level; when the association degree of the associated data set and the data set to be moved out is smaller than a first association degree threshold value and larger than or equal to a second association degree threshold value, setting the association level of the associated data set to be a medium association level; and when the association degree of the associated data set and the data set to be removed is smaller than a second association degree threshold value, setting the association level of the associated data set to be a low association level. Wherein the first relevancy threshold is greater than the second relevancy threshold.
The associating unit 902 sets an association level for each associated data set in the association statistical information based on the number of times of association operation, and the number of times of synchronization start of each associated data set in the plurality of associated data sets with a data set to be removed when performing association operation. The associating unit 902 obtains the operation history information of each data set, and determines the number of associating operations, and the number of synchronization start times of each associated data set with the data set to be removed during the associating operations according to the operation history information.
The correlation operation times are the times of correlation operation of the two data sets in a statistical time period; the correlation operation time is the time length of the correlation operation of the two data sets in the statistical time period; the synchronous starting times are the times of synchronous starting of the two data groups in a statistical time period; the correlated operation means that the difference of the time for which the two data sets are respectively called or started to operate is larger than a first preset time interval and smaller than a second preset time interval; wherein synchronous start means that the difference between the times at which the two data sets are respectively called or started to run is less than or equal to a first predetermined time interval.
The associating unit 902 determines, based on the associated statistical information of the to-be-removed data set, that a plurality of associated data sets that need to be associated and run when the to-be-removed data set runs include: the associating unit 902 determines, based on the association statistical information of the to-be-removed data group, a plurality of associated data groups, of which the to-be-removed data group needs to be associated and operated at the time of operation and the association level is a high association level. The associating unit 902 determines, based on the associated statistical information of the to-be-removed data set, that the plurality of associated data sets that need to be associated for operation when the to-be-removed data set is in operation include: the associating unit 902 determines, based on the association statistical information of the data group to be removed, a plurality of associated data groups of which the association level is the middle association level and of which the data group to be removed needs to be associated and operated at the time of operation. The associating unit 902 determines, based on the association statistical information of the to-be-removed data set, that the multiple associated data sets that need to be associated and run when the to-be-removed data set runs, include: the associating unit 902 determines, based on the association statistical information of the to-be-removed data set, a plurality of associated data sets that need to be associated and run at runtime of the to-be-removed data set and have association levels of a high association level and a medium association level.
The first scanning unit 903 scans each of the plurality of data segments in the first memory to determine the number of associated data sets included in each data segment, and determines the data segment having the largest number of associated data sets in the plurality of data segments, of which the remaining space can accommodate the data set to be removed, as the current data segment (from the largest number until it is determined that the data segment meets the requirement). When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be shifted out, randomly selecting one data segment from the plurality of data segments as a current data segment; or when a plurality of data segments with the most number of associated data groups exist and the remaining space can accommodate the data group to be moved out, selecting the data segment with the largest remaining space from the plurality of data segments as the current data segment; or when a plurality of data segments having the largest number of associated data sets and having a remaining space capable of accommodating the data set to be moved out exist, selecting a data segment having the smallest remaining space from the plurality of data segments as the current data segment.
And when the remaining space of the data segments with the most number of the associated data groups cannot accommodate the data group to be moved out, selecting the data segment with the second most number of the associated data groups and judging whether the remaining space can accommodate the data group to be moved out, until the remaining space is determined to accommodate the data segment with the most number of the associated data groups in the plurality of data segments of the data group to be moved out.
For example, data segment 204-1 includes 10 associated data sets, data segment 204-2 includes 20 associated data sets, data segment 204-3 includes 20 associated data sets, and data segment 204-4 includes 30 associated data sets. The remaining space of data segments 204-1, 204-2, and 204-3 can accommodate the data set to be removed, but the remaining space of data segment 204-4 cannot accommodate the data set to be removed (e.g., the remaining space is zero or the remaining space is below a minimum storage threshold such that no additional data sets can be stored). The data segments 204-2 and 204-3 may be determined to be candidates for a current data segment, for which a random one of the data segments 204-2 and 204-3, the data segment with the largest remaining space, or the data segment with the smallest remaining space may be selected as the current data segment (e.g., data segment 204-2).
The moving unit 904 determines a current data area allocated to the data group to be shifted out from a plurality of data areas in the current data segment, and moves the data group to be shifted out from the buffer into the current data area of the current data segment. Wherein determining a current data region allocated for the data group to be shifted out among a plurality of data regions within the current data segment comprises: and randomly distributing a data area for the data group to be shifted out in a plurality of data areas of the current data segment to serve as the current data area. Or calculating the hash value of the identifier of the data group to be removed, and selecting one data area from the plurality of data areas of the current data segment as the current data area according to the hash value. Or taking the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area. Or taking the data area with the minimum residual storage space in the plurality of data areas of the current data segment as the current data area.
The first memory includes therein a data segment index information table, and the data segment index information table includes data segment index information for each of the first buffer area, the second buffer area, the storage area, and the reserved storage area. The data segment index information includes, for the storage area, an identification, a start address, an end address, a storage capacity, and the like of each data segment. The data segment index information includes, for the first buffer and the second buffer, an identification, a start address, an end address, a storage capacity, and the like of each compressed data segment. For the reserved storage area, the data segment index information includes an identification, a start address, an end address, a storage capacity, and the like of each compressed data segment or uncompressed data segment.
Preferably, the data piece index information of each of the first buffer area, the second buffer area, the storage area, and the reserved storage area may constitute a single data piece index information table, and this single data piece index information table is stored in the boot area. Alternatively, the data piece index information of each of the first buffer area, the second buffer area, the storage area, and the reserved storage area may constitute a respective data piece index information table, and the data piece index information tables of the first buffer area, the second buffer area, the storage area, and the reserved storage area may be stored in respective storage areas. The data segment index information may include a tuple < identifier of data group, identifier of belonging data segment >.
A second scanning unit 905 that scans all data areas except the current data area in the current data segment, determines a data area except the current data area and having at least one associated data group in the current data segment as an associated data area, and determines a compression rate and a compression level of each associated data area according to a highest associated level related to the associated data group in each associated data area in the current data segment, wherein the compression rate includes a high compression rate, a medium compression rate, and a low compression rate, and the compression degrees of the high compression rate, the medium compression rate, and the low compression rate increase sequentially; wherein the compression stage comprises: the first compression stage, the second compression stage, the third compression stage and the fourth compression stage are sequentially reduced in compression order.
And the directory area of each data segment or compressed data segment stores a data area index information table. The data area index information table may include a doublet < identifier of data group, identifier of belonging (compressed) data area >. And determining an identifier of each associated data group, inquiring the index information table of each data area according to the identifier of the associated data group, and determining the data area where each associated data group is located in the current data segment according to the inquiry result. The associated data areas are determined from the data area in which each associated data set is located and each associated data area has at least one associated data set therein.
For example, the high compression ratio is 90%, the medium compression ratio is 80%, and the low compression ratio is 70%. Alternatively, the high compression ratio is a compression ratio of 89% to 99%, the medium compression ratio is a compression ratio of 79% to 89% (excluding 89%), and the low compression ratio is a compression ratio of less than 79%. Alternatively, the high compression ratio is a compression ratio of 85% -100% (excluding 100%), the medium compression ratio is a compression ratio of 70% -85% (excluding 85%), and the low compression ratio is a compression ratio of less than 70%. The above values are exemplary only, and one skilled in the art will appreciate that the compression ratio or range of values can be any reasonable value.
The compression order of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage decreases in sequence. It should be appreciated that the first, second, third, and fourth compression stages may be used to indicate a sequential level of compression of the data set, such as a first batch, a second batch, a third batch, and a fourth batch, etc. Alternatively, two or more compression stages are determined to be the same batch.
A first setting unit 906, configured to set the current data area to a high compression rate and mark the current data area as a first compression level, wherein determining the compression rate and the compression level of each associated data area according to a highest association level related to an associated data group in each associated data area in the current data segment includes: when the highest related level is a high related level, a medium related level or a low related level, setting the compression rate of the related data area to be a high compression rate, a medium compression rate or a low compression rate, respectively; the method comprises the steps of marking the related data area with the highest related level as a high related level as a second compression level, marking the related data area with the highest related level as a medium related level as a third compression level and marking the related data area with the highest related level as a fourth compression level.
Determining the compression rate and the compression level of each associated data area according to the highest association level referred to by the associated data group in each associated data area in the current data segment (that is, taking the association level of the associated data group with the highest association level in at least one associated data group in each associated data area as the highest association level referred to by the associated data group) comprises: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to a high compression rate, a medium compression rate or a low compression rate, respectively; the method comprises the steps of marking the related data area with the highest related level as a high related level as a second compression level, marking the related data area with the highest related level as a medium related level as a third compression level and marking the related data area with the highest related level as a low related level as a fourth compression level.
For example, within the current data segment are associated data areas 1, 2, and 3, and the associated data area 1 has 1 associated data group of high association level and 3 data groups of medium association level, the associated data area 2 has 5 data groups of low association level, and the associated data area 3 has 2 associated data groups of medium association level and 1 data group of low association level. It can be seen that the highest association level related to the associated data area 1 is a high association level, and the compression rate of the associated data area 1 is set to be a high compression rate and marked as a second compression level; the highest association level related to the associated data area 2 is a low association level, and the compression rate of the associated data area 2 is set as a low compression rate and marked as a fourth compression level; the highest association level to which the associated data area 3 relates is the intermediate association level, and the compression rate of the associated data area 3 is set to the intermediate compression rate and marked as the third compression level.
A first compression unit 907, which performs compression in accordance with a compression level and a compression rate within the current data segment (from a low address to a high address direction, or from a high address to a low address direction — expanding one scheme): firstly, compressing a data group in the current data area marked as a first compression stage according to a high compression rate; then, compressing the data group in the associated data area marked as the second compression stage according to the high compression rate; next, compressing the data group in the associated data area (which may be a plurality of associated data areas) marked as the third compression stage according to the medium compression rate; finally, the data sets in the associated data area (which may be a plurality of associated data areas) marked as the fourth compression stage are compressed at a low compression ratio. The sequence of the compression in the current data section is determined according to the compression stage, namely, after all data groups in the current data area of the first compression stage are compressed, all data groups in the associated data area of the second compression stage are compressed, and so on.
Wherein compressing the data groups in the associated data area labeled as the second compression stage comprises: the data groups in the plurality of associated data areas marked as the second compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a high compression ratio. Wherein compressing the data groups in the associated data area labeled as the third compression stage comprises: the data groups in the plurality of associated data areas marked as the third compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at the medium compression rate. Wherein compressing the data groups in the associated data area labeled as the fourth compression stage comprises: the data groups in the plurality of associated data areas labeled as the fourth compression stage are compressed from the low address to the high address direction or from the high address to the low address direction at a low compression ratio.
A third scanning unit 908 that determines at least one associated data segment, which is other than the current data segment and has associated data groups, among the plurality of data segments while performing compression in the current data segment according to a compression level and a compression rate, wherein the associated data group is stored in at least one associated data area in each associated data segment;
and the directory area of each data segment or compressed data segment stores a data area index information table. The data area index information table may include a doublet < identifier of data group, identifier of belonging (compressed) data area >. Determining an identifier of each associated data set, inquiring each data area index information table of each data section except the current data section according to the identifier of the associated data set, and determining the data section (associated data section) where each associated data set is located and the data area where the data section is located according to the inquiry result. A data segment having an associated data set is determined as an associated data segment. And determining an associated data area according to the data area in which each associated data group is located in the associated data section, wherein each associated data area has at least one associated data group.
After compression according to the compression level and the compression rate in the current data segment, the compressed data group(s) generated by compressing the data group(s) in the current data area of the current data segment is moved to the first buffer area of the first memory. Moving the compressed data group(s) resulting from the compression of the data group(s) in each associated data region of the current data segment into the second buffer region of the first memory. After moving the compressed data group(s) into the first buffer of the first memory, initializing the storage space of the current data area. After moving the compressed data set(s) into the second buffer of the first memory, the storage space of each associated data area is initialized. That is, after compressing the current data area and the plurality of data groups in each associated data area (all data groups in the current data area and each associated data area) in the current data segment, the storage space of the current data area and each associated data area is initialized. The initialization is, for example, memory space reset, memory content deletion, or the like, so that the current data area and each associated data area can store new data.
The second setting unit 909, determining the compression rate and the compression level of each associated data area in the associated data segment according to the highest association level related to the associated data group in each associated data area in the at least one associated data area of each associated data segment, includes: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate correspondingly; the associated data area of which the highest relevance level concerned is a high relevance level is marked as a second compression level, and the associated data area of which the highest relevance level concerned is a medium relevance level or a low relevance level is marked as a third compression level.
Determining the compression rate and the compression level of each associated data area according to the highest association level referred to by the associated data set in each associated data area (in other words, regarding the association level of the associated data set with the highest association level in at least one associated data set in each associated data area as the highest association level referred to by the associated data set) in the associated data area (in each of the plurality of associated data areas) comprises: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to be a high compression rate, a medium compression rate or a low compression rate, respectively; the method comprises the steps of marking the related data area with the highest related level as a high related level as a second compression level, marking the related data area with the highest related level as a medium related level as a third compression level and marking the related data area with the highest related level as a low related level as a fourth compression level.
Determining the compression rate and the compression level of each associated data area in each associated data section according to the highest association level related to the associated data group in each associated data area in at least one associated data area of each associated data section (in other words, taking the association level of the associated data group with the highest association level in at least one associated data group in each associated data area as the highest association level related to the associated data group) comprises: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate, respectively; the associated data area of which the highest concerned association level is the high association level is marked as the second compression level, and the associated data area of which the highest concerned association level is the medium association level or the low association level is marked as the third compression level.
For example, associated data areas 1 and 2 are provided in associated data segment 1, and associated data area 3 is provided in associated data segment 2. The associated data area 1 has 1 associated data group of high association level and 3 data groups of medium association level, the associated data area 2 has 5 data groups of low association level, and the associated data area 3 has 2 associated data groups of medium association level and 1 data group of low association level. Therefore, the highest association level related to the associated data area 1 is a high association level, and the compression rate of the associated data area 1 is set to be a high compression rate and marked as a second compression level; the highest association level related to the associated data area 2 is a low association level, and the compression rate of the associated data area 2 is set to be a low compression rate and marked as a third compression level; the highest association level to which the associated data area 3 relates is the intermediate association level, and the compression rate of the associated data area 3 is set to the intermediate compression rate and marked as the third compression level.
A second compression unit 910, in response to completion of compression by compression level and compression rate within the current compressed data segment, for compressing by compression level and compression rate within the at least one associated compressed data segment (each of the associated compressed data segments-parallel, or serial): firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate; then, the associated data area marked as the third compression stage and the medium compression rate in each associated compressed data section is compressed at the medium compression rate, and the associated data area marked as the third compression stage and the low compression rate in each associated compressed data section is compressed at the same time at the low compression rate.
Wherein a data segment having at least one associated data group is selected as an associated data segment, and a data area having at least one associated data group within the associated data segment is determined as an associated data area.
Wherein compressing the associated data region labeled as the second compression level within each associated compressed data segment at a high compression ratio comprises: and compressing the associated data area marked as the second compression stage in each associated compressed data segment according to a high compression rate based on the serial or parallel mode of the plurality of associated compressed data segments. Wherein compressing the associated data regions labeled as the third compression stage and the medium compression ratio within each associated compressed data segment according to the medium compression ratio comprises: and compressing the associated data area marked as the third compression level and the intermediate compression level in each associated compressed data segment according to the intermediate compression rate based on the serial or parallel mode of the plurality of associated compressed data segments. Wherein compressing the associated data regions labeled as the third compression stage and the low compression ratio within each associated compressed data segment at the low compression ratio comprises: and compressing the associated data area marked as the third compression stage and the low compression rate in each associated compressed data segment according to the low compression rate based on the serial or parallel mode of the plurality of associated compressed data segments.
And moving a plurality of compressed data groups generated by compressing a plurality of data groups in each associated data area (all data groups in each associated data area) in at least one associated data area in each associated data segment into a second buffer area of the first memory. Initializing a storage space of each of at least one associated data region within each associated data segment after moving the compressed data set into a second buffer region of the first memory. That is, after compressing a plurality of data groups (all data groups in each associated data area) in each associated data area within each associated data segment, the storage space of each associated data area is initialized. The initialization is, for example, memory space reset, memory content deletion, or the like, so that the current data area and each associated data area can store new data.
And determining the number of times of accessing each data group in all data groups (stored in the storage area) in the first storage in each third preset time interval, and moving the data groups with the number of times of accessing in the current third preset time interval higher than the threshold value of the third time number into the cache. Or determining the number of times of accessing each compressed data group in all the compressed data groups (stored in the first buffer area or the second buffer area) in the first memory within each third predetermined time, decompressing and moving the compressed data groups which are accessed within the current third predetermined time interval and have the number of times of accessing higher than the third time threshold value into the cache.

Claims (10)

1. A method for dynamic processing based on data sets evicted from a cache, the method comprising:
counting the accessed times of each data group in a plurality of data groups in a cache of a processor in the mobile terminal in real time, and determining the data group with the accessed times lower than a first time threshold value in a preset time period as a data group to be moved out, which is to be moved from the cache to a first memory;
determining a plurality of associated data sets which need to be operated in an associated manner and are stored in a first memory when the data set to be removed runs on the basis of the associated statistical information of the data set to be removed, and setting an association level for each associated data set according to the association degree of each associated data set and the data set to be removed, wherein the association levels comprise: a high association level, a medium association level, and a low association level;
scanning each data segment in the plurality of data segments in the first memory to determine the number of associated data sets included in each data segment, and determining the data segment with the largest number of associated data sets in the plurality of data segments, of which the remaining space can accommodate the data set to be moved out, as the current data segment;
determining a current data area allocated to the data group to be moved out from a plurality of data areas in the current data section, and moving the data group to be moved out from a cache to the current data area of the current data section;
scanning all data areas except the current data area in the current data section, determining the data areas except the current data area in the current data section and having at least one associated data group as associated data areas, and determining the compression ratio and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data section, wherein the compression ratio comprises a high compression ratio, a medium compression ratio and a low compression ratio, and the compression degrees of the high compression ratio, the medium compression ratio and the low compression ratio are sequentially increased; wherein the compression stage comprises: the compression sequence of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage is sequentially reduced;
setting the current data area to be at a high compression rate and marking the current data area as a first compression level, wherein the step of determining the compression rate and the compression level of each associated data area according to the highest association level related to the associated data group in each associated data area in the current data section comprises the following steps: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, marking the related associated data area with the highest association level as a medium association level as a third compression level and marking the related associated data area with the highest association level as a low association level as a fourth compression level;
compressing in the current data segment according to a compression level and a compression ratio:
firstly, compressing a data group in the current data area marked as a first compression stage according to a high compression rate;
then, compressing the data group in the associated data area marked as the second compression stage according to the high compression rate;
then, compressing the data group in the associated data area marked as the third compression level according to the medium compression rate;
finally, compressing the data group in the associated data area marked as the fourth compression stage according to the low compression rate;
determining at least one associated data segment of the plurality of data segments, which is other than the current data segment and has an associated data set, while compressing the current data segment according to a compression level and a compression rate, wherein the associated data set is stored in at least one associated data area in each associated data segment;
determining the compression rate and the compression level of each associated data area in the associated data section according to the highest association level related to the associated data group in each associated data area in at least one associated data area of each associated data section comprises: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, and marking the related associated data area with the highest association level as a medium association level or an association level as a third compression level;
in response to completion of compression by compression level and compression rate within the current compressed data segment, performing compression by compression level and compression rate within the at least one associated compressed data segment:
firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate;
then, compressing the associated data area marked as a third compression level and a medium compression rate in each associated compressed data segment according to the medium compression rate, and simultaneously compressing the associated data area marked as the third compression level and a low compression rate in each associated compressed data segment according to the low compression rate;
the method further comprises the steps that when the fact that the operating system of the mobile terminal is loaded into the first storage and the starting of the operating system is completed is detected, a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and a file package associated with each application in the plurality of applications to be loaded is copied from a second storage to the first storage;
after determining a plurality of applications to be loaded according to a preset loading configuration file, copying an association statistical file from a second memory to a first memory, wherein the association statistical file comprises a plurality of pieces of association statistical information, and each piece of association statistical information is used for indicating a plurality of association data groups of each data group;
determining the comprehensive association degree of each data group except the current data group in the multiple data groups and the current data group, performing descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selecting the multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance degree, the operation relevance degree and the feedback relevance degree of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance degree.
The preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a preset ranking.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
wherein the second count threshold is greater than the first count threshold, or the second count threshold is less than the first count threshold;
wherein the predetermined period of time is a period of time with a current time as an end point and a past time as a start point;
determining a time length of the predetermined time period according to a system configuration or a user setting;
the cache is a cache memory inside or outside the processor;
the number of accesses is the number of times each data set is accessed by the processor.
3. The method of any of claims 1-2, when there are a plurality of data segments having a maximum number of associated data sets and remaining space capable of accommodating the data set to be removed, randomly selecting one data segment from the plurality of data segments as a current data segment; or
When a plurality of data segments having the largest number of associated data groups and having a remaining space capable of accommodating the data group to be moved out exist, selecting a data segment having the largest remaining space from the plurality of data segments as a current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and the remaining space can accommodate the data group to be removed, the data segment having the smallest remaining space is selected from the plurality of data segments as the current data segment.
4. The method of any of claims 1-3, wherein determining a current data region of the plurality of data regions within the current data segment that is allocated for the data group to be removed comprises:
randomly distributing a data area for the data group to be shifted out in a plurality of data areas of the current data segment to serve as a current data area; or
Calculating a hash value of the identifier of the data group to be moved out, and selecting one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
Taking the data area with the largest ratio of the residual storage space in the plurality of data areas of the current data segment as the current data area; or
And taking the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
5. The method of any of claims 1-4, wherein determining a current data region of the plurality of data regions within the current data segment that is allocated for the data group to be removed comprises:
randomly distributing a data area for the data group to be shifted out in a plurality of data areas of the current data segment to serve as a current data area; or
Calculating a hash value of the identifier of the data group to be moved out, and selecting one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
Taking the data area with the largest ratio of the residual storage space in the plurality of data areas of the current data segment as the current data area; or
And taking the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
6. A system for dynamic processing based on data sets evicted from a cache, the system comprising:
the counting unit is used for carrying out real-time counting on the accessed times of each data group in a plurality of data groups in a cache of a processor in the mobile terminal, and determining the data group with the accessed times lower than a first time threshold value in a preset time period as a data group to be moved out, which is to be moved from the cache to a first memory;
the association unit is used for determining a plurality of association data sets which need to be associated and run during running and are stored in the first memory based on the association statistical information of the data sets to be removed, and setting an association level for each association data set according to the association degree of each association data set and the data sets to be removed, wherein the association level comprises: a high association level, a medium association level, and a low association level;
the first scanning unit is used for scanning each data segment in the plurality of data segments in the first memory to determine the number of associated data groups included in each data segment, and determining the data segment with the largest number of associated data groups in the plurality of data segments, the remaining space of which can accommodate the data group to be moved out, as the current data segment;
the mobile unit is used for determining a current data area which is distributed for the data group to be moved out in a plurality of data areas in the current data section and moving the data group to be moved out to the current data area of the current data section from the cache;
a second scanning unit, which scans all data areas except the current data area in the current data segment, determines the data areas except the current data area in the current data segment and having at least one associated data group as associated data areas, and determines the compression ratio and the compression level of each associated data area according to the highest associated level related to the associated data group in each associated data area in the current data segment, wherein the compression ratio comprises a high compression ratio, a medium compression ratio and a low compression ratio, and the compression degrees of the high compression ratio, the medium compression ratio and the low compression ratio are sequentially increased; wherein the compression stage comprises: the compression sequence of the first compression stage, the second compression stage, the third compression stage and the fourth compression stage is sequentially reduced;
a first setting unit, configured to set the current data area to a high compression rate and mark the current data area as a first compression level, wherein determining the compression rate and the compression level of each associated data area according to a highest association level related to an associated data group in each associated data area in the current data segment includes: when the highest related association level is a high association level, a medium association level or a low association level, setting the compression rate of the associated data area to a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, marking the related associated data area with the highest association level as a medium association level as a third compression level and marking the related associated data area with the highest association level as a fourth compression level;
a first compression unit that compresses within the current data segment according to a compression level and a compression ratio:
firstly, compressing a data group in the current data area marked as a first compression stage according to a high compression rate;
then, compressing the data group in the associated data area marked as the second compression stage according to the high compression rate;
then, compressing the data group in the associated data area marked as the third compression level according to the medium compression rate;
finally, compressing the data group in the associated data area marked as the fourth compression stage according to the low compression rate;
a third scanning unit which determines at least one associated data segment which is other than the current data segment and has an associated data group in the plurality of data segments while the current data segment is compressed according to a compression level and a compression rate, wherein the associated data group is stored in at least one associated data area in each associated data segment;
the second setting unit, determining the compression rate and the compression level of each associated data area in the associated data segment according to the highest associated level related to the associated data group in each associated data area in at least one associated data area of each associated data segment, includes: when the highest association level related to the associated data area is a high association level, a medium association level or a low association level, marking the compression rate of the associated data area as a high compression rate, a medium compression rate or a low compression rate, respectively; marking the related associated data area with the highest association level as a high association level as a second compression level, and marking the related associated data area with the highest association level as a medium association level or a low association level as a third compression level;
a second compression unit that, in response to completion of compression by compression level and compression rate within the current compressed data segment, performs compression by compression level and compression rate within the at least one associated compressed data segment:
firstly, compressing the associated data area marked as a second compression level in each associated compressed data section according to a high compression rate;
then, compressing the associated data area marked as a third compression level and a medium compression rate in each associated compressed data segment according to the medium compression rate, and simultaneously compressing the associated data area marked as the third compression level and a low compression rate in each associated compressed data segment according to the low compression rate;
the mobile terminal further comprises an initialization unit, when the operating system of the mobile terminal is detected to be loaded into the first memory and the starting of the operating system is completed, a plurality of applications to be loaded of the mobile terminal are determined according to a preset loading configuration file, and a file package associated with each application in the plurality of applications to be loaded is copied from a second memory to the first memory; after determining a plurality of applications to be loaded according to a preset loading configuration file, copying an association statistical file from a second memory to a first memory, wherein the association statistical file comprises a plurality of pieces of association statistical information, and each piece of association statistical information is used for indicating a plurality of association data groups of each data group;
the association unit determines the comprehensive association degree of each data group except the current data group and the current data group in the multiple data groups, performs descending order arrangement on each data group except the current data group based on the comprehensive association degree to generate an ordered list, and selects multiple data groups from the ordered list according to a preset selection rule to serve as the multiple associated data groups of the current data group;
the method comprises the steps that content matching is carried out on summary information of any two data groups to determine the content relevancy between the any two data groups;
acquiring operation history information of each data group, determining statistical data of the data groups operated in each basic time unit according to the operation history information, determining the operation times of any two data groups in the same basic time unit, and determining the operation association degree between any two data groups based on the operation times in the same basic time unit;
the initial value of the feedback association degree between any two data groups is set to be 0, an association degree rule preset by a user is analyzed to determine a data group pair consisting of two data groups needing to be set by the feedback association degree in a plurality of data groups, and the feedback association degree is set for the two data groups in at least one data group pair according to the association degree rule.
Wherein the content relevance, the operation relevance and the feedback relevance of each data group except the current data group in the plurality of data groups and the current data group are subjected to weighted calculation to determine the comprehensive relevance;
the preset selection rule comprises the following steps: the comprehensive association degree in the sorted list is larger than a threshold value of the association degree, or the comprehensive association degree in the sorted list is ranked before a preset ranking.
7. The system of claim 6, wherein the first and second sensors are arranged in a single package,
wherein the second count threshold is greater than the first count threshold, or the second count threshold is less than the first count threshold;
wherein the predetermined period of time is a period of time with a current time as an end point and a past time as a start point;
determining a time length of the predetermined time period according to a system configuration or a user setting;
the cache is a cache memory inside or outside the processor;
the number of accesses is the number of times each data set is accessed by the processor.
8. The system according to any one of claims 6 to 7, wherein when there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be removed, the first scanning unit randomly selects one data segment from the plurality of data segments as the current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be moved out, the first scanning unit selects a data segment having the largest remaining space from among the plurality of data segments as a current data segment; or
When there are a plurality of data segments having the largest number of associated data groups and remaining space capable of accommodating the data group to be removed, the first scanning unit selects a data segment having the smallest remaining space from among the plurality of data segments as a current data segment.
9. The system of any of claims 6-8, wherein the mobile unit determining a current data region of the plurality of data regions within the current data segment that is allocated for the data set to be removed comprises:
the mobile unit randomly allocates a data area for the data group to be shifted out in a plurality of data areas of the current data segment to serve as a current data area; or
The mobile unit calculates the hash value of the identifier of the data group to be moved out, and selects one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
The mobile unit takes the data area with the largest ratio of the residual storage space in the plurality of data areas of the current data segment as the current data area; or
And the mobile unit takes the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
10. The system of any of claims 6-9, wherein the mobile unit determining a current data region of the plurality of data regions within the current data segment that is allocated for the data set to be removed comprises:
the mobile unit randomly allocates a data area for the data group to be shifted out in a plurality of data areas of the current data segment to serve as a current data area; or
The mobile unit calculates the hash value of the identifier of the data group to be moved out, and selects one data area from a plurality of data areas of the current data segment as a current data area according to the hash value; or
The mobile unit takes the data area with the largest ratio of the residual storage space in the plurality of data areas of the current data segment as the current data area; or
And the mobile unit takes the data area with the largest residual storage space in the plurality of data areas of the current data segment as the current data area.
CN202110406895.9A 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache Pending CN113282235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110406895.9A CN113282235A (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810624546.2A CN108804042B (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache
CN202110406895.9A CN113282235A (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201810624546.2A Division CN108804042B (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache

Publications (1)

Publication Number Publication Date
CN113282235A true CN113282235A (en) 2021-08-20

Family

ID=64086783

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810624546.2A Active CN108804042B (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache
CN202110406895.9A Pending CN113282235A (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201810624546.2A Active CN108804042B (en) 2018-06-16 2018-06-16 Method and system for dynamically processing data set based on shift-out in cache

Country Status (1)

Country Link
CN (2) CN108804042B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009775A (en) * 2022-12-20 2023-04-25 广州辰创科技发展有限公司 Database memory management system and method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492009B (en) * 2018-11-25 2023-06-23 广州市塞安物联网科技有限公司 Method and system for identifying relevance time units in big data storage device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007505545A (en) * 2003-09-12 2007-03-08 コニンクリユケ フィリップス エレクトロニクス エヌ.ブイ. Scalable signal processing method and apparatus
JP4653830B2 (en) * 2008-09-19 2011-03-16 株式会社東芝 Instruction cache system
US9148172B2 (en) * 2012-06-22 2015-09-29 Micron Technology, Inc. Data compression and management
US9990308B2 (en) * 2015-08-31 2018-06-05 Oracle International Corporation Selective data compression for in-memory databases
WO2017181429A1 (en) * 2016-04-22 2017-10-26 SZ DJI Technology Co., Ltd. Systems and methods for processing image data based on region-of-interest (roi) of a user
CN106446079B (en) * 2016-09-08 2019-06-18 中国科学院计算技术研究所 A kind of file of Based on Distributed file system prefetches/caching method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009775A (en) * 2022-12-20 2023-04-25 广州辰创科技发展有限公司 Database memory management system and method
CN116009775B (en) * 2022-12-20 2024-04-02 广州辰创科技发展有限公司 Database memory management system and method

Also Published As

Publication number Publication date
CN108804042B (en) 2021-06-15
CN108804042A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108897808B (en) Method and system for storing data in cloud storage system
US8214331B2 (en) Managing storage of individually accessible data units
US10303650B2 (en) Contiguous file allocation in an extensible file system
US9727572B2 (en) Database compression system and method
US8099421B2 (en) File system, and method for storing and searching for file by the same
CN107577436B (en) Data storage method and device
CN107491523B (en) Method and device for storing data object
US9043660B2 (en) Data store capable of efficient storing of keys
CN110018786B (en) System and method for predicting data storage characteristics
CN108108089B (en) Picture loading method and device
CN108804042B (en) Method and system for dynamically processing data set based on shift-out in cache
KR101750646B1 (en) Compression device, compression method, decompression device, decompression method, and information processing system
US10585807B2 (en) Balanced cache for recently frequently used data
CN115617762A (en) File storage method and equipment
AU2016394744A1 (en) Database-archiving method and apparatus that generate index information, and method and apparatus for searching archived database comprising index information
CN113486026A (en) Data processing method, device, equipment and medium
CN111475100B (en) Method, apparatus and computer readable medium for managing a storage system
CN110825706B (en) Data compression method and related equipment
CN112306748B (en) Data recovery method, device and storage medium
JPWO2014097359A1 (en) Compression program, compression device, decompression program, and decompression device
CN108897807B (en) Method and system for carrying out hierarchical processing on data in mobile terminal
CN111143373A (en) Data processing method and device, electronic equipment and storage medium
CN115934354A (en) Online storage method and device
CN114416676A (en) Data processing method, device, equipment and storage medium
CN111625500A (en) File snapshot method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination