CN109033462A - The method and system of low-frequency data item are determined in the storage equipment of big data storage - Google Patents
The method and system of low-frequency data item are determined in the storage equipment of big data storage Download PDFInfo
- Publication number
- CN109033462A CN109033462A CN201811006475.6A CN201811006475A CN109033462A CN 109033462 A CN109033462 A CN 109033462A CN 201811006475 A CN201811006475 A CN 201811006475A CN 109033462 A CN109033462 A CN 109033462A
- Authority
- CN
- China
- Prior art keywords
- data item
- data
- storage
- access
- equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of in the storage equipment stored for big data determines the method and system of low-frequency data item, when wherein method includes: the data access operation not being currently running in determining all storage equipment in big data storage system, the access information statistics file of each storage equipment is determined;It is determined based on access information statistics file in current statistical time section and is accessed multiple pre-selection data item that number is less than low frequency frequency threshold value in all data item of each storage equipment, the total memory capacity of each storage equipment is determined according to the device descriptive information in the system log device of big data storage system, the free memory capacity that each storage equipment is determined according to the storage message file in the storage information area of each storage equipment determines the low frequency coefficient of each pre-selection data item in each storage equipment;The pre-selection data item that low frequency coefficient is less than low frequency coefficient threshold value in multiple pre-selection data item in each storage equipment is determined as low-frequency data item.
Description
Technical field
The present invention relates to big data field of storage and cloud storage field, and more particularly, to one kind for number greatly
According to the method and system for determining low-frequency data item in the storage equipment of storage.
Background technique
Currently, data volume is just with geometric progression as the use of various types of information equipments becomes more and more frequently
Mode carries out explosive increase.In order to obtain useful information from the data of magnanimity, it is necessary to effectively be deposited to the data of magnanimity
Storage.Big data storage system can satisfy the demand to effectively being stored to mass data.However, being deposited in current big data
In storage system, the low-frequency data item in the storage equipment in big data storage system can not be identified.Generally, due to low frequency
Data item gradually increases in storage equipment, can seriously reduce storage equipment, the even data access of big data storage system
Efficiency.
Summary of the invention
According to an aspect of the present invention, a kind of low-frequency data determining in the storage equipment stored for big data is provided
The method of item, which comprises
In response to receiving each storage in big data storage system for multiple storage equipment of big data storage
In equipment determine low-frequency data item request, by the big data storage system from arbitrary request of data side institute it is received newly
Data access request be redirected to the system buffer equipment of the big data storage system without by received new data visit
Ask the corresponding storage equipment that request is sent in multiple storage equipment, with by the system buffer equipment by new data access
Request each of the description information of included querying condition and the ephemeral data item set of the system buffer equipment interim
Data item carries out content matching with the content matching degree of each ephemeral data item of determination, selects content from multiple ephemeral data items
Matching degree is greater than at least one selected ephemeral data item of matching degree threshold value, by least one selected selected nonce
It is sent to request of data side indicated by the new data access request according to item, and in the buffering of the system buffer equipment
The new data access request is saved in area;
The data access behaviour not being currently running in determining all storage equipment in the big data storage system
When making, the running log file of each storage equipment in multiple storage equipment in the big data storage system, and base are obtained
Running log file in current statistical time section and each storage equipment determine stored in each storage equipment it is multiple
The access information by statistics of data item, is deposited according in the threshold value at preset access time interval and each storage equipment
The access information by statistics of multiple data item of storage determines the access information statistics file of each storage equipment, wherein accessing
Time interval be data item it is adjacent be accessed twice between a period of time;Wherein the access information statistics file includes
Frequency statistics table, the frequency statistics table include multiple frequency records, wherein the content of each frequency record is 8 tuples < data
The identifier of item, statistics initial time, counts end time, sizes of memory, greater than the threshold at access time interval at accessed number
The number of value, interval of maximum access time, minimum access time interval >;
All numbers of each storage equipment in current statistical time section are determined based on the access information statistics file
The multiple pre-selection data item for being less than low frequency frequency threshold value according to number is accessed in item, according to the system of the big data storage system
Device descriptive information in recording equipment determines the total memory capacity of each storage equipment, is believed according to the storage of each storage equipment
The storage message file in region is ceased to determine the free memory capacity of each storage equipment, is determined according to the following equation every
The low frequency coefficient of each pre-selection data item in a storage equipment:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxFor in current storage devices
Maximum access time interval, t in multiple access time intervals of i-th of pre-selection data itemiminIt is in current storage devices i-th
Minimum access time interval, t in multiple access time intervals of a pre-selection data itemibeginIt is i-th in current storage devices
Preselect statistics initial time, the t of data itemiendFor the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, R for current storage devices are the free memory capacity of current storage devices, UNiFor in current storage devices
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of i-th of pre-selection data itemiCurrently to deposit
The accessed number of i-th of pre-selection data item in equipment is stored up, wherein i is natural number and PT >=i >=1, PT are currently stored set
The standby middle quantity for preselecting data item and PT >=100;And
Low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to the pre-selection data of low frequency coefficient threshold value
Item is determined as low-frequency data item.
Wherein, when the data management apparatus being located at outside big data storage system needs depositing in big data storage system
When storing up determining low-frequency data item in equipment, the data management apparatus is sent in big data storage to the big data storage system
The request of low-frequency data item is determined in each storage equipment in system for multiple storage equipment of big data storage;
Wherein by the big data storage system from arbitrary request of data side received new data access request weight
Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more
Corresponding storage equipment in a storage equipment includes:
At the time of receiving the request of determining low-frequency data item with the big data storage system, by the big data
Storage system then from arbitrary request of data side received new data access request be redirected to big data storage
The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding
Store equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data
It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally
Introduce the content of ephemeral data item with including;
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with
It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching
The content matching degree of data item includes:
The description information for the querying condition for being included by new data access request by the system buffer equipment with it is described
The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content
Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true
The content matching degree of fixed each ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to
Request of data side indicated by the new data access request is sent for showing the big data storage system pause data
Access and the new data access request have been saved to the response message in the buffer area of the system buffer equipment, and
And it carries in the response message for showing the new data access request from request of data side in the buffer area
The information of current Queue sequence, wherein coming in the buffer area according to the time span of new data access request being saved
Determine current Queue sequence of the new data access request in the buffer area, and according to being protected in current Queue sequence
The descending order for the time span deposited is ranked up new data access request.
Wherein respective running log file is saved in the system data region of each storage equipment;
Wherein current statistical time section receives the request when institute of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of the current date at place starts and a period of time of the consecutive days of predetermined quantity forward;The wherein nature of predetermined quantity
Day is 10 consecutive days, 20 consecutive days or 30 consecutive days;
Wherein determine that each storage is set based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of standby middle storage includes:
Based on current statistical time section to it is each storage equipment running log file in all log recordings into
Row is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section,
To obtain the access information by statistics of each data item;
The multiple data item stored in each storage equipment are made of the access information by statistics of each data item
By the access information of statistics;
Wherein each log recording includes: the identifier of data item, access initial time, access end time, storage ruler
Very little and storage initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
Wherein the threshold value at the preset access time interval is 5 minutes, 10 minutes, 15 minutes or 20 minutes.
According to the warp of the multiple data item stored in the threshold value at preset access time interval and each storage equipment
The access information for crossing statistics determines that each access information statistics file for storing equipment includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is carried out
Statistics is with the accessed number of each data item of determination and all access time intervals;
The threshold greater than access time interval of each data item is determined based on all access time intervals of each data item
The number of value, interval of maximum access time and minimum access time interval;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as uniting
Initial time is counted, the access end time accessed for the last time in the access information by statistics of each data item is determined
To count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item.
The low frequency frequency threshold value is 100,150 or 200;
Device descriptive information in the system log device includes: that all storages included by big data storage system are set
The standby total memory capacity of total quantity, each storage equipment, the network address of each storage equipment and/or each storage equipment adds
Enter the time of the big data storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number
Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item
The free memory capacity of breath and each storage equipment;
The low frequency coefficient threshold value is 120,160 or 220.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
It is determined as after low-frequency data item according to item, further includes:
It is true greater than 2 times of data item of low frequency frequency threshold value by number is accessed in all data item of each storage equipment
It is set to data item to be selected to obtain multiple data item to be selected, and constitute respective item set to be selected by multiple data item to be selected
It closes, multiple low-frequency data items that low frequency coefficient is less than low frequency coefficient threshold value in each storage equipment is constituted into respective low-frequency data
Item set;
The current storage equipment being directed in multiple storage equipment:
The quantity of low-frequency data item in the low-frequency data item set of current storage equipment is less than or equal to current
When storing the quantity of the data item to be selected in the collection of data items to be selected of equipment, according to the ascending order of accessed number sequentially by low frequency
All low-frequency data items in collection of data items are ranked up to generate the first sorted lists, will be ordered as in the first sorted lists
1st low-frequency data item as current low-frequency data item,
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected
Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected;
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most
Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data
Item is saved in the idle storage space of current storage equipment;
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item
Data item;
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence
, if it is present carrying out step 6.5;If it does not exist, then terminating;
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as
Current low-frequency data item, carries out step 6.1;
Or the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current deposit
When storing up the quantity of the data item to be selected in the collection of data items to be selected of equipment, by the low-frequency data item set of current storage equipment
In all low-frequency data items be grouped to generate multiple low-frequency data item groups so that the multiple low-frequency data Xiang Zuzhong
Total accessed number of all low-frequency data items is greater than 1.5 times of low frequency frequency threshold value in each low-frequency data item group, and determines
The averagely accessed number of all low-frequency data items in each low-frequency data item group, wherein the average quilt of each low-frequency data item group
The absolute value of difference between access times is less than 20.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
It is determined as after low-frequency data item according to item, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area
Each data access request carries out data access operation;
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment
The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain
Querying condition;
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition
A data item, and determine at least one target storage device involved in multiple data item;
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device
Close at least one data item of the new querying condition;
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh
Mark collection of data items is sent to request of data side indicated by the new data access request.
8, according to the method described in claim 7, wherein according to data access multiple in the buffer area of system buffer equipment
The current Queue sequence of request carries out data access operation to each data access request in buffer area
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current
The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area
Sort primary data access request in current Queue sequence;
8.2, currently processed data access request is parsed to obtain currently processed querying condition;
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system
Related multiple data item, and determine at least one target storage in big data storage system involved in multiple data item
Equipment;
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target
Equipment receives at least one data item for meeting the currently processed querying condition;
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute
It states target data item set and is sent to request of data side indicated by the currently processed data access request;
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked
Ask deletion;
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if
It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited
Ask request.
According to another aspect of the present invention, a kind of low-frequency data determining in the storage equipment stored for big data is provided
The system of item, the system comprises:
Pretreatment unit, in response to receiving multiple storage equipment in big data storage system for big data storage
Each storage equipment in determine the request of low-frequency data item, by the big data storage system from arbitrary institute of request of data side
Received new data access request is redirected to the system buffer equipment of the big data storage system without institute is received
New data access request is sent to the corresponding storage equipment in multiple storage equipment, with will be new by the system buffer equipment
The data access request description information of querying condition that is included and the ephemeral data item set of the system buffer equipment in
Each ephemeral data item carry out content matching with the content matching degree of each ephemeral data item of determination, from multiple ephemeral data items
Middle selection content matching degree is greater than at least one selected ephemeral data item of matching degree threshold value, by least one selected choosing
Fixed ephemeral data item is sent to request of data side indicated by the new data access request, and in the system buffer
The new data access request is saved in the buffer area of equipment;
Statistic unit, the number not being currently running in determining all storage equipment in the big data storage system
When according to access operation, the running log text of each storage equipment in multiple storage equipment in the big data storage system is obtained
Part, and determined in each storage equipment and deposited based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of storage, according to the threshold value and each storage at preset access time interval
The access information by statistics of the multiple data item stored in equipment determines the access information statistics file of each storage equipment,
Wherein access time interval be data item it is adjacent be accessed twice between a period of time;The wherein access information statistics
File includes frequency statistics table, and the frequency statistics table includes multiple frequency records, wherein the content of each frequency record is 8 yuan
Group < data item identifier, statistics initial time, the statistics end time, sizes of memory, is greater than access time at accessed number
The number of the threshold value at interval, interval of maximum access time, minimum access time interval >;
Computing unit determines each storage equipment in current statistical time section based on the access information statistics file
All data item in be accessed number be less than low frequency frequency threshold value multiple pre-selection data item, according to the big data store be
Device descriptive information in the system log device of system determines the total memory capacity of each storage equipment, according to each storage equipment
Storage information area in storage message file come determine it is each storage equipment free memory capacity, according to following formula
To determine the low frequency coefficient of each pre-selection data item in each storage equipment:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxFor in current storage devices
Maximum access time interval, t in multiple access time intervals of i-th of pre-selection data itemiminIt is in current storage devices i-th
Minimum access time interval, t in multiple access time intervals of a pre-selection data itemibeginIt is i-th in current storage devices
Preselect statistics initial time, the t of data itemiendFor the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, R for current storage devices are the free memory capacity of current storage devices, UNiFor in current storage devices
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of i-th of pre-selection data itemiCurrently to deposit
The accessed number of i-th of pre-selection data item in equipment is stored up, wherein i is natural number and PT >=i >=1, PT are currently stored set
The standby middle quantity for preselecting data item and PT >=100;And
Low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to the pre-selection data of low frequency coefficient threshold value
Item is determined as low-frequency data item.
Wherein, when the data management apparatus being located at outside big data storage system needs depositing in big data storage system
When storing up determining low-frequency data item in equipment, the data management apparatus is sent in big data storage to the big data storage system
The request of low-frequency data item is determined in each storage equipment in system for multiple storage equipment of big data storage;
Wherein pretreatment unit by the big data storage system from arbitrary request of data side received new data
Access request be redirected to the system buffer equipment of the big data storage system without by received new data access ask
Ask be sent to it is multiple storage equipment in corresponding storage equipment include:
It, will at the time of pretreatment unit receives the request of determining low-frequency data item with the big data storage system
The big data storage system then from arbitrary request of data side received new data access request be redirected to it is described
The system buffer equipment of big data storage system without by received new data access request be sent to multiple storage equipment
In corresponding storage equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data
It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally
Introduce the content of ephemeral data item with including;
The wherein querying condition that new data access request is included by pretreatment unit by the system buffer equipment
Description information carries out content matching with each ephemeral data item in the ephemeral data item set of the system buffer equipment with true
The content matching degree of each ephemeral data item includes: calmly
The description for the querying condition that new data access request is included by pretreatment unit by the system buffer equipment
The summary info of each ephemeral data item in the ephemeral data item set of information and the system buffer equipment is carried out based on language
Content matching that adopted content compares, the content matching compared based on keyword or in being combined based on semantic content and keyword
Hold matching with the content matching degree of determination each ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
Wherein pretreatment unit is sent to request of data side indicated by the new data access request for showing
It states big data storage system pause data access and the new data access request has been saved to the system buffer and sets
Response message in standby buffer area, and carry in the response message for showing the new data from request of data side
The information of current Queue sequence of the access request in the buffer area, wherein according to new data access in the buffer area
The time span of request being saved determines current Queue sequence of the new data access request in the buffer area, and
New data access request is ranked up according to the descending order for the time span being saved in current Queue sequence.
Wherein running log file is saved in the system data region of each storage equipment;
Wherein current statistical time section receives the request when institute of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of the current date at place starts and a period of time of the consecutive days of predetermined quantity forward;The wherein nature of predetermined quantity
Day is 10 consecutive days, 20 consecutive days or 30 consecutive days;
Wherein statistic unit is determined every based on the running log file in current statistical time section and each storage equipment
The multiple data item stored in a storage equipment by statistics access information include:
Statistic unit is based on current statistical time section to all days in the running log file of each storage equipment
Will record is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Multiple log recordings of the statistic unit according to data item to each storage equipment in current statistical time section
Classify, to obtain the access information by statistics of each data item;
Access information of the statistic unit by each data item by statistics constitute store in each storage equipment it is multiple
The access information by statistics of data item;
Wherein each log recording includes: the identifier of data item, access initial time, access end time, storage ruler
Very little and storage initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
Wherein the threshold value at the preset access time interval is 5 minutes, 10 minutes, 15 minutes or 20 minutes.
Statistic unit is according to the multiple numbers stored in the threshold value and each storage equipment at preset access time interval
Determine that each access information statistics file for storing equipment includes: according to the access information by statistics of item
Access by statistics of the statistic unit to each data item in the multiple data item stored in each storage equipment
Information is counted with the accessed number of each data item of determination and all access time intervals;
Statistic unit based on all access time intervals of each data item determine each data item be greater than access time
The number of the threshold value at interval, interval of maximum access time and minimum access time interval;
The access initial time that statistic unit will be accessed for the first time in the access information by statistics of each data item
It is determined as counting initial time, access accessed for the last time in the access information by statistics of each data item is terminated
Time is determined as counting the end time;
Statistic unit determines the sizes of memory of each data item based on the access information by statistics of each data item.
The low frequency frequency threshold value is 100,150 or 200;
Device descriptive information in the system log device includes: storage equipment included by big data storage system
Described in total quantity, the total memory capacity of each storage equipment, the network address of each storage equipment or each storage equipment are added
The time of big data storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number
Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item
The free memory capacity of breath and each storage equipment;
The low frequency coefficient threshold value is 120,160 or 220.
Further include adjustment unit, is greater than low frequency number for number will to be accessed in all data item of each storage equipment
2 times of data item of threshold value is determined as data item to be selected to obtain multiple data item to be selected, and is made of multiple data item to be selected
Low frequency coefficient in all data item of each storage equipment is less than multiple low frequencies of low frequency coefficient threshold value by collection of data items to be selected
Data item constitutes low-frequency data item set;
The current storage equipment being directed in multiple storage equipment:
When the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is less than or equal to number to be selected
According to item gather in data item to be selected quantity when, according to the ascending order sequence of accessed number by the institute in low-frequency data item set
There is low-frequency data item to be ranked up to generate the first sorted lists, the 1st low-frequency data will be ordered as in the first sorted lists
Item is used as current low-frequency data item,
14.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected
Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected;
14.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most
Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data
Item is saved in the idle storage space of current storage equipment;
14.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item
Data item;
14.4,1 low frequency number after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence
According to item, if it is present carrying out 14.5;If it does not exist, then terminating;
14.5, by sequence in first sorted lists, 1 low-frequency data item selects to make after current low-frequency data item
For current low-frequency data item, 14.1 are carried out;
Or when the quantity of the low-frequency data item in low-frequency data item set is greater than the number to be selected in collection of data items to be selected
According to item quantity when, all low-frequency data items in low-frequency data item set are grouped to generate multiple low-frequency data items
Group, so that the total of all low-frequency data items is accessed number in each low-frequency data item group of the multiple low-frequency data Xiang Zuzhong
Greater than 1.5 times of low frequency frequency threshold value, and determine the average accessed secondary of all low-frequency data items in each low-frequency data item group
Number, wherein the absolute value of the difference between the averagely accessed number of each low-frequency data item group is less than 20.
The currently queuing of multiple data access requests is suitable in buffer area of the pretreatment unit according to system buffer equipment
Each data access request in ordered pair buffer area carries out data access operation;
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment
The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain
Querying condition;
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition
A data item, and determine at least one target storage device involved in multiple data item;
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device
Close at least one data item of the new querying condition;
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh
Mark collection of data items is sent to request of data side indicated by the new data access request.
Wherein the currently queuing of multiple data access requests is suitable in buffer area of the pretreatment unit according to system buffer equipment
Each data access request in ordered pair buffer area carries out data access operation
16.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current
The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area
Sort primary data access request in current Queue sequence;
16.2, currently processed data access request is parsed to obtain currently processed querying condition;
16.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system
Related multiple data item, and determine at least one target storage device involved in multiple data item;
16.4, the currently processed querying condition is sent to each target storage device, and is deposited from each target
Storage equipment receives at least one data item for meeting the currently processed querying condition;
16.5, target data item set will be formed from each received all data item of target storage device institute, and will
The target data item set is sent to request of data side indicated by the currently processed data access request;
16.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked
Ask deletion;
16.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if
It is then to carry out 16.1;If it is not, then determining that any data access for not having in the buffer area of system buffer equipment and being saved is asked
It asks.
Detailed description of the invention
By reference to the following drawings, exemplary embodiments of the present invention can be more fully understood by:
Fig. 1 is that low-frequency data item is determined in the storage equipment stored for big data according to embodiment of the present invention
The flow chart of method;
Fig. 2 is the schematic diagram according to multiple access information statistics files of embodiment of the present invention;And
Fig. 3 is that low-frequency data item is determined in the storage equipment stored for big data according to embodiment of the present invention
The structural schematic diagram of system.
Specific embodiment
Fig. 1 is that low-frequency data item is determined in the storage equipment stored for big data according to embodiment of the present invention
The flow chart of method 100.
In step 101, in response to receiving multiple storage equipment in big data storage system for big data storage
Each storage equipment in determine the request of low-frequency data item, by the big data storage system from arbitrary institute of request of data side
Received new data access request is redirected to the system buffer equipment of the big data storage system without institute is received
New data access request is sent to the corresponding storage equipment in multiple storage equipment, with will be new by the system buffer equipment
The data access request description information of querying condition that is included and the ephemeral data item set of the system buffer equipment in
Each ephemeral data item carry out content matching with the content matching degree of each ephemeral data item of determination, from multiple ephemeral data items
Middle selection content matching degree is greater than at least one selected ephemeral data item of matching degree threshold value, by least one selected choosing
Fixed ephemeral data item is sent to request of data side indicated by the new data access request, and in the system buffer
The new data access request is saved in the buffer area of equipment.
When the data management apparatus being located at outside big data storage system needs the storage in big data storage system to set
When standby middle determining low-frequency data item, the data management apparatus is sent in big data storage system to the big data storage system
The request of low-frequency data item is determined in each storage equipment of interior multiple storage equipment for big data storage.Positioned at big data
Data management apparatus outside storage system can by big data storage system maintenance personnel, administrative staff or operation personnel into
Row operation or control.For example, the maintenance personnel of big data storage system, administrative staff or operation personnel can periodically or root
The identification or determination to low-frequency data item are triggered according to practical operation situation.It include that multiple storages are set in big data storage system
It is standby, and each storage equipment can store each memory capacity for storing equipment of multiple data item and can be arbitrary rationally
Numerical value.Each data item can be the number of various types of data files, such as text type, audio types, video type etc.
According to file.Wherein low-frequency data item for example refers to that the accessed number in specific time is all lower than big data storage system
The averagely accessed number of data item, or the data item lower than the averagely accessed number of all data item of storage equipment etc..
Wherein by the big data storage system from arbitrary request of data side received new data access request weight
Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more
Corresponding storage equipment in a storage equipment includes:
At the time of receiving the request of determining low-frequency data item with the big data storage system, by the big data
Storage system then from arbitrary request of data side received new data access request be redirected to big data storage
The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding
Store equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data
It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally
Introduce the content of ephemeral data item with including.
Multiple storages in big data storage system for big data storage are received in the big data storage system
At the time of determining the request of low-frequency data item in each storage equipment of equipment, multiple new data may be received and visited
Ask request.At this point, promoting big data storage system is then received all from one or more arbitrary institutes of request of data side
New data access request is all redirected to the system buffer equipment of the big data storage system without institute is received new
Data access request be sent to it is multiple storage equipment in corresponding storage equipment.In general, big data storage system can basis
The determination in the catalogue storage server of the big data storage system of querying condition included by new data access request is looked into
Multiple data item involved in inquiry condition, and determine at least one target storage device involved in multiple data item.It will be described
Currently processed querying condition is sent to each target storage device, and meets described work as from the reception of each target storage device
At least one data item of the querying condition of pre-treatment.And when in order to carry out the identification of low-frequency data item or determine, big data is deposited
All new data access requests are all redirected to the system buffer equipment of the big data storage system by storage system.Wherein
System buffer equipment is located inside big data storage system, and for storing the ephemeral data item including multiple ephemeral data items
Set, or for being buffered to data access request.Wherein querying condition is, for example, mobile communication and 5G and (uplink
Or downlink).In this case, the description information of querying condition is, for example, the uplink or downlink of 5G mobile communication
Link.It include multiple ephemeral data items in ephemeral data item set, and each ephemeral data item can be various types of numbers
According to the data file of file, such as text type, audio types, video type etc..Each ephemeral data item or each data item
It all has summary info and summary info is used to briefly introduce the content of ephemeral data item or data item.For example, abstract letter
Breath are as follows: the C++ since 0 ing allows your 21 days association C++ this programming languages using straightaway introduce.
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with
It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching
The content matching degree of data item includes:
The description information for the querying condition for being included by new data access request by the system buffer equipment with it is described
The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content
Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true
The content matching degree of fixed each ephemeral data item and the querying condition.The application can be used any existing text and compare other side
Formula determines the description information of querying condition that new data access request is included and the ephemeral data item of system buffer equipment
Content matching degree between the summary info of each ephemeral data item in set, wherein text alignments are, for example, to be based on language
Content matching that adopted content compares, the content matching compared based on keyword or in being combined based on semantic content and keyword
Hold matching.Wherein, the content matching degree of each ephemeral data item and the querying condition may be used to indicate that each ephemeral data
Item close degree, similar degree, degree of correlation or correlation degree with the querying condition.
Wherein the matching degree threshold value is 55%, 60%, 65%, 70% or any reasonable value, and content matching degree
Range be [0%, 100%], i.e. content matching degree can be any numerical value between from 0% to 100%.From multiple nonces
According at least one the selected ephemeral data item for selecting content matching degree to be greater than matching degree threshold value in item, i.e., from multiple ephemeral datas
Selection content matching degree is greater than 55%, 60%, 65% or 70% at least one selected ephemeral data item in.It will be selected
At least one selected ephemeral data item be sent to request of data side indicated by the new data access request, and
The new data access request is saved in the buffer area of the system buffer equipment.By it is selected at least one selected face
When the data item purpose that is sent to request of data side indicated by the new data access request be to allow request of data side can
Content relevant to data access request is obtained, in the case where big data storage system suspends data access service to promote to count
According to requesting party it will be seen that related content.
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to
Request of data side indicated by the new data access request is sent for showing the big data storage system pause data
Access and the new data access request have been saved to the response message in the buffer area of the system buffer equipment, and
And it carries in the response message for showing the new data access request from request of data side in the buffer area
The information of current Queue sequence.Wherein come in the buffer area according to the time span of new data access request being saved
Determine current Queue sequence of the new data access request in the buffer area, and according to being protected in current Queue sequence
The descending order for the time span deposited is ranked up new data access request.That is, the time span being saved is longer, then newly
Data access request current Queue sequence it is more forward.Preferably, to number indicated by the new data access request
It sends according to requesting party for having shown the big data storage system pause data access and the new data access request
It is saved to after the response message in the buffer area of the system buffer equipment further include: periodically to the new data
Request of data side indicated by access request is sent for showing the new data access request from request of data side described
The notification message of current Queue sequence in buffer area.
In step 102, it is not currently running in determining all storage equipment in the big data storage system
When data access operation, the running log text of each storage equipment in multiple storage equipment in the big data storage system is obtained
Part, and determined in each storage equipment and deposited based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of storage, according to the threshold value and each storage at preset access time interval
The access information by statistics of the multiple data item stored in equipment determines the access information statistics file of each storage equipment,
Wherein access time interval be data item it is adjacent be accessed twice between a period of time;The wherein access information statistics
File includes frequency statistics table, and the frequency statistics table includes multiple frequency records, wherein the content of each frequency record is 8 yuan
Group < data item identifier, statistics initial time, the statistics end time, sizes of memory, is greater than access time at accessed number
The number of the threshold value at interval, interval of maximum access time, minimum access time interval >.
The data access operation being wherein currently running refers to that storage equipment is looked into according to transmitted by big data storage system
Inquiry condition carries out data retrieval in the memory space of itself, will constitute item set by data retrieval data item obtained
It closes, collection of data items is sent to the operation processing of request of data side by big data storage system.
Wherein running log file is saved in the system data region of each storage equipment.Wherein running log file packet
Include multiple log recordings, wherein each log recording include: data item identifier, access initial time, access the end time,
Sizes of memory and storage initial time.Wherein the identifier of data item can be the title of data item, the unique identification of data item,
Coding of data item etc. is capable of the information of unique identification data item.Access initial time refers to number involved in current log record
The initial time being accessed according to item.At the end of the access end time refers to that data item involved in current log record is accessed
Between.For example, may be related to the operation such as reading, modify when accessing to the data item in storage equipment, when accessing starting
Between and access the end time be used for indicate this operation initial time and the end time.Sizes of memory is that data item is set in storage
Sizes of memory in standby.Storage initial time is the starting that data item starts storage in storage equipment or big data storage system
Time, that is, data item is saved in storage equipment or big data storage system to provide the initial time of access service.At this
In application, access includes reading and/or modifying.
Wherein current statistical time section receives the request when institute of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of the current date at place starts and a period of time of the consecutive days of predetermined quantity forward;The wherein nature of predetermined quantity
Day is 10 consecutive days, 20 consecutive days or 30 consecutive days.For example, big data storage system receives determining low-frequency data item
Request time be the 11:25:36 on the 11st of August in 2018, then big data storage system receives asking for determining low-frequency data item
Locating current date is on August 11st, 2018 when asking.When big data storage system receives the request of determining low-frequency data item
The proxima luce (prox. luc) of locating current date is on August 10th, 2018.Current statistical time section is the reception of big data storage system
Proxima luce (prox. luc) to locating current date when the request for determining low-frequency data item start and forward predetermined quantity (for example, 10
Natural number) consecutive days a period of time, i.e., current statistical time section be on August 00:00:00 to 2018 years 1,8 2018
Moon 23:59:59 on the 10th.
Wherein determine that each storage is set based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of standby middle storage includes:
Based on current statistical time section to it is each storage equipment running log file in all log recordings into
Row is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section,
To obtain the access information by statistics of each data item;
The multiple data item stored in each storage equipment are made of the access information by statistics of each data item
By the access information of statistics.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018,
That is 10 consecutive days, then based on 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th to each storage equipment
Running log file in all log recordings chosen to obtain each storage equipment in the 00:00 on the 1st of August in 2018:
All log recordings in 00 to 2018 on August, 10,23:59:59.According to data item (for example, identifier of data item) to every
Multiple log recordings of a storage equipment in 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th are divided
Class, to obtain the access information by statistics of each data item.Each data item by statistics access information be, for example,
All accessed information of each data item in current statistical time section.By each data item in each storage equipment
By statistics access information constitute it is each storage equipment in store multiple data item by statistics access information.
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.Example
Such as, summary info are as follows: the C++ since 0 allows your 21 days association C++ this programming languages using straightaway introduction.
Wherein access time interval be data item it is adjacent be accessed twice between a period of time, for example, current
The accessed access end time to a period of time between accessed access initial time next time.It is wherein described preparatory
The threshold value at the access time interval set is 5 minutes, 10 minutes, 15 minutes, 20 minutes or any reasonable value.In general, working as
In preceding statistical time section (or statistical time section), data item A is accessed 5 times and the time accessed every time is
30 seconds, then data item A current statistical time section (or statistical time section) has 4 access time intervals.
According to the warp of the multiple data item stored in the threshold value at preset access time interval and each storage equipment
The access information for crossing statistics determines that each access information statistics file for storing equipment includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is carried out
Statistics is with the accessed number of each data item of determination and all access time intervals;
The threshold greater than access time interval of each data item is determined based on all access time intervals of each data item
The number of value, interval of maximum access time and minimum access time interval;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as uniting
Initial time is counted, the access end time accessed for the last time in the access information by statistics of each data item is determined
To count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item.
Due to each access information packet by statistics for storing each data item in the multiple data item stored in equipment
Include multiple log recordings, and each log recording represents data item and is accessed 1 time, thus by the quantity of log recording come
Determine (always) the accessed number of each data item.In addition, multiple log recordings are tied according to access initial time or access
The beam time is ranked up, and can obtain the access time interval between each log recording, so that it is determined that between all access times
Every.Further, being compared by threshold value to preset access time interval and all access time intervals can
Determine the number of the threshold value greater than access time interval of each data item, and by uniting to all access time intervals
Meter can determine maximum access time interval and the minimum access time interval of each data item.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018,
The access initial time that first time of the data item A in current statistical time section is accessed is the 09:02 on the 1st of August in 2018:
11, access 2018 end times August 09:05:36 on the 1st, and last in current statistical time section of data item A
Secondary accessed access initial time is the 22:26:53 on the 10th of August in 2018, accesses 2018 end times August 22:27 on the 10th:
39, then statistics initial time of the data item A in current statistical time section is the 09:02:11 on the 1st of August in 2018, and is united
The meter end time is the 22:27:39 on the 10th of August in 2018.
In addition, determining each data according to the sizes of memory in log recording arbitrary in the access information by statistics
The sizes of memory of item.
In step 103, determine that each storage is set in current statistical time section based on the access information statistics file
It is accessed multiple pre-selection data item that number is less than low frequency frequency threshold value in standby all data item, is stored according to the big data
Device descriptive information in the system log device of system determines the total memory capacity of each storage equipment, is set according to each storage
Storage message file in standby storage information area determines the free memory capacity of each storage equipment, according to following public affairs
Formula come determine it is each storage equipment in each pre-selection data item low frequency coefficient:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxFor in current storage devices
Maximum access time interval, t in multiple access time intervals of i-th of pre-selection data itemiminIt is in current storage devices i-th
Minimum access time interval, t in multiple access time intervals of a pre-selection data itemibeginIt is i-th in current storage devices
Preselect statistics initial time, the t of data itemiendFor the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, R for current storage devices are the free memory capacity of current storage devices, UNiFor in current storage devices
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of i-th of pre-selection data itemiCurrently to deposit
The accessed number of i-th of pre-selection data item in equipment is stored up, wherein i is natural number, and PT is natural number and PT >=i >=1, PT are
Quantity and PT >=100 of data item are preselected in current storage devices.
Wherein, low frequency frequency threshold value is 100,150,175,200 or any reasonable value.In the system log device
Device descriptive information includes: total storage of the total quantity of storage equipment included by big data storage system, each storage equipment
The time of the big data storage system is added in capacity, the network address of each storage equipment or each storage equipment.Big data
The total quantity of storage equipment included by storage system is the total quantity of all storage equipment in big data storage system.Each deposit
The total memory capacity of storage equipment be the total capacity of the memory space of each storage equipment or can be each storage equipment can be with
The total capacity of the memory space of item for storing data.The network address of each storage equipment is, for example, IP address, MAC Address
Deng.The time that the big data storage system is added in each storage equipment refers to that the big data storage is added in each storage equipment
Initial time of the system to carry out storing data item as the storage equipment in the big data storage system.
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number
Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item
The free memory capacity of breath and each storage equipment.The total quantity of data item refers to all data item in each storage equipment
Total quantity.The sizes of memory of each data item refers to sizes of memory or institute when each data item is stored in storing equipment
The memory space of occupancy.The starting storage time of each data item refers to that each data item starts in the storage equipment belonged to
The time of storage, for example, data item is copied to the time in storage equipment.The identifier of each data item can be data item
Title, the coding of the unique identification of data item, data item etc. be capable of the information of unique identification data item.Each data item is plucked
Want information for briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0 is used
Straightaway introduction allows your 21 days association C++ this programming languages.The free memory capacity of each storage equipment refers to each
The free memory capacity or residual storage capacity of new data item can be stored in storage equipment.Wherein low frequency coefficient threshold value is
90, any reasonable value such as 100,120,130,150,160,170,220.
In step 104, low frequency coefficient in multiple pre-selection data item in each storage equipment is less than low frequency coefficient threshold value
Pre-selection data item be determined as low-frequency data item.That is, the application is through the above steps, for number greatly in big data storage system
According to identifying or recognizing low-frequency data item in each storage equipment of storage.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
It is determined as after low-frequency data item according to item, further includes:
It is true greater than 2 times of data item of low frequency frequency threshold value by number is accessed in all data item of each storage equipment
It is set to data item to be selected to obtain multiple data item to be selected, and constitute collection of data items to be selected by multiple data item to be selected, it will be every
Multiple low-frequency data items that low frequency coefficient is less than low frequency coefficient threshold value in a storage equipment constitute low-frequency data item set.For example,
When low frequency frequency threshold value is 100, then number will be accessed in all data item of each storage equipment and be greater than 200 (100 × 2)
Data item be determined as data item to be selected to obtain multiple data item to be selected.For example, then will when low frequency coefficient threshold value is 120
Multiple low-frequency data items of the low frequency coefficient less than 120 constitute low-frequency data item set in each storage equipment, i.e., by each storage
All low-frequency data items in equipment constitute low-frequency data item set.
Working as when the quantity of the low-frequency data item in the low-frequency data item set in current storage equipment is less than or equal to
It, will according to the ascending order sequence of accessed number when the quantity of the data item to be selected in the collection of data items to be selected of preceding storage equipment
All low-frequency data items in low-frequency data item set are ranked up to generate the first sorted lists, will be arranged in the first sorted lists
The low-frequency data item that sequence is the 1st is as current low-frequency data item.For example, when the low-frequency data item in low-frequency data item set
When quantity (for example, 326) is less than quantity (for example, 827) of the data item to be selected in collection of data items to be selected, according to accessed time
All low-frequency data items in low-frequency data item set are ranked up to generate first by several ascending order sequences (sequence increased)
Sorted lists.In the first sorted lists, the accessed number for the forward data item that sorts is fewer, and the data rearward that sort
The accessed number of item is more.The 1st low-frequency data item will be ordered as in first sorted lists (that is, accessed number is minimum
Data item or low-frequency data item) be used as current low-frequency data item.
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected
Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected.The application can
With use any existing text alignments determine current low-frequency data item summary info and collection of data items to be selected in
Content matching degree between the summary info of each data item to be selected, wherein text alignments are, for example, to be based on semantic content ratio
Pair content matching, the content matching that is compared based on keyword or the content matching that is combined based on semantic content and keyword.
Wherein, the content matching degree of each data item to be selected and current low-frequency data item may be used to indicate that each data item to be selected and institute
State close degree, similar degree, degree of correlation or the correlation degree of current low-frequency data item.
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most
Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data
Item is saved in the idle storage space of current storage equipment.By in all data item to be selected of collection of data items to be selected with work as
The maximum data item to be selected of the content matching degree of preceding low-frequency data item and current low-frequency data item carry out data item combination refer to by
With the maximum data item to be selected of content matching degree and current low-frequency data item configuration file group of current low-frequency data item, and will
With the summary info of the maximum data item to be selected of content matching degree of current low-frequency data item and the abstract of current low-frequency data item
Information is merged with the summary info of configuration file group.Using the file group constituted the data item new as one, and will
New data item is saved in the idle storage space of current storage equipment, i.e., in the memory space of no storing data item.
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item
Data item.In the idle storage space that new data item (the file group constituted) is saved in current storage equipment it
Afterwards, the maximum data item to be selected of content matching degree with current low-frequency data item is deleted from the collection of data items to be selected.This
Outside, from current storage equipment by with the maximum data item to be selected of the content matching degree of current low-frequency data item and current low frequency
Data entry deletion is (this is because with the maximum data item to be selected of content matching degree of current low-frequency data item and current low-frequency data
The file group that item is constituted has been saved in the idle storage space of current storage equipment).
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence
, if it is present carrying out step 6.5;If it does not exist, then terminating.It determines in first sorted lists with the presence or absence of row
Sequence 1 low-frequency data item after current low-frequency data item, which is meant that, to be determined in first sorted lists with the presence or absence of interviewed
Ask that number is higher than current low-frequency data item and the low frequency number adjacent in the first sorted lists with the current low-frequency data item
According to item.Such as, when current low-frequency data item is to be ordered as the 1st low-frequency data item, then sequence is 1 after current low-frequency data item
The low-frequency data item of position is the low-frequency data item for being ordered as the 2nd, i.e. it is least to be accessed number second in the first sorted lists
Low-frequency data item or data item.If it is present step 6.5 is carried out, if it does not exist, then terminating the above process.
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as
Current low-frequency data item, carries out step 6.1.For example, the low-frequency data item selection for being ordered as the 2nd in the first sorted lists is made
To carry out step 6.1 after current low-frequency data item, and so on, the 3rd, the 4th, the 5th will be ordered as in the first sorted lists
Position ..., until last 1 low-frequency data item is selected as current low-frequency data item.
Or the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current
It, will be in the low-frequency data item of current storage equipment when storing the quantity of the data item to be selected in the collection of data items to be selected of equipment
All low-frequency data items in set are grouped to generate multiple low-frequency data item groups, so that the multiple low-frequency data item group
In each low-frequency data item group in total accessed numbers of all low-frequency data items be greater than 1.5 times of low frequency frequency threshold value.Really
The averagely accessed number of all low-frequency data items in fixed each low-frequency data item group.Preferably, plurality of low-frequency data item
The absolute value of difference in group between the averagely accessed number of any two low-frequency data item group less than 20,30,40,50,60,
Any reasonable values such as 70.
For example, the quantity (for example, 569) when the low-frequency data item in low-frequency data item set is greater than collection of data items to be selected
In data item to be selected quantity (for example, 516) when, 569 low-frequency data items in low-frequency data item set are grouped
To generate multiple low-frequency data item groups.Wherein, the application according to the quantity K of the low-frequency data item in low-frequency data item set and point
Parameter Z is organized to determine the number of packet G being grouped to low-frequency data item, whereinZ is equal to any conjunctions such as 3,4,5
Manage numerical value.When Z is equal to 5,569 low-frequency data items are divided into 113 low frequency numbers
According to item group.
Additionally, the total of all low-frequency data items is accessed in each low-frequency data item group of multiple low-frequency data Xiang Zuzhong
Number is greater than 1.1 times, 1.2 times, 1.3 times, 1.5 times or any reasonable value of low frequency frequency threshold value.Determine each low-frequency data
Item organizes the averagely accessed number of interior all low-frequency data items, i.e., the averagely accessed number of each low-frequency data item group.For example,
Low-frequency data item group includes low-frequency data item 1-5, and the accessed number of low-frequency data item 1-5 is 95,76,110,82 respectively
With 102, then the averagely accessed number of all low-frequency data items is 93 in low-frequency data item group.Plurality of low-frequency data item group
The absolute value of difference between the averagely accessed number of middle any two low-frequency data item group is less than 20,30,40,50,60,70
Etc. any reasonable value.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
It is determined as after low-frequency data item according to item, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area
Each data access request carries out data access operation.For example, multiple data access requests in the buffer area of system buffer equipment
Current Queue sequence are as follows: the first data access request, the second data access request, third data access request, the 4th data
Access request and the 5th data access request are then visited according to the first data access request, the second data access request, third data
Ask that the current Queue sequence of request, the 4th data access request and the 5th data access request visits each data in buffer area
Ask that request carries out data access operation.
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment
The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain
Querying condition.For example, when the first data access request in the buffer area for determining system buffer equipment, the second data access are asked
Ask, third data access request, the 4th data access request and the 5th data access request have been processed, therefore system is slow
Rush any data access request for not having in the buffer area of equipment and being saved.Then, to the big data storage system from number
According to requesting party received 6th data access request parsed to obtain new querying condition.Wherein new querying condition is for example
It is mobile communication and 5G and (uplink or downlink).
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition
A data item, and determine at least one target storage device in big data storage system involved in multiple data item.Wherein,
Catalogue storage server is used to store the directory information of all data item in big data storage system.For example, directory information is number
According to the identifier of item, the summary info of data item, the metadata information of data item, the keyword message of data item, data item institute
Storage equipment being located at etc..Catalogue storage server is according to querying condition or new querying condition to storage big data storage system
Interior all data item are inquired, for example, in the summary info of data item, the metadata information of data item and/or data item
It is looked into keyword message using new querying condition (for example, mobile communication and 5G and (uplink or downlink))
It askes, with multiple data item involved in the determination new querying condition.According to directory information determine each data item be located at,
It is stored in or related storage equipment, thereby determines that at least one target storage device involved in multiple data item.?
In special circumstances, multiple data item are likely located in same target storage device.
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device
Close at least one data item of the new querying condition.Each target storage device is according to the new querying condition at itself
It is retrieved in all data item stored, to obtain at least one data item, and by least one data obtained
Item is sent to the interface equipment of big data storage system.Preferably, there is no redundancies in the big data storage system of the application
Data item, i.e., each data item are unique.Wherein, interface equipment is used to receive data access request from request of data side,
And interface equipment is used to collection of data items or target data item set being sent to corresponding request of data side.
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh
Mark collection of data items is sent to request of data side indicated by the new data access request.Interface equipment will be from each target
It stores the received all data item of equipment institute and forms target data item set, and interface equipment is by the target data item set
It is sent to request of data side indicated by the new data access request.
Wherein according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment to buffer area
In each data access request carry out data access operation include:
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current
The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area
Sort primary data access request in current Queue sequence.As described above, for example, more in the buffer area of system buffer equipment
The current Queue sequence of a data access request are as follows: the first data access request, the second data access request, third data access
Request, the 4th data access request and the 5th data access request, then according to data multiple in the buffer area of system buffer equipment
The current Queue sequence of access request determines that currently processed data access request is the first data access request.
8.2, currently processed data access request is parsed to obtain currently processed querying condition.Wherein data
Access request or currently processed data access request include querying condition, therefore are carried out to currently processed data access request
Parsing can obtain currently processed querying condition.Wherein currently processed querying condition is, for example, mobile communication and 5G and (on
Line link or downlink).
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system
Related multiple data item, and determine at least one target storage device involved in multiple data item.Wherein, catalogue stores
Server is used to store the directory information of all data item in big data storage system.For example, directory information is the mark of data item
What knowledge symbol, the summary info of data item, the metadata information of data item, the keyword message of data item, data item were located at deposits
Store up equipment etc..Catalogue storage server is according to currently processed querying condition to all data item in storage big data storage system
It is inquired, for example, in the keyword message of the summary info of data item, the metadata information of data item and/or data item
It is inquired using currently processed querying condition (for example, mobile communication and 5G and (uplink or downlink)), with true
Multiple data item involved in the fixed new querying condition.Determine that each data item is located at, is stored according to directory information
In or related storage equipment, thereby determine that at least one target storage device involved in multiple data item.In special feelings
Under condition, multiple data item are likely located in same target storage device.
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target
Equipment receives at least one data item for meeting the currently processed querying condition.Each target storage device is worked as according to
The querying condition of pre-treatment is retrieved in all data item itself stored, to obtain at least one data item, and
At least one data item obtained is sent to the interface equipment of big data storage system.Preferably, the big data of the application
The data item of redundancy is not present in storage system, i.e., each data item is unique.Wherein, interface equipment from data for asking
The side of asking receives data access request, and interface equipment is for collection of data items or target data item set to be sent to accordingly
Request of data side.
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute
It states target data item set and is sent to request of data side indicated by the currently processed data access request.Interface equipment will
Target data item set are formed from the received all data item of each target storage device institute, and interface equipment is by the target
Collection of data items is sent to request of data side indicated by the new data access request.
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked
Ask deletion.For example, the first data access request in the current Queue sequence of data access requests multiple in buffer area is deleted.
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if
It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited
Ask request.
For example, in the buffer area of system buffer equipment multiple data access requests current Queue sequence are as follows: the first data
Access request, the second data access request, third data access request, the 4th data access request and the 5th data access are asked
It asks, and after deleting the first data access request in the current Queue sequence of data access requests multiple in buffer area,
Then determine that there is any data access request being saved in the buffer area of system buffer equipment, i.e. the second data access request,
Third data access request, the 4th data access request and the 5th data access request, then carry out step 801.
After deleting the 5th in the current Queue sequence of data access requests multiple in buffer area according to access request, then
Determine do not have any data access request for being saved in the buffer area of system buffer equipment, i.e. the first data access request,
Second data access request, third data access request, the 4th data access request and the 5th data access request complete
Data access operation, it is determined that do not have any data access request being saved in the buffer area of system buffer equipment.Exist
In the case where determining any data access request for not having in the buffer area of system buffer equipment and being saved, to the big data
Storage system from arbitrary request of data side received new data access request parsed to obtain new querying condition, and
Carry out respective handling.
In this application, identical if there is the accessed number of different data item or low-frequency data item, and need
From data item or low-frequency data item select one as current data item or current low-frequency data item when, from accessed number
It is selected at random in identical different data item or low-frequency data item.
Fig. 2 is the schematic diagram according to multiple access information statistics files 200 of embodiment of the present invention.The application is in determination
When the data access operation not being currently running in all storage equipment in the big data storage system, obtain described big
Each running log file for storing equipment in multiple storage equipment in data-storage system, and based on current statistical time
The running log file of section and each storage equipment determines that the passing through for multiple data item stored in each storage equipment counts
Access information, according to the multiple data item stored in the threshold value at preset access time interval and each storage equipment
Access information by statistics determines the access information statistics file of each storage equipment, and wherein access time interval is data item
It is adjacent be accessed twice between a period of time.As shown in Fig. 2, since each storage equipment all has respective access letter
Statistics file is ceased, therefore there are multiple access information statistics files 200.Access information statistics file includes frequency statistics table 201,
The frequency statistics table 201 include multiple frequency records (serial number 1,2,3,4,5,6 ...), wherein each frequency record is interior
Hold for 8 tuples < data item identifier, accessed number, statistics initial time, the statistics end time, sizes of memory, be greater than visit
Ask the number of the threshold value of time interval, maximum access time interval, minimum access time interval >.
As shown in Fig. 2, access information statistics file 1 includes frequency statistics table 201.It include multiple in frequency statistics table 201
Frequency record.6 frequency records are illustrated only in frequency statistics table 201, wherein the identifier of data item be respectively PPT introduction,
Big data system introduction, The Tai-Chi Master, C++, U.S.'s tourist handbook, Sanya tourism strategy since 0.For example, PPT introduction and
Big data system introduction be PPT file, The Tai-Chi Master and since 0 ing C++ be video file, U.S.'s tourist handbook and Sanya trip
Trip strategy is pdf document.Also, when showing the accessed number of each data item, statistics starting in frequency statistics table 201
Between, the statistics end time, sizes of memory, greater than the number of the threshold value at access time interval, maximum access time interval and minimum
Access time interval.
Fig. 3 is that low-frequency data item is determined in the storage equipment stored for big data according to embodiment of the present invention
The structural schematic diagram of system 300.System 300 includes: pretreatment unit 301, statistic unit 302, computing unit 303, determines list
Member 304 and adjustment unit 305.
Pretreatment unit 301 is set in response to receiving multiple storages in big data storage system for big data storage
The request that low-frequency data item is determined in standby each storage equipment, by the big data storage system from arbitrary request of data side
Received new data access request be redirected to the system buffer equipment of the big data storage system without will be received
New data access request be sent to it is multiple storage equipment in corresponding storage equipment, with by the system buffer equipment will
The ephemeral data item set of the description information for the querying condition that new data access request is included and the system buffer equipment
In each ephemeral data item carry out content matching with the content matching degree of each ephemeral data item of determination, from multiple ephemeral datas
In selection content matching degree be greater than matching degree threshold value at least one selected ephemeral data item, by it is selected at least one
Selected ephemeral data item is sent to request of data side indicated by the new data access request, and slow in the system
It rushes and saves the new data access request in the buffer area of equipment.
When the data management apparatus being located at outside big data storage system needs the storage in big data storage system to set
When standby middle determining low-frequency data item, the data management apparatus is sent in big data storage system to the big data storage system
The request of low-frequency data item is determined in each storage equipment of interior multiple storage equipment for big data storage.Positioned at big data
Data management apparatus outside storage system can by big data storage system maintenance personnel, administrative staff or operation personnel into
Row operation or control.For example, the maintenance personnel of big data storage system, administrative staff or operation personnel can periodically or root
The identification or determination to low-frequency data item are triggered according to practical operation situation.It include that multiple storages are set in big data storage system
It is standby, and each storage equipment can store each memory capacity for storing equipment of multiple data item and can be arbitrary rationally
Numerical value.Each data item can be the number of various types of data files, such as text type, audio types, video type etc.
According to file.Wherein low-frequency data item refers to that the accessed number in specific time is lower than all data of big data storage system
The averagely accessed number of item, or the data item lower than averagely accessed number of all data item of storage equipment etc..
Wherein by the big data storage system from arbitrary request of data side received new data access request weight
Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more
Corresponding storage equipment in a storage equipment includes:
At the time of receiving the request of determining low-frequency data item with the big data storage system, by the big data
Storage system then from arbitrary request of data side received new data access request be redirected to big data storage
The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding
Store equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data
It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally
Introduce the content of ephemeral data item with including.
Multiple storages in big data storage system for big data storage are received in the big data storage system
At the time of determining the request of low-frequency data item in each storage equipment of equipment, multiple new data may be received and visited
Ask request.At this point, promoting big data storage system is then received all from one or more arbitrary institutes of request of data side
New data access request is all redirected to the system buffer equipment of the big data storage system without institute is received new
Data access request be sent to it is multiple storage equipment in corresponding storage equipment.In general, big data storage system can basis
The determination in the catalogue storage server of the big data storage system of querying condition included by new data access request is looked into
Multiple data item involved in inquiry condition, and determine at least one target storage device involved in multiple data item.It will be described
Currently processed querying condition is sent to each target storage device, and meets described work as from the reception of each target storage device
At least one data item of the querying condition of pre-treatment.And when in order to carry out the identification of low-frequency data item or determine, big data is deposited
All new data access requests are all redirected to the system buffer equipment of the big data storage system by storage system.Wherein
System buffer equipment is located inside big data storage system, and for storing the ephemeral data item including multiple ephemeral data items
Set, or for being buffered to data access request.Wherein querying condition is, for example, mobile communication and 5G and (uplink
Or downlink).In this case, the description information of querying condition is, for example, the uplink or downlink of 5G mobile communication
Link.It include multiple ephemeral data items in ephemeral data item set, and each ephemeral data item can be various types of numbers
According to the data file of file, such as text type, audio types, video type etc..Each ephemeral data item or each data item
It all has summary info and summary info is used to briefly introduce the content of ephemeral data item or data item.For example, abstract letter
Breath are as follows: the C++ since 0 ing allows your 21 days association C++ this programming languages using straightaway introduce.
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with
It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching
The content matching degree of data item includes:
The description information for the querying condition for being included by new data access request by the system buffer equipment with it is described
The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content
Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true
The content matching degree of fixed each ephemeral data item and the querying condition.The application can be used any existing text and compare other side
Formula determines the description information of querying condition that new data access request is included and the ephemeral data item of system buffer equipment
Content matching degree between the summary info of each ephemeral data item in set, wherein text alignments are, for example, to be based on language
Content matching that adopted content compares, the content matching compared based on keyword or in being combined based on semantic content and keyword
Hold matching.Wherein, the content matching degree of each ephemeral data item and the querying condition may be used to indicate that each ephemeral data
Item close degree, similar degree, degree of correlation or correlation degree with the querying condition.
Wherein the matching degree threshold value is 55%, 60%, 65%, 70% or any reasonable value, and content matching degree
Range be [0%, 100%], i.e. content matching degree can be any numerical value between from 0% to 100%.From multiple nonces
According at least one the selected ephemeral data item for selecting content matching degree to be greater than matching degree threshold value in item, i.e., from multiple ephemeral datas
Selection content matching degree is greater than 55%, 60%, 65% or 70% at least one selected ephemeral data item in.It will be selected
At least one selected ephemeral data item be sent to request of data side indicated by the new data access request, and
The new data access request is saved in the buffer area of the system buffer equipment.By it is selected at least one selected face
When the data item purpose that is sent to request of data side indicated by the new data access request be to allow request of data side can
Content relevant to data access request is obtained, in the case where big data storage system suspends data access service to promote to count
According to requesting party it will be seen that related content.
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to
Request of data side indicated by the new data access request is sent for showing the big data storage system pause data
Access and the new data access request have been saved to the response message in the buffer area of the system buffer equipment, and
And it carries in the response message for showing the new data access request from request of data side in the buffer area
The information of current Queue sequence.Wherein come in the buffer area according to the time span of new data access request being saved
Determine current Queue sequence of the new data access request in the buffer area, and according to being protected in current Queue sequence
The descending order for the time span deposited is ranked up new data access request.That is, the time span being saved is longer, then newly
Data access request current Queue sequence it is more forward.Preferably, to number indicated by the new data access request
It sends according to requesting party for having shown the big data storage system pause data access and the new data access request
It is saved to after the response message in the buffer area of the system buffer equipment further include: periodically to the new data
Request of data side indicated by access request is sent for showing the new data access request from request of data side described
The notification message of current Queue sequence in buffer area.
Statistic unit 302 is not currently running in determining all storage equipment in the big data storage system
Data access operation when, obtain in the big data storage system it is multiple storage equipment in it is each storage equipment running logs
File, and determined in each storage equipment based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of storage, is deposited according to the threshold value at preset access time interval and each
The access information by statistics of the multiple data item stored in storage equipment determines the access information statistics text of each storage equipment
Part, wherein access time interval be data item it is adjacent be accessed twice between a period of time;The wherein access information
Statistics file includes frequency statistics table, and the frequency statistics table includes multiple frequency records, wherein the content of each frequency record
For 8 tuples < data item identifier, accessed number, statistics initial time, the statistics end time, sizes of memory, it is greater than access
The number of the threshold value of time interval, interval of maximum access time, minimum access time interval >.
The data access operation being wherein currently running refers to that storage equipment is looked into according to transmitted by big data storage system
Inquiry condition carries out data retrieval in the memory space of itself, will constitute item set by data retrieval data item obtained
It closes, collection of data items is sent to the operation processing of request of data side by big data storage system.
Wherein running log file is saved in the system data region of each storage equipment.Wherein running log file packet
Include multiple log recordings, wherein each log recording include: data item identifier, access initial time, access the end time,
Sizes of memory and storage initial time.Wherein the identifier of data item can be the title of data item, the unique identification of data item,
Coding of data item etc. is capable of the information of unique identification data item.Access initial time refers to number involved in current log record
The initial time being accessed according to item.At the end of the access end time refers to that data item involved in current log record is accessed
Between.For example, may be related to the operation such as reading, modify when accessing to the data item in storage equipment, when accessing starting
Between and access the end time be used for indicate this operation initial time and the end time.Sizes of memory is that data item is set in storage
Sizes of memory in standby.Storage initial time is the starting that data item starts storage in storage equipment or big data storage system
Time, that is, data item is saved in storage equipment or big data storage system to provide the initial time of access service.At this
In application, access includes reading and/or modifying.
Wherein current statistical time section receives the request when institute of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of the current date at place starts and a period of time of the consecutive days of predetermined quantity forward;The wherein nature of predetermined quantity
Day is 10 consecutive days, 20 consecutive days or 30 consecutive days.For example, big data storage system receives determining low-frequency data item
Request time be the 11:25:36 on the 11st of August in 2018, then big data storage system receives asking for determining low-frequency data item
Locating current date is on August 11st, 2018 when asking.When big data storage system receives the request of determining low-frequency data item
The proxima luce (prox. luc) of locating current date is on August 10th, 2018.Current statistical time section is the reception of big data storage system
Proxima luce (prox. luc) to locating current date when the request for determining low-frequency data item start and forward predetermined quantity (for example, 10
Natural number) consecutive days a period of time, i.e., current statistical time section be on August 00:00:00 to 2018 years 1,8 2018
Moon 23:59:59 on the 10th.
Wherein determine that each storage is set based on the running log file in current statistical time section and each storage equipment
The access information by statistics of multiple data item of standby middle storage includes:
Based on current statistical time section to it is each storage equipment running log file in all log recordings into
Row is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section,
To obtain the access information by statistics of each data item;
The multiple data item stored in each storage equipment are made of the access information by statistics of each data item
By the access information of statistics.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018,
That is 10 consecutive days, then based on 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th to each storage equipment
Running log file in all log recordings chosen to obtain each storage equipment in the 00:00 on the 1st of August in 2018:
All log recordings in 00 to 2018 on August, 10,23:59:59.According to data item (for example, identifier of data item) to every
Multiple log recordings of a storage equipment in 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th are divided
Class, to obtain the access information by statistics of each data item.Each data item by statistics access information be, for example,
All accessed information of each data item in current statistical time section.By each data item in each storage equipment
By statistics access information constitute it is each storage equipment in store multiple data item by statistics access information.
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.Example
Such as, summary info are as follows: the C++ since 0 allows your 21 days association C++ this programming languages using straightaway introduction.
Wherein access time interval be data item it is adjacent be accessed twice between a period of time, for example, current
The accessed access end time to a period of time between accessed access initial time next time.It is wherein described preparatory
The threshold value at the access time interval set is 5 minutes, 10 minutes, 15 minutes, 20 minutes or any reasonable value.In general, working as
In preceding statistical time section (or statistical time section), data item A is accessed 5 times and the time accessed every time is
30 seconds, then data item A current statistical time section (or statistical time section) has 4 access time intervals.
According to the warp of the multiple data item stored in the threshold value at preset access time interval and each storage equipment
The access information for crossing statistics determines that each access information statistics file for storing equipment includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is carried out
Statistics is with the accessed number of each data item of determination and all access time intervals;
The threshold greater than access time interval of each data item is determined based on all access time intervals of each data item
The number of value, interval of maximum access time and minimum access time interval;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as uniting
Initial time is counted, the access end time accessed for the last time in the access information by statistics of each data item is determined
To count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item.
Due to each access information packet by statistics for storing each data item in the multiple data item stored in equipment
Include multiple log recordings, and each log recording represents data item and is accessed 1 time, thus by the quantity of log recording come
Determine (always) the accessed number of each data item.In addition, multiple log recordings are tied according to access initial time or access
The beam time is ranked up, and can obtain the access time interval between each log recording, so that it is determined that between all access times
Every.Further, being compared by threshold value to preset access time interval and all access time intervals can
Determine the number of the threshold value greater than access time interval of each data item, and by uniting to all access time intervals
Meter can determine maximum access time interval and the minimum access time interval of each data item.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018,
The access initial time that first time of the data item A in current statistical time section is accessed is the 09:02 on the 1st of August in 2018:
11, access 2018 end times August 09:05:36 on the 1st, and last in current statistical time section of data item A
Secondary accessed access initial time is the 22:26:53 on the 10th of August in 2018, accesses 2018 end times August 22:27 on the 10th:
39, then statistics initial time of the data item A in current statistical time section is the 09:02:11 on the 1st of August in 2018, and is united
The meter end time is the 22:27:39 on the 10th of August in 2018.
In addition, determining each data according to the sizes of memory in log recording arbitrary in the access information by statistics
The sizes of memory of item.
Computing unit 303 determines each storage in current statistical time section based on the access information statistics file
It is accessed multiple pre-selection data item that number is less than low frequency frequency threshold value in all data item of equipment, is deposited according to the big data
Device descriptive information in the system log device of storage system determines the total memory capacity of each storage equipment, according to each storage
Storage message file in the storage information area of equipment determines the free memory capacity of each storage equipment, according to following
Formula come determine it is each storage equipment in each pre-selection data item low frequency coefficient:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxFor in current storage devices
Maximum access time interval, t in multiple access time intervals of i-th of pre-selection data itemiminIt is in current storage devices i-th
Minimum access time interval, t in multiple access time intervals of a pre-selection data itemibeginIt is i-th in current storage devices
Preselect statistics initial time, the t of data itemiendFor the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, R for current storage devices are the free memory capacity of current storage devices, UNiFor in current storage devices
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of i-th of pre-selection data itemiCurrently to deposit
The accessed number of i-th of pre-selection data item in equipment is stored up, wherein i is natural number, and PT is natural number and PT >=i >=1, PT are
Quantity and PT >=100 of data item are preselected in current storage devices.
Wherein, low frequency frequency threshold value is 100,150,175,200 or any reasonable value.In the system log device
Device descriptive information includes: total storage of the total quantity of storage equipment included by big data storage system, each storage equipment
The time of the big data storage system is added in capacity, the network address of each storage equipment or each storage equipment.Big data
The total quantity of storage equipment included by storage system is the total quantity of all storage equipment in big data storage system.Each deposit
The total memory capacity of storage equipment be the total capacity of the memory space of each storage equipment or can be each storage equipment can be with
The total capacity of the memory space of item for storing data.The network address of each storage equipment is, for example, IP address, MAC Address
Deng.The time that the big data storage system is added in each storage equipment refers to that the big data storage is added in each storage equipment
Initial time of the system to carry out storing data item as the storage equipment in the big data storage system.
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number
Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item
The free memory capacity of breath and each storage equipment.The total quantity of data item refers to all data item in each storage equipment
Total quantity.The sizes of memory of each data item refers to sizes of memory or institute when each data item is stored in storing equipment
The memory space of occupancy.The starting storage time of each data item refers to that each data item starts in the storage equipment belonged to
The time of storage, for example, data item is copied to the time in storage equipment.The identifier of each data item can be data item
Title, the coding of the unique identification of data item, data item etc. be capable of the information of unique identification data item.Each data item is plucked
Want information for briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0 is used
Straightaway introduction allows your 21 days association C++ this programming languages.The free memory capacity of each storage equipment refers to each
The free memory capacity or residual storage capacity of new data item can be stored in storage equipment.Wherein low frequency coefficient threshold value is
90, any reasonable value such as 100,120,130,150,160,170,220.
Low frequency coefficient in multiple pre-selection data item in each storage equipment is less than low frequency coefficient threshold value by determination unit 304
Pre-selection data item be determined as low-frequency data item.That is, the application is through the above steps, for number greatly in big data storage system
According to identifying or recognizing low-frequency data item in each storage equipment of storage.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
After being determined as low-frequency data item according to item, further include using adjustment unit 305 by it is each storage equipment all data item in quilt
2 times of the data item that access times are greater than low frequency frequency threshold value is determined as data item to be selected to obtain multiple data item to be selected, and
Collection of data items to be selected is constituted by multiple data item to be selected, low frequency coefficient in each storage equipment is less than low frequency coefficient threshold value
Multiple low-frequency data items constitute low-frequency data item set.For example, when low frequency frequency threshold value is 100, then by each storage equipment
All data item in be accessed number and be determined as data item to be selected greater than 100 × 2 data item to obtain multiple data to be selected
?.For example, when low frequency coefficient threshold value is 120, then by multiple low-frequency datas of the low frequency coefficient less than 120 in each storage equipment
Item constitutes low-frequency data item set, i.e., all low-frequency data items in each storage equipment is constituted low-frequency data item set.
When the quantity of the low-frequency data item in low-frequency data item set is less than or equal to be selected in collection of data items to be selected
When the quantity of data item, all low-frequency data items in low-frequency data item set are carried out according to the ascending order sequence of accessed number
Sequence will be ordered as the 1st low-frequency data item as current low frequency number to generate the first sorted lists in the first sorted lists
According to item.For example, the quantity (for example, 326) when the low-frequency data item in low-frequency data item set is less than in collection of data items to be selected
Data item to be selected quantity (for example, 827) when, according to the ascending order sequence (sequence increased) of accessed number by low-frequency data
All low-frequency data items in item set are ranked up to generate the first sorted lists.In the first sorted lists, sort forward
Data item accessed number it is fewer, and the accessed number for the data item rearward of sorting is more.By the first sorted lists
In be ordered as the 1st low-frequency data item (that is, accessed the least data item of number or low-frequency data item) and be used as current low frequency
Data item.
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected
Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected.The application can
With use any existing text alignments determine the summary info of current low-frequency data item with collection of data items to be selected
In each data item to be selected summary info between content matching degree, wherein text alignments are, for example, to be based on semantic content
The content matching of comparison, the content matching based on keyword comparison or the content combined based on semantic content and keyword
Match.Wherein, the content matching degree of each data item to be selected and current low-frequency data item may be used to indicate that each number to be selected
According to close degree, similar degree, degree of correlation or the correlation degree of item and the current low-frequency data item.
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most
Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data
Item is saved in the idle storage space of storage equipment.By in all data item to be selected of collection of data items to be selected with current low frequency
The maximum data item to be selected of the content matching degree of data item and current low-frequency data item carry out data item combination refer to by with it is current
The maximum data item to be selected of the content matching degree of low-frequency data item and current low-frequency data item configuration file group, and will with it is current
The summary info of the summary info of the maximum data item to be selected of the content matching degree of low-frequency data item and current low-frequency data item into
Row merges with the summary info of configuration file group.Using the file group constituted the data item new as one, and by new number
It is saved according to item in the idle storage space of storage equipment, i.e., in the memory space of no storing data item.
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item
Data item.After in the idle storage space that new data item (the file group constituted) is saved in storage equipment, from institute
State the maximum data item to be selected of content matching degree deleted in collection of data items to be selected with current low-frequency data item.In addition, from depositing
Store up equipment in by with the maximum data item to be selected of the content matching degree of current low-frequency data item and current low-frequency data entry deletion (this
It is the text because being constituted with the maximum data item to be selected of content matching degree of current low-frequency data item and current low-frequency data item
Part group has been saved in the idle storage space of storage equipment).
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence
, if it is present carrying out step 6.5;If it does not exist, then terminating.It determines in first sorted lists with the presence or absence of row
Sequence 1 low-frequency data item after current low-frequency data item, which is meant that in determining first sorted lists, whether there is
Accessed number is higher than current low-frequency data item and adjacent low in the first sorted lists with the current low-frequency data item
Frequency data item.Such as, when current low-frequency data item is to be ordered as the 1st low-frequency data item, then sequence is in current low-frequency data
1 low-frequency data item is the low-frequency data item for being ordered as the 2nd after, i.e. is accessed number second most in the first sorted lists
Few low-frequency data item or data item.If it is present step 6.5 is carried out, if it does not exist, then terminating the above process.
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as
Current low-frequency data item, carries out step 6.1.For example, the low-frequency data item selection for being ordered as the 2nd in the first sorted lists is made
To carry out step 6.1 after current low-frequency data item, and so on, the 3rd, the 4th, the 5th will be ordered as in the first sorted lists
Position ..., until last 1 low-frequency data item is selected as current low-frequency data item.
Or when the quantity of the low-frequency data item in low-frequency data item set is greater than the number to be selected in collection of data items to be selected
According to item quantity when, all low-frequency data items in low-frequency data item set are grouped to generate multiple low-frequency data items
Group, so that the total of all low-frequency data items is accessed number in each low-frequency data item group of the multiple low-frequency data Xiang Zuzhong
Greater than 1.5 times of low frequency frequency threshold value.Determine the averagely accessed number of all low-frequency data items in each low-frequency data item group.
Preferably, difference between the averagely accessed number of plurality of low-frequency data Xiang Zuzhong any two low-frequency data item group
Absolute value is less than any reasonable values such as 20,30,40,50,60,70.
For example, the quantity (for example, 569) when the low-frequency data item in low-frequency data item set is greater than collection of data items to be selected
In data item to be selected quantity (for example, 516) when, 569 low-frequency data items in low-frequency data item set are grouped
To generate multiple low-frequency data item groups.Wherein, the application according to the quantity K of the low-frequency data item in low-frequency data item set and point
Parameter Z is organized to determine the number of packet G being grouped to low-frequency data item, whereinZ is equal to any conjunctions such as 3,4,5
Manage numerical value.When Z is equal to 5,569 low-frequency data items are divided into 113 low frequency numbers
According to item group.
Additionally, the total of all low-frequency data items is accessed in each low-frequency data item group of multiple low-frequency data Xiang Zuzhong
Number is greater than 1.1 times, 1.2 times, 1.3 times, 1.5 times or any reasonable value of low frequency frequency threshold value.Determine each low-frequency data
Item organizes the averagely accessed number of interior all low-frequency data items, i.e., the averagely accessed number of each low-frequency data item group.For example,
Low-frequency data item group includes low-frequency data item 1-5, and the accessed number of low-frequency data item 1-5 is 95,76,110,82 respectively
With 102, then the averagely accessed number of all low-frequency data items is 93 in low-frequency data item group.Plurality of low-frequency data item group
The absolute value of difference between the averagely accessed number of middle any two low-frequency data item group is less than 20,30,40,50,60,70
Etc. any reasonable value.
In the preselected number that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than to low frequency coefficient threshold value
It is determined as after low-frequency data item according to item, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area
Each data access request carries out data access operation.For example, multiple data access requests in the buffer area of system buffer equipment
Current Queue sequence are as follows: the first data access request, the second data access request, third data access request, the 4th data
Access request and the 5th data access request are then visited according to the first data access request, the second data access request, third data
Ask that the current Queue sequence of request, the 4th data access request and the 5th data access request visits each data in buffer area
Ask that request carries out data access operation.
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment
The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain
Querying condition.For example, when the first data access request in the buffer area for determining system buffer equipment, the second data access are asked
Ask, third data access request, the 4th data access request and the 5th data access request have been processed, therefore system is slow
Rush any data access request for not having in the buffer area of equipment and being saved.Then, to the big data storage system from number
According to requesting party received 6th data access request parsed to obtain new querying condition.Wherein new querying condition is for example
It is mobile communication and 5G and (uplink or downlink).
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition
A data item, and determine at least one target storage device involved in multiple data item.Wherein, catalogue storage server is used for
Store the directory information of all data item in big data storage system.For example, directory information is the identifier of data item, data item
Summary info, the storage equipment that is located at of the metadata information of data item, the keyword message of data item, data item etc..Mesh
Address book stored server looks into all data item in storage big data storage system according to querying condition or new querying condition
It askes, for example, using new in the keyword message of the summary info of data item, the metadata information of data item and/or data item
Querying condition (for example, mobile communication and 5G and (uplink or downlink)) inquired, looked into so that determination is described new
Multiple data item involved in inquiry condition.Determine that each data item is located at, is stored in or related according to directory information
Equipment is stored, thereby determines that at least one target storage device involved in multiple data item.Under special circumstances, multiple data
Item is likely located in same target storage device.
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device
Close at least one data item of the new querying condition.Each target storage device is according to the new querying condition at itself
It is retrieved in all data item stored, to obtain at least one data item, and by least one data obtained
Item is sent to the interface equipment of big data storage system.Preferably, there is no redundancies in the big data storage system of the application
Data item, i.e., each data item are unique.Wherein, interface equipment is used to receive data access request from request of data side,
And interface equipment is used to collection of data items or target data item set being sent to corresponding request of data side.
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh
Mark collection of data items is sent to request of data side indicated by the new data access request.Wherein according to system buffer equipment
Buffer area in multiple data access requests current Queue sequence in buffer area each data access request carry out data
Access operation includes:
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current
The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area
Sort primary data access request in current Queue sequence.
8.2, currently processed data access request is parsed to obtain currently processed querying condition.
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system
Related multiple data item, and determine at least one target storage device involved in multiple data item.Wherein, catalogue stores
Server is used to store the directory information of all data item in big data storage system.
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target
Equipment receives at least one data item for meeting the currently processed querying condition.
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute
It states target data item set and is sent to request of data side indicated by the currently processed data access request.
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked
Ask deletion.
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if
It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited
Ask request.
Claims (10)
1. a kind of method for determining low-frequency data item in the storage equipment stored for big data, which comprises
In response to receiving each storage equipment in big data storage system for multiple storage equipment of big data storage
The request of middle determining low-frequency data item, by the big data storage system from arbitrary request of data side received new data
Access request be redirected to the system buffer equipment of the big data storage system without by received new data access ask
Seek the corresponding storage equipment being sent in multiple storage equipment, with by the system buffer equipment by new data access request
Each ephemeral data in the ephemeral data item set of the description information for the querying condition for being included and the system buffer equipment
Item carries out content matching with the content matching degree of each ephemeral data item of determination, selects content matching from multiple ephemeral data items
Degree is greater than at least one selected ephemeral data item of matching degree threshold value, by least one selected selected ephemeral data item
It is sent to request of data side indicated by the new data access request, and in the buffer area of the system buffer equipment
Save the new data access request;
When the data access operation not being currently running in determining all storage equipment in the big data storage system,
The running log file of each storage equipment in multiple storage equipment in the big data storage system is obtained, and based on current
Statistical time section and the running log file of each storage equipment determine multiple data item for storing in each storage equipment
By statistics access information, it is more according to being stored in the threshold value at preset access time interval and each storage equipment
The access information by statistics of a data item determines the access information statistics file of each storage equipment, wherein between access time
Every be data item it is adjacent be accessed twice between a period of time;Wherein the access information statistics file includes frequency system
Table is counted, the frequency statistics table includes multiple frequency records, wherein the content of each frequency record is 8 tuples < data item mark
Know symbol, accessed number, statistics initial time, the statistics end time, sizes of memory, greater than access time interval threshold value time
Number, maximum access time interval, minimum access time interval >;
All data item of each storage equipment in current statistical time section are determined based on the access information statistics file
In be accessed number be less than low frequency frequency threshold value multiple pre-selection data item, according to the system of the big data storage system record
Device descriptive information in equipment determines the total memory capacity of each storage equipment, according to the storage information area of each storage equipment
Storage message file in domain determines the free memory capacity of each storage equipment, and each deposit is determined according to the following equation
Store up the low frequency coefficient of each pre-selection data item in equipment:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxIt is in current storage devices i-th
Maximum access time interval, t in multiple access time intervals of a pre-selection data itemiminIt is i-th in current storage devices
Minimum access time interval, t in multiple access time intervals of pre-selection data itemibeginIt is pre- for i-th in current storage devices
Select statistics initial time, the t of data itemiendIt is for the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, the R of current storage devices are the free memory capacity of current storage devices, UNiIt is in current storage devices i-th
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of a pre-selection data itemiIt is currently stored
The accessed number of i-th of pre-selection data item in equipment, wherein i is natural number and PT >=i >=1, PT are current storage devices
The middle quantity for preselecting data item and PT >=100;And
The pre-selection data item that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than low frequency coefficient threshold value is true
It is set to low-frequency data item.
2. according to the method described in claim 1, wherein, when the data management apparatus being located at outside big data storage system needs
When determining low-frequency data item in the storage equipment in big data storage system, the data management apparatus is deposited to the big data
Storage system is sent in each storage equipment in big data storage system for multiple storage equipment of big data storage and determines
The request of low-frequency data item;
Wherein by the big data storage system from arbitrary request of data side received new data access request redirect
To the big data storage system system buffer equipment without by received new data access request be sent to multiple deposit
Storage equipment in corresponding storage equipment include:
At the time of receiving the request of determining low-frequency data item with the big data storage system, the big data is stored
System then from arbitrary request of data side received new data access request be redirected to the big data storage system
System buffer equipment without by received new data access request be sent to it is multiple storage equipment in corresponding storages
Equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data item collection
It include multiple ephemeral data items in conjunction, and each ephemeral data item has summary info, the summary info is for briefly
Introduce the content of ephemeral data item;
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with it is described
Each ephemeral data item in the ephemeral data item set of system buffer equipment carries out content matching with each ephemeral data of determination
Content matching degree include:
By the description information and the system of the querying condition that new data access request is included by the system buffer equipment
The summary info for buffering each ephemeral data item in the ephemeral data item set of equipment is carried out based in semantic content comparison
It is every to determine to hold matching, the content matching compared based on keyword or the content matching that combines based on semantic content and keyword
The content matching degree of a ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to described
Request of data side indicated by new data access request is sent for showing the big data storage system pause data access
The response message in the buffer area of the system buffer equipment, and institute are had been saved to the new data access request
It states in response message and carries for showing that the new data access request from request of data side is current in the buffer area
The information of Queue sequence, wherein being determined in the buffer area according to the time span of new data access request being saved
Current Queue sequence of the new data access request in the buffer area, and according to being saved in current Queue sequence
The descending order of time span is ranked up new data access request.
3. method described in any one of -2 according to claim 1, wherein in the system data region of each storage equipment
Save respective running log file;
Wherein current statistical time section receives locating when the request of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of current date starts and a period of time of the consecutive days of predetermined quantity forward;Wherein the consecutive days of predetermined quantity are
10 consecutive days, 20 consecutive days or 30 consecutive days;
Wherein determined in each storage equipment based on the running log file in current statistical time section and each storage equipment
Storage multiple data item by statistics access information include:
All log recordings in the running log file of each storage equipment are selected based on current statistical time section
It takes to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section, to obtain
Obtain the access information by statistics of each data item;
The process of the multiple data item stored in each storage equipment is made of the access information by statistics of each data item
The access information of statistics;
Wherein each log recording include: data item identifier, access initial time, access the end time, sizes of memory and
Store initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
4. method described in any one of -3 according to claim 1,
Wherein the threshold value at the preset access time interval is 5 minutes, 10 minutes, 15 minutes or 20 minutes.
According to the multiple data item stored in the threshold value at preset access time interval and each storage equipment by system
The access information of meter determines that the access information statistics file of each storage equipment includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is counted
With the accessed number of each data item of determination and all access time intervals;
The threshold value greater than access time interval of each data item is determined based on all access time intervals of each data item
Number, interval of maximum access time and minimum access time interval;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as counting
Begin the time, the access end time accessed for the last time in the access information by statistics of each data item is determined as uniting
Count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item.
5. method described in any one of -4 according to claim 1,
The low frequency frequency threshold value is 100,150 or 200;
Device descriptive information in the system log device includes: all storage equipment included by big data storage system
Institute is added in total quantity, the total memory capacity of each storage equipment, the network address of each storage equipment and/or each storage equipment
State the time of big data storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, each data item
Sizes of memory, the starting storage time of each data item, the identifier of each data item, each data item summary info with
And the free memory capacity of each storage equipment;
The low frequency coefficient threshold value is 120,160 or 220.
6. a kind of system that low-frequency data item is determined in the storage equipment stored for big data, the system comprises:
Pretreatment unit, in response to receiving in big data storage system for the every of multiple storage equipment of big data storage
The request that low-frequency data item is determined in a storage equipment, the big data storage system is received from arbitrary request of data side
New data access request be redirected to the system buffer equipment of the big data storage system without by it is received new
Data access request is sent to the corresponding storage equipment in multiple storage equipment, with by the system buffer equipment by new number
The description information of querying condition for being included according to access request with it is every in the ephemeral data item set of the system buffer equipment
A ephemeral data item carries out content matching with the content matching degree of each ephemeral data item of determination, from multiple ephemeral data Xiang Zhongxuan
At least one selected ephemeral data item that content matching degree is greater than matching degree threshold value is selected, at least one is selected by selected
Ephemeral data item is sent to request of data side indicated by the new data access request, and in the system buffer equipment
Buffer area in save the new data access request;
Statistic unit, the data not being currently running in determining all storage equipment in the big data storage system are visited
When asking operation, the running log file of each storage equipment in multiple storage equipment in the big data storage system is obtained, and
And it is determined based on the running log file in current statistical time section and each storage equipment and to be stored in each storage equipment
The access information by statistics of multiple data item, according to the threshold value at preset access time interval and each storage equipment
The access information by statistics of multiple data item of middle storage determines the access information statistics file of each storage equipment, wherein
Access time interval be data item it is adjacent be accessed twice between a period of time;The wherein access information statistics file
Including frequency statistics table, the frequency statistics table includes multiple frequency records, wherein the content of each frequency record be 8 tuples <
The identifier of data item, statistics initial time, the statistics end time, sizes of memory, is greater than access time interval at accessed number
The number of threshold value, maximum access time interval, minimum access time interval >;
Computing unit determines the institute of each storage equipment in current statistical time section based on the access information statistics file
Have and be accessed multiple pre-selection data item that number is less than low frequency frequency threshold value in data item, according to the big data storage system
Device descriptive information in system log device determines the total memory capacity of each storage equipment, according to depositing for each storage equipment
The storage message file in information area is stored up to determine the free memory capacity of each storage equipment, according to following formula come really
The low frequency coefficient of each pre-selection data item in fixed each storage equipment:
Wherein DTFiFor low frequency coefficient, the t of i-th of pre-selection data item in current storage devicesimaxIt is in current storage devices i-th
Maximum access time interval, t in multiple access time intervals of a pre-selection data itemiminIt is i-th in current storage devices
Minimum access time interval, t in multiple access time intervals of pre-selection data itemibeginIt is pre- for i-th in current storage devices
Select statistics initial time, the t of data itemiendIt is for the statistics end time of i-th of pre-selection data item, C in current storage devices
Total memory capacity, the R of current storage devices are the free memory capacity of current storage devices, UNiIt is in current storage devices i-th
Number, the AN of the threshold value greater than access time interval in multiple access time intervals of a pre-selection data itemiIt is currently stored
The accessed number of i-th of pre-selection data item in equipment, wherein i is natural number and PT >=i >=1, PT are current storage devices
The middle quantity for preselecting data item and PT >=100;And
The pre-selection data item that low frequency coefficient in multiple pre-selection data item in each storage equipment is less than low frequency coefficient threshold value is true
It is set to low-frequency data item.
7. system according to claim 6, wherein when the data management apparatus being located at outside big data storage system needs
When determining low-frequency data item in the storage equipment in big data storage system, the data management apparatus is deposited to the big data
Storage system is sent in each storage equipment in big data storage system for multiple storage equipment of big data storage and determines
The request of low-frequency data item;
Wherein pretreatment unit by the big data storage system from arbitrary request of data side received new data access
Request is redirected to the system buffer equipment of the big data storage system without by the received new data access request hair of institute
Giving multiple corresponding storage equipment stored in equipment includes:
It, will be described at the time of pretreatment unit receives the request of determining low-frequency data item with the big data storage system
Big data storage system then from arbitrary request of data side received new data access request be redirected to the big number
According to storage system system buffer equipment without by received new data access request be sent in multiple storage equipment
Corresponding storage equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data item collection
It include multiple ephemeral data items in conjunction, and each ephemeral data item has summary info, the summary info is for briefly
Introduce the content of ephemeral data item;
The description for the querying condition that wherein new data access request is included by pretreatment unit by the system buffer equipment
It is every to determine that each ephemeral data item in the ephemeral data item set of information and the system buffer equipment carries out content matching
The content matching degree of a ephemeral data item includes:
The description information for the querying condition that new data access request is included by pretreatment unit by the system buffer equipment
It carries out with the summary info of each ephemeral data item in the ephemeral data item set of the system buffer equipment based in semanteme
Hold the content matching compared, the content matching based on keyword comparison or the content combined based on semantic content and keyword
It is equipped with the content matching degree for determining each ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
Wherein pretreatment unit is sent described big for showing to request of data side indicated by the new data access request
Data-storage system pause data access and the new data access request have been saved to the system buffer equipment
Response message in buffer area, and carry in the response message for showing the new data access from request of data side
The information for requesting the current Queue sequence in the buffer area, wherein according to new data access request in the buffer area
The time span being saved determine current Queue sequence of the new data access request in the buffer area, and working as
New data access request is ranked up according to the descending order for the time span being saved in preceding Queue sequence.
8. the system according to any one of claim 6-7, wherein in the system data region of each storage equipment
Save running log file;
Wherein current statistical time section receives locating when the request of determining low-frequency data item for big data storage system
The proxima luce (prox. luc) of current date starts and a period of time of the consecutive days of predetermined quantity forward;Wherein the consecutive days of predetermined quantity are
10 consecutive days, 20 consecutive days or 30 consecutive days;
Wherein statistic unit is each deposited based on the running log file determination in current statistical time section and each storage equipment
Storage equipment in store multiple data item by statistics access information include:
Statistic unit remembers all logs in the running log file of each storage equipment based on current statistical time section
Record is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Statistic unit carries out multiple log recordings of each storage equipment in current statistical time section according to data item
Classification, to obtain the access information by statistics of each data item;
The access information by statistics of each data item is constituted the multiple data stored in each storage equipment by statistic unit
The access information by statistics of item;
Wherein each log recording include: data item identifier, access initial time, access the end time, sizes of memory and
Store initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
9. the system according to any one of claim 6-8,
Wherein the threshold value at the preset access time interval is 5 minutes, 10 minutes, 15 minutes or 20 minutes.
Statistic unit is according to the multiple data item stored in the threshold value and each storage equipment at preset access time interval
By statistics access information determine it is each storage equipment access information statistics file include:
Access information by statistics of the statistic unit to each data item in the multiple data item stored in each storage equipment
It is counted with the accessed number of each data item of determination and all access time intervals;
Statistic unit based on all access time intervals of each data item determine each data item be greater than access time interval
Threshold value number, maximum access time interval and minimum access time interval;
Statistic unit determines access initial time accessed for the first time in the access information by statistics of each data item
To count initial time, by the access end time accessed for the last time in the access information by statistics of each data item
It is determined as counting the end time;
Statistic unit determines the sizes of memory of each data item based on the access information by statistics of each data item.
10. the system according to any one of claim 6-9,
The low frequency frequency threshold value is 100,150 or 200;
Device descriptive information in the system log device includes: the sum of storage equipment included by big data storage system
The big number is added in amount, the total memory capacity of each storage equipment, the network address of each storage equipment or each storage equipment
According to the time of storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, each data item
Sizes of memory, the starting storage time of each data item, the identifier of each data item, each data item summary info with
And the free memory capacity of each storage equipment;
The low frequency coefficient threshold value is 120,160 or 220.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811006475.6A CN109033462B (en) | 2018-08-30 | 2018-08-30 | Method and system for determining low frequency data items in a storage device for large data storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811006475.6A CN109033462B (en) | 2018-08-30 | 2018-08-30 | Method and system for determining low frequency data items in a storage device for large data storage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109033462A true CN109033462A (en) | 2018-12-18 |
CN109033462B CN109033462B (en) | 2023-04-28 |
Family
ID=64626509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811006475.6A Active CN109033462B (en) | 2018-08-30 | 2018-08-30 | Method and system for determining low frequency data items in a storage device for large data storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033462B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271104A (en) * | 2018-08-30 | 2019-01-25 | 杜广香 | It is a kind of for determining the method and system of the operating status of big data storage system |
CN109739817A (en) * | 2018-12-26 | 2019-05-10 | 杜广香 | A kind of method and system of the storing data file in big data storage system |
CN109753505A (en) * | 2018-12-26 | 2019-05-14 | 杜广香 | The method and system of temporary storage cell are created in big data storage system |
CN112965810A (en) * | 2021-01-27 | 2021-06-15 | 合肥大多数信息科技有限公司 | Multi-kernel browser data integration method based on shared network channel |
CN116541365A (en) * | 2023-07-06 | 2023-08-04 | 成都泛联智存科技有限公司 | File storage method, device, storage medium and client |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106775461A (en) * | 2016-11-30 | 2017-05-31 | 华为技术有限公司 | Hot spot data determines method, equipment and device |
-
2018
- 2018-08-30 CN CN201811006475.6A patent/CN109033462B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106775461A (en) * | 2016-11-30 | 2017-05-31 | 华为技术有限公司 | Hot spot data determines method, equipment and device |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271104A (en) * | 2018-08-30 | 2019-01-25 | 杜广香 | It is a kind of for determining the method and system of the operating status of big data storage system |
CN109271104B (en) * | 2018-08-30 | 2024-07-26 | 衡阳市芊芊网络科技有限公司 | Method and system for determining operation state of big data storage system |
CN109739817A (en) * | 2018-12-26 | 2019-05-10 | 杜广香 | A kind of method and system of the storing data file in big data storage system |
CN109753505A (en) * | 2018-12-26 | 2019-05-14 | 杜广香 | The method and system of temporary storage cell are created in big data storage system |
CN109753505B (en) * | 2018-12-26 | 2022-06-24 | 济南银华信息技术有限公司 | Method and system for creating temporary storage unit in big data storage system |
CN109739817B (en) * | 2018-12-26 | 2023-01-03 | 深圳光点软件科技有限公司 | Method and system for storing data file in big data storage system |
CN112965810A (en) * | 2021-01-27 | 2021-06-15 | 合肥大多数信息科技有限公司 | Multi-kernel browser data integration method based on shared network channel |
CN116541365A (en) * | 2023-07-06 | 2023-08-04 | 成都泛联智存科技有限公司 | File storage method, device, storage medium and client |
CN116541365B (en) * | 2023-07-06 | 2023-09-15 | 成都泛联智存科技有限公司 | File storage method, device, storage medium and client |
Also Published As
Publication number | Publication date |
---|---|
CN109033462B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033462A (en) | The method and system of low-frequency data item are determined in the storage equipment of big data storage | |
US9727572B2 (en) | Database compression system and method | |
US6751627B2 (en) | Method and apparatus to facilitate accessing data in network management protocol tables | |
CN107801086A (en) | The dispatching method and system of more caching servers | |
CN1392704A (en) | Selecting synchronous data | |
CN105991849B (en) | One kind is attended a banquet method of servicing, apparatus and system | |
KR101411321B1 (en) | Method and apparatus for managing neighbor node having similar characteristic with active node and computer readable medium thereof | |
CN100512152C (en) | Method of managing alarm inquiry | |
CN105550180B (en) | The method, apparatus and system of data processing | |
CN106940715B (en) | A kind of method and apparatus of the inquiry based on concordance list | |
CN109271104A (en) | It is a kind of for determining the method and system of the operating status of big data storage system | |
CN107203623B (en) | Load balancing and adjusting method of web crawler system | |
US20090037443A1 (en) | Intelligent group communication | |
CN100488173C (en) | A method for carrying out automatic selection of packet classification algorithm | |
US11681680B2 (en) | Method, device and computer program product for managing index tables | |
CN109271103A (en) | A kind of method and system carrying out data mixing storage in big data storage system | |
CN109271102A (en) | Identify the method and system of the low access degree storage equipment in big data storage system | |
CN109271101A (en) | It is a kind of for determining the method and system of the data balancing of big data storage system | |
CN101848149A (en) | Method and device for scheduling graded queues in packet network | |
CN109240988A (en) | For avoiding big data storage system from entering the method and system of access imbalance state | |
CN115412737B (en) | Live broadcast return source relay node determining method and device | |
CN109657018B (en) | Distributed vehicle running data query method and terminal equipment | |
US6996577B1 (en) | Method and system for automatically grouping objects in a directory system based on their access patterns | |
CN109150819B (en) | A kind of attack recognition method and its identifying system | |
CN108241758A (en) | Data query method and relevant device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230403 Address after: Room 201, No. 2-2-2 Yingcai Street, Tianhe District, Guangzhou City, Guangdong Province, 510000 (Location: 2) (Office only) Applicant after: Guangzhou sibeishou Engineering Consulting Co.,Ltd. Address before: 252659 Shandong province Liaocheng City Linqing City Dai Wan Town, the village of the South Village Health Room Applicant before: Du Guangxiang |
|
GR01 | Patent grant | ||
GR01 | Patent grant |