CN109271101A - It is a kind of for determining the method and system of the data balancing of big data storage system - Google Patents

It is a kind of for determining the method and system of the data balancing of big data storage system Download PDF

Info

Publication number
CN109271101A
CN109271101A CN201811005484.3A CN201811005484A CN109271101A CN 109271101 A CN109271101 A CN 109271101A CN 201811005484 A CN201811005484 A CN 201811005484A CN 109271101 A CN109271101 A CN 109271101A
Authority
CN
China
Prior art keywords
data
equipment
data item
storage
low frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811005484.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Guangxiang
Original Assignee
Du Guangxiang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Du Guangxiang filed Critical Du Guangxiang
Priority to CN201811005484.3A priority Critical patent/CN109271101A/en
Publication of CN109271101A publication Critical patent/CN109271101A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of for determining the method and system of the data balancing of big data storage system, when wherein method includes: the data access operation not being currently running in determining all storage equipment in the big data storage system, the access information statistics file of each storage equipment is determined according to the access information by statistics of the multiple data item stored in each storage equipment;Calculate the coefficient of balance of big data storage system;When the coefficient of balance of big data storage system is greater than system balancing coefficient threshold, determine that the data balancing of the big data storage system is in non-equilibrium state.

Description

It is a kind of for determining the method and system of the data balancing of big data storage system
Technical field
The present invention relates to big data field of storage and cloud storage field, and more particularly, to one kind by monitoring, based on Calculate or determine the method and system of the data balancing of big data storage system.
Background technique
Currently, data volume is just with geometric progression as the use of various types of information equipments becomes more and more frequently Mode carries out explosive increase.In order to obtain useful information from the data of magnanimity, it is necessary to effectively be deposited to the data of magnanimity Storage.Big data storage system can satisfy the demand to effectively being stored to mass data.However, being deposited in current big data In storage system, since the accessed number of each data item is different, so as to cause data item different in each storage equipment Accessed number gap can be increasing.Further, the data that this gap will lead between storage equipment are accessed number It is unbalance, and then the data balancing of big data system is caused unbalance problem occur.For this purpose, existing in the prior art to prison Survey, be calculated or determined the demand of the data balancing of big data storage system.
Summary of the invention
According to an aspect of the present invention, it provides a kind of for determining the side of the data balancing of big data storage system Method, which comprises
In response to receiving the request for determining the data balancing of big data storage system, the big data is stored System from arbitrary request of data side received new data access request be redirected to the big data storage system be System buffering equipment without by institute received new data access request be sent to it is multiple store equipment in corresponding storage equipment, The description information for the querying condition that new data access request is included and the system to be delayed by the system buffer equipment The each ephemeral data item rushed in the ephemeral data item set of equipment carries out content matching in each ephemeral data item of determination Hold matching degree, content matching degree is selected to be greater than at least one selected ephemeral data of matching threshold from multiple ephemeral data items , at least one selected selected ephemeral data item is sent to data indicated by the new data access request and is asked The side of asking, and the new data access request is saved in the buffer area of the system buffer equipment;
The data access behaviour not being currently running in determining all storage equipment in the big data storage system When making, the running log file of each storage equipment in multiple storage equipment in the big data storage system, and base are obtained Running log file in current statistical time section and each storage equipment determine stored in each storage equipment it is multiple The access information by statistics of data item, according to the access by statistics of the multiple data item stored in each storage equipment Information determines the access information statistics file of each storage equipment;Wherein the access information statistics file includes data item statistics Table, the data item statistical form includes multiple data item records, wherein the content of each data item record is 6 tuples < data item Identifier, accessed number, statistics initial time, the statistics end time, sizes of memory, storage initial time >;
The access information statistics file of each storage equipment is parsed, will each be deposited in current statistical time section It stores up the data item that accessed number is less than low frequency frequency threshold value in all data item of equipment and is determined as low-frequency data item, determine every The low frequency term quantity of low-frequency data item included by a storage equipment;Low frequency term quantity in multiple storage equipment is set greater than low frequency The storage equipment of standby threshold value is determined as low frequency storage equipment and determines the quantity of low frequency storage equipment in big data storage system;It will The storage equipment that low frequency term quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment and determines that big data is deposited The quantity of non-low frequency storage equipment in storage system;
Based on the access information statistics file of each low frequency storage equipment, each low frequency of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of data item and accessed number and the low-frequency data item of determining each low frequency storage equipment, Each low frequency storage respective all data item of equipment are determined based on the access information statistics file of each low frequency storage equipment Always accessed number;Setting in the identifier of equipment and the system log device of big data storage system is stored according to each low frequency Standby description information determines each low frequency storage respective total memory capacity of equipment,
Based on the access information statistics file of each non-low frequency storage equipment, each of each non-low frequency storage equipment is determined The low frequency of the sizes of memory of low-frequency data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Item quantity determines that each non-low frequency stores the respective institute of equipment based on the access information statistics file of each non-low frequency storage equipment There is total accessed number of data item;According to the system note of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in recording apparatus determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system, wherein DLB is that low frequency stores in big data storage system The coefficient of balance of equipment;LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN deposits for big data The quantity of low frequency storage equipment in storage system;LTSijThe storage ruler of j-th of low-frequency data item in equipment is stored for i-th of low frequency It is very little, LSiTotal sizes of memory of all low-frequency data items of equipment, LC are stored for i-th of low frequencyiEquipment is stored for i-th of low frequency Total memory capacity, LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is low for i-th Total accessed number of all low-frequency data items of frequency storage equipment, LTiAll data item of equipment are stored for i-th of low frequency Always accessed number;Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and And LTNi≥100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;NLTNmIt is non-low for m-th The low frequency term quantity of the low-frequency data item of frequency storage equipment, NLDN are the number that non-low frequency stores equipment in big data storage system Amount;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment, NLS are stored for m-th of non-low frequencymIt is deposited for m-th of non-low frequency Store up total sizes of memory of all low-frequency data items of equipment, NLCmThe total memory capacity of equipment is stored for m-th of non-low frequency, NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency storage Total accessed number of all low-frequency data items of equipment, NLTmThe total of all data item of equipment is stored for m-th of non-low frequency Accessed number;Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 are simultaneously And NLTNm≥50;And
When the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold, the big data storage is determined The data balancing of system is in non-equilibrium state.
Wherein, when the data management apparatus being located at outside big data storage system it needs to be determined that the number of big data storage system When according to balance, the data management apparatus sends the number for determining big data storage system to the big data storage system According to the request of balance;
Wherein by the big data storage system from arbitrary request of data side received new data access request weight Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more Corresponding storage equipment in a storage equipment includes: to be received with the big data storage system for determining that big data stores At the time of the request of the data balancing of system, by the big data storage system then from arbitrary institute of request of data side Received new data access request is redirected to the system buffer equipment of the big data storage system without institute is received New data access request is sent to the corresponding storage equipment in multiple storage equipment;The wherein new data access request Description information including querying condition and querying condition includes multiple ephemeral data items in the ephemeral data item set, and Each ephemeral data item has summary info, and the summary info is used to briefly introduce the content of ephemeral data item;Wherein by The system buffer equipment sets the description information for the querying condition that new data access request is included and the system buffer Each ephemeral data item in standby ephemeral data item set carries out content matching with the content of each ephemeral data item of determination With degree include: the querying condition for being included by new data access request by the system buffer equipment description information with it is described The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true The content matching degree of fixed each ephemeral data item and the querying condition;Wherein the matching degree threshold value is 60%, and content The range of matching degree is [0%, 100%];
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to Request of data side indicated by the new data access request is sent for showing the big data storage system pause data Access and the new data access request have been saved to the response message in the buffer area of the system buffer equipment, and And it carries in the response message for showing the new data access request from request of data side in the buffer area The information of current Queue sequence, wherein coming in the buffer area according to the time span of new data access request being saved Determine current Queue sequence of the new data access request in the buffer area, and according to being protected in current Queue sequence The descending order for the time span deposited is ranked up new data access request.
Wherein running log file is saved in the system data region of each storage equipment;Wherein current statistical time Section is locating when big data storage system receives the request for determining the data balancing of big data storage system works as The proxima luce (prox. luc) on preceding date starts and a period of time of the consecutive days of predetermined quantity forward;Wherein the consecutive days of predetermined quantity are 10 A consecutive days, 20 consecutive days or 30 consecutive days;The wherein fortune based on current statistical time section and each storage equipment Row journal file determines that the access information by statistics of the multiple data item stored in each storage equipment includes: based on current Statistical time section all log recordings in the running log file of each storage equipment are chosen it is each to obtain Store multiple log recordings of the equipment in current statistical time section;According to data item to each storage equipment current Multiple log recordings in statistical time section are classified, to obtain the access information by statistics of each data item;By Each data item by statistics access information constitute it is each storage equipment in store multiple data item by statistics Access information;Wherein each log recording includes: the identifier of data item, access initial time, access end time, storage ruler Very little and storage initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
Determine that each storage is set according to the access information by statistics of the multiple data item stored in each storage equipment Standby access information statistics file includes: to unite to passing through for each data item in the multiple data item stored in each storage equipment The access information of meter is counted with the accessed number of each data item of determination;By the access by statistics of each data item Access initial time accessed for the first time is determined as counting initial time in information, by the visit by statistics of each data item Ask that the access end time accessed for the last time in information is determined as counting the end time;Pass through system based on each data item The access information of meter determines the sizes of memory of each data item;According to the storage letter in the storage information area of each storage equipment File is ceased to determine storage initial time of each data item in storage equipment.
The low frequency frequency threshold value is 100,120,150 or 200;Device descriptive information packet in the system log device It includes: total memory capacity, the Mei Gecun of the total quantity of all storage equipment included by big data storage system, each storage equipment The time of the big data storage system is added in the network address and/or each storage equipment of storing up equipment;Each storage equipment The storage message file stored in information area includes: the total quantity of data item, the sizes of memory of each data item, each data Starting storage time, the identifier of each data item, the summary info of each data item and each sky for storing equipment of item Not busy memory capacity;
The low frequency equipment threshold value is 100,120,150,200,300,400 or 500;The system balancing coefficient threshold is 50%, 55%, 60%, 65% or 70%.
After the data balancing for determining the big data storage system is in non-equilibrium state, further includes: will be each It stores 2 times of the data item that accessed number is greater than low frequency frequency threshold value in all data item of equipment and is determined as data item to be selected To obtain multiple data item to be selected, and respective item set to be selected is constituted by multiple data item to be selected of each storage equipment It closes, all low-frequency data items of each storage equipment is constituted into respective low-frequency data item set;It is directed to multiple storage equipment In current storage equipment: the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment be less than or Equal to the data item to be selected in the collection of data items to be selected of current storage equipment quantity when, according to the ascending order of accessed number All low-frequency data items in the low-frequency data item set of current storage equipment are ranked up to generate the first sequence by sequence List, will be ordered as the 1st low-frequency data item as current low-frequency data item in the first sorted lists,
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected;
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data Item is saved in the idle storage space of current storage equipment;
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item Data item;
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence , if it is present carrying out step 6.5;If it does not exist, then terminating;
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as Current low-frequency data item, carries out step 6.1;
Alternatively, when the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current deposit When storing up the quantity of the data item to be selected in the collection of data items to be selected of equipment, by the low-frequency data item set of current storage equipment In all low-frequency data items be grouped to generate multiple low-frequency data item groups so that the multiple low-frequency data Xiang Zuzhong Total accessed number of all low-frequency data items is greater than 1.5 times of low frequency frequency threshold value in each low-frequency data item group, and determines The averagely accessed number of all low-frequency data items in each low-frequency data item group, wherein the average quilt of each low-frequency data item group The absolute value of difference between access times is less than 20.
After the data balancing for determining the big data storage system is in non-equilibrium state, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area Each data access request carries out data access operation;
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain Querying condition;
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition A data item, and determine at least one target storage device involved in multiple data item;
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device Close at least one data item of the new querying condition;
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh Mark collection of data items is sent to request of data side indicated by the new data access request.
Wherein according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment to buffer area In each data access request carry out data access operation include:
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area Sort primary data access request in current Queue sequence;
8.2, currently processed data access request is parsed to obtain currently processed querying condition;
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system Related multiple data item, and determine at least one target storage device involved in multiple data item;
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target Equipment receives at least one data item for meeting the currently processed querying condition;
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute It states target data item set and is sent to request of data side indicated by the currently processed data access request;
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked Ask deletion;
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited Ask request.
According to another aspect of the present invention, it provides and a kind of is for determine the data balancing of big data storage system System, the system comprises:
Pretreatment unit, in response to receiving the request for determining the data balancing of big data storage system, by institute State big data storage system from arbitrary request of data side received new data access request be redirected to the big data The system buffer equipment of storage system without by received new data access request be sent to it is multiple storage equipment in phases The storage equipment answered, with the description information for the querying condition for being included by new data access request by the system buffer equipment Content matching is carried out with each ephemeral data item in the ephemeral data item set of the system buffer equipment each to face with determination When data item content matching degree, from multiple ephemeral data items select content matching degree be greater than matching threshold at least one choosing At least one selected selected ephemeral data item is sent to the new data access request institute by fixed ephemeral data item The request of data side of instruction, and the new data access request is saved in the buffer area of the system buffer equipment;
Statistic unit, the number not being currently running in determining all storage equipment in the big data storage system When according to access operation, the running log text of each storage equipment in multiple storage equipment in the big data storage system is obtained Part, and determined in each storage equipment and deposited based on the running log file in current statistical time section and each storage equipment The access information by statistics of multiple data item of storage, according to the multiple data item stored in each storage equipment by uniting The access information of meter determines the access information statistics file of each storage equipment;Wherein the access information statistics file includes number According to item statistical form, the data item statistical form includes multiple data item records, wherein the content of each data item record is 6 tuples <identifier of data item, accessed number, statistics initial time, statistics end time, sizes of memory, storage initial time>;
Computing unit parses the access information statistics file of each storage equipment, by current statistical time area Number, which is accessed, in all data item of interior each storage equipment is determined as low frequency number less than the data item of low frequency frequency threshold value According to item, the low frequency term quantity of low-frequency data item included by each storage equipment is determined;By low frequency item number in multiple storage equipment The storage equipment that amount is greater than low frequency equipment threshold value is determined as low frequency storage equipment and determines low frequency storage in big data storage system The quantity of equipment;The storage equipment that low frequency term quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment simultaneously Determine the quantity of non-low frequency storage equipment in big data storage system;
Based on the access information statistics file of each low frequency storage equipment, each low frequency of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of data item and accessed number and the low-frequency data item of determining each low frequency storage equipment, Each low frequency storage respective all data item of equipment are determined based on the access information statistics file of each low frequency storage equipment Always accessed number;Setting in the identifier of equipment and the system log device of big data storage system is stored according to each low frequency Standby description information determines each low frequency storage respective total memory capacity of equipment,
Based on the access information statistics file of each non-low frequency storage equipment, each of each non-low frequency storage equipment is determined The low frequency of the sizes of memory of low-frequency data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Item quantity determines that each non-low frequency stores the respective institute of equipment based on the access information statistics file of each non-low frequency storage equipment There is total accessed number of data item;According to the system note of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in recording apparatus determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system, wherein DLB is that low frequency stores in big data storage system The coefficient of balance of equipment;LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN deposits for big data The quantity of low frequency storage equipment in storage system;LTSijThe storage ruler of j-th of low-frequency data item in equipment is stored for i-th of low frequency It is very little, LSiTotal sizes of memory of all low-frequency data items of equipment, LC are stored for i-th of low frequencyiEquipment is stored for i-th of low frequency Total memory capacity, LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is low for i-th Total accessed number of all low-frequency data items of frequency storage equipment, LTiAll data item of equipment are stored for i-th of low frequency Always accessed number;Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and And LTNi≥100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;NLTNmIt is non-low for m-th The low frequency term quantity of the low-frequency data item of frequency storage equipment, NLDN are the number that non-low frequency stores equipment in big data storage system Amount;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment, NLS are stored for m-th of non-low frequencymIt is deposited for m-th of non-low frequency Store up total sizes of memory of all low-frequency data items of equipment, NLCmThe total memory capacity of equipment is stored for m-th of non-low frequency, NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency storage Total accessed number of all low-frequency data items of equipment, NLTmThe total of all data item of equipment is stored for m-th of non-low frequency Accessed number;Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 are simultaneously And NLTNm≥50;And
Determination unit, when the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold, described in determination The data balancing of big data storage system is in non-equilibrium state.
Wherein, when the data management apparatus being located at outside big data storage system it needs to be determined that the number of big data storage system When according to balance, the data management apparatus sends the number for determining big data storage system to the big data storage system According to the request of balance;
Wherein pretreatment unit by the big data storage system from arbitrary request of data side received new data Access request be redirected to the system buffer equipment of the big data storage system without by received new data access ask Seeking the corresponding storage equipment being sent in multiple storage equipment includes: that pretreatment unit is received with the big data storage system It is at the time of request for determining the data balancing of big data storage system, the big data storage system is subsequent From arbitrary request of data side received new data access request be redirected to the big data storage system system it is slow Rush equipment without by received new data access request be sent to it is multiple storage equipment in corresponding storage equipment;Wherein The new data access request includes the description information of querying condition and querying condition, includes in the ephemeral data item set Multiple ephemeral data items, and each ephemeral data item has summary info, and the summary info is interim for briefly introducing The content of data item;The wherein inquiry item that new data access request is included by pretreatment unit by the system buffer equipment Each ephemeral data item in the ephemeral data item set of the description information of part and the system buffer equipment carries out content matching Content matching degree with each ephemeral data item of determination includes: that pretreatment unit is visited new data by the system buffer equipment It asks and each of the description information of included querying condition and the ephemeral data item set of the system buffer equipment is requested to be faced When data item summary info carry out based on semantic content compare content matching, based on keyword compare content matching or base In the content matching that semantic content and keyword combine with the content of determination each ephemeral data item and the querying condition With degree;Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
Wherein pretreatment unit saved in the buffer area of the system buffer equipment the new data access request it Afterwards further include: send to request of data side indicated by the new data access request for showing big data storage system System pause data access and the new data access request have been saved in the buffer area of the system buffer equipment Response message, and carry in the response message for showing the new data access request from request of data side described The information of current Queue sequence in buffer area, wherein being saved according to new data access request in the buffer area Time span determines current Queue sequence of the new data access request in the buffer area, and in current Queue sequence In new data access request is ranked up according to the descending order for the time span being saved.
Wherein running log file is saved in the system data region of each storage equipment;
Wherein current statistical time section is that big data storage system is received for determining big data storage system The proxima luce (prox. luc) of locating current date starts and at one section of consecutive days of predetermined quantity forward when the request of data balancing Between;Wherein the consecutive days of predetermined quantity are 10 consecutive days, 20 consecutive days or 30 consecutive days;Wherein statistic unit is based on working as The running log file in preceding statistical time section and each storage equipment determines the multiple data stored in each storage equipment The access information by statistics of item includes: operation of the statistic unit based on current statistical time section to each storage equipment All log recordings in journal file are chosen more in current statistical time section to obtain each storage equipment A log recording;Statistic unit remembers multiple logs of each storage equipment in current statistical time section according to data item Record is classified, to obtain the access information by statistics of each data item;Statistic unit is by each data item by uniting The access information of meter constitutes the access information by statistics of the multiple data item stored in each storage equipment;Wherein each day Will record includes: the identifier of data item, access initial time, access end time, sizes of memory and storage initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
Statistic unit determines every according to the access information by statistics of the multiple data item stored in each storage equipment The access information statistics file of a storage equipment includes: statistic unit to every in the multiple data item stored in each storage equipment The access information by statistics of a data item is counted with the accessed number of each data item of determination;Statistic unit will be every Access initial time accessed for the first time is determined as counting initial time in the access information by statistics of a data item, will At the end of the access end time being accessed for the last time in the access information by statistics of each data item is determined as statistics Between;Statistic unit determines the sizes of memory of each data item based on the access information by statistics of each data item;Statistics is single Member determines each data item in storage equipment according to the storage message file in the storage information area of each storage equipment Storage initial time.
The low frequency frequency threshold value is 100,120,150 or 200;Device descriptive information packet in the system log device It includes: total memory capacity, the Mei Gecun of the total quantity of all storage equipment included by big data storage system, each storage equipment The time of the big data storage system is added in the network address and/or each storage equipment of storing up equipment;Each storage equipment The storage message file stored in information area includes: the total quantity of data item, the sizes of memory of each data item, each data Starting storage time, the identifier of each data item, the summary info of each data item and each sky for storing equipment of item Not busy memory capacity;
The low frequency equipment threshold value is 100,120,150,200,300,400 or 500;The system balancing coefficient threshold is 50%, 55%, 60%, 65% or 70%.
Further include adjustment unit, is greater than low frequency number for number will to be accessed in all data item of each storage equipment 2 times of data item of threshold value is determined as data item to be selected to obtain multiple data item to be selected, and by the multiple of each storage equipment Data item to be selected constitutes respective collection of data items to be selected, all low-frequency data items of each storage equipment is constituted respective low Frequency collection of data items;The current storage equipment being directed in multiple storage equipment: in the low-frequency data of current storage equipment The quantity of low-frequency data item in item set is less than or equal to the number to be selected in the collection of data items to be selected of current storage equipment According to item quantity when, according to accessed number ascending order sequence will in the low-frequency data item set of current storage equipment own Low-frequency data item is ranked up to generate the first sorted lists, and the 1st low-frequency data item will be ordered as in the first sorted lists As current low-frequency data item,
14.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected;
14.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data Item is saved in the idle storage space of current storage equipment;
14.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item Data item;
14.4,1 low frequency number after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence According to item, if it is present carrying out 14.5;If it does not exist, then terminating;
14.5, by sequence in first sorted lists, 1 low-frequency data item selects to make after current low-frequency data item For current low-frequency data item, 14.1 are carried out;
Alternatively, when the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current deposit When storing up the quantity of the data item to be selected in the collection of data items to be selected of equipment, by the low-frequency data item set of current storage equipment In all low-frequency data items be grouped to generate multiple low-frequency data item groups so that the multiple low-frequency data Xiang Zuzhong Total accessed number of all low-frequency data items is greater than 1.5 times of low frequency frequency threshold value in each low-frequency data item group, and determines The averagely accessed number of all low-frequency data items in each low-frequency data item group, wherein the average quilt of each low-frequency data item group The absolute value of difference between access times is less than 20.
The currently queuing of multiple data access requests is suitable in buffer area of the pretreatment unit according to system buffer equipment Each data access request in ordered pair buffer area carries out data access operation;
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain Querying condition;
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition A data item, and determine at least one target storage device involved in multiple data item;
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device Close at least one data item of the new querying condition;
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh Mark collection of data items is sent to request of data side indicated by the new data access request.
Wherein the currently queuing of multiple data access requests is suitable in buffer area of the pretreatment unit according to system buffer equipment Each data access request in ordered pair buffer area carries out data access operation
16.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area Sort primary data access request in current Queue sequence;
16.2, currently processed data access request is parsed to obtain currently processed querying condition;
16.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system Related multiple data item, and determine at least one target storage device involved in multiple data item;
16.4, the currently processed querying condition is sent to each target storage device, and is deposited from each target Storage equipment receives at least one data item for meeting the currently processed querying condition;
16.5, target data item set will be formed from each received all data item of target storage device institute, and will The target data item set is sent to request of data side indicated by the currently processed data access request;
16.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked Ask deletion;
16.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if It is then to carry out 16.1;If it is not, then determining that any data access for not having in the buffer area of system buffer equipment and being saved is asked It asks.
Detailed description of the invention
By reference to the following drawings, exemplary embodiments of the present invention can be more fully understood by:
Fig. 1 is the stream according to the method for the data balancing for determining big data storage system of embodiment of the present invention Cheng Tu;
Fig. 2 is the structural schematic diagram according to the big data storage system of embodiment of the present invention;And
Fig. 3 is the knot according to the system of the data balancing for determining big data storage system of embodiment of the present invention Structure schematic diagram.
Specific embodiment
Fig. 1 is the stream according to the method for the data balancing for determining big data storage system of embodiment of the present invention Cheng Tu.
It, will be described in response to receiving the request for determining the data balancing of big data storage system in step 101 Big data storage system from arbitrary request of data side received new data access request be redirected to the big data and deposit The system buffer equipment of storage system without by received new data access request be sent to it is multiple storage equipment in it is corresponding Storage equipment, with the description information of the querying condition that is included by new data access request by the system buffer equipment with It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching The content matching degree of data item, at least one that selection content matching degree is greater than matching threshold from multiple ephemeral data items are selected Ephemeral data item, it is signified that at least one selected selected ephemeral data item is sent to the new data access request The request of data side shown, and the new data access request is saved in the buffer area of the system buffer equipment.
Wherein, when the data management apparatus being located at outside big data storage system it needs to be determined that the number of big data storage system When according to balance, the data management apparatus sends the number for determining big data storage system to the big data storage system According to the request of balance.Data management apparatus outside big data storage system can be by the maintenance of big data storage system Personnel, administrative staff or operation personnel operate or control.For example, the maintenance personnel of big data storage system, administrative staff Or operation personnel can be triggered periodically or according to the actual operation to the data balancing of big data storage system It determines.It include multiple storage equipment in big data storage system, and can to store multiple data item each for each storage equipment The memory capacity of storage equipment can be arbitrary reasonable value.Each data item can be various types of data files, example Such as data file of text type, audio types, video type.Wherein low-frequency data item refers to interviewed in specific time Ask averagely accessed number of the number lower than all data item of big data storage system, or all data lower than storage equipment The data item of averagely accessed number of item etc..Low frequency storage equipment for example refers to the always quilt of all data item in specific time The average always accessed number of access times all storage respective all data item of equipment lower than in big data storage system.
Wherein by the big data storage system from arbitrary request of data side received new data access request weight Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more Corresponding storage equipment in a storage equipment includes:
The request for determining the data balancing of big data storage system is received with the big data storage system Moment, by the big data storage system then from arbitrary request of data side received new data access request weight Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more Corresponding storage equipment in a storage equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally Introduce the content of ephemeral data item with including.
The request for determining the data balancing of big data storage system is received in the big data storage system Moment may receive multiple new data access requests.At this point, promoting big data storage system then from one Or multiple arbitrary request of data sides received all new data access requests be all redirected to big data storage The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding Store equipment.In general, big data storage system can the querying condition according to included by new data access request in the big number According to multiple data item involved in querying condition determining in the catalogue storage server of storage system, and determine multiple data item institutes At least one target storage device being related to.The currently processed querying condition is sent to each target storage device, and And at least one data item for meeting the currently processed querying condition is received from each target storage device.And in order to carry out The monitoring of the data balancing of big data storage system or when determining, big data storage system is by all new data access requests All it is redirected to the system buffer equipment of the big data storage system.Wherein system buffer equipment is located at big data storage system System is internal, and for store the ephemeral data item set including multiple ephemeral data items, or be used for data access request into Row buffering.Wherein querying condition is, for example, mobile communication and 5G and (uplink or downlink).In this case, it looks into The description information of inquiry condition is, for example, the uplink or downlink of 5G mobile communication.It include more in ephemeral data item set A ephemeral data item, and each ephemeral data item can be various types of data files, such as text type, audio class The data file of type, video type etc..Each ephemeral data item or each data item all have summary info and summary info For briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0, using popular easy The introduction understood allows your 21 days association C++ this programming languages.
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching The content matching degree of data item includes:
The description information for the querying condition for being included by new data access request by the system buffer equipment with it is described The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true The content matching degree of fixed each ephemeral data item and the querying condition.The application can be used any existing text and compare other side Formula determines the description information of querying condition that new data access request is included and the ephemeral data item of system buffer equipment Content matching degree between the summary info of each ephemeral data item in set, wherein text alignments are, for example, to be based on language Content matching that adopted content compares, the content matching compared based on keyword or in being combined based on semantic content and keyword Hold matching.Wherein, the content matching degree of each ephemeral data item and the querying condition may be used to indicate that each ephemeral data Item close degree, similar degree, degree of correlation or correlation degree with the querying condition.
Wherein the matching degree threshold value is 55%, 60%, 65%, 70% or any reasonable value, and content matching degree Range be [0%, 100%], i.e. content matching degree can be any numerical value between from 0% to 100%.From multiple nonces According at least one the selected ephemeral data item for selecting content matching degree to be greater than matching degree threshold value in item, i.e., from multiple ephemeral datas Selection content matching degree is greater than 55%, 60%, 65% or 70% at least one selected ephemeral data item in.It will be selected At least one selected ephemeral data item be sent to request of data side indicated by the new data access request, and The new data access request is saved in the buffer area of the system buffer equipment.By it is selected at least one selected face When the data item purpose that is sent to request of data side indicated by the new data access request be to allow request of data side can Content relevant to data access request is obtained, in the case where big data storage system suspends data access service to promote to count According to requesting party it will be seen that related content.The new data access is wherein saved in the buffer area of the system buffer equipment After request further include: send to request of data side indicated by the new data access request for showing the big data Storage system pause data access and the new data access request have been saved to the buffering of the system buffer equipment Response message in area, and carry in the response message for showing the new data access request from request of data side The information of current Queue sequence in the buffer area, wherein according to the quilt of new data access request in the buffer area The time span of preservation determines current Queue sequence of the new data access request in the buffer area, and when front row New data access request is ranked up according to the descending order for the time span being saved in team's sequence.That is, be saved Time span is longer, then the current Queue sequence of new data access request is more forward.Preferably, it is visited to the new data Ask that the indicated request of data side of request is sent for showing the big data storage system pause data access and described new Data access request has been saved to after the response message in the buffer area of the system buffer equipment further include: periodically Ground is sent to request of data side indicated by the new data access request for showing the new number from request of data side According to the notification message of current Queue sequence of the access request in the buffer area.
In step 102, it is not currently running in determining all storage equipment in the big data storage system When data access operation, the running log text of each storage equipment in multiple storage equipment in the big data storage system is obtained Part, and determined in each storage equipment and deposited based on the running log file in current statistical time section and each storage equipment The access information by statistics of multiple data item of storage, according to the multiple data item stored in each storage equipment by uniting The access information of meter determines the access information statistics file of each storage equipment;Wherein the access information statistics file includes number According to item statistical form, the data item statistical form includes multiple data item records, wherein the content of each data item record is 6 tuples <identifier of data item, accessed number, statistics initial time, statistics end time, sizes of memory, storage initial time>.
The data access operation being wherein currently running refers to that storage equipment is looked into according to transmitted by big data storage system Inquiry condition carries out data retrieval in the memory space of itself, will constitute item set by data retrieval data item obtained It closes, collection of data items is sent to the operation processing of request of data side by big data storage system.
Wherein respective running log file is saved in the system data region of each storage equipment.Wherein running log File includes multiple log recordings, wherein each log recording includes: the identifier of data item, access initial time, access knot Beam time, sizes of memory and storage initial time.Wherein the identifier of data item can be the title of data item, data item only One mark, coding of data item etc. are capable of the information of unique identification data item.Access initial time refers to that current log records institute The accessed initial time of the data item being related to.It is accessed that the access end time refers to that current log records related data item End time.For example, may be related to the operation such as reading, modify when accessing to the data item in storage equipment, visit It asks initial time and accesses the end time for indicating initial time and the end time of this operation.Sizes of memory is data item Sizes of memory in storage equipment.Storage initial time is that data item starts to deposit in storage equipment or big data storage system The initial time of storage, that is, data item is saved in storage equipment or big data storage system to provide the starting of access service Time.In this application, access includes reading and/or modifying.
Wherein current statistical time section is that big data storage system is received for determining big data storage system The proxima luce (prox. luc) of locating current date starts and at one section of consecutive days of predetermined quantity forward when the request of data balancing Between;Wherein the consecutive days of predetermined quantity are 10 consecutive days, 20 consecutive days or 30 consecutive days.For example, big data storage system It is the 11:25 on the 11st of August in 2018 that system, which was received for the time for the request for determining the data balancing of big data storage system: 36, then big data storage system receives locating current when the request for determining the data balancing of big data storage system Date is on August 11st, 2018.Big data storage system receives the data balancing for determining big data storage system The proxima luce (prox. luc) of locating current date is on August 10th, 2018 when request.Current statistical time section is big data storage system The proxima luce (prox. luc) of system current date locating when receiving the request for determining the data balancing of big data storage system starts And forward a period of time of the consecutive days of predetermined quantity (for example, 10 natural numbers), i.e., current statistical time section is 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th.
Wherein determine that each storage is set based on the running log file in current statistical time section and each storage equipment The access information by statistics of multiple data item of standby middle storage includes:
Based on current statistical time section to it is each storage equipment running log file in all log recordings into Row is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section, To obtain the access information by statistics of each data item;
The multiple data item stored in each storage equipment are made of the access information by statistics of each data item By the access information of statistics.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018, That is 10 consecutive days, then based on 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th to each storage equipment Running log file in all log recordings chosen to obtain each storage equipment in the 00:00 on the 1st of August in 2018: All log recordings in 00 to 2018 on August, 10,23:59:59.According to data item (for example, identifier of data item) to every Multiple log recordings of a storage equipment in 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th are divided Class, to obtain the access information by statistics of each data item.Each data item by statistics access information be, for example, All accessed information of each data item in current statistical time section.By each data item in each storage equipment By statistics access information constitute it is each storage equipment in store multiple data item by statistics access information.
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.Example Such as, summary info are as follows: the C++ since 0 allows your 21 days association C++ this programming languages using straightaway introduction.
Determine that each storage is set according to the access information by statistics of the multiple data item stored in each storage equipment Standby access information statistics file includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is carried out Statistics is with the accessed number of each data item of determination;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as uniting Initial time is counted, the access end time accessed for the last time in the access information by statistics of each data item is determined To count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item;
Determine that each data item is storing according to the storage message file in the storage information area of each storage equipment Storage initial time in equipment.
Due to each access information packet by statistics for storing each data item in the multiple data item stored in equipment Include multiple log recordings, and each log recording represents data item and is accessed 1 time, thus by the quantity of log recording come Determine (always) the accessed number of each data item.For example, current statistical time section is the 00:00 on the 1st of August in 2018: The access that 00 to 2018 on August first time of 10,23:59:59, data item A in current statistical time section is accessed rises Time beginning is the 09:02:11 on the 1st of August in 2018, accesses 2018 end times August 09:05:36 on the 1st, and data item A is working as The access initial time that last time in preceding statistical time section is accessed is the 22:26:53 on the 10th of August in 2018, access 2018 end times August 22:27:39 on the 10th, then statistics initial time of the data item A in current statistical time section be The 09:02:11 on the 1st of August in 2018, and the end time is counted as the 22:27:39 on the 10th of August in 2018.
In addition, determining each data according to the sizes of memory in log recording arbitrary in the access information by statistics The sizes of memory of item.According to each data item recorded in the storage message file in the storage information area of each storage equipment The time in storage equipment is copied/moved to determine storage initial time of each data item in storage equipment.
In step 103, the access information statistics file of each storage equipment is parsed, by current statistical time area Number, which is accessed, in all data item of interior each storage equipment is determined as low frequency number less than the data item of low frequency frequency threshold value According to item, the low frequency term quantity of low-frequency data item included by each storage equipment is determined;By low frequency item number in multiple storage equipment The storage equipment that amount is greater than low frequency equipment threshold value is determined as low frequency storage equipment and determines low frequency storage in big data storage system The quantity of equipment;The storage equipment that low frequency term quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment simultaneously Determine the quantity of non-low frequency storage equipment in big data storage system.
Based on the access information statistics file of each low frequency storage equipment, each low frequency of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of data item and accessed number and the low-frequency data item of determining each low frequency storage equipment, Each low frequency storage respective all data item of equipment are determined based on the access information statistics file of each low frequency storage equipment Always accessed number.Setting in the identifier of equipment and the system log device of big data storage system is stored according to each low frequency Standby description information determines each low frequency storage respective total memory capacity of equipment.
Based on the access information statistics file of each non-low frequency storage equipment, each of each non-low frequency storage equipment is determined The low frequency of the sizes of memory of low-frequency data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Item quantity determines that each non-low frequency stores the respective institute of equipment based on the access information statistics file of each non-low frequency storage equipment There is total accessed number of data item.According to the system note of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in recording apparatus determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system, wherein DLB is that low frequency stores in big data storage system The coefficient of balance of equipment;LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN deposits for big data The quantity of low frequency storage equipment in storage system;LTSijThe storage ruler of j-th of low-frequency data item in equipment is stored for i-th of low frequency It is very little, LSiTotal sizes of memory of all low-frequency data items of equipment, LC are stored for i-th of low frequencyiEquipment is stored for i-th of low frequency Total memory capacity, LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is low for i-th Total accessed number of all low-frequency data items of frequency storage equipment, LTiAll data item of equipment are stored for i-th of low frequency Always accessed number;Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and And LTNi≥100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;NLTNmIt is non-low for m-th The low frequency term quantity of the low-frequency data item of frequency storage equipment, NLDN are the number that non-low frequency stores equipment in big data storage system Amount;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment, NLS are stored for m-th of non-low frequencymIt is deposited for m-th of non-low frequency Store up total sizes of memory of all low-frequency data items of equipment, NLCmThe total memory capacity of equipment is stored for m-th of non-low frequency, NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency storage Total accessed number of all low-frequency data items of equipment, NLTmThe total of all data item of equipment is stored for m-th of non-low frequency Accessed number;Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 are simultaneously And NLTNm≥50;And
When the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold, the big data storage is determined The data balancing of system is in non-equilibrium state.Non-equilibrium state is low frequency imbalance state.When putting down for big data storage system When the coefficient DE that weighs is less than or equal to system balancing coefficient threshold, determine that the data balancing of the big data storage system is in flat Weighing apparatus state.
Wherein, low frequency frequency threshold value is 100,120,150,175,200 or any reasonable value.
Device descriptive information in system log device includes: all storage equipment included by big data storage system Institute is added in total quantity, the total memory capacity of each storage equipment, the network address of each storage equipment and/or each storage equipment State the time of big data storage system.The total quantity of storage equipment included by big data storage system is big data storage system In it is all storage equipment total quantitys.The total memory capacity of each storage equipment is total appearance of the memory space of each storage equipment Amount or the total capacity that can be each memory space that can be used for storing data item for storing equipment.Each storage equipment Network address is, for example, IP address, MAC Address etc..The time that the big data storage system is added in each storage equipment refers to often The big data storage system is added to store number as the storage equipment in the big data storage system in a storage equipment According to the initial time of item.
Big data storage system further includes access recording equipment.The access description information accessed in recording equipment includes: big Total accessed number of the data-storage system within each consecutive days before current date.At the end of arbitrary consecutive days or When by arbitrary consecutive days, big data storage system can by just terminate or just past consecutive days in big data storage system Total accessed number of all storage equipment in system.In general, the access description information in access recording equipment can recorde big number According to total accessed number of the storage system within each consecutive days of the consecutive days of the predetermined quantity before current date (today). For example, the consecutive days of predetermined quantity were 800 consecutive days.
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item The free memory capacity of breath and each storage equipment.The total quantity of data item refers to all data item in each storage equipment Total quantity.The sizes of memory of each data item refers to sizes of memory or institute when each data item is stored in storing equipment The memory space of occupancy.The starting storage time of each data item refers to that each data item starts in the storage equipment belonged to The time of storage, for example, data item is copied to the time in storage equipment.The identifier of each data item can be data item Title, the coding of the unique identification of data item, data item etc. be capable of the information of unique identification data item.Each data item is plucked Want information for briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0 is used Straightaway introduction allows your 21 days association C++ this programming languages.The free memory capacity of each storage equipment refers to each The free memory capacity or residual storage capacity of new data item can be stored in storage equipment.
The low frequency equipment threshold value is any conjunctions such as 90,100,120,130,150,160,200,220,300,400,500 Manage numerical value.The system balancing coefficient threshold is any reasonable value such as 50%, 55%, 60%, 65%, 70%, 75%.
After the data balancing for determining the big data storage system is in non-equilibrium state, further includes:
It is true greater than 2 times of data item of low frequency frequency threshold value by number is accessed in all data item of each storage equipment It is set to data item to be selected to obtain multiple data item to be selected, and is made of multiple data item to be selected of each storage equipment respective All low-frequency data items of each storage equipment are constituted respective low-frequency data item set by collection of data items to be selected.For example, working as When low frequency frequency threshold value is 100, then number will be accessed in all data item of each storage equipment and be greater than 200 (100 × 2) Data item is determined as data item to be selected to obtain multiple data item to be selected.For example, when low frequency frequency threshold value is 100, then it will be every It is accessed multiple low-frequency data items of the number less than 100 in a storage equipment and constitutes low-frequency data item set, i.e., by each storage All low-frequency data items in equipment constitute low-frequency data item set.
The current storage equipment being directed in multiple storage equipment:
The quantity of low-frequency data item in the low-frequency data item set of current storage equipment is less than or equal to current It, will be current according to the ascending order sequence of accessed number when storing the quantity of the data item to be selected in the collection of data items to be selected of equipment Storage equipment low-frequency data item set in all low-frequency data items be ranked up to generate the first sorted lists, by first The 1st low-frequency data item is ordered as in sorted lists as current low-frequency data item.For example, when in low-frequency data item set When the quantity (for example, 326) of low-frequency data item is less than quantity (for example, 827) of the data item to be selected in collection of data items to be selected, All low-frequency data items in low-frequency data item set are arranged according to the ascending order sequence (sequence increased) of accessed number Sequence is to generate the first sorted lists.In the first sorted lists, the accessed number for the forward data item that sorts is fewer, and arranges The accessed number of the data item of sequence rearward is more.By be ordered as in the first sorted lists the 1st low-frequency data item (that is, by The least data item of access times or low-frequency data item) it is used as current low-frequency data item.
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected.The application can With use any existing text alignments determine current low-frequency data item summary info and collection of data items to be selected in Content matching degree between the summary info of each data item to be selected, wherein text alignments are, for example, to be based on semantic content ratio Pair content matching, the content matching that is compared based on keyword or the content matching that is combined based on semantic content and keyword. Wherein, the content matching degree of each data item to be selected and current low-frequency data item may be used to indicate that each data item to be selected and institute State close degree, similar degree, degree of correlation or the correlation degree of current low-frequency data item.
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data Item is saved in the idle storage space of current storage equipment.By in all data item to be selected of collection of data items to be selected with work as The maximum data item to be selected of the content matching degree of preceding low-frequency data item and current low-frequency data item carry out data item combination refer to by With the maximum data item to be selected of content matching degree and current low-frequency data item configuration file group of current low-frequency data item, and will With the summary info of the maximum data item to be selected of content matching degree of current low-frequency data item and the abstract of current low-frequency data item Information is merged with the summary info of configuration file group.Using the file group constituted the data item new as one, and will New data item is saved in the idle storage space of current storage equipment, i.e., in the memory space of no storing data item.
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item Data item.In the idle storage space that new data item (the file group constituted) is saved in current low frequency storage equipment Later, the maximum data item to be selected of content matching degree with current low-frequency data item is deleted from the collection of data items to be selected. In addition, from current low frequency storage equipment by with the maximum data item to be selected of the content matching degree of current low-frequency data item and work as Preceding low-frequency data entry deletion is (this is because with the maximum data item to be selected of content matching degree of current low-frequency data item and current low The file group that frequency data item is constituted has been saved in the idle storage space of current low frequency storage equipment).
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence , if it is present carrying out step 6.5;If it does not exist, then terminating.It determines in first sorted lists with the presence or absence of row Sequence 1 low-frequency data item after current low-frequency data item, which is meant that, to be determined in first sorted lists with the presence or absence of interviewed Ask that number is higher than current low-frequency data item and the low frequency number adjacent in the first sorted lists with the current low-frequency data item According to item.Such as, when current low-frequency data item is to be ordered as the 1st low-frequency data item, then sequence is 1 after current low-frequency data item The low-frequency data item of position is the low-frequency data item for being ordered as the 2nd, i.e. it is least to be accessed number second in the first sorted lists Low-frequency data item or data item.If it is present step 6.5 is carried out, if it does not exist, then terminating the above process.
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as Current low-frequency data item, carries out step 6.1.For example, the low-frequency data item selection for being ordered as the 2nd in the first sorted lists is made To carry out step 6.1 after current low-frequency data item, and so on, the 3rd, the 4th, the 5th will be ordered as in the first sorted lists Position ..., until last 1 low-frequency data item is selected as current low-frequency data item.
Alternatively, when the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current deposit When storing up the quantity of the data item to be selected in the collection of data items to be selected of equipment, by the low-frequency data item set of current storage equipment In all low-frequency data items be grouped to generate multiple low-frequency data item groups so that the multiple low-frequency data Xiang Zuzhong Total accessed number of all low-frequency data items is greater than 1.5 times of low frequency frequency threshold value in each low-frequency data item group, and determines The averagely accessed number of all low-frequency data items in each low-frequency data item group, wherein the average quilt of each low-frequency data item group The absolute value of difference between access times is less than 20.Preferably, plurality of low-frequency data Xiang Zuzhong any two low frequency number According to the absolute value of the difference between the averagely accessed number of item group less than any reasonable values such as 20,30,40,50,60,70.
For example, the quantity (for example, 569) when the low-frequency data item in low-frequency data item set is greater than collection of data items to be selected In data item to be selected quantity (for example, 516) when, 569 low-frequency data items in low-frequency data item set are grouped To generate multiple low-frequency data item groups.Wherein, the application according to the quantity K of the low-frequency data item in low-frequency data item set and point Parameter Z is organized to determine the number of packet G being grouped to low-frequency data item, whereinZ is equal to any conjunctions such as 3,4,5 Manage numerical value.When Z is equal to 5,569 low-frequency data items are divided into 113 low frequency numbers According to item group.
Additionally, the total of all low-frequency data items is accessed in each low-frequency data item group of multiple low-frequency data Xiang Zuzhong Number is greater than 1.1 times, 1.2 times, 1.3 times, 1.5 times or any reasonable value of low frequency frequency threshold value.Determine each low-frequency data Item organizes the averagely accessed number of interior all low-frequency data items, i.e., the averagely accessed number of each low-frequency data item group.For example, Low-frequency data item group includes low-frequency data item 1-5, and the accessed number of low-frequency data item 1-5 is 95,76,110,82 respectively With 102, then the averagely accessed number of all low-frequency data items is 93 in low-frequency data item group.Plurality of low-frequency data item group The absolute value of difference between the averagely accessed number of middle any two low-frequency data item group is less than 20,30,40,50,60,70 Etc. any reasonable value.
After the data balancing for determining the big data storage system is in non-equilibrium state, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area Each data access request carries out data access operation.For example, multiple data access requests in the buffer area of system buffer equipment Current Queue sequence are as follows: the first data access request, the second data access request, third data access request, the 4th data Access request and the 5th data access request are then visited according to the first data access request, the second data access request, third data Ask that the current Queue sequence of request, the 4th data access request and the 5th data access request visits each data in buffer area Ask that request carries out data access operation.
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain Querying condition.For example, when the first data access request in the buffer area for determining system buffer equipment, the second data access are asked Ask, third data access request, the 4th data access request and the 5th data access request have been processed, therefore system is slow Rush any data access request for not having in the buffer area of equipment and being saved.Then, to the big data storage system from number According to requesting party received 6th data access request parsed to obtain new querying condition.Wherein new querying condition is for example It is mobile communication and 5G and (uplink or downlink).
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition A data item, and determine at least one target storage device in big data storage system involved in multiple data item.Wherein, Catalogue storage server is used to store the directory information of all data item in big data storage system.For example, directory information is number According to the identifier of item, the summary info of data item, the metadata information of data item, the keyword message of data item, data item institute Storage equipment being located at etc..Catalogue storage server is according to querying condition or new querying condition to storage big data storage system Interior all data item are inquired, for example, in the summary info of data item, the metadata information of data item and/or data item It is looked into keyword message using new querying condition (for example, mobile communication and 5G and (uplink or downlink)) It askes, with multiple data item involved in the determination new querying condition.According to directory information determine each data item be located at, It is stored in or related storage equipment, thereby determines that at least one target storage device involved in multiple data item.? In special circumstances, multiple data item are likely located in same target storage device.
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device Close at least one data item of the new querying condition.Each target storage device is according to the new querying condition at itself It is retrieved in all data item stored, to obtain at least one data item, and by least one data obtained Item is sent to the interface equipment of big data storage system.Preferably, there is no redundancies in the big data storage system of the application Data item, i.e., each data item are unique.Wherein, interface equipment is used to receive data access request from request of data side, And interface equipment is used to collection of data items or target data item set being sent to corresponding request of data side.
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh Mark collection of data items is sent to request of data side indicated by the new data access request.Interface equipment will be from each target It stores the received all data item of equipment institute and forms target data item set, and interface equipment is by the target data item set It is sent to request of data side indicated by the new data access request.
Wherein according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment to buffer area In each data access request carry out data access operation include:
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area Sort primary data access request in current Queue sequence.As described above, for example, more in the buffer area of system buffer equipment The current Queue sequence of a data access request are as follows: the first data access request, the second data access request, third data access Request, the 4th data access request and the 5th data access request, then according to data multiple in the buffer area of system buffer equipment The current Queue sequence of access request determines that currently processed data access request is the first data access request.
8.2, currently processed data access request is parsed to obtain currently processed querying condition.Wherein data Access request or currently processed data access request include querying condition, therefore are carried out to currently processed data access request Parsing can obtain currently processed querying condition.Wherein currently processed querying condition is, for example, mobile communication and 5G and (on Line link or downlink).
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system Related multiple data item, and determine at least one target storage device involved in multiple data item.Wherein, catalogue stores Server is used to store the directory information of all data item in big data storage system.For example, directory information is the mark of data item What knowledge symbol, the summary info of data item, the metadata information of data item, the keyword message of data item, data item were located at deposits Store up equipment etc..Catalogue storage server is according to currently processed querying condition to all data item in storage big data storage system It is inquired, for example, in the keyword message of the summary info of data item, the metadata information of data item and/or data item It is inquired using currently processed querying condition (for example, mobile communication and 5G and (uplink or downlink)), with true Multiple data item involved in the fixed new querying condition.Determine that each data item is located at, is stored according to directory information In or related storage equipment, thereby determine that at least one target storage device involved in multiple data item.In special feelings Under condition, multiple data item are likely located in same target storage device.
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target Equipment receives at least one data item for meeting the currently processed querying condition.Each target storage device is worked as according to The querying condition of pre-treatment is retrieved in all data item itself stored, to obtain at least one data item, and At least one data item obtained is sent to the interface equipment of big data storage system.Preferably, the big data of the application The data item of redundancy is not present in storage system, i.e., each data item is unique.Wherein, interface equipment from data for asking The side of asking receives data access request, and interface equipment is for collection of data items or target data item set to be sent to accordingly Request of data side.
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute It states target data item set and is sent to request of data side indicated by the currently processed data access request.Interface equipment will Target data item set are formed from the received all data item of each target storage device institute, and interface equipment is by the target Collection of data items is sent to request of data side indicated by the new data access request.
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked Ask deletion.For example, the first data access request in the current Queue sequence of data access requests multiple in buffer area is deleted.
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited Ask request.
For example, in the buffer area of system buffer equipment multiple data access requests current Queue sequence are as follows: the first data Access request, the second data access request, third data access request, the 4th data access request and the 5th data access are asked It asks, and after deleting the first data access request in the current Queue sequence of data access requests multiple in buffer area, Then determine that there is any data access request being saved in the buffer area of system buffer equipment, i.e. the second data access request, Third data access request, the 4th data access request and the 5th data access request, then carry out step 801.
After deleting the 5th in the current Queue sequence of data access requests multiple in buffer area according to access request, then Determine do not have any data access request for being saved in the buffer area of system buffer equipment, i.e. the first data access request, Second data access request, third data access request, the 4th data access request and the 5th data access request complete Data access operation, it is determined that do not have any data access request being saved in the buffer area of system buffer equipment.Exist In the case where determining any data access request for not having in the buffer area of system buffer equipment and being saved, to the big data Storage system from arbitrary request of data side received new data access request parsed to obtain new querying condition, and Carry out respective handling.
In this application, identical if there is the accessed number of different data item or low-frequency data item, and need From data item or low-frequency data item select one as current data item or current low-frequency data item when, from accessed number It is selected at random in identical different data item or low-frequency data item.
Fig. 2 is the structural schematic diagram according to the big data storage system 200 of embodiment of the present invention.As shown in Fig. 2, big number It is attached and uses with big data storage system by communication link according to the data management apparatus 201 outside storage system 200 In being managed, safeguard to big data storage system, count, control.Data management apparatus 201 can be stored by big data Maintenance personnel, administrative staff or the operation personnel of system operate or control.For example, the maintenance personnel of big data storage system, Administrative staff or operation personnel can trigger the operation to big data storage system periodically or according to the actual operation The determination of state.
Big data storage system 200 include: interface equipment, catalogue storage server, system log device, access record set Standby, system buffer equipment (be not shown in Fig. 2 interface equipment, catalogue storage server, system log device, access recording equipment, System buffer equipment) and multiple storage equipment.
Interface equipment is used to receive data access request from request of data side, and interface equipment is used for collection of data items Or target data item set is sent to corresponding request of data side.In addition, interface equipment is also used to receive from data management apparatus For determining the request of the operating status of big data storage system, and by the operating status of big data storage system (for example, low Frequency operating status or non-low-frequency operation state) it is sent to data management apparatus.
Catalogue storage server is used to store the directory information of all data item in big data storage system.For example, catalogue Information be the identifier of data item, the summary info of data item, the metadata information of data item, data item keyword message, The storage equipment etc. that data item is located at.Catalogue storage server is according to querying condition or new querying condition to storage big data All data item are inquired in storage system, for example, the summary info of data item, the metadata information of data item and/or New querying condition (for example, mobile communication and 5G and (uplink or downlink)) is utilized in the keyword message of data item It is inquired, with multiple data item involved in the determination new querying condition.Each data item is determined according to directory information It is located at, is stored in or related storage equipment, thereby determines that the storage of at least one target involved in multiple data item Equipment.Under special circumstances, multiple data item are likely located in same target storage device.
System log device is for storing device descriptive information.Wherein, device descriptive information includes: big data storage system The total quantity of included all storage equipment, the total memory capacity of each storage equipment, each network address for storing equipment And/or the time of the big data storage system is added in each storage equipment.
Recording equipment is accessed for storing access description information.Wherein access description information includes: big data storage system Accessed number within each consecutive days before current date.
System buffer equipment is for storing multiple ephemeral data items and for being saved or being delayed to data access request Punching.Wherein ephemeral data item can be determined by data management apparatus 201, or received from 201 institute of data management apparatus Data item.Ephemeral data item is for that can not respond the data access request of data requesting party in big data storage system When, request of data side is provided to temporarily to be handled.New data access request can be included by system buffer equipment The description information of querying condition and the ephemeral data item set of the system buffer equipment in each ephemeral data item carry out Content matching selects content matching degree to be greater than with the content matching degree of each ephemeral data item of determination from multiple ephemeral data items At least one selected selected ephemeral data item is sent to institute by the selected ephemeral data item of at least one of matching threshold State request of data side indicated by new data access request, and system buffer equipment can be saved in buffer area it is described new Data access request.
It include multiple storage equipment in big data storage system.Also, each storage equipment can store multiple data item The memory capacity of each storage equipment can be arbitrary reasonable value.Each data item can be various types of data texts The data file of part, such as text type, audio types, video type etc..Wherein low-frequency data item refers in specific time Averagely accessed number of the accessed number lower than all data item of big data storage system, or the institute lower than storage equipment There is the data item of averagely accessed number of data item etc..Low frequency storage equipment for example refers to all data item in specific time Total accessed number all storage respective all data item of equipment lower than in big data storage system it is average total accessed Number.
Or number will be accessed less than low in all data item of each storage equipment in current statistical time section The data item of frequency frequency threshold value is determined as low-frequency data item, and low frequency term quantity in multiple storage equipment is greater than low frequency equipment The storage equipment of threshold value is determined as low frequency storage equipment.As shown in Fig. 2, on the dotted line where data management apparatus 201 Multiple storage equipment are non-low frequency storage equipment, and multiple storage equipment below the dotted line where data management apparatus 201 Equipment is stored for low frequency.The application calculates the coefficient of balance of low frequency storage equipment and big data storage system in big data storage system The coefficient of balance of non-low frequency storage equipment in system.The coefficient of balance of low frequency storage equipment can indicate low frequency in low frequency storage equipment The accessed number of data item and the ratio of sizes of memory.The coefficient of balance of non-low frequency storage equipment can indicate non-low frequency storage The accessed number of low-frequency data item and the ratio of sizes of memory in equipment.
In Fig. 2, hollow circle represents non-low frequency storage equipment, and indicates non-low-frequency data item ratio in the direction of the arrow It gradually rises.That is, the non-low frequency remoter with dotted line distance stores the accessed number and sizes of memory of low-frequency data item in equipment Ratio it is smaller, that is, the accessed number of non-low-frequency data item and the ratio of sizes of memory it is bigger.Hatched non-empty Heart circle represents low frequency storage equipment, and instruction low-frequency data item ratio gradually rises in the direction of the arrow.That is, with dotted line distance The accessed number of low-frequency data item and the ratio of sizes of memory are bigger in remoter low frequency storage equipment, that is, non-low frequency number It is smaller according to the accessed number of item and the ratio of sizes of memory.For example, when the ratio of the low-frequency data item in low frequency storage equipment When result compared with the ratio of the low-frequency data item in non-low frequency storage equipment is excessive, big data storage system there exist a possible indication that The data balancing of system is in non-equilibrium state, such as low frequency imbalance state, otherwise, there exist a possible indication that big data storage system Data balancing is in equilibrium state.
Fig. 3 is the system 300 according to the data balancing for determining big data storage system of embodiment of the present invention Structural schematic diagram.System 300 includes: pretreatment unit 301, statistic unit 302, computing unit 303,304 and of determination unit Adjustment unit 305.
Pretreatment unit 301 will in response to receiving the request for determining the data balancing of big data storage system The big data storage system from arbitrary request of data side received new data access request be redirected to the big number According to storage system system buffer equipment without by received new data access request be sent in multiple storage equipment Corresponding storage equipment, with the description letter for the querying condition for being included by new data access request by the system buffer equipment Breath carries out content matching with each ephemeral data item in the ephemeral data item set of the system buffer equipment to determine each The content matching degree of ephemeral data item selects at least one of content matching degree greater than matching threshold from multiple ephemeral data items At least one selected selected ephemeral data item is sent to the new data access request by selected ephemeral data item Indicated request of data side, and the new data access request is saved in the buffer area of the system buffer equipment.
Wherein, when the data management apparatus being located at outside big data storage system it needs to be determined that the number of big data storage system When according to balance, the data management apparatus sends the number for determining big data storage system to the big data storage system According to the request of balance.Data management apparatus outside big data storage system can be by the maintenance of big data storage system Personnel, administrative staff or operation personnel operate or control.For example, the maintenance personnel of big data storage system, administrative staff Or operation personnel can be triggered periodically or according to the actual operation to the data balancing of big data storage system It determines.It include multiple storage equipment in big data storage system, and can to store multiple data item each for each storage equipment The memory capacity of storage equipment can be arbitrary reasonable value.Each data item can be various types of data files, example Such as data file of text type, audio types, video type.Wherein low-frequency data item refers to interviewed in specific time Ask averagely accessed number of the number lower than all data item of big data storage system, or all data lower than storage equipment The data item of averagely accessed number of item etc..Low frequency storage equipment for example refers to the always quilt of all data item in specific time The average always accessed number of access times all storage respective all data item of equipment lower than in big data storage system.
Wherein by the big data storage system from arbitrary request of data side received new data access request weight Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more Corresponding storage equipment in a storage equipment includes:
The request for determining the data balancing of big data storage system is received with the big data storage system Moment, by the big data storage system then from arbitrary request of data side received new data access request weight Be directed to the system buffer equipment of the big data storage system without by received new data access request be sent to it is more Corresponding storage equipment in a storage equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data It include multiple ephemeral data items in item set, and each ephemeral data item has summary info, the summary info is for generally Introduce the content of ephemeral data item with including.
The request for determining the data balancing of big data storage system is received in the big data storage system Moment may receive multiple new data access requests.At this point, promoting big data storage system then from one Or multiple arbitrary request of data sides received all new data access requests be all redirected to big data storage The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding Store equipment.In general, big data storage system can the querying condition according to included by new data access request in the big number According to multiple data item involved in querying condition determining in the catalogue storage server of storage system, and determine multiple data item institutes At least one target storage device being related to.The currently processed querying condition is sent to each target storage device, and And at least one data item for meeting the currently processed querying condition is received from each target storage device.And in order to carry out The monitoring of the data balancing of big data storage system or when determining, big data storage system is by all new data access requests All it is redirected to the system buffer equipment of the big data storage system.Wherein system buffer equipment is located at big data storage system System is internal, and for store the ephemeral data item set including multiple ephemeral data items, or be used for data access request into Row buffering.Wherein querying condition is, for example, mobile communication and 5G and (uplink or downlink).In this case, it looks into The description information of inquiry condition is, for example, the uplink or downlink of 5G mobile communication.It include more in ephemeral data item set A ephemeral data item, and each ephemeral data item can be various types of data files, such as text type, audio class The data file of type, video type etc..Each ephemeral data item or each data item all have summary info and summary info For briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0, using popular easy The introduction understood allows your 21 days association C++ this programming languages.
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with It is each interim with determination that each ephemeral data item in the ephemeral data item set of the system buffer equipment carries out content matching The content matching degree of data item includes:
The description information for the querying condition for being included by new data access request by the system buffer equipment with it is described The summary info of each ephemeral data item in the ephemeral data item set of system buffer equipment compared based on semantic content Content matching, the content matching compared based on keyword or the content matching combined based on semantic content and keyword with true The content matching degree of fixed each ephemeral data item and the querying condition.The application can be used any existing text and compare other side Formula determines the description information of querying condition that new data access request is included and the ephemeral data item of system buffer equipment Content matching degree between the summary info of each ephemeral data item in set, wherein text alignments are, for example, to be based on language Content matching that adopted content compares, the content matching compared based on keyword or in being combined based on semantic content and keyword Hold matching.Wherein, the content matching degree of each ephemeral data item and the querying condition may be used to indicate that each ephemeral data Item close degree, similar degree, degree of correlation or correlation degree with the querying condition.
Wherein the matching degree threshold value is 55%, 60%, 65%, 70% or any reasonable value, and content matching degree Range be [0%, 100%], i.e. content matching degree can be any numerical value between from 0% to 100%.From multiple nonces According at least one the selected ephemeral data item for selecting content matching degree to be greater than matching degree threshold value in item, i.e., from multiple ephemeral datas Selection content matching degree is greater than 55%, 60%, 65% or 70% at least one selected ephemeral data item in.It will be selected At least one selected ephemeral data item be sent to request of data side indicated by the new data access request, and The new data access request is saved in the buffer area of the system buffer equipment.By it is selected at least one selected face When the data item purpose that is sent to request of data side indicated by the new data access request be to allow request of data side can Content relevant to data access request is obtained, in the case where big data storage system suspends data access service to promote to count According to requesting party it will be seen that related content.The new data access request is wherein saved in the buffer area of system buffer equipment Later further include: send to request of data side indicated by the new data access request for showing the big data storage System halt data access and the new data access request have been saved in the buffer area of the system buffer equipment Response message, and in the response message carry for showing the new data access request from request of data side in institute The information of the current Queue sequence in buffer area is stated, wherein being saved according to new data access request in the buffer area Time span determine current Queue sequence of the new data access request in the buffer area, and it is suitable being currently lined up New data access request is ranked up according to the descending order for the time span being saved in sequence.That is, the time being saved Length is longer, then the current Queue sequence of new data access request is more forward.Preferably, it is asked to the new data access Indicated request of data side is asked to send for showing the big data storage system pause data access and the new data Access request has been saved to after the response message in the buffer area of the system buffer equipment further include: periodically to Request of data side indicated by the new data access request is sent for showing that the new data from request of data side are visited Ask the notification message for requesting the current Queue sequence in the buffer area.
Statistic unit 302 is not currently running in determining all storage equipment in the big data storage system Data access operation when, obtain in the big data storage system it is multiple storage equipment in it is each storage equipment running logs File, and determined in each storage equipment based on the running log file in current statistical time section and each storage equipment The access information by statistics of multiple data item of storage, according to the process of the multiple data item stored in each storage equipment The access information of statistics determines the access information statistics file of each storage equipment;Wherein the access information statistics file includes Data item statistical form, the data item statistical form includes multiple data item records, wherein the content of each data item record is 6 yuan Group < data item identifier, accessed number, statistics initial time, statistics end time, sizes of memory, storage initial time >。
The data access operation being wherein currently running refers to that storage equipment is looked into according to transmitted by big data storage system Inquiry condition carries out data retrieval in the memory space of itself, will constitute item set by data retrieval data item obtained It closes, collection of data items is sent to the operation processing of request of data side by big data storage system.
Wherein respective running log file is saved in the system data region of each storage equipment.Wherein running log File includes multiple log recordings, wherein each log recording includes: the identifier of data item, access initial time, access knot Beam time, sizes of memory and storage initial time.Wherein the identifier of data item can be the title of data item, data item only One mark, coding of data item etc. are capable of the information of unique identification data item.Access initial time refers to that current log records institute The accessed initial time of the data item being related to.It is accessed that the access end time refers to that current log records related data item End time.For example, may be related to the operation such as reading, modify when accessing to the data item in storage equipment, visit It asks initial time and accesses the end time for indicating initial time and the end time of this operation.Sizes of memory is data item Sizes of memory in storage equipment.Storage initial time is that data item starts to deposit in storage equipment or big data storage system The initial time of storage, that is, data item is saved in storage equipment or big data storage system to provide the starting of access service Time.In this application, access includes reading and/or modifying.
Wherein current statistical time section is that big data storage system is received for determining big data storage system The proxima luce (prox. luc) of locating current date starts and at one section of consecutive days of predetermined quantity forward when the request of data balancing Between;Wherein the consecutive days of predetermined quantity are 10 consecutive days, 20 consecutive days or 30 consecutive days.For example, big data storage system It is the 11:25 on the 11st of August in 2018 that system, which was received for the time for the request for determining the data balancing of big data storage system: 36, then big data storage system receives locating current when the request for determining the data balancing of big data storage system Date is on August 11st, 2018.Big data storage system receives the data balancing for determining big data storage system The proxima luce (prox. luc) of locating current date is on August 10th, 2018 when request.Current statistical time section is big data storage system The proxima luce (prox. luc) of system current date locating when receiving the request for determining the data balancing of big data storage system starts And forward a period of time of the consecutive days of predetermined quantity (for example, 10 natural numbers), i.e., current statistical time section is 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th.
Wherein determine that each storage is set based on the running log file in current statistical time section and each storage equipment The access information by statistics of multiple data item of standby middle storage includes:
Based on current statistical time section to it is each storage equipment running log file in all log recordings into Row is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section, To obtain the access information by statistics of each data item;
The multiple data item stored in each storage equipment are made of the access information by statistics of each data item By the access information of statistics.
For example, current statistical time section is 00:00:00 to 2018 years on the 1st August 23:59:59 on the 10th of August in 2018, That is 10 consecutive days, then based on 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th to each storage equipment Running log file in all log recordings chosen to obtain each storage equipment in the 00:00 on the 1st of August in 2018: All log recordings in 00 to 2018 on August, 10,23:59:59.According to data item (for example, identifier of data item) to every Multiple log recordings of a storage equipment in 00:00:00 to 2018 years on the 1st August of August in 2018 23:59:59 on the 10th are divided Class, to obtain the access information by statistics of each data item.Each data item by statistics access information be, for example, All accessed information of each data item in current statistical time section.By each data item in each storage equipment By statistics access information constitute it is each storage equipment in store multiple data item by statistics access information.
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.Example Such as, summary info are as follows: the C++ since 0 allows your 21 days association C++ this programming languages using straightaway introduction.
Determine that each storage is set according to the access information by statistics of the multiple data item stored in each storage equipment Standby access information statistics file includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is carried out Statistics is with the accessed number of each data item of determination;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as uniting Initial time is counted, the access end time accessed for the last time in the access information by statistics of each data item is determined To count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item;
Determine that each data item is storing according to the storage message file in the storage information area of each storage equipment Storage initial time in equipment.
Due to each access information packet by statistics for storing each data item in the multiple data item stored in equipment Include multiple log recordings, and each log recording represents data item and is accessed 1 time, thus by the quantity of log recording come Determine (always) the accessed number of each data item.For example, current statistical time section is the 00:00 on the 1st of August in 2018: The access that 00 to 2018 on August first time of 10,23:59:59, data item A in current statistical time section is accessed rises Time beginning is the 09:02:11 on the 1st of August in 2018, accesses 2018 end times August 09:05:36 on the 1st, and data item A is working as The access initial time that last time in preceding statistical time section is accessed is the 22:26:53 on the 10th of August in 2018, access 2018 end times August 22:27:39 on the 10th, then statistics initial time of the data item A in current statistical time section be The 09:02:11 on the 1st of August in 2018, and the end time is counted as the 22:27:39 on the 10th of August in 2018.
In addition, determining each data according to the sizes of memory in log recording arbitrary in the access information by statistics The sizes of memory of item.According to each data item recorded in the storage message file in the storage information area of each storage equipment The time in storage equipment is copied/moved to determine storage initial time of each data item in storage equipment.
Computing unit 303 parses the access information statistics file of each storage equipment, by current statistical time Number, which is accessed, in all data item of each storage equipment in section is determined as low frequency less than the data item of low frequency frequency threshold value Data item determines the low frequency term quantity of low-frequency data item included by each storage equipment;By low frequency term in multiple storage equipment The storage equipment that quantity is greater than low frequency equipment threshold value is determined as low frequency storage equipment and determines that low frequency is deposited in big data storage system Store up the quantity of equipment;The storage equipment that low frequency term quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment And determine the quantity of non-low frequency storage equipment in big data storage system.
Based on the access information statistics file of each low frequency storage equipment, each low frequency of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of data item and accessed number and the low-frequency data item of determining each low frequency storage equipment, Each low frequency storage respective all data item of equipment are determined based on the access information statistics file of each low frequency storage equipment Always accessed number.Setting in the identifier of equipment and the system log device of big data storage system is stored according to each low frequency Standby description information determines each low frequency storage respective total memory capacity of equipment.
Based on the access information statistics file of each non-low frequency storage equipment, each of each non-low frequency storage equipment is determined The low frequency of the sizes of memory of low-frequency data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Item quantity determines that each non-low frequency stores the respective institute of equipment based on the access information statistics file of each non-low frequency storage equipment There is total accessed number of data item.According to the system note of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in recording apparatus determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system, wherein DLB is that low frequency stores in big data storage system The coefficient of balance of equipment;LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN deposits for big data The quantity of low frequency storage equipment in storage system;LTSijThe storage ruler of j-th of low-frequency data item in equipment is stored for i-th of low frequency It is very little, LSiTotal sizes of memory of all low-frequency data items of equipment, LC are stored for i-th of low frequencyiEquipment is stored for i-th of low frequency Total memory capacity, LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is low for i-th Total accessed number of all low-frequency data items of frequency storage equipment, LTiAll data item of equipment are stored for i-th of low frequency Always accessed number;Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and And LTNi≥100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;NLTNmIt is non-low for m-th The low frequency term quantity of the low-frequency data item of frequency storage equipment, NLDN are the number that non-low frequency stores equipment in big data storage system Amount;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment, NLS are stored for m-th of non-low frequencymIt is deposited for m-th of non-low frequency Store up total sizes of memory of all low-frequency data items of equipment, NLCmThe total memory capacity of equipment is stored for m-th of non-low frequency, NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency storage Total accessed number of all low-frequency data items of equipment, NLTmThe total of all data item of equipment is stored for m-th of non-low frequency Accessed number;Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 are simultaneously And NLTNm≥50;And
Determination unit 304 determines institute when the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold The data balancing for stating big data storage system is in non-equilibrium state.Non-equilibrium state is low frequency imbalance state.Work as big data When the coefficient of balance DE of storage system is less than or equal to system balancing coefficient threshold, the data of the big data storage system are determined Balance is in equilibrium state.Wherein, low frequency frequency threshold value is 100,120,150,175,200 or any reasonable value.
Device descriptive information in the system log device includes: that all storages included by big data storage system are set The standby total memory capacity of total quantity, each storage equipment, the network address of each storage equipment and/or each storage equipment adds Enter the time of the big data storage system.The total quantity of storage equipment included by big data storage system is big data storage The total quantity of all storage equipment in system.The total memory capacity of each storage equipment is the memory space of each storage equipment Total capacity or the total capacity that can be each memory space that can be used for storing data item for storing equipment.Each storage is set Standby network address is, for example, IP address, MAC Address etc..Each storage equipment, which is added time of the big data storage system, is Refer to that the big data storage system is added to deposit as the storage equipment in the big data storage system in each storage equipment Store up the initial time of data item.
Big data storage system further includes access recording equipment.The access description information accessed in recording equipment includes: big Total accessed number of the data-storage system within each consecutive days before current date.At the end of arbitrary consecutive days or When by arbitrary consecutive days, big data storage system can by just terminate or just past consecutive days in big data storage system Total accessed number of all storage equipment in system.In general, the access description information in access recording equipment can recorde big number According to total accessed number of the storage system within each consecutive days of the consecutive days of the predetermined quantity before current date (today). For example, the consecutive days of predetermined quantity were 800 consecutive days.
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, every number Believe according to the abstract of the sizes of memory of item, the starting storage time of each data item, the identifier of each data item, each data item The free memory capacity of breath and each storage equipment.The total quantity of data item refers to all data item in each storage equipment Total quantity.The sizes of memory of each data item refers to sizes of memory or institute when each data item is stored in storing equipment The memory space of occupancy.The starting storage time of each data item refers to that each data item starts in the storage equipment belonged to The time of storage, for example, data item is copied to the time in storage equipment.The identifier of each data item can be data item Title, the coding of the unique identification of data item, data item etc. be capable of the information of unique identification data item.Each data item is plucked Want information for briefly introducing the content of ephemeral data item or data item.For example, summary info are as follows: the C++ since 0 is used Straightaway introduction allows your 21 days association C++ this programming languages.The free memory capacity of each storage equipment refers to each The free memory capacity or residual storage capacity of new data item can be stored in storage equipment.
The low frequency equipment threshold value is any conjunctions such as 90,100,120,130,150,160,200,220,300,400,500 Manage numerical value.The system balancing coefficient threshold is any reasonable value such as 50%, 55%, 60%, 65%, 70%, 75%.
It further include using adjustment after the data balancing for determining the big data storage system is in non-equilibrium state It is true greater than 2 times of data item of low frequency frequency threshold value that unit 305 will be accessed number in all data item of each storage equipment It is set to data item to be selected to obtain multiple data item to be selected, and is made of multiple data item to be selected of each storage equipment respective All low-frequency data items of each storage equipment are constituted respective low-frequency data item set by collection of data items to be selected.For example, working as When low frequency frequency threshold value is 100, then number will be accessed in all data item of each storage equipment and be greater than 200 (100 × 2) Data item is determined as data item to be selected to obtain multiple data item to be selected.For example, when low frequency frequency threshold value is 100, then it will be every It is accessed multiple low-frequency data items of the number less than 100 in a storage equipment and constitutes low-frequency data item set, i.e., by each storage All low-frequency data items in equipment constitute low-frequency data item set.
The current storage equipment being directed in multiple storage equipment:
The quantity of low-frequency data item in the low-frequency data item set of current storage equipment is less than or equal to current It, will be current according to the ascending order sequence of accessed number when storing the quantity of the data item to be selected in the collection of data items to be selected of equipment Storage equipment low-frequency data item set in all low-frequency data items be ranked up to generate the first sorted lists, by first The 1st low-frequency data item is ordered as in sorted lists as current low-frequency data item.
6.1, summary info based on current low-frequency data item is plucked with each data item to be selected in collection of data items to be selected Information is wanted to carry out content matching, with the content matching degree of determination current low-frequency data item and each data item to be selected.
6.2, by all data item to be selected of collection of data items to be selected with the content matching degree of current low-frequency data item most Big data item to be selected and current low-frequency data item carry out data item combination, to form a new data item, by new data Item is saved in the idle storage space of current storage equipment.
6.3, it is deleted from the collection of data items to be selected maximum to be selected with the content matching degree of current low-frequency data item Data item.
6.4,1 low-frequency data after current low-frequency data item is determined in first sorted lists with the presence or absence of sequence , if it is present carrying out step 6.5;If it does not exist, then terminating.
6.5, sequence 1 low-frequency data item after current low-frequency data item in first sorted lists is selected as Current low-frequency data item, carries out step 6.1.
Alternatively, when the quantity of the low-frequency data item in the low-frequency data item set of current storage equipment is greater than current deposit When storing up the quantity of the data item to be selected in the collection of data items to be selected of equipment, by the low-frequency data item set of current storage equipment In all low-frequency data items be grouped to generate multiple low-frequency data item groups so that the multiple low-frequency data Xiang Zuzhong Total accessed number of all low-frequency data items is greater than 1.5 times of low frequency frequency threshold value in each low-frequency data item group, and determines The averagely accessed number of all low-frequency data items in each low-frequency data item group, wherein the average quilt of each low-frequency data item group The absolute value of difference between access times is less than 20.Preferably, plurality of low-frequency data Xiang Zuzhong any two low frequency number According to the absolute value of the difference between the averagely accessed number of item group less than any reasonable values such as 20,30,40,50,60,70.
After the data balancing for determining the big data storage system is in non-equilibrium state, further includes:
According to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment in buffer area Each data access request carries out data access operation.
It is right in the case where not having any data access request being saved in the buffer area for determining system buffer equipment The big data storage system from arbitrary request of data side received new data access request parsed it is new to obtain Querying condition.
It is determined in the catalogue storage server of the big data storage system more involved in the new querying condition A data item, and determine at least one target storage device in big data storage system involved in multiple data item.
The new querying condition is sent to each target storage device, and receives and accords with from each target storage device Close at least one data item of the new querying condition.
Target data item set will be formed from the received all data item of each target storage device institute, and by the mesh Mark collection of data items is sent to request of data side indicated by the new data access request.
Wherein according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment to buffer area In each data access request carry out data access operation include:
8.1, it is determined according to the current Queue sequence of data access requests multiple in the buffer area of system buffer equipment current The data access request of processing, wherein the currently processed data access request is multiple data access requests in buffer area Sort primary data access request in current Queue sequence.
8.2, currently processed data access request is parsed to obtain currently processed querying condition.
8.3, the currently processed querying condition is determined in the catalogue storage server of the big data storage system Related multiple data item, and determine at least one target storage device involved in multiple data item.
8.4, the currently processed querying condition is sent to each target storage device, and is stored from each target Equipment receives at least one data item for meeting the currently processed querying condition.
8.5, target data item set will be formed from the received all data item of each target storage device institute, and by institute It states target data item set and is sent to request of data side indicated by the currently processed data access request.
8.6, the primary data access that sorts in the current Queue sequence of data access requests multiple in buffer area is asked Ask deletion.
8.7, determine in the buffer area of system buffer equipment whether there is any data access request being saved, if It is then to carry out step 8.1;If it is not, then determining that any data for not having in the buffer area of system buffer equipment and being saved are visited Ask request.

Claims (10)

1. a kind of method for determining the data balancing of big data storage system, which comprises
In response to receiving the request for determining the data balancing of big data storage system, by the big data storage system From arbitrary request of data side received new data access request be redirected to the big data storage system system it is slow Rush equipment without by received new data access request be sent to it is multiple storage equipment in corresponding storage equipment, with by The system buffer equipment sets the description information for the querying condition that new data access request is included and the system buffer Each ephemeral data item in standby ephemeral data item set carries out content matching with the content of each ephemeral data item of determination With degree, content matching degree is selected to be greater than at least one selected ephemeral data item of matching threshold from multiple ephemeral data items, At least one selected selected ephemeral data item is sent to request of data indicated by the new data access request Side, and the new data access request is saved in the buffer area of the system buffer equipment;
When the data access operation not being currently running in determining all storage equipment in the big data storage system, The running log file of each storage equipment in multiple storage equipment in the big data storage system is obtained, and based on current Statistical time section and the running log file of each storage equipment determine multiple data item for storing in each storage equipment By statistics access information, according to it is each storage equipment in store multiple data item by statistics access information it is true The access information statistics file of fixed each storage equipment;Wherein the access information statistics file includes data item statistical form, institute Stating data item statistical form includes multiple data item records, wherein the content of each data item record is 6 tuples < data item mark Symbol, accessed number, statistics initial time, the statistics end time, sizes of memory, storage initial time >;
The access information statistics file of each storage equipment is parsed, storage each in current statistical time section is set It is accessed number in standby all data item and is determined as low-frequency data item less than the data item of low frequency frequency threshold value, determination is each deposited Store up the low frequency term quantity of low-frequency data item included by equipment;Low frequency term quantity in multiple storage equipment is greater than low frequency equipment threshold The storage equipment of value is determined as low frequency storage equipment and determines the quantity of low frequency storage equipment in big data storage system;By low frequency The storage equipment that item quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment and determines that big data storage is The quantity of non-low frequency storage equipment in system;
Based on the access information statistics file of each low frequency storage equipment, each low-frequency data of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of item and accessed number and the low-frequency data item of determining each low frequency storage equipment, is based on The access information statistics file of each low frequency storage equipment determines the always quilt of each low frequency storage respective all data item of equipment Access times;The equipment stored in the identifier of equipment and the system log device of big data storage system according to each low frequency is retouched It states information and determines each low frequency storage respective total memory capacity of equipment,
Based on the access information statistics file of each non-low frequency storage equipment, each low frequency of each non-low frequency storage equipment is determined The low frequency item number of the sizes of memory of data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Amount determines that each non-low frequency stores the respective all numbers of equipment based on the access information statistics file of each non-low frequency storage equipment According to total accessed number of item;It is set according to the system record of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in standby determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system,
Wherein, DLB is the coefficient of balance of low frequency storage equipment in big data storage system;
LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN is low frequency in big data storage system Store the quantity of equipment;LTSijThe sizes of memory of j-th of low-frequency data item in equipment, LS are stored for i-th of low frequencyiIt is i-th Low frequency stores total sizes of memory of all low-frequency data items of equipment, LCiThe total memory capacity of equipment is stored for i-th of low frequency,
LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is set for i-th of low frequency storage Total accessed number of standby all low-frequency data items, LTiThe total accessed of all data item of equipment is stored for i-th of low frequency Number;
Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and LTNi≥ 100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;
NLTNmThe low frequency term quantity of the low-frequency data item of equipment is stored for m-th of non-low frequency, NLDN is in big data storage system The quantity of non-low frequency storage equipment;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment is stored for m-th of non-low frequency, NLSmTotal sizes of memory of all low-frequency data items of equipment, NLC are stored for m-th of non-low frequencymIt is set for m-th of non-low frequency storage Standby total memory capacity,
NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency Store total accessed number of all low-frequency data items of equipment, NLTmAll data item of equipment are stored for m-th of non-low frequency Total accessed number;
Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 and NLTNm≥ 50;And
When the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold, the big data storage system is determined Data balancing be in non-equilibrium state.
2. according to the method described in claim 1, wherein, when the data management apparatus being located at outside big data storage system needs When determining the data balancing of big data storage system, the data management apparatus is used for big data storage system transmission Determine the request of the data balancing of big data storage system;
Wherein by the big data storage system from arbitrary request of data side received new data access request redirect To the big data storage system system buffer equipment without by received new data access request be sent to multiple deposit Storage equipment in corresponding storage equipment include:
At the time of receiving the request for determining the data balancing of big data storage system with the big data storage system Start, by the big data storage system then from the received new data access request redirection of institute of arbitrary request of data side To the big data storage system system buffer equipment without by received new data access request be sent to multiple deposit Store up the corresponding storage equipment in equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data item collection It include multiple ephemeral data items in conjunction, and each ephemeral data item has summary info, the summary info is for briefly Introduce the content of ephemeral data item;
The description information for the querying condition for wherein being included by new data access request by the system buffer equipment with it is described Each ephemeral data item in the ephemeral data item set of system buffer equipment carries out content matching with each ephemeral data of determination Content matching degree include:
By the description information and the system of the querying condition that new data access request is included by the system buffer equipment The summary info for buffering each ephemeral data item in the ephemeral data item set of equipment is carried out based in semantic content comparison It is every to determine to hold matching, the content matching compared based on keyword or the content matching that combines based on semantic content and keyword The content matching degree of a ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
After wherein saving the new data access request in the buffer area of the system buffer equipment further include: to described Request of data side indicated by new data access request is sent for showing the big data storage system pause data access The response message in the buffer area of the system buffer equipment, and institute are had been saved to the new data access request It states in response message and carries for showing that the new data access request from request of data side is current in the buffer area The information of Queue sequence, wherein being determined in the buffer area according to the time span of new data access request being saved Current Queue sequence of the new data access request in the buffer area, and according to being saved in current Queue sequence The descending order of time span is ranked up new data access request.
3. method described in any one of -2 according to claim 1, wherein in the system data region of each storage equipment Save running log file;
Wherein current statistical time section is that big data storage system receives the data for determining big data storage system The proxima luce (prox. luc) of locating current date starts and a period of time of the consecutive days of predetermined quantity forward when the request of balance;Its The consecutive days of middle predetermined quantity are 10 consecutive days, 20 consecutive days or 30 consecutive days;Wherein based on current statistical time The running log file of section and each storage equipment determines that the passing through for multiple data item stored in each storage equipment counts Access information include:
All log recordings in the running log file of each storage equipment are selected based on current statistical time section It takes to obtain multiple log recordings of each storage equipment in current statistical time section;
Classify according to data item to multiple log recordings of each storage equipment in current statistical time section, to obtain Obtain the access information by statistics of each data item;
The process of the multiple data item stored in each storage equipment is made of the access information by statistics of each data item The access information of statistics;
Wherein each log recording include: data item identifier, access initial time, access the end time, sizes of memory and Store initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
4. method described in any one of -3 according to claim 1,
Each storage equipment is determined according to the access information by statistics of the multiple data item stored in each storage equipment Access information statistics file includes:
The access information by statistics of each data item in the multiple data item stored in each storage equipment is counted With the accessed number of each data item of determination;
Access initial time accessed for the first time in the access information by statistics of each data item is determined as counting Begin the time, the access end time accessed for the last time in the access information by statistics of each data item is determined as uniting Count the end time;
The sizes of memory of each data item is determined based on the access information by statistics of each data item;
Determined each data item in storage equipment according to the storage message file in the storage information area of each storage equipment In storage initial time.
5. method described in any one of -4 according to claim 1,
The low frequency frequency threshold value is 100,120,150 or 200;
Device descriptive information in the system log device includes: all storage equipment included by big data storage system Institute is added in total quantity, the total memory capacity of each storage equipment, the network address of each storage equipment and/or each storage equipment State the time of big data storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, each data item Sizes of memory, the starting storage time of each data item, the identifier of each data item, each data item summary info with And the free memory capacity of each storage equipment;
The low frequency equipment threshold value is 100,120,150,200,300,400 or 500;The system balancing coefficient threshold is 50%, 55%, 60%, 65% or 70%.
6. a kind of system for determining the data balancing of big data storage system, the system comprises:
Pretreatment unit will be described big in response to receiving the request for determining the data balancing of big data storage system Data-storage system from arbitrary request of data side received new data access request be redirected to big data storage The system buffer equipment of system without by received new data access request be sent to it is multiple storage equipment in it is corresponding Equipment is stored, with the description information of the querying condition that is included by new data access request by the system buffer equipment and institute It states each ephemeral data item in the ephemeral data item set of system buffer equipment and carries out content matching with each nonce of determination According to the content matching degree of item, selected from multiple ephemeral data items content matching degree be greater than matching threshold at least one is selected At least one selected selected ephemeral data item is sent to indicated by the new data access request by ephemeral data item Request of data side, and the new data access request is saved in the buffer area of the system buffer equipment;
Statistic unit, the data not being currently running in determining all storage equipment in the big data storage system are visited When asking operation, the running log file of each storage equipment in multiple storage equipment in the big data storage system is obtained, and And it is determined based on the running log file in current statistical time section and each storage equipment and to be stored in each storage equipment Multiple data item by statistics access information, according to it is each storage equipment in store multiple data item by statistics Access information determines the access information statistics file of each storage equipment;Wherein the access information statistics file includes data item Statistical form, the data item statistical form includes multiple data item records, wherein the content of each data item record is 6 tuples < number According to the identifier of item, accessed number, statistics initial time, the statistics end time, sizes of memory, storage initial time >;
Computing unit parses the access information statistics file of each storage equipment, will be in current statistical time section Number, which is accessed, in all data item of each storage equipment is determined as low-frequency data item less than the data item of low frequency frequency threshold value, Determine the low frequency term quantity of low-frequency data item included by each storage equipment;Low frequency term quantity in multiple storage equipment is greater than The storage equipment of low frequency equipment threshold value is determined as low frequency storage equipment and determines low frequency storage equipment in big data storage system Quantity;The storage equipment that low frequency term quantity is less than or equal to low frequency equipment threshold value is determined as non-low frequency storage equipment and is determined big The quantity of non-low frequency storage equipment in data-storage system;
Based on the access information statistics file of each low frequency storage equipment, each low-frequency data of each low frequency storage equipment is determined The low frequency term quantity of the sizes of memory of item and accessed number and the low-frequency data item of determining each low frequency storage equipment, is based on The access information statistics file of each low frequency storage equipment determines the always quilt of each low frequency storage respective all data item of equipment Access times;The equipment stored in the identifier of equipment and the system log device of big data storage system according to each low frequency is retouched It states information and determines each low frequency storage respective total memory capacity of equipment,
Based on the access information statistics file of each non-low frequency storage equipment, each low frequency of each non-low frequency storage equipment is determined The low frequency item number of the sizes of memory of data item and accessed number and the low-frequency data item of determining each non-low frequency storage equipment Amount determines that each non-low frequency stores the respective all numbers of equipment based on the access information statistics file of each non-low frequency storage equipment According to total accessed number of item;It is set according to the system record of the identifier of each non-low frequency storage equipment and big data storage system Device descriptive information in standby determines each non-low frequency storage respective total memory capacity of equipment;
Calculate the coefficient of balance of big data storage system:
Wherein, DE is the coefficient of balance of big data storage system,
Wherein, DLB is the coefficient of balance of low frequency storage equipment in big data storage system;
LTNiThe low frequency term quantity of the low-frequency data item of equipment is stored for i-th of low frequency, LDN is low frequency in big data storage system Store the quantity of equipment;LTSijThe sizes of memory of j-th of low-frequency data item in equipment, LS are stored for i-th of low frequencyiIt is i-th Low frequency stores total sizes of memory of all low-frequency data items of equipment, LCiThe total memory capacity of equipment is stored for i-th of low frequency,
LTAijThe accessed number of j-th of low-frequency data item in equipment, LA are stored for i-th of low frequencyiIt is set for i-th of low frequency storage Total accessed number of standby all low-frequency data items, LTiThe total accessed of all data item of equipment is stored for i-th of low frequency Number;
Wherein i is natural number, and LDN >=i >=1 and j are natural number, LTNi>=j >=1, wherein LDN >=100, and LTNi≥ 100;
Wherein, NDLB is the coefficient of balance of non-low frequency storage equipment in big data storage system;
NLTNmThe low frequency term quantity of the low-frequency data item of equipment is stored for m-th of non-low frequency, NLDN is in big data storage system The quantity of non-low frequency storage equipment;NLTSmnThe sizes of memory of n-th of low-frequency data item in equipment is stored for m-th of non-low frequency, NLSmTotal sizes of memory of all low-frequency data items of equipment, NLC are stored for m-th of non-low frequencymIt is set for m-th of non-low frequency storage Standby total memory capacity,
NLTAmnThe accessed number of n-th of low-frequency data item in equipment, NLA are stored for m-th of non-low frequencymFor m-th of non-low frequency Store total accessed number of all low-frequency data items of equipment, NLTmAll data item of equipment are stored for m-th of non-low frequency Total accessed number;
Wherein m is natural number, and NLDN >=m >=1 and n are natural number, NLTNm>=n >=1, wherein NLDN >=100 and NLTNm≥ 50;And
Determination unit determines the big number when the coefficient of balance DE of big data storage system is greater than system balancing coefficient threshold Non-equilibrium state is according to the data balancing of storage system.
7. system according to claim 6, wherein when the data management apparatus being located at outside big data storage system needs When determining the data balancing of big data storage system, the data management apparatus is used for big data storage system transmission Determine the request of the data balancing of big data storage system;
Wherein pretreatment unit by the big data storage system from arbitrary request of data side received new data access Request is redirected to the system buffer equipment of the big data storage system without by the received new data access request hair of institute Giving multiple corresponding storage equipment stored in equipment includes:
Pretreatment unit receives the data balancing for determining big data storage system with the big data storage system At the time of request, by the big data storage system then from arbitrary request of data side received new data access Request is redirected to the system buffer equipment of the big data storage system without by the received new data access request hair of institute Give the corresponding storage equipment in multiple storage equipment;
Wherein the new data access request includes the description information of querying condition and querying condition, the ephemeral data item collection It include multiple ephemeral data items in conjunction, and each ephemeral data item has summary info, the summary info is for briefly Introduce the content of ephemeral data item;
The description for the querying condition that wherein new data access request is included by pretreatment unit by the system buffer equipment It is every to determine that each ephemeral data item in the ephemeral data item set of information and the system buffer equipment carries out content matching The content matching degree of a ephemeral data item includes:
The description information for the querying condition that new data access request is included by pretreatment unit by the system buffer equipment It carries out with the summary info of each ephemeral data item in the ephemeral data item set of the system buffer equipment based in semanteme Hold the content matching compared, the content matching based on keyword comparison or the content combined based on semantic content and keyword It is equipped with the content matching degree for determining each ephemeral data item and the querying condition;
Wherein the matching degree threshold value is 60%, and the range of content matching degree is [0%, 100%];
Wherein pretreatment unit is gone back after saving the new data access request in the buffer area of the system buffer equipment It include: to send to request of data side indicated by the new data access request for showing that the big data storage system is temporary Stop data access and the new data access request has been saved to response in the buffer area of the system buffer equipment Message, and carry in the response message for showing the new data access request from request of data side in the buffering The information of current Queue sequence in area, wherein according to the time of new data access request being saved in the buffer area Length determines current Queue sequence of the new data access request in the buffer area, and presses in current Queue sequence New data access request is ranked up according to the descending order for the time span being saved.
8. the system according to any one of claim 6-7, wherein in the system data region of each storage equipment Save running log file;
Wherein current statistical time section is that big data storage system receives the data for determining big data storage system The proxima luce (prox. luc) of locating current date starts and a period of time of the consecutive days of predetermined quantity forward when the request of balance;Its The consecutive days of middle predetermined quantity are 10 consecutive days, 20 consecutive days or 30 consecutive days;Wherein statistic unit is based on current The running log file of statistical time section and each storage equipment determines the multiple data item stored in each storage equipment Include: by the access information of statistics
Statistic unit remembers all logs in the running log file of each storage equipment based on current statistical time section Record is chosen to obtain multiple log recordings of each storage equipment in current statistical time section;
Statistic unit carries out multiple log recordings of each storage equipment in current statistical time section according to data item Classification, to obtain the access information by statistics of each data item;
Statistic unit is made of the multiple data stored in each storage equipment the access information by statistics of each data item The access information by statistics of item;
Wherein each log recording include: data item identifier, access initial time, access the end time, sizes of memory and Store initial time;
Wherein each data item has summary info, and the summary info is used to briefly introduce the content of data item.
9. the system according to any one of claim 6-8,
Statistic unit is each deposited according to the access information determination by statistics of the multiple data item stored in each storage equipment Storage equipment access information statistics file include:
Access information by statistics of the statistic unit to each data item in the multiple data item stored in each storage equipment It is counted with the accessed number of each data item of determination;
Statistic unit determines access initial time accessed for the first time in the access information by statistics of each data item To count initial time, by the access end time accessed for the last time in the access information by statistics of each data item It is determined as counting the end time;
Statistic unit determines the sizes of memory of each data item based on the access information by statistics of each data item;
Statistic unit determines that each data item exists according to the storage message file in the storage information area of each storage equipment Store the storage initial time in equipment.
10. the system according to any one of claim 6-9,
The low frequency frequency threshold value is 100,120,150 or 200;
Device descriptive information in the system log device includes: all storage equipment included by big data storage system Institute is added in total quantity, the total memory capacity of each storage equipment, the network address of each storage equipment and/or each storage equipment State the time of big data storage system;
Storage message file in the storage information area of each storage equipment includes: the total quantity of data item, each data item Sizes of memory, the starting storage time of each data item, the identifier of each data item, each data item summary info with And the free memory capacity of each storage equipment;
The low frequency equipment threshold value is 100,120,150,200,300,400 or 500;The system balancing coefficient threshold is 50%, 55%, 60%, 65% or 70%.
CN201811005484.3A 2018-08-30 2018-08-30 It is a kind of for determining the method and system of the data balancing of big data storage system Withdrawn CN109271101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811005484.3A CN109271101A (en) 2018-08-30 2018-08-30 It is a kind of for determining the method and system of the data balancing of big data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811005484.3A CN109271101A (en) 2018-08-30 2018-08-30 It is a kind of for determining the method and system of the data balancing of big data storage system

Publications (1)

Publication Number Publication Date
CN109271101A true CN109271101A (en) 2019-01-25

Family

ID=65154703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811005484.3A Withdrawn CN109271101A (en) 2018-08-30 2018-08-30 It is a kind of for determining the method and system of the data balancing of big data storage system

Country Status (1)

Country Link
CN (1) CN109271101A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461620A (en) * 2020-04-09 2020-07-28 中振区块链(深圳)有限公司 Block chain-based distributed storage method and device for logistics data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461620A (en) * 2020-04-09 2020-07-28 中振区块链(深圳)有限公司 Block chain-based distributed storage method and device for logistics data
CN111461620B (en) * 2020-04-09 2023-08-01 海口慧海医药有限公司 Distributed storage method and device based on block chain logistics data

Similar Documents

Publication Publication Date Title
CN109033462A (en) The method and system of low-frequency data item are determined in the storage equipment of big data storage
US11720537B2 (en) Bucket merging for a data intake and query system using size thresholds
US11663176B2 (en) Data field extraction model training for a data intake and query system
US11704490B2 (en) Log sourcetype inference model training for a data intake and query system
US20220036177A1 (en) Data field extraction by a data intake and query system
US8886797B2 (en) System and method for deriving user expertise based on data propagating in a network environment
US9262533B2 (en) Context based data searching
CN107801086A (en) The dispatching method and system of more caching servers
CN107835437B (en) Dispatching method based on more cache servers and device
US9692817B2 (en) System and method for flexible holding storage during messaging
CN106959963A (en) A kind of data query method, apparatus and system
US10437820B2 (en) Asymmetric distributed cache with data chains
CN106095575B (en) A kind of devices, systems, and methods of log audit
CN101131697A (en) System and method for moving records between partitions
KR101411321B1 (en) Method and apparatus for managing neighbor node having similar characteristic with active node and computer readable medium thereof
CN101147136A (en) Information processing device and storage device, information processing method and storing method, and information processing program and program for storage device
CN107577787A (en) The method and system of associated data information storage
CN112054923B (en) Service request detection method, equipment and medium
CN106919691A (en) Method, device and the searching system retrieved based on web page library
CN109271104A (en) It is a kind of for determining the method and system of the operating status of big data storage system
US11216500B1 (en) Provisioning mailbox views
US20090037443A1 (en) Intelligent group communication
CN109271103A (en) A kind of method and system carrying out data mixing storage in big data storage system
CN109271101A (en) It is a kind of for determining the method and system of the data balancing of big data storage system
CN109271102A (en) Identify the method and system of the low access degree storage equipment in big data storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190125

WW01 Invention patent application withdrawn after publication