CN110134738A - Distributed memory system resource predictor method, device - Google Patents

Distributed memory system resource predictor method, device Download PDF

Info

Publication number
CN110134738A
CN110134738A CN201910425874.4A CN201910425874A CN110134738A CN 110134738 A CN110134738 A CN 110134738A CN 201910425874 A CN201910425874 A CN 201910425874A CN 110134738 A CN110134738 A CN 110134738A
Authority
CN
China
Prior art keywords
cluster
resource
data
memory system
distributed memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910425874.4A
Other languages
Chinese (zh)
Other versions
CN110134738B (en
Inventor
穆纯进
尹正军
马骁
王项男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Big Data Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201910425874.4A priority Critical patent/CN110134738B/en
Publication of CN110134738A publication Critical patent/CN110134738A/en
Application granted granted Critical
Publication of CN110134738B publication Critical patent/CN110134738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Abstract

This application discloses a kind of distributed memory system resource predictor method and devices, this method comprises: receiving the resource occupation inquiry request for each cluster in distributed memory system;According to resource occupation inquiry request, the metadata of each cluster in distributed memory system is obtained;The resource parameters that each cluster has currently occupied are obtained according to the metadata of each cluster, the resource parameters currently occupied include the memory that data file quantity, data volume, data block size, data number of blocks and each task of processing need;Parameter is occupied according to the storage resource that the resource parameters that each cluster has currently occupied calculate distributed memory system.By the metadata for extracting each cluster in distributed memory system, and according to the metadata of each cluster, the resource parameters that each cluster of processing has currently occupied are found out, the resource parameters currently occupied according to each cluster are to complete to estimate the resource of distributed memory system.

Description

Distributed memory system resource predictor method, device
Technical field
The application belongs to data processing field, and in particular to distributed memory system resource predictor method, device.
Background technique
The data generated in big data era, enterprise are more and more, and big data cluster scale is increasing, entreprise cost It steeply rises, if occupancy resource can be estimated in advance in the case where extensive operation is submitted, optimization operation can be provided very It is big to help, to reduce cluster resource consumption, guarantee cluster stability, reduces entreprise cost.
Occupancy resource, this method are estimated currently with building test environment and carrying out trial operation on test or direct-on-line Entreprise cost is increased, adverse effect is generated to cluster on line.
Summary of the invention
The application for currently with build test environment test or direct-on-line on trial operation come estimate occupy provide Source, such a process increases entreprise costs, lead to the problem of adverse effect to cluster on line, provide a kind of distributed memory system Resource predictor method, device.
The application provides a kind of distributed memory system resource predictor method, comprising:
Receive the resource occupation inquiry request for each cluster in distributed memory system;
According to the resource occupation inquiry request, the metadata of each cluster in distributed memory system is obtained;
The resource parameters that each cluster has currently occupied are obtained according to the metadata of each cluster, it is described current The resource parameters occupied include data file quantity, data volume, data block size, data number of blocks and each task needs of processing Memory;
The storage resource of the distributed memory system is calculated according to the resource parameters that each cluster has currently occupied Occupy parameter.
Optionally, described according to the resource occupation inquiry request, obtain the member of each cluster in distributed memory system Data step, comprising:
Acquire the meta data file of the binary format saved in the distributed memory system with predetermined period, and by institute It states meta data file and is converted to text formatting;
Extract metadata from the meta data file of the text formatting, the metadata include it is at least one of following or Any combination: when file system directories name, file system directories, access user, user group, permission, file path, file modification Between, the file access time.
Optionally, the metadata according to each cluster obtains the resource ginseng that each cluster has currently occupied Number step, comprising:
The mark of data to be checked is parsed from the resource occupation inquiry request, and according to the data to be checked Mark obtains the metadata of the data to be checked;
According to the metadata of the data to be checked, the target cluster that the data to be checked are belonged to is determined;
The resource occupation inquiry request is sent to the target cluster;
Receive the resource parameters of the object set pocket transmission currently occupied.
Optionally, the metadata according to the data to be checked determines the target that the data to be checked are belonged to It is described that the resource occupation inquiry request is sent to before the target cluster step after cluster step, further includes:
Judge whether the target cluster is isomeric group, if so, rewriting to the resource occupation inquiry request;
It is described that the resource occupation inquiry request is sent to the target cluster, comprising: by revised resource occupation Inquiry request is sent to the target cluster.
Optionally, it includes task quantity and EMS memory occupation amount that the storage resource, which occupies parameter, described according to described each The step of storage resource that the resource parameters that cluster has currently occupied calculate the distributed memory system occupies parameter, comprising:
For each cluster, determine the maximum value in data file quantity and data number of blocks, calculate the maximum value and The sum of data volume, and the sum and the ratio of data block size of the maximum value and data volume are calculated, obtain the number of tasks of the cluster Amount;
The task quantity of the distributed memory system is determined according to the task quantity of each cluster;
For each cluster, calculating task quantity and the product for handling the memory that each task needs obtain the interior of the cluster Deposit occupancy;
According to the EMS memory occupation amount of each cluster, the EMS memory occupation amount of the distributed memory system is determined.
The application also provides a kind of distributed memory system resource estimating device, comprising:
Receiving module, for receiving the resource occupation inquiry request for being directed to each cluster in distributed memory system;
First obtains module, for obtaining each collection in distributed memory system according to the resource occupation inquiry request The metadata of group;
Second obtains module, for obtaining what each cluster had currently occupied according to the metadata of each cluster Resource parameters, the resource parameters currently occupied include data file quantity, data volume, data block size, data block number Measure and handle the memory that each task needs;
Computing module, the resource parameters for currently having been occupied according to each cluster calculate the distributed storage system The storage resource of system occupies parameter.
Optionally, described first module is obtained, comprising:
Submodule is acquired, for acquiring the member of the binary format saved in the distributed memory system with predetermined period Data file, and the meta data file is converted into text formatting;
Extracting sub-module, for extracting metadata, the metadata packet from the meta data file of the text formatting Include at least one of following or any combination: file system directories name, file system directories, access user, user group, permission, text Part path, filemodetime, file access time.
Optionally, described second module is obtained, comprising:
Acquisition submodule, for parsing the mark of data to be checked from the resource occupation inquiry request, and according to The mark of the data to be checked obtains the metadata of the data to be checked;
It determines submodule, for the metadata according to the data to be checked, determines what the data to be checked were belonged to Target cluster;
Sending submodule, for the resource occupation inquiry request to be sent to the target cluster;
Receiving submodule, for receiving the resource parameters of the object set pocket transmission currently occupied.
Optionally, described second module is obtained, further includes:
Judgment module, for judging whether the target cluster is isomeric group, if so, inquiring the resource occupation Request is rewritten;
The sending submodule, is specifically used for: revised resource occupation inquiry request is sent to the target cluster.
Optionally, the computing module, comprising:
First computational submodule determines the maximum in data file quantity and data number of blocks for being directed to each cluster Value, calculates the sum of the maximum value and data volume, and calculate the sum and the ratio of data block size of the maximum value and data volume, Obtain the task quantity of the cluster;
Second computational submodule determines the task of the distributed memory system for the task quantity according to each cluster Quantity;
Third computational submodule, for being directed to each cluster, the memory that calculating task quantity and each task of processing need Product obtains the EMS memory occupation amount of the cluster;
4th computational submodule determines the distributed memory system for the EMS memory occupation amount according to each cluster EMS memory occupation amount.
Distributed memory system resource predictor method provided by the present application, by making full dimension picture to distributed memory system Picture, extracts the metadata of each cluster in distributed memory system, and parses to the resource occupation inquiry request of submission, according to Metadata finds out the resource parameters that each cluster has currently occupied, and combines meter according to the resource parameters that each cluster has currently occupied Process is calculated to estimate total task number of operation, total EMS memory occupation amount, to complete to estimate the resource of distributed memory system.
Detailed description of the invention
Fig. 1 is a kind of flow chart for distributed memory system resource predictor method that the application first embodiment provides;
A kind of optional embodiment of step S2 in Fig. 1 that Fig. 2 provides for the application first embodiment;
A kind of optional embodiment of step S3 in Fig. 1 that Fig. 3 provides for the application first embodiment;
Another optional embodiment of step S4 in Fig. 1 that Fig. 4 provides for the application first embodiment;
Fig. 5 is a kind of structural representation for distributed memory system resource predictor method that the application second embodiment provides Figure;
Fig. 6 is that a kind of another structure for distributed memory system resource predictor method that the application second embodiment provides is shown It is intended to;
Fig. 7 is that a kind of another structure for distributed memory system resource predictor method that the application second embodiment provides is shown It is intended to;
Fig. 8 is that a kind of another structure for distributed memory system resource predictor method that the application second embodiment provides is shown It is intended to
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, with reference to the accompanying drawing and specific embodiment party Present invention is further described in detail for formula.
The application provides a kind of distributed memory system resource predictor method, device.It is provided below in conjunction with the application The attached drawing of embodiment be described in detail one by one.
A kind of distributed memory system resource predictor method that the application first embodiment provides is as follows:
As shown in Figure 1, it illustrates a kind of distributed memory system resource predictor method provided by the embodiments of the present application, packet Include following steps.
Step S1 receives the resource occupation inquiry request for each cluster in distributed memory system.
In this step, in this step, the resource occupation inquiry for each cluster in distributed memory system is received Request, i.e. SQL query are requested.Sql like language is the abbreviation of structured query language (Structured Query Language). Sql like language is a kind of data base querying and programming language, for accessing data and querying, updating, and managing relation data Library system;It is simultaneously also the extension name of database script file.
Step S2 obtains the metadata of each cluster in distributed memory system according to the resource occupation inquiry request.
Metadata (Metadata), also known as broker data, relaying data, for data (the data about for describing data Data), the information of data attribute (property) is mainly described, for supporting as indicated storage location, historical data, resource The functions such as lookup, file record.Metadata a kind of electronic type catalogue at last, in order to achieve the purpose that scheduling, it is necessary to retouch The interior perhaps characteristic of data is stated and collected, and then reaches the purpose for assisting data retrieval.Metadata is tissue, the number about data According to domain and its information of relationship, in short, metadata is exactly the data about data.
Preferably, as shown in Fig. 2, the step S2 obtains distributed storage system according to the resource occupation inquiry request The metadata of each cluster in system, comprising:
Step S201 acquires the metadata text of the binary format saved in the distributed memory system with predetermined period Part, and the meta data file is converted into text formatting.
Distributed memory system can save the catalogue of whole system, the details of file in memory, in order to prevent Internal storage data is lost after delay machine, and data in EMS memory can be arrived magnetic at regular intervals with binary form sequenceization by storage system Disk.
In this step, the binary file in taken at regular intervals distributed memory system, and by binary file antitone sequence Text formatting is turned to, for extracting metadata.Predetermined period is preset value, can specifically be set as required It is fixed, it is not construed as limiting herein.
Step S202 extracts metadata from the meta data file of the text formatting.
In this step, the metadata of each cluster is extracted by distributed computing.Extraction step includes customized KV, two Minor sort, customized subregion, it is customized merge, customized grouping, the metadata of extraction include it is at least one of following or Any combination: file system directories name, file system directories, access user (file owning user), user group, permission, text Part path, filemodetime, file access time.It is also possible to comprising other data, such as the directory capacity (appearance of file Amount), catalogue file number (number of files under file), minimax mean file size, file format etc..
Specifically, the often more than one big data cluster in enterprise, it may multiple clusters even different types of isomery Cluster, may need to combine multiple clusters when we submit a SQL query and calculate together, either still to single cluster Multiple isomeric groups require consumption resource, require to do resource occupation and estimate, so system needs first number to each cluster According to being collected.Each isomeric group provides the metadata information that a http interface sticks one's chin out, and metadata obtains Fang Cheng This http interface is called to obtain the metadata of each isomeric group in sequence.That is, will be executed for each cluster above-mentioned Step.
Step S3 obtains the resource parameters that each cluster has currently occupied according to the metadata of each cluster.
In this step, the data that each cluster carries out stock assessment needs are obtained according to the metadata of each cluster, i.e., The resource parameters that each cluster has currently occupied, including data file quantity, data volume, data block size, data number of blocks and Handle the memory that each task needs.Herein it should be noted that the resource parameters currently occupied are to generate when data processing Dynamic data.
Preferably, as shown in figure 3, the step S3, obtains each cluster according to the metadata of each cluster The resource parameters currently occupied, comprising:
Step S301, parses the mark of data to be checked from the resource occupation inquiry request, and according to it is described to The mark for inquiring data obtains the metadata of the data to be checked.
In this step, when submitting SQL operation, the logic executive plan and physics executive plan of SQL are parsed.According to The logic executive plan and physics executive plan that parse obtain the mark of data to be checked, and according to data to be checked Mark obtains metadata corresponding with the data to be checked.Data to be checked parse not yet herein, only parse and want Inquire the mark of what data.
Step S302 determines the object set that the data to be checked are belonged to according to the metadata of the data to be checked Group.
In this step, the metadata obtained according to previous step, according to system directory name, the file system in metadata Catalogue and file path, determine this document path it is corresponding be which cluster.
The resource occupation inquiry request is sent to the target cluster by step S303.
In this step, which cluster belonged to according to the data to be checked that previous step metadata is judged, it then will money Source occupies inquiry request and is routed to this target cluster.
Preferably, after the step S302, before the step S303, further includes: whether judge the target cluster For isomeric group, if so, being rewritten to the resource occupation inquiry request.The step S303, by the resource occupation Inquiry request is sent to the target cluster, comprising: revised resource occupation inquiry request is sent to the target cluster.
In this step, it has been possible to isomeric group in distributed memory system, so needing in routing procedure to SQL A degree of rewriting is carried out, rewriting herein will be rewritten as adapting to corresponding isomeric group for each isomeric group Sentence, it is specific to rewrite sentence sets itself as required, it is not construed as limiting herein.It is different to get the corresponding target of data to be checked After structure cluster, it will be rewritten for the SQL statement of isomeric group, and the revised SQL statement is routed to corresponding mesh Mark isomeric group.
By taking practical big data system as an example: the SQL syntax that hive, ES, HBase cluster are supported has difference to a certain degree, Group type is arrived according to metadata is available, execution grammer, such as HBase are adapted to according to the distinctive difference of this group type Itself do not support SQL, the built-in API for needing for SQL statement to be changed to HBase is calculated.API is that operating system is left for using journey One calling interface of sequence, application program make operating system go the life of executing application by the API of call operation system It enables.
Step S304 receives the resource parameters of the object set pocket transmission currently occupied.
In this step, the data file number of each target cluster is inquired from each target cluster according to metadata The memory that amount, data volume, data block size, data number of blocks and each task of processing need.The metadata of extraction mainly include with At least one of lower or any combination: file system directories name, file system directories, access user, user group, permission, file road Diameter, filemodetime, file access time.Member can be obtained according to file system directories name, file system directories, file path Which cluster is corresponding data be, can be obtained according to access user, user group, permission, filemodetime, file access time Data file quantity, data volume, data block size, data number of blocks and each task of processing of each target cluster need interior It deposits.
It should be noted that need to parse the data to be checked of scanning again according to revised SQL after SQL rewrites, According to the corresponding metadata of data to be checked, the data text of each target heterogeneous cluster is inquired from each target heterogeneous cluster The memory that number of packages amount, data volume, data block size, data number of blocks and each task of processing need.
Step S4 calculates depositing for the distributed memory system according to the resource parameters that each cluster has currently occupied Store up resource occupation parameter.
In this step, the data that stock assessment needs are carried out to distributed memory system obtained according to step S3, i.e., The resource parameters currently occupied, are calculated, and the storage resource for obtaining final distributed memory system occupies parameter, including total Number of tasks and total EMS memory occupation amount, to complete stock assessment.
Preferably, as shown in figure 4, it includes task quantity and EMS memory occupation amount, the step that the storage resource, which occupies parameter, Rapid S4, the resource parameters currently occupied according to each cluster calculate the storage resource of the distributed memory system Occupy parameter, comprising:
Step S401 determines the maximum value in data file quantity and data number of blocks for each cluster, described in calculating The sum of maximum value and data volume, and the sum and the ratio of data block size of the maximum value and data volume are calculated, obtain the cluster Task quantity.Meanwhile the corresponding CPU core number of each task, CPU core number=task number.
In this step, for a cluster, the number of tasks of a cluster is calculated, task number=max (data file number, Data block number)+data volume/data block size.It can be seen that how much major embodiments of task number are the number and data of number of files The size of amount.
In a preferred embodiment, recommend if mean file size is less than the data block size of system setting Merged when processing, main adjustment direction is that minimum fragment number is greater than or equal to data block number, with specific reference to stock assessment into Row tuning.For example, mean file size is 1M, a data block size is 10M, and a total of 100 files need to handle, average File size is less than data block size, then minimum fragment number is more than or equal to 100/10=10 when handling.
Step S402 determines the task quantity of the distributed memory system according to the task quantity of each cluster.
Merger is carried out to the stock assessment result of cluster each in distributed memory system, calculates general assignment number and total CPU core number, calculation formula are as follows:
General assignment number (total_task)=cluster task+ cluster task+ cluster task...;
Total CPU core number=total task number.
Step S403, for each cluster, calculating task quantity and the product for handling the memory that each task needs are somebody's turn to do The EMS memory occupation amount of cluster.
In this step, for a cluster, the EMS memory occupation amount an of cluster, memory=task number * processing are calculated The memory that each task needs.
Step S404 determines the EMS memory occupation amount of the distributed memory system according to the EMS memory occupation amount of each cluster.
In this step, merger is carried out to the stock assessment result of cluster each in distributed memory system, then calculated Total EMS memory occupation amount, calculation formula are as follows out:
Total EMS memory occupation amount (total_memory)=cluster memory+ cluster memory+ cluster memory...
In a preferred embodiment, the embodiment of the present application also calculates the bottle of each cluster in distributed memory system Neck, bottleneck=required resource/total resources.The EMS memory occupation amount of a required resource i.e. cluster, total resources are the total interior of the cluster It deposits.
Distributed memory system resource predictor method provided by the present application, by making full dimension picture to distributed memory system Picture, extracts the metadata of each cluster in distributed memory system, and parses to the SQL of submission, according to the logic meter of generation Draw, physics plan and metadata find out the resource parameters that each cluster has currently occupied, according to the resource parameters currently occupied In conjunction with calculation process to estimate total task number of operation, total EMS memory occupation amount, to complete the resource to distributed memory system It estimates.
A kind of distributed memory system resource estimating device that the application second embodiment provides is as follows:
As shown in figure 5, it illustrates a kind of distributed memory system resource predictor methods provided by the embodiments of the present application Structural schematic diagram comprises the following modules.
Receiving module 11, for receiving the resource occupation inquiry request for being directed to each cluster in distributed memory system;
First obtains module 12, for obtaining each in distributed memory system according to the resource occupation inquiry request The metadata of cluster;
Second obtains module 13, has currently occupied for obtaining each cluster according to the metadata of each cluster Resource parameters, the currently resource parameters that have occupied include data file quantity, data volume, data block size, data block Quantity and the memory for handling each task needs;
Computing module 14, the resource parameters for currently having been occupied according to each cluster calculate the distributed storage The storage resource of system occupies parameter.
Optionally, as shown in fig. 6, described first obtains module 12, comprising:
Submodule 121 is acquired, for acquiring the binary format saved in the distributed memory system with predetermined period Meta data file, and the meta data file is converted into text formatting;
Extracting sub-module 122, for extracting metadata, the metadata from the meta data file of the text formatting Including at least one of following or any combination: file system directories name, file system directories, access user, user group, permission, File path, filemodetime, file access time.
Optionally, as shown in fig. 7, described second obtains module 13, comprising:
Acquisition submodule 131, for parsing the mark of data to be checked, and root from the resource occupation inquiry request The metadata of the data to be checked is obtained according to the mark of the data to be checked;
It determines submodule 132, for the metadata according to the data to be checked, determines that the data to be checked are belonged to Target cluster;
Sending submodule 133, for the resource occupation inquiry request to be sent to the target cluster;
Receiving submodule 134, for receiving the resource parameters of the object set pocket transmission currently occupied.
Optionally, described second module 13 (being not drawn into figure) is obtained, further includes:
Judgment module, for judging whether the target cluster is isomeric group, if so, inquiring the resource occupation Request is rewritten;
The sending submodule, is specifically used for: revised resource occupation inquiry request is sent to the target cluster.
Optionally, as shown in figure 8, the computing module 14, comprising:
First computational submodule 141 determines in data file quantity and data number of blocks most for being directed to each cluster Big value, calculates the sum of the maximum value and data volume, and calculate the sum of the maximum value and data volume and the ratio of data block size Value, obtains the task quantity of the cluster;
Second computational submodule 142, for determining the distributed memory system according to the task quantity of each cluster Task quantity;
Third computational submodule 143, for being directed to each cluster, calculating task quantity and the memory for handling each task needs Product, obtain the EMS memory occupation amount of the cluster;
4th computational submodule 144 determines the distributed memory system for the EMS memory occupation amount according to each cluster EMS memory occupation amount.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, however the present invention is not limited thereto.For those skilled in the art, essence of the invention is not being departed from In the case where mind and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.

Claims (10)

1. a kind of distributed memory system resource predictor method characterized by comprising
Receive the resource occupation inquiry request for each cluster in distributed memory system;
According to the resource occupation inquiry request, the metadata of each cluster in distributed memory system is obtained;
The resource parameters that each cluster has currently occupied are obtained according to the metadata of each cluster, it is described currently to have accounted for Resource parameters include in data file quantity, data volume, data block size, data number of blocks and each task of processing need It deposits;
It is occupied according to the storage resource that the resource parameters that each cluster has currently occupied calculate the distributed memory system Parameter.
2. distributed memory system resource predictor method according to claim 1, which is characterized in that described according to the money Source occupies inquiry request, obtains the metadata step of each cluster in distributed memory system, comprising:
Acquire the meta data file of the binary format saved in the distributed memory system with predetermined period, and by the member Data file transition is text formatting;
Metadata is extracted from the meta data file of the text formatting, the metadata includes at least one of following or any Combination: file system directories name, file system directories, access user, user group, permission, file path, filemodetime, The file access time.
3. distributed memory system resource predictor method according to claim 1, which is characterized in that described according to described each The metadata of a cluster obtains the resource parameters step that each cluster has currently occupied, comprising:
The mark of data to be checked is parsed from the resource occupation inquiry request, and according to the mark of the data to be checked Obtain the metadata of the data to be checked;
According to the metadata of the data to be checked, the target cluster that the data to be checked are belonged to is determined;
The resource occupation inquiry request is sent to the target cluster;
Receive the resource parameters of the object set pocket transmission currently occupied.
4. distributed memory system resource predictor method according to claim 3, which is characterized in that it is described according to The metadata of data is inquired, it is described to account for the resource after determining the target cluster step that the data to be checked are belonged to Before being sent to the target cluster step with inquiry request, further includes:
Judge whether the target cluster is isomeric group, if so, rewriting to the resource occupation inquiry request;
It is described that the resource occupation inquiry request is sent to the target cluster, comprising: to inquire revised resource occupation Request is sent to the target cluster.
5. distributed memory system resource predictor method according to claim 1, which is characterized in that the storage resource accounts for It include task quantity and EMS memory occupation amount with parameter, the resource parameters currently occupied according to each cluster calculate institute The step of stating the storage resource occupancy parameter of distributed memory system, comprising:
For each cluster, the maximum value in data file quantity and data number of blocks is determined, calculate the maximum value and data The sum of amount, and the sum and the ratio of data block size of the maximum value and data volume are calculated, obtain the task quantity of the cluster;
The task quantity of the distributed memory system is determined according to the task quantity of each cluster;
For each cluster, calculating task quantity and the product for handling the memory that each task needs, the memory for obtaining the cluster are accounted for Dosage;
According to the EMS memory occupation amount of each cluster, the EMS memory occupation amount of the distributed memory system is determined.
6. a kind of distributed memory system resource estimating device characterized by comprising
Receiving module, for receiving the resource occupation inquiry request for being directed to each cluster in distributed memory system;
First obtains module, for obtaining each cluster in distributed memory system according to the resource occupation inquiry request Metadata;
Second obtains module, for obtaining the resource that each cluster has currently occupied according to the metadata of each cluster Parameter, the currently resource parameters that have occupied include data file quantity, data volume, data block size, data number of blocks and Handle the memory that each task needs;
Computing module, the resource parameters for currently having been occupied according to each cluster calculate the distributed memory system Storage resource occupies parameter.
7. distributed memory system resource estimating device according to claim 6, which is characterized in that described first obtains mould Block, comprising:
Submodule is acquired, for acquiring the metadata of the binary format saved in the distributed memory system with predetermined period File, and the meta data file is converted into text formatting;
Extracting sub-module, for extracting metadata from the meta data file of the text formatting, the metadata include with At least one of lower or any combination: file system directories name, file system directories, access user, user group, permission, file road Diameter, filemodetime, file access time.
8. distributed memory system resource estimating device according to claim 6, which is characterized in that described second obtains mould Block, comprising:
Acquisition submodule, for parsing the mark of data to be checked from the resource occupation inquiry request, and according to described The mark of data to be checked obtains the metadata of the data to be checked;
It determines submodule, for the metadata according to the data to be checked, determines the target that the data to be checked are belonged to Cluster;
Sending submodule, for the resource occupation inquiry request to be sent to the target cluster;
Receiving submodule, for receiving the resource parameters of the object set pocket transmission currently occupied.
9. distributed memory system resource estimating device according to claim 8, which is characterized in that described second obtains mould Block, further includes:
Judgment module, for judging whether the target cluster is isomeric group, if so, to the resource occupation inquiry request It is rewritten;
The sending submodule, is specifically used for: revised resource occupation inquiry request is sent to the target cluster.
10. distributed memory system resource estimating device according to claim 6, which is characterized in that the computing module, Include:
First computational submodule determines the maximum value in data file quantity and data number of blocks, counts for being directed to each cluster The sum of the maximum value and data volume is calculated, and calculates the sum and the ratio of data block size of the maximum value and data volume, is obtained The task quantity of the cluster;
Second computational submodule determines the number of tasks of the distributed memory system for the task quantity according to each cluster Amount;
Third computational submodule, for being directed to each cluster, calculating task quantity and the product for handling the memory that each task needs, Obtain the EMS memory occupation amount of the cluster;
4th computational submodule determines the memory of the distributed memory system for the EMS memory occupation amount according to each cluster Occupancy.
CN201910425874.4A 2019-05-21 2019-05-21 Distributed storage system resource estimation method and device Active CN110134738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910425874.4A CN110134738B (en) 2019-05-21 2019-05-21 Distributed storage system resource estimation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910425874.4A CN110134738B (en) 2019-05-21 2019-05-21 Distributed storage system resource estimation method and device

Publications (2)

Publication Number Publication Date
CN110134738A true CN110134738A (en) 2019-08-16
CN110134738B CN110134738B (en) 2021-09-10

Family

ID=67572348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910425874.4A Active CN110134738B (en) 2019-05-21 2019-05-21 Distributed storage system resource estimation method and device

Country Status (1)

Country Link
CN (1) CN110134738B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569317A (en) * 2019-09-12 2019-12-13 北京明略软件系统有限公司 metadata collection method and device for data source
CN111680799A (en) * 2020-04-08 2020-09-18 北京字节跳动网络技术有限公司 Method and apparatus for processing model parameters
CN113111038A (en) * 2021-03-31 2021-07-13 北京达佳互联信息技术有限公司 File storage method, device, server and storage medium
CN113553166A (en) * 2020-04-26 2021-10-26 广州汽车集团股份有限公司 Cross-platform high-performance computing integration method and system
WO2023051270A1 (en) * 2021-09-30 2023-04-06 中兴通讯股份有限公司 Memory occupation amount pre-estimation method and apparatus, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542040A (en) * 2011-12-27 2012-07-04 北京奇虎科技有限公司 Capacity acquiring method and system
CN103678563A (en) * 2011-12-27 2014-03-26 北京奇虎科技有限公司 Capacity obtaining method and system
US20140372250A1 (en) * 2011-03-04 2014-12-18 Forbes Media Llc System and method for providing recommended content
CN104657260A (en) * 2013-11-25 2015-05-27 航天信息股份有限公司 Achievement method for distributed locks controlling distributed inter-node accessed shared resources
CN108694071A (en) * 2017-03-29 2018-10-23 瞻博网络公司 More cluster panels for distributed virtualization infrastructure elements monitoring and policy control
US20190051210A1 (en) * 2017-08-09 2019-02-14 Inchstones, LLC Distributed architecture for data synchronization

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140372250A1 (en) * 2011-03-04 2014-12-18 Forbes Media Llc System and method for providing recommended content
CN102542040A (en) * 2011-12-27 2012-07-04 北京奇虎科技有限公司 Capacity acquiring method and system
CN103678563A (en) * 2011-12-27 2014-03-26 北京奇虎科技有限公司 Capacity obtaining method and system
CN104657260A (en) * 2013-11-25 2015-05-27 航天信息股份有限公司 Achievement method for distributed locks controlling distributed inter-node accessed shared resources
CN108694071A (en) * 2017-03-29 2018-10-23 瞻博网络公司 More cluster panels for distributed virtualization infrastructure elements monitoring and policy control
US20190051210A1 (en) * 2017-08-09 2019-02-14 Inchstones, LLC Distributed architecture for data synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蔡涛等: ""NVMMDS-一种面向非易失存储器的元数据管理方法"", 《计算机研究与发展》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569317A (en) * 2019-09-12 2019-12-13 北京明略软件系统有限公司 metadata collection method and device for data source
CN111680799A (en) * 2020-04-08 2020-09-18 北京字节跳动网络技术有限公司 Method and apparatus for processing model parameters
CN111680799B (en) * 2020-04-08 2024-02-20 北京字节跳动网络技术有限公司 Method and device for processing model parameters
CN113553166A (en) * 2020-04-26 2021-10-26 广州汽车集团股份有限公司 Cross-platform high-performance computing integration method and system
CN113111038A (en) * 2021-03-31 2021-07-13 北京达佳互联信息技术有限公司 File storage method, device, server and storage medium
CN113111038B (en) * 2021-03-31 2024-01-19 北京达佳互联信息技术有限公司 File storage method, device, server and storage medium
WO2023051270A1 (en) * 2021-09-30 2023-04-06 中兴通讯股份有限公司 Memory occupation amount pre-estimation method and apparatus, and storage medium

Also Published As

Publication number Publication date
CN110134738B (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN110134738A (en) Distributed memory system resource predictor method, device
CN108009236B (en) Big data query method, system, computer and storage medium
CN105138592B (en) A kind of daily record data storage and search method based on distributed structure/architecture
US10114682B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
CN102164186B (en) Method and system for realizing cloud search service
CN110515912A (en) Log processing method, device, computer installation and computer readable storage medium
WO2017167050A1 (en) Configuration information generation and transmission method, and resource loading method, apparatus and system
CN110457281A (en) Data processing method, device, equipment and medium
CN106970958B (en) A kind of inquiry of stream file and storage method and device
CN108241539B (en) Interactive big data query method and device based on distributed system, storage medium and terminal equipment
CN106503008B (en) File storage method and device and file query method and device
CN111782692B (en) Frequency control method and device
CN111797091A (en) Method and device for querying data in database, electronic equipment and storage medium
CN110147470B (en) Cross-machine-room data comparison system and method
CN109299157A (en) A kind of data export method and device of distributed big single table
CN105763595A (en) Method of improving data processing efficiency and server
CN108154024B (en) Data retrieval method and device and electronic equipment
CN111224831A (en) Method and system for generating call ticket
CN111752945A (en) Time sequence database data interaction method and system based on container and hierarchical model
CN102026228A (en) Statistical method and equipment for communication network performance data
CN109213950B (en) Data processing method and device for browser application of IPTV (Internet protocol television) intelligent set top box
CN112887113A (en) Method, device and system for processing data
CN110781430B (en) Novel virtual data center system of internet and construction method thereof
CN109586970B (en) Resource allocation method, device and system
CN108959952A (en) data platform authority control method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant