CN108363761A - Hadoop awr automatic loads analyze information bank, analysis method and storage medium - Google Patents
Hadoop awr automatic loads analyze information bank, analysis method and storage medium Download PDFInfo
- Publication number
- CN108363761A CN108363761A CN201810107603.XA CN201810107603A CN108363761A CN 108363761 A CN108363761 A CN 108363761A CN 201810107603 A CN201810107603 A CN 201810107603A CN 108363761 A CN108363761 A CN 108363761A
- Authority
- CN
- China
- Prior art keywords
- unit
- data
- sent
- cluster
- awr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 33
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 21
- 238000012423 maintenance Methods 0.000 claims abstract description 11
- 238000006116 polymerization reaction Methods 0.000 claims abstract description 10
- 238000005538 encapsulation Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 13
- 238000000034 method Methods 0.000 claims description 10
- 238000013468 resource allocation Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 4
- 230000007812 deficiency Effects 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000007726 management method Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 241000233805 Phoenix Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides Hadoop awr automatic loads analysis information bank, analysis method and storage medium.Described information storehouse includes data acquisition module, computing module, Awr report messages library, each module by network connection, wherein:The information that the data acquisition module is used to acquire Hadoop big data clusters is sent to the computing module;The calculating pattern that the computing module is used to be selected according to client is carried out polymerization, the classified calculating of index of correlation using corresponding algorithm, result of calculation is sent to Awr report messages library;Awr report messages library supplies information to operation maintenance personnel and carries out tuning to cluster hardware and software.The present invention has done complete analysis in terms of host, rack, input file, output file, task, compensate for the deficiency of existing cluster log system, it provides and more fully analyzes latitude, the performance issue that can be defined very much and effectively point out cluster, helps administrator to make optimization for the cluster of oneself.
Description
Technical field
The invention belongs to computer big data fields, are directed primarily to Hadoop awr automatic loads analysis information bank, divide
Analysis method and storage medium.
Background technology
Hadoop is a distributed system architecture.User can be the case where not knowing about distributed low-level details
Under, develop distributed program.The power of cluster is made full use of to carry out high-speed computation and storage.
Hadoop realizes a distributed file system, and the most crucial design of frame is exactly:HDFS(Hadoop
Distributed File System) and MapReduce.HDFS provides storage for the data of magnanimity, then MapReduce is
The data of magnanimity provide calculating.
Hadoop is reliable, because it is assumed that calculating elements and storage can fail, therefore it safeguards multiple operational datas
Copy, it is ensured that the node redistribution processing of failure can be directed to;Hadoop is efficient, because of its work in a parallel fashion
Make, passes through parallel processing speed up processing;Hadoop or telescopic, can handle PB level data;In addition, Hadoop according to
Rely in community service, therefore its cost is relatively low, anyone can use.Hadoop, which is one, can allow the light frame of user
Structure and the Distributed Computing Platform used.User easily can develop and run the application of processing mass data on Hadoop
Program.It mainly has following advantage:The frame that Hadoop writes with useful Java language, therefore operate in Linux productions
It is ideal on platform.Application program on Hadoop can also be write using other language, such as C++.
Hadoop is able to the extensive use in big data processing application and has benefited from its own in data extraction, deformation and load
(ETL) inherent advantage in aspect.The distributed structure/architecture of Hadoop, it is right by big data processing engine as far as possible close to storage
Batch operation is relatively suitable as such as ETL, because the batch processing result of of this sort operation can directly be moved towards to deposit
Storage.The MapReduce functions of Hadoop, which realize, smashes individual task, and fragment task (Map) is sent to multiple nodes
On, it is loaded into data warehouse in the form of individual data collection again later.
AWR (Automatic Workload Repository) is the english abbreviation of automatic load information bank, AWR reports
It is a kind of performance collection and analysis tool that version provides after Oracle 10g, whole system in a period can be provided and provided
The report of source service condition, by report it will be seen that the whole service situation of a system, the report of generation includes multiple portions
Point.AWR to the acquisition information in memory sample primary per hour, and by information preservation to disk, and retains 7 days, and 7
Old record can be just capped after it.These sample informations are saved in the libraries write-in AWR.And this sample frequency and retention time
It can be adjusted according to actual conditions, this just provides significantly more efficient system monitoring tool to DBA.
DBA (Database Administrator) is to database operating status and the monitoring understanding of situation, test process
When middle discovery database bottleneck occurs but can not navigate to concrete reason, AWR reports can be borrowed and carry out analyzing and positioning.Database
There is performance issue, generally all in three places:IO, memory, CPU, these three places are again closely bound up.Assuming that this three
The failure of a place all not physically needs more memories to store, is also required to simultaneously certainly when I/O load increases
CPU takes more time to filter these data.On the contrary, the CPU time spends mostly, it may be possible to parse SQL
(Structured Query Language) sentence, it is also possible to filter too many data, be not necessarily and have with IO or interior
Relationship.
Fig. 1 is the system composition schematic diagram of Hadoop clusters.As shown in Figure 1, Hadoop clusters include five layers, respectively
Data acquisition, data storage, data dispatch, big data calculating, application layer.
Data acquisition is the access layer of data, and the frame of this level is responsible for data transmission entering big data cluster, and will
The data of big data cluster export.Flume:Isomeric data acquires frame;Sqoop:Relevant database acquires frame offline;
Canal:Relevant database online acquisition frame.
Data storage is the accumulation layer of data, this layer of frame mainly stores data, provides various data-interface and supplies
User uses.Hdfs:The acquiescence distributed memory system of Hadoop;Hbase:The default database of hadoop is based on Hdfs, carries
Oltp (on-line transaction processing), Olap (On-Line Analytical Processing) are supplied
Etc. access modes, based on column store;Kafka:Stream data stores, and is used for streaming computing, is a Message Queuing system.
Data dispatch is management and the dispatching distribution layer of resource, is responsible for Cpu, the resources such as memory, and by different meters
It calculates frame and calculating etc. is allocated according to resource situation.Yarn:The scheduling system of Hadoop acquiescences, compatible a variety of Computational frames,
A variety of dispatching algorithms.
Big data calculates:A variety of different Computational frames.Hive:Big data Sql query engines, Sql is parsed into
The calculating tasks such as Mapreduce or Spark, are calculated;Spark:Memory computing engines, distributed algorithm;
Mapreduce:The computing engines of Hadoop acquiescences, distributed algorithm;Phoenix:Sql computing engines based on Hbase;
Other:Other kinds of Computational frame.
Application layer:The client application system of various big data systems.Oozie:The scheduling system of big data calculating task,
Different task is combined, is scheduled according to dependence;Hue:Big data develops unified gateway, and developer is by this
A tool carries out big data exploitation.
Administrator can not have found problems with after Hadoop clusters operation a period of time:It can not find that cluster is most commonly used
File;Which task each host of cluster and the Cpu and memory of rack have been respectively allocated to;It is how many that each file distinguishes temporary
Computing resource;Whether the calculating of task and resource allocation are reasonable;Whether the file degree of balance of Hdfs is reasonable;Whether the division of rack
Rationally.
In order to solve problem above, the present invention is quasi- provides Hadoop awr automatic loads analysis information bank analysis method and deposits
Storage media can collect big data cluster progressive by Hadoop awr, analyze and tuning.
Invention content
The purpose of the present invention is to provide the performance collection and tuning tool of a Hadoop cluster, a kind of mode is provided and is gone
The current loading condition of cluster is analyzed, output cluster loads related statements information, understands the negative of cluster to guidance management personnel
Situation is carried, is the sharp weapon of Hadoop cluster tunings.
The present invention provides a kind of Hadoop awr automatic loads analysis information bank, and described information storehouse includes data acquisition module
Block, computing module, Awr report messages library, each module by network connection, wherein:
The information that the data acquisition module is used to acquire Hadoop big data clusters is sent to the computing module;
The calculating pattern that the computing module is used to be selected according to client carries out index of correlation using corresponding algorithm
Polymerization, classified calculating, Awr report messages library is sent to by result of calculation;
Awr report messages library supplies information to operation maintenance personnel and carries out tuning to cluster hardware and software.
Further, the data acquisition module includes Hive query engines, Mapreduce computing engines, Spark calculating
Engine, Other query engines, each unit by network connection to the computing module, wherein:
The Hive query engines for the Sql of big data cluster is inquired and is parsed into calculating task be sent to it is described
Computing module;
The Mapreduce computing engines are parsed into for the calculate node of big data cluster to be allocated and dispatched
Calculating task is sent to the computing module;
The Spark computing engines are used to be divided into the data of the internal storage location of big data cluster small time segment and parse
It is sent to the computing module at calculating task;
The Other query engines are sent to institute for being parsed into calculating task to the unassorted index of big data cluster
State computing module.
Further, the computing module includes pattern resolution unit, input unit, computing unit, output unit, described
Each unit network connection, wherein;
The pattern resolution unit is used to parse the calculating pattern of client selection, and data are carried out with corresponding calculating pattern
Filling, calculating pattern and data are sent to the input unit;
The input unit is used to that, to being formatted, parsing from the data, filter and encapsulation process, pattern will to be calculated
It is sent to the computing unit with processed data;
The computing unit is used to receive the data of the calculating pattern and input unit encapsulation, carries out the poly- of index of correlation
It closes, result of calculation is sent to the output unit by classified calculating;
For being formatted, encapsulating to the result of calculation that the computing unit provides, selection corresponds to the output unit
Output report pattern be output to Awr report messages library.
Further, the calculating pattern of the pattern resolution unit parsing includes that single machine calculates pattern, PC cluster
Pattern, Mapreduce calculate pattern, wherein;
The single machine calculates the pattern cluster few for data volume, provides non-distributed algorithm to the computing unit;
The PC cluster pattern is used for Yarn clusters, provides Mapreduce, Spark distributed algorithm to the meter
Calculate unit;
The Mapreduce calculates pattern and is used for Hadoop 1.x version clusters, provide Mapreduce distributed algorithms to
The computing unit.
Further, the input unit includes input format unit, resolution unit, filter element, packaged unit,
In:
The input format unit after being formatted to the data from the pattern resolution unit for being sent to institute
State resolution unit;
The resolution unit is used for described to being sent to after carrying out data parsing from the data of the input format unit
Filter element;
The filter element is used for being sent to packaged unit after carrying out data filtering from the data of the resolution unit;
The packaged unit after being packaged to the data from the filter element for being sent to the computing unit.
Further, the computing unit includes global statistics unit, rack statistic unit, host statistic unit, time
Statistic unit, each statistic unit receive the calculating pattern of the input unit and after data carry out statistics calculating, calculate knot
Fruit is sent to the output unit, wherein;
The statistics that the global statistics unit is used to carry out big data cluster global latitude calculates;
The statistics that the rack statistic unit is used to carry out big data cluster rack latitude calculates;
The statistics that the host statistic unit is used to carry out big data cluster host latitude calculates;
The statistics that the time statistic unit is used to carry out big data cluster time dimension calculates.
Further, the output unit includes output format unit, encapsulation unit, format selecting unit, formats
Output unit, wherein:
The output format unit is used to carry out output data format to the result of calculation from the computing unit,
The output data of formatting is sent to the encapsulation unit;
The encapsulation unit is used for being sent to the format after carrying out data encapsulation from the data of output format unit
Selecting unit;
The format selecting unit is used to carry out output data statement form selection to the data from the encapsulation unit,
Output data and statement form are sent to the Formatting Output unit;
The Formatting Output unit is used for the data to receiving and carries out output data statement form according to statement form
Change, the output data report of formatting is sent to Awr report messages library.
The present invention also provides a kind of sides carrying out load analysis using Hadoop awr automatic loads analysis information bank
Method, described method includes following steps:
Step S1 is sent to the computing module using the information of the data collecting module collected big data cluster;
Step S2, the calculating pattern that the computing module is selected according to client carry out correlation using corresponding algorithm and refer to
Target polymerization, classified calculating, Awr report messages library is sent to by result of calculation;
Step S3, operation maintenance personnel help to carry out tuning to cluster hardware and software using Awr report messages library.
Further, in step S3, it is described by Awr report messages library to cluster carry out tuning in the way of include with
Lower step:
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;
Each task difference temporary how many computing resource analyzed;
Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;
Whether the division for analyzing rack is reasonable;
Unreasonable cluster hardware and software is optimized.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with processor program, which is characterized in that should
When program is computer-executed, following steps are realized:
Step S1 is sent to computing module using the information of data collecting module collected big data cluster;
Step S2, the calculating pattern that computing module is selected according to client carry out index of correlation using corresponding algorithm
Polymerization, classified calculating, Awr report messages library is sent to by result of calculation;
Step S3, operation maintenance personnel help to carry out tuning to cluster hardware and software using Awr report messages library.
Further, in step S3, it is described by Awr report messages library to cluster carry out tuning in the way of include with
Lower step:
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;
Each task difference temporary how many computing resource analyzed;
Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;
Whether the division for analyzing rack is reasonable;
Unreasonable cluster hardware and software is optimized.
Wherein, cluster Mapreduce patterns adapt to the version cluster of Hadoop 1.x, and single machine cluster mode is only supported
Hadoopyarn clusters.
Present invention obtains obviously benefits:
For official's monitoring and conventional monitoring system, the beneficial effects of the present invention are:In Performance Evaluation, master
Complete analysis comprehensively has been done in terms of machine, rack, input file, output file, task, is provided and is more fully analyzed latitude,
Preliminary advice is provided, the data target of the cluster for machine scale at 20 or more, analysis report can be very clear and effective
The performance issue pointed out inside cluster, help administrator do corresponding adjustment for the cluster of oneself;It compensates for existing
The deficiency of the log system of Hadoop clusters, can provide more effective performance analysis report, and user can be easier to find performance
Problem.
Description of the drawings
Fig. 1 is the system composition schematic diagram of Hadoop clusters.
Fig. 2 is the system composition schematic diagram of the embodiment of the present invention.
Fig. 3 is the computing module composition schematic diagram of the embodiment of the present invention.
Specific implementation mode
Below in conjunction with drawings and examples, the specific implementation mode of the present invention is described in more details, so as to energy
The advantages of enough more fully understanding the solution of the present invention and its various aspects.However, specific embodiments described below and reality
It applies example to be for illustrative purposes only, rather than limiting the invention.Table 1 is the explanation statistics of english vocabulary of the present invention
Table, capital and small letter are general.
Table 1
Fig. 2 is the system composition schematic diagram of the embodiment of the present invention.As shown in Fig. 2, a kind of Hadoop awr automatic loads point
Information bank is analysed, described information storehouse includes data acquisition module 1, computing module 2, Awr report messages library 3, and each module passes through network
Connection, wherein:The information that the data acquisition module 1 is used to acquire Hadoop big data clusters is sent to the computing module 2;Institute
Calculating pattern of the computing module 2 for being selected according to client is stated, polymerization, the classification of index of correlation are carried out using corresponding algorithm
It calculates, result of calculation is sent to Awr report messages library 3;Operation maintenance personnel is helped hard to cluster using Awr report messages library 3
Part and software carry out tuning.
The data acquisition module 1 includes Hive query engines 11, Mapreduce computing engines 12, Spark computing engines
13, Other query engines 14, each unit by network connection to the computing module, wherein:The Hive query engines
11 are sent to the computing module 2 for being inquired the sql of big data cluster and being parsed into calculating task;It is described
Mapreduce computing engines 12 are parsed into calculating task and send for the calculate node of big data cluster to be allocated and dispatched
To the computing module 2;When the Spark computing engines 13 are used to be divided into small to the data of the internal storage location of big data cluster
Between segment be parsed into calculating task and be sent to the computing module 2;The Other query engines 14 be used for big data cluster its
His unfiled index is parsed into calculating task and is sent to the computing module.
Fig. 3 is the computing module composition schematic diagram of the embodiment of the present invention.As shown in figure 3, the computing module 2 includes pattern
Resolution unit 21, input unit 22, computing unit 23, output unit 24, each unit network connection, wherein;The pattern
Resolution unit 21 is used to parse the calculating pattern of client selection, and the filling of data is carried out with corresponding calculating pattern, will be calculated
Pattern and data are sent to the input unit 22;The input unit 22 is used for being formatted, parsing, mistake from the data
Filter and encapsulation process, the computing unit 23 is sent to by calculating pattern and processed data;The computing unit 23 is for connecing
The data for receiving the calculating pattern and the encapsulation of input unit 22, carry out the polymerization of index of correlation, classified calculating send result of calculation
To the output unit 24;The output unit 24 for the result of calculation that the computing unit 23 provides is formatted,
Encapsulation, selects corresponding output report pattern to be output to Awr report messages library 3.
The calculating pattern that the pattern resolution unit 21 parses includes that single machine calculates pattern 211, PC cluster pattern
212, Mapreduce calculates pattern 213, wherein;The single machine calculates the cluster few for data volume of pattern 211, provides overstepping one's bounds
Cloth algorithm is sent to the input unit 22;The PC cluster pattern 212 be used for Yarn clusters, provide Mapreduce,
Spark distributed algorithms are sent to the input unit 22;The Mapreduce calculates pattern 213 and is used for Hadoop 1.X editions
This cluster provides Mapreduce distributed algorithms and is sent to the input unit 22.
The input unit 22 includes input format unit 221, resolution unit 222, filter element 223, packaged unit
224, wherein:The input format unit 221 is for after being formatted the data from the pattern resolution unit 21
It is sent to the resolution unit 222;The resolution unit 222 is used to carry out the data from the input format unit 221
It is sent to the filter element 223 after data parsing;The filter element 223 is used for the data from the resolution unit 222
It is sent to packaged unit 224 after carrying out data filtering;The packaged unit 224 is used for the data from the filter element 223
The computing unit 23 is sent to after being packaged.
The computing unit includes global statistics unit 231, rack statistic unit 232, host statistic unit 233, time
Statistic unit 234, each statistic unit receive the calculating pattern of the input unit and after data carry out statistics calculating, calculate
As a result it is sent to the output unit 24, wherein;
The statistics that the global statistics unit 231 is used to carry out big data cluster global latitude calculates;The rack system
Count the statistics calculating that unit 232 is used to carry out big data cluster rack latitude;The host statistic unit 233 is used for counting greatly
The statistics that host latitude is carried out according to cluster calculates;The time statistic unit 234 is used to carry out time dimension to big data cluster
Statistics calculate.
The output unit 24 includes output format unit 241, encapsulation unit 242, format selecting unit 243, format
Change output unit 244, wherein:The output format unit 241 be used for the result of calculation from the computing unit 23 into
The output data of row output data format, formatting is sent to the encapsulation unit 242;The encapsulation unit 242 is used for coming
It is sent to the format selecting unit 243 after carrying out data encapsulation from the data of the output format unit 241;The format choosing
Unit 243 is selected for carrying out output data statement form selection to the data from the encapsulation unit 242, output data and
Statement form is sent to the Formatting Output unit 244;The Formatting Output unit 244 be used for the data that receive according to
Statement form carries out output data statement form, and the output data report of formatting is sent to awr report messages library 3.
A method of carrying out load analysis, the method packet using Hadoop awr automatic loads analysis information bank
Include following steps:Step S1 is sent to the computing module using the information of the data collecting module collected big data cluster;Step
Rapid S2, the calculating pattern that the computing module is selected according to client carry out the poly- splitting or integrating of index of correlation using corresponding algorithm
Class calculates, and result of calculation is sent to Awr report messages library;Step S3, operation maintenance personnel utilize the help pair of Awr report messages library
Cluster hardware and software carries out tuning.
It is described to include the following steps in such a way that Awr report messages library is to cluster progress tuning in step S3:Point
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;It is temporary to analyze each task difference
With how many computing resource;Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;Analyze drawing for rack
Whether rationally to divide;Unreasonable cluster hardware and software is optimized.
A kind of computer readable storage medium, is stored thereon with processor program, which is characterized in that the program is by computer
When execution, following steps are realized:Step S1 is sent to computing module using the information of data collecting module collected big data cluster;
Step S2, the calculating pattern that computing module is selected according to client carry out polymerization, the classification of index of correlation using corresponding algorithm
It calculates, result of calculation is sent to Awr report messages library;Step S3, operation maintenance personnel are helped using Awr report messages library to collection
Group's hardware and software carries out tuning.
It is described to include the following steps in such a way that Awr report messages library is to cluster progress tuning in step S3:Point
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;It is temporary to analyze each task difference
With how many computing resource;Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;Analyze drawing for rack
Whether rationally to divide;Unreasonable cluster hardware and software is optimized.
Hadoop awr automatic loads analyze information bank, analysis method and storage medium, are in the application of Hadoop clusters
Layer.By way of big data analysis, by " calculating ", " scheduling ", the resource service condition of " storage " layer is summarized and is divided
Analysis, provides analysis report, and for exploitation, management, operation maintenance personnel uses, and convenient for finding clustering performance problem, and provides to improve and think
Road and scheme.
Hadoop awr automatic loads analysis information bank is the performance collection and tuning tool of a Hadoop cluster, is provided
The loading condition that a kind of mode goes analysis cluster current, output cluster loads related statements information, to guidance management personnel
The loading condition for understanding cluster, is the sharp weapon of Hadoop cluster tunings.This report is just as a comprehensive physical examination report.It can be with
Snapshot is established to cluster different time sections, is influenced by the adjustment of the comparative analysis cluster of snapshot, and discover whether existence
It can problem.Tool is analyzed mainly for computing resource and storage resource, the calculating task of cluster, exports detailed performance report
It accuses, the metrics of Hadoop clusters acquiescence and the monitoring software of existing offer is provided.The advantage of Hadoop awr is to count
More comprehensively according to report latitude, in latitudes such as the distribution of task, aggregated data, data processing, capacity, host, rack, calculating types
Degree, all compensates for the deficiency in the log system of existing hadoop cluster, can provide more effective performance analysis report, uses
Person can be easier to find performance issue.
Finally it should be noted that:The above embodiment is only
Restriction to embodiment.For making other variations or changes in different ways on the basis of the above description, still in this
Among the protection domain of invention.
Claims (10)
1. a kind of Hadoop awr automatic loads analyze information bank, described information storehouse include data acquisition module, computing module,
Awr report messages library, each module by network connection, wherein:
The information that the data acquisition module is used to acquire Hadoop big data clusters is sent to the computing module;
The calculating pattern that the computing module is used to be selected according to client carries out the poly- of index of correlation using corresponding algorithm
Splitting or integrating class calculates, and result of calculation is sent to Awr report messages library;
Awr report messages library supplies information to operation maintenance personnel and carries out tuning to cluster hardware and software.
2. Hadoop awr automatic loads according to claim 1 analyze information bank, which is characterized in that wherein:
The data acquisition module includes that Hive query engines, Mapreduce computing engines, Spark computing engines, Other are looked into
Ask engine, each unit by network connection to the computing module, wherein:
The Hive query engines are sent to the calculating for being inquired the sql of big data cluster and being parsed into calculating task
Module;
The Mapreduce computing engines are parsed into calculating for the calculate node of big data cluster to be allocated and dispatched
Task is sent to the computing module;
The Spark computing engines are divided by the data of the internal storage location to big data cluster based on small time segment is parsed into
Calculation task is sent to the computing module;
The Other query engines by the unassorted index to big data cluster be parsed into calculating task be sent to it is described based on
Calculate module.
The computing module includes pattern resolution unit, input unit, computing unit, output unit, and each unit network connects
It connects, wherein;
The pattern resolution unit is used to parse the calculating pattern of client selection, and filling out for data is carried out with corresponding calculating pattern
It fills, calculating pattern and data is sent to the input unit;
The input unit is used for being formatted, parsing from the data, filter and encapsulation process, by the pattern of calculating and place
The data managed are sent to the computing unit;
The computing unit is used to receive the data of the calculating pattern and input unit encapsulation, carries out the polymerization of index of correlation,
Result of calculation is sent to the output unit by classified calculating;
The output unit selects corresponding defeated for being formatted, encapsulating to the result of calculation that the computing unit provides
Go out report mode and is output to awr report messages library.
3. Hadoop awr automatic loads according to claim 2 analyze information bank, which is characterized in that the pattern parsing
The calculating pattern of unit resolves include single machine calculate pattern, PC cluster pattern, Mapreduce calculate pattern, wherein;
The single machine calculates the pattern cluster few for data volume, provides non-distributed algorithm and is sent to the input unit;
The PC cluster pattern is used for Yarn clusters, provides Mapreduce, Spark distributed algorithm and is sent to the input
Unit;
The Mapreduce calculates pattern and is used for Hadoop 1.x version clusters, provides Mapreduce distributed algorithms and is sent to institute
State input unit.
4. Hadoop awr automatic loads according to claim 2 analyze information bank, which is characterized in that the input unit
Including input format unit, resolution unit, filter element, packaged unit, wherein:
The input format unit after being formatted to the data from the pattern resolution unit for being sent to the solution
Analyse unit;
The resolution unit is used for being sent to the filtering after carrying out data parsing from the data of the input format unit
Unit;
The filter element is used for being sent to packaged unit after carrying out data filtering from the data of the resolution unit;
The packaged unit after being packaged to the data from the filter element for being sent to the computing unit.
5. Hadoop awr automatic loads according to claim 2 analyze information bank, which is characterized in that the computing unit
Including global statistics unit, rack statistic unit, host statistic unit, time statistic unit, each statistic unit receives institute
It states the calculating pattern of input unit and after data carry out statistics calculating, result of calculation is sent to the output unit, wherein;
The statistics that the global statistics unit is used to carry out big data cluster global latitude calculates;
The statistics that the rack statistic unit is used to carry out big data cluster rack latitude calculates;
The statistics that the host statistic unit is used to carry out big data cluster host latitude calculates;
The statistics that the time statistic unit is used to carry out big data cluster time dimension calculates.
6. Hadoop awr automatic loads according to claim 2 analyze information bank, which is characterized in that the output unit
Including output format unit, encapsulation unit, format selecting unit, Formatting Output unit, wherein:
The output format unit is used to carry out output data format, format to the result of calculation from the computing unit
The output data of change is sent to the encapsulation unit;
The encapsulation unit is used to select being sent to the format after carrying out data encapsulation from the data of output format unit
Unit;
The format selecting unit is used to carry out output data statement form selection to the data from the encapsulation unit, defeated
Go out data and statement form is sent to the Formatting Output unit;
The Formatting Output unit is used for the data to receiving and carries out output data statement form, lattice according to statement form
The output data report of formula is sent to awr report messages library.
7. a kind of Hadoop awr automatic loads analysis information bank using described in claim 1 to claim 6 is loaded
The method of analysis, described method includes following steps:
Step S1 is sent to the computing module using the information of the data collecting module collected big data cluster;
Step S2, the calculating pattern that the computing module is selected according to client carry out index of correlation using corresponding algorithm
Polymerization, classified calculating, Awr report messages library is sent to by result of calculation;
Step S3, operation maintenance personnel carry out tuning using Awr report messages library to cluster hardware and software.
8. the method according to the description of claim 7 is characterized in that in step S3, using Awr report messages library to cluster
The mode for carrying out tuning includes the following steps:
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;
Each task difference temporary how many computing resource analyzed;
Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;
Whether the division for analyzing rack is reasonable;
Unreasonable cluster hardware and software is optimized.
9. a kind of computer readable storage medium, is stored thereon with processor program, which is characterized in that the program is held by computer
When row, following steps are realized:
Step S1 is sent to computing module using the information of data collecting module collected big data cluster;
Step S2, the calculating pattern that computing module is selected according to client, using corresponding algorithm carry out index of correlation polymerization,
Result of calculation is sent to Awr report messages library by classified calculating;
Step S3, operation maintenance personnel carry out tuning using Awr report messages library to cluster hardware and software.
10. storage medium according to claim 9, which is characterized in that in step S3, utilize Awr report messages library
The mode that tuning is carried out to cluster includes the following steps:
Which task each host of analysis big data cluster and the Cpu and memory of rack have been respectively allocated to;
Analyze each task occupies how many computing resource respectively;
Whether computational resource allocation and the file degree of balance for analyzing above-mentioned task are reasonable;
Whether the division for analyzing rack is reasonable;
Unreasonable cluster hardware and software is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810107603.XA CN108363761A (en) | 2018-02-02 | 2018-02-02 | Hadoop awr automatic loads analyze information bank, analysis method and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810107603.XA CN108363761A (en) | 2018-02-02 | 2018-02-02 | Hadoop awr automatic loads analyze information bank, analysis method and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108363761A true CN108363761A (en) | 2018-08-03 |
Family
ID=63004494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810107603.XA Pending CN108363761A (en) | 2018-02-02 | 2018-02-02 | Hadoop awr automatic loads analyze information bank, analysis method and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363761A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807017A (en) * | 2019-10-16 | 2020-02-18 | 杭州美创科技有限公司 | AWR report analysis method based on Beautiful Soup analysis technology |
CN111130987A (en) * | 2019-11-01 | 2020-05-08 | 平安科技(深圳)有限公司 | Automatic acquisition method and device for AWR report, electronic equipment and storage medium |
CN112860528A (en) * | 2021-03-31 | 2021-05-28 | 中国工商银行股份有限公司 | Database server performance testing and analyzing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103618644A (en) * | 2013-11-26 | 2014-03-05 | 曙光信息产业股份有限公司 | Distributed monitoring system based on hadoop cluster and method thereof |
CN104317658A (en) * | 2014-10-17 | 2015-01-28 | 华中科技大学 | MapReduce based load self-adaptive task scheduling method |
CN105323111A (en) * | 2015-11-17 | 2016-02-10 | 南京南瑞集团公司 | Operation and maintenance automation system and method |
CN106709012A (en) * | 2016-12-26 | 2017-05-24 | 北京锐安科技有限公司 | Method and device for analyzing big data |
-
2018
- 2018-02-02 CN CN201810107603.XA patent/CN108363761A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103618644A (en) * | 2013-11-26 | 2014-03-05 | 曙光信息产业股份有限公司 | Distributed monitoring system based on hadoop cluster and method thereof |
CN104317658A (en) * | 2014-10-17 | 2015-01-28 | 华中科技大学 | MapReduce based load self-adaptive task scheduling method |
CN105323111A (en) * | 2015-11-17 | 2016-02-10 | 南京南瑞集团公司 | Operation and maintenance automation system and method |
CN106709012A (en) * | 2016-12-26 | 2017-05-24 | 北京锐安科技有限公司 | Method and device for analyzing big data |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807017A (en) * | 2019-10-16 | 2020-02-18 | 杭州美创科技有限公司 | AWR report analysis method based on Beautiful Soup analysis technology |
CN111130987A (en) * | 2019-11-01 | 2020-05-08 | 平安科技(深圳)有限公司 | Automatic acquisition method and device for AWR report, electronic equipment and storage medium |
WO2021082484A1 (en) * | 2019-11-01 | 2021-05-06 | 平安科技(深圳)有限公司 | Awr report automatic acquisition method and apparatus, electronic device, and storage medium |
CN112860528A (en) * | 2021-03-31 | 2021-05-28 | 中国工商银行股份有限公司 | Database server performance testing and analyzing method and device |
CN112860528B (en) * | 2021-03-31 | 2024-08-02 | 中国工商银行股份有限公司 | Database server performance testing and analyzing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3903205B1 (en) | Technique of comprehensively support autonomous json document object (ajd) cloud service | |
US10560465B2 (en) | Real time anomaly detection for data streams | |
US11275743B2 (en) | System and method for analyzing data records | |
US10795905B2 (en) | Data stream ingestion and persistence techniques | |
CN106462578B (en) | Methods for querying and updating database entries | |
CN105069703B (en) | A kind of electrical network mass data management method | |
US6801903B2 (en) | Collecting statistics in a database system | |
US10635644B2 (en) | Partition-based data stream processing framework | |
US20170357703A1 (en) | Dynamic partitioning techniques for data streams | |
US10338958B1 (en) | Stream adapter for batch-oriented processing frameworks | |
CN113312376B (en) | Method and terminal for real-time processing and analysis of Nginx logs | |
US20080065588A1 (en) | Selectively Logging Query Data Based On Cost | |
CN111046022A (en) | Database auditing method based on big data technology | |
CN108363761A (en) | Hadoop awr automatic loads analyze information bank, analysis method and storage medium | |
US8311989B1 (en) | Query logging in a multi-database system | |
US11182386B2 (en) | Offloading statistics collection | |
CN107818106B (en) | Big data offline calculation data quality verification method and device | |
CN116362212A (en) | Report generation method, device, equipment and storage medium | |
Roschke et al. | A flexible and efficient alert correlation platform for distributed ids | |
US20180300186A1 (en) | Methods and apparatus for capturing and determining mainframe operating system events | |
Chen et al. | Towards low-latency big data infrastructure at Sangfor | |
CN109669987A (en) | A big data storage optimization method | |
US8290935B1 (en) | Method and system for optimizing database system queries | |
Iuhasz et al. | Monitoring of exascale data processing | |
Yadav et al. | Big Data and cloud computing: An emerging perspective and future trends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211027 Address after: 518102 room 404, building 37, chentian Industrial Zone, chentian community, Xixiang street, Bao'an District, Shenzhen, Guangdong Province Applicant after: Shenzhen Huaxun ark Photoelectric Technology Co., Ltd Address before: 518102 floor 3, building 22, chentian Industrial Zone, Baomin Second Road, Xixiang street, Bao'an District, Shenzhen, Guangdong Applicant before: SHENZHEN CCT SOFTWARE INFORMATION Co.,Ltd. |
|
TA01 | Transfer of patent application right |