WO2017092600A1

WO2017092600A1 - Pointer counting method and device

Info

Publication number: WO2017092600A1
Application number: PCT/CN2016/107017
Authority: WO
Inventors: 王逸; 武翀; 刘键; 方孝健; 封仲淹
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2015-12-04
Filing date: 2016-11-24
Publication date: 2017-06-08
Also published as: CN106846021A

Abstract

The present invention relates to the technical field of real-time computation, and provides a pointer counting method and device. The method comprises: creating, according to nodes that are in respective hierarchical levels and corresponding to a pointer of a topological operation, a structured pointer name for the pointer, wherein a hierarchical relationship between the nodes in the respective hierarchical levels can be determined via the structured pointer name (110); upon detection of lowest-level data corresponding to the pointer, performing a counting operation for structured pointer names corresponding to the lowest level (120); and based on data from the counting operation for the structured pointer names in the lowest level, aggregating, level by level and according to the hierarchical relationship between respective structured pointer names, the data in structured pointer names in successive upper levels (130). The present invention can reduce system costs, and allows convenient addition or deletion of one or more levels because a hierarchical relationship is built in association with structured pointer names, thereby facilitating expansion.

Description

Index statistical method and device

The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present application relates to the field of real-time computing technologies, and in particular, to an indicator statistical method and an indicator statistical device.

Background technique

Along with the rapid development of information technology, information has exploded, and people have more diverse and convenient ways to obtain information. At the same time, the timeliness of information is becoming more and more demanding. As an example in the search scenario, when a seller publishes a product information in an e-commerce website, the seller certainly hopes that the product information can be searched, clicked, and purchased by the buyer immediately. On the contrary, if the product information is to be Wait until the next day or longer to be searched out. For the seller, the information is too lagging, especially affecting the real-time nature of the product information. Based on this demand, a real-time computing system, such as a hierarchical real-time computing system such as jstorm/storm, is generated.

Among them, Storm is a distributed open source real-time computing system under the apache community, using clojure language (Clojure is a Lisp language running on Java (Java is an object-oriented programming language that can write cross-platform applications) platform (Lisp is a programming language known for its expressiveness and power)). Storm can be used in "stream processing" to process messages in real time; it can also be used for "continuous computation", which continuously processes data streams and outputs the results to users in the form of streams. It can also be used in "Remote Procedure Call Protocol" to perform operations in parallel. JStorm is a real-time computing system based on Storm, which is compatible with Storm.

In real-time computing systems, in order to measure the health and performance of an application, it is often necessary to measure and count the metrics of the application. Such as the application of the amount of messages sent (Emitted), the amount of transmission per second (TPS) and other indicators.

In the hierarchical real-time computing system of jstorm/storm, many real-time computing systems have a hierarchical structure. But the traditional statistical method, for an indicator, can only count one level of data. If statistics of all levels need to be counted, indicators need to be defined separately at a specific level; data aggregation and consolidation of indicators between different levels also require additional complex logic to implement, and the calculation process is complicated and system resources are expensive.

Summary of the invention

In view of the above problems, embodiments of the present application have been made in order to provide an indicator statistical method and a corresponding indicator statistical apparatus that overcome the above problems or at least partially solve the above problems.

In order to solve the above problem, the present application discloses an indicator statistical method, including:

For an indicator of a topology operation, a structured indicator name is created for each level node corresponding to the indicator; wherein, the hierarchical relationship between the nodes of each level is determined by the name of the structured indicator;

After monitoring the lowest level data corresponding to the indicator, performing statistics under the name of the corresponding lowest level structured indicator;

Based on the statistical data under the name of the lowest-level structured indicator, according to the hierarchical relationship between the names of each structured indicator, it is summarized step by step to the name of the structural indicator of the previous level.

The application also discloses an indicator statistical device, comprising:

a structured identifier creation module, configured to create a structured indicator name for each level node corresponding to the indicator for an indicator of a topology operation; wherein, the hierarchical relationship between the nodes of each level is determined by the structured indicator name;

The bottom layer indicator monitoring module is configured to perform statistics on the lowest level data corresponding to the indicator, and under the name of the corresponding lowest level structured indicator;

The layer-by-layer summary module is used to summarize the statistics under the name of the lowest-level structured indicator according to the hierarchical relationship between the names of the structured indicators, and to summarize them under the structural indicator name of the previous level.

Embodiments of the present application include the following advantages:

In the embodiment of the present application, for the index to be counted for the topology operation of the real-time computing system, a structured indicator name is created for each level node corresponding to the indicator, and the structured indicator name is used to determine the hierarchical relationship between the hierarchical nodes. Then, the embodiment of the present application monitors the lowest-level data of the indicator, and collects the data under the name of the lowest-level structured indicator, and then summarizes the data based on the lowest level statistical data according to the hierarchical relationship between the structured indicator names. Under the name of the structural indicator of the previous level. In this way, the embodiment of the present application can easily perform statistics on indicators of each level by hierarchical relationship of structured indicator names, which is simple in logic, can reduce system consumption, and can be conveniently implemented because of hierarchical relationships constructed according to structured indicator names. Add or remove one or several levels to facilitate expansion.

DRAWINGS

1 is a flow chart of steps of an embodiment of an indicator statistical method of the present application;

2 is a flow chart of steps of an embodiment of an indicator statistical method of the present application;

3 is a structural block diagram of an embodiment of an indicator statistical device of the present application;

4 is a structural block diagram of an embodiment of an indicator statistical system of the present application.

detailed description

The above described objects, features and advantages of the present application will become more apparent and understood.

For a more convenient description of the embodiments of the present application, a jstorm or storm real-time computing system is taken as an example to introduce related terms involved in the embodiments of the present application.

Topology: A topology job that is an application running on a jstorm or storm system. After a topology job is submitted to the storm or jstorm real-time computing system, it can run without interruption.

Component: A topology consists of multiple components, each called a component. The components of storm and jstorm are divided into spout and bolt. The spout component represents the source of the processed data, such as a spout that can fetch data from an external message component, or retrieve data from a database. Broadly speaking, spout can continuously acquire data from any external data source, and The data is sent downstream, such as a bolt, and Bolt receives the data from Spout and processes it.

Task: Task. A Task represents a logical processing unit, which is an instance of the implemented spout/bolt. A component may include multiple Tasks.

Stream: Data stream. Stream is the smallest unit of indicator statistics in jstorm and storm. A Task may include multiple Streams.

In practical applications, jstorm and storm real-time computing systems have a hierarchical structure, such as stream→task→component→topology. The data of each indicator is counted in the stream.

One of the core concepts of the embodiments of the present application is that, for a real-time computing system, since it has a hierarchical structure when processing data, and for real-time computing systems, it is possible to quickly perform statistics on all levels of indicators, for each level of hierarchical nodes of an indicator. In this embodiment, the structured indicator name is created for each hierarchical node, and the hierarchical relationship between the hierarchical nodes is determined by the structured indicator name itself. Therefore, only the data of the lowest-level structured indicator name needs to be counted, and the data can be summarized step by step according to the hierarchical relationship between the structured indicator names, and the data of each level corresponding to the index can be obtained. Therefore, in the embodiment of the present application, the hierarchical relationship of the structured indicator names can be used to simply count the indicators of each level, the logic is simple, the system consumption can be reduced, and the hierarchical relationship established according to the structured indicator name can be convenient. Add or remove one or several levels for easy expansion.

Embodiment 1

Referring to FIG. 1 , a flow chart of steps of an embodiment of an indicator statistical method of the present application is shown, which may specifically include the following steps:

Step 110: Create, for an indicator of a topology job, a structured indicator name for each level node corresponding to the indicator; wherein, the hierarchical relationship between the hierarchical nodes is determined by the structured indicator name;

In the embodiment of the present application, the process of real-time processing of data by jstorm is taken as an example, and jstorm can first receive a topology topology, that is, start an application. Then, the embodiment of the present application needs to count various indicators of each level in the topology processing process, such as the amount of sent messages (Emitted) of each level, the level of transmission per second (TPS), and the like. Then, the embodiment of the present application may create a structured indicator name for each level node corresponding to the indicator, and determine a hierarchical relationship between the nodes of each level by using the structured indicator name.

For example, the aforementioned hierarchical structure stream→task→component→topology. This application can pre-define the structure of the structured indicator name as follows:

Topology@component@Task@Stream@name

Topology, component, task, and stream indicate the location of the node identifier, and name indicates the location of the indicator identifier.

Assume that the topology node of topology is tp1. The topology includes a hierarchical node component whose node identifier is spout. The component includes two hierarchical nodes Task0 and Task1. Task0 includes two hierarchical nodes Steam0, Steam1. Under Task0, there are two hierarchical nodes Steam2 and Steam3. The statistical indicator is identified as Emitted.

Then, you can create structured metric names for each level node:

The name of the structured indicator corresponding to Steam0 is:

Tp1@spout@Task0@Stream0@Emitted; indicates the value of the Emitted value of Stream0.

The structured indicator name for Steam1 is:

Tp1@spout@Task0@Stream1@Emitted; indicates the value of the Emitted value of Stream1.

The name of the structured indicator corresponding to Task0 is:

Tp1@spout@Task0@@Emitted; indicates the Emitted value of the statistics Task0.

The structured indicator name for Steam2 is:

Tp1@spout@Task1@Stream2@Emitted; indicates the value of the Emitted value of Stream2.

The name of the structured indicator for Steam3 is:

Tp1@spout@Task1@Stream3@Emitted; indicates the value of the Emitted value of Stream3.

The name of the structured indicator corresponding to Task1 is:

Tp1@spout@Task1@@Emitted; indicates the Emitted value of the statistics Task1.

The name of the structured indicator corresponding to spout is:

Tp1@spout@@@Emitted; indicates the Emitted value of the statistics spout.

The name of the structured indicator corresponding to tp1 is:

Tp1@@@@Emitted; indicates the Emitted value of the tp1.

Then the structured indicator name corresponding to Task0 is reduced by the name of the structured indicator corresponding to Steam0 and Steam1, and there is a clear relationship between the superior and the lower. And tp1@spout@Task0@Stream0@Emitted and tp1@spout@Task0@Stream1@Emitted are level relationships. Similarly, tp1@spout@@@Emitted is reduced by the structural index names of Task0 and Task1, with a clear relationship between the superior and the subordinate. And tp1@spout@Task0@@Emitted and tp1@spout@Task1@@Emitted are level relationships. The above structured indicator names can clarify the hierarchical relationship between nodes at each level.

In another preferred embodiment of the present application, step 110 includes:

Sub-step A11, for an indicator of a topology job, the node identifier of the top-level hierarchical node to the lowest-level hierarchical node and the indicator identifier of the indicator are sequentially combined into the lowest-level structured indicator name;

In the embodiment of the present application, for the convenience of calculation, for a topology index, firstly, according to the structure of the predefined structured indicator name, the lowest-level node identifier of the top-level hierarchical node to the lowest-level hierarchical node and the indicator are used. The indicator identifiers are sequentially combined into the lowest-level structured indicator names. As mentioned above, Steam0 is the lowest level node, and its structured indicator name is set first: tp1@spout@Task0@Stream0@Emitted. Several other lowest level hierarchy nodes are similar.

The bottom-level structured indicator name represents the hierarchical path from the top to the bottom.

Sub-step A12, based on the lowest-level structured indicator name, set the hierarchical node of the current level in the structured indicator name to be empty for the structured indicator name of each level, and obtain the structural indicator name of the previous level. .

After setting the lowest-level structured indicator name, calculate the structured indicator name for each level node step by step.

For example, the upper level of Stream is Task, then the implementation of this application merges the same Task in the previous level of Stream. For example, if Stream0 in tp1@spout@Task0@Stream0@Emitted is set to null, or tp1@spout@Task0@Stream1@Emitted, the structured indicator name tp1@spout@Task0@@Emitted of the hierarchical node Task0 is obtained. In the same way, the structural index name of Task1 is obtained tp1@spout@Task1@@Emitted. Then based on the Task hierarchy node up to level, the Task0 in tp1@spout@Task0@@Emitted or Task1 in tp1@spout@Task1@@Emitted Set to null, get the structured index name of spout tp1@spout@@@Emitted. And so on, until the top-level structured indicator name is generated.

It can be understood that in the above example, the symbol @ is used as a separator to facilitate the merging of structured indicator names to generate a hierarchical level of structured indicator names. In the actual application, the symbol similar to @ may not be set. At this time, the node identifier of each hierarchical node and the associated hierarchical level may be provided to the real-time computing system, so that the real-time computing system can perform sub-step A12 according to the node identifier of each hierarchical level.

Preferably, in another preferred embodiment of the present application, sub-step A11 includes:

A111, based on the lowest-level structured indicator name, for each structured hierarchical indicator name, the hierarchical node corresponding to the current hierarchical separator in the structured indicator name is set to be empty, and the upper level structure is obtained. The name of the indicator.

In the embodiment of the present application, in order to prevent the full name of the node identifier and the indicator identifier of each hierarchical node from being directly combined, the duplicate name and the ambiguity occur, and the embodiment of the present application adds a separator between the node identifiers of any level. A separator is also added between the lowest node identifier and the metric identifier. @, as in the above tp1@spout@Task0@Stream0@Emitted, is the added separator. When generating the hierarchical structure name of the previous level, you can only set the indicator ID to be empty and retain the separator to make it easier to determine the hierarchical relationship between the names of each structured indicator. Because, for example, tp1@spout@Task0@@Emitted generated by tp1@spout@Task0@Stream0@Emitted, in the subsequent merge, only need to judge that the 3rd and 4th separators are empty, you can Make sure the tp1@spout@Task0@@Emitted name is the value of Emitted for all Steams under Statistics Task0.

Of course, the delimiters of the embodiments of the present application may also adopt other symbols, which are not limited by the embodiments of the present application.

Preferably, in another preferred embodiment of the present application, before step 110, the method further includes:

B11: Register the indicator identifier corresponding to the lowest level node to the system.

In an actual application, when the scheduling server allocates the topology to each computing node for execution, the worker executed by each computing node may register the indicator identifier corresponding to the lowest level node in the system of the computing node according to requirements. For example, register the indicator ID of each level node of the Stream level. Then, the computing node in the embodiment of the present application can automatically generate the structured indicator name of the hierarchical node of each level according to the structural definition of the structured indicator name after the index identifier of the lowest level hierarchical node is registered, for example, for the stream level. Stream0, in step A11, generates tp1@spout@Task0@Stream0@Emitted, etc., and then proceeds to step A12 to generate a structured indicator name until each level node is generated.

The above only informs the computing node to register the indicator identifier corresponding to the lowest level hierarchical node to the system, and only Give the compute node a simple notification without reducing the transmission overhead by structuring the metric data to the compute node's hierarchical nodes.

In the embodiment of the present invention, the structural definition of the structured indicator name may be configured in a scheduling server of the real-time computing system, and then transmitted to each computing node by the scheduling server. The technician can change the hierarchical structure of the structured metric names configured in the dispatch server as needed to change the hierarchy and change the structured metric names for each level node accordingly.

For example, for the normal level, you need to add the indicator group group above the indicator ID, then the structure of the structured indicator name can be defined as:

Topology@component@Task@Stream@group@name.

Taking Emitted as an example, the group groups the Emitted statistics according to the requirements of different services. For example, the service A needs to send 10 messages, the structured indicator name is +1, and the service B needs to send 1 message, and the structured indicator name + 1. Then different groups, the corresponding structured indicators have different values.

It can be understood that the structural definition of the above structured indicator name can be updated to the scheduling server and then distributed by the scheduling server to each computing node. Of course, it can also be used with the original structured indicator definition.

Of course, the embodiment of the present invention may also change the structural definition of the foregoing structured indicator name according to actual needs, which is not limited in this application.

Step 120: After monitoring the lowest-level data corresponding to the indicator, perform statistics under the name of the corresponding lowest-level structured indicator;

In the embodiment of the present invention, taking jstorm's real-time computing system as an example, a topology is created in jstorm, then the jstorm scheduled system can divide the topology into multiple workers, and each worker represents a process that performs a specific task. . The above workers are distributed on different computing nodes of jstorm's computing cluster and executed in parallel. All the actual data processing work is finally completed in the worker. Therefore, for each computing node, the topology is processed in a hierarchical structure, so in the embodiment of the present application, each computing node obtains the above-mentioned structural index names corresponding to the hierarchical nodes of the indicator, and then in each structured The indicators are counted under the indicator name.

In practical applications, each worker can run at least one spout and/or at least one bolt. In the worker, the spout or bolt is divided into task execution, and the task processes the data in the form of a stream.

Then, the embodiment of the present application monitors the data related to the indicator that appears in the stream, for example, for the amount of sent messages (Emitted), monitors that the stream level node stream0 passes a Tuple (Tuple: a basic unit of a message delivery), and corresponds to Steam0. Structured indicator name tp1@spout@Task0@Stream0@Emitted value more New is 1.

It can be understood that, in the embodiment of the present application, the system can be notified to monitor the indicators of the lowest level hierarchical node, and no monitoring is performed on the lowest layer.

Step 130: Based on the statistical data under the name of the lowest-level structured indicator, according to the hierarchical relationship between the names of the structured indicators, the data is summarized step by step to the structural indicator name of the upper level.

Assuming the first record, the initial value of the structured indicator name of each layer above the bottom layer is 0, then based on the above record: tp1@spout@Task0@Stream0@Emitted:1

According to the structure of tp1@spout@Task0@Stream0@Emitted, the hierarchical structure name corresponding to the structured name is found: tp1@spout@Task0@@Emitted, tp1@spout@@@Emitted, tp1 @@@@Emitted, then update the values of the three to 1.

It is also assumed that in step 120, it is monitored that the stream level node stream0 passes a Tuple (Tuple: the basic unit of a message delivery), and the value of the structured indicator name tp1@spout@Task0@Stream0@Emitted corresponding to Steam1 is updated to 2.

At this time, in step 130, according to the hierarchical relationship, the layers are summarized step by step, and the summary order and results are as follows:

Tp1@spout@Task0@@Emitted:2

Tp1@spout@@@Emitted:2

Tp1@@@@Emitted: 2

It is also assumed that in step 120, the stream level node stream3 is monitored to pass a Tuple (Tuple: the basic unit of a message delivery), and the value of the structured indicator name tp1@spout@Task1@Stream3@Emitted corresponding to Steam0 is updated to 1.

Tp1@spout@Task1@@Emitted:1

Tp1@spout@@@Emitted:3

Tp1@@@@Emitted:3

In the embodiment of the present application, since the real-time computing systems each perform corresponding statistics with the structured indicator names of the hierarchical nodes.

In order to obtain the overall statistical data of each level of the entire real-time computing system, the embodiment of the present application aggregates the records under the structural indicator names of the respective computing nodes.

For example, there are two compute nodes 1, 2, and the record for compute node 1 is:

Tp1@spout@Task0@Stream0@Emitted:10

Tp1@spout@Task1@Stream3@Emitted:10

Tp1@spout@Task0@@Emitted:10

Tp1@spout@Task1@@Emitted:10

Tp1@spout@@@Emitted:20

Tp1@@@@Emitted:20

For example, the record for compute node 2 is:

Tp1@spout@Task0@Stream1@Emitted:20

Tp1@spout@Task1@Stream3@Emitted:10

Tp1@spout@Task0@@Emitted:20

Tp1@spout@Task1@@Emitted:10

Tp1@spout@@@Emitted:30

Tp1@@@@Emitted:30

Then the summary records of the Emitted statistics of the entire real-time computing system for each level of tp1 are:

Tp1@spout@Task0@Stream0@Emitted:10

Tp1@spout@Task0@Stream1@Emitted:20

Tp1@spout@Task1@Stream3@Emitted:20

Tp1@spout@Task0@@Emitted:30

Tp1@spout@Task1@@Emitted:20

Tp1@spout@@@Emitted:50

Tp1@@@@Emitted:50

In practical applications, the real-time computing system counts data within one minute for each structured indicator. You can then continuously output statistics every minute. For example, data in the form of a log.

Preferably, in another embodiment of the present application, after step 130, the method further includes:

In step 140, the statistical data under each structured indicator name is exported to a database for storage.

In the embodiment of the present application, since the real-time computing system does not have the function of the database, the statistical result is inconvenient to query.

Moreover, due to the structured way of the structured indicator name, it is suitable for many big data processing tools and frameworks, such as HBase, Hadoop, Hive, etc. Therefore, the present application can export the statistical data under each structured indicator name to HBase, Hadoop, Hive and other databases.

Preferably, in another preferred embodiment of the present application, step 140 includes:

Sub-step C11, the statistical data under each structured indicator name is exported to the database, and the structured indicator name and time stamp are used as keywords, and the statistical data is used as the key value for storage.

In practical applications, for real-time computing systems, for each structured indicator, it is to count data in a time period, such as 1 minute, after the arrival time period, the records under the structured indicator name will be refreshed and re-recorded. Therefore, for each structured indicator name, at the end of the above time period, there will be a time stamp. In the embodiment of the present application, the structured indicator name and its statistical data are structured at the end of the time period. The indicator name and timestamp are keyword keys, and the statistics are key values and are stored in the database, such as the HBase database. Then, through the timestamp, it is convenient to find the index values of each level in a period of time. This timestamp is the system time at the end of each time period.

To further illustrate the advantages of embodiments of the present application. Taking a jstorm topology as an example, the topology identifier of the topology is tp1; tp1 has a component, and the hierarchy identifier is spout; there are 5 tasks under the component, and each task has an id corresponding thereto, and the corresponding hierarchical identifiers are respectively Task0~Task4; At the same time, there are 2 streams under each task, and the corresponding level identifiers are Stream0 and Stream1 respectively. Then its hierarchical relationship is:

Stream[0~1]→Task[0~4]→spout→tp1

In the traditional technology, if you want to count the spout message volume, you need to define an indicator named SpoutEmitted, and update this value every time you send a message. If you need to count the message volume of task0, you need to define a task named Task0Emitted. Update this value; if you need to count the stream0 stream0 in task0, you need to define the name Stream0Emitted, other cases, and so on. However, SpoutEmitted is actually hierarchical with Task0Emitted~Task4Emitted, which is equivalent to SpoutEmitted=Task0Emitted+Task1Emitted+Task2Emitted+Task3Emitted+Task4Emitted; similarly, Task0Emitted=Stream0Emitted+Stream1Emitted. The traditional indicator statistical method does not reflect the above hierarchical relationship and calculation logic. If you need to implement this hierarchical logic, you need to do a lot of additional complex logic judgments and calculations. At the same time, the traditional method also needs to carefully select the indicator name to avoid duplication and result in inaccurate data.

The embodiment of the present application, in the real-time computing system, has a hierarchical structure when processing data, and the real-time computing system can quickly perform statistics on all the levels of the indicators. For the hierarchical nodes of each level of an indicator, the embodiment of the present application A structured indicator name is created for each hierarchical node, and the hierarchical relationship between the hierarchical nodes is determined by the structured indicator name itself, thereby determining the summary relationship. Therefore, only the data of the lowest-level structured indicator name needs to be counted, and the data can be summarized step by step according to the hierarchical relationship between the structured indicator names, and the data of each level corresponding to the index can be obtained. Therefore, in the embodiment of the present application, the hierarchical relationship of the structured indicator names can be used to simply perform statistics on the indicators of each level, and the logic is simple, which can reduce system consumption, and is based on the structural indicators. The hierarchical relationship of name construction can easily add or delete one or several levels to facilitate expansion.

Embodiment 2

Referring to FIG. 2, a flow chart of steps of a preferred embodiment of an indicator statistical method of the present application is shown. Specifically, the method may include the following steps:

Step 210: Each computing node corresponds to an indicator of the lowest level hierarchical node to the system of the computing node.

In the embodiment of the present application, the real-time computing system may adopt a distributed computing system, where the distributed computing system includes a scheduling server and each computing node.

Wherein, the structural definition of the structured indicator name can be configured in the scheduling server of the real-time computing system, and then configured by the scheduling server to each computing node, so that each computing node can name the lowest-level structured indicator according to the above definition. Process it.

In the embodiment of the present application, taking Jstorm as an example, when the scheduling server allocates the topology to each computing node for execution, the worker executed by each computing node can register the index corresponding to the lowest level node in the system of the computing node according to requirements. Logo.

For a topology indicator, it can actually be divided into two parts: 1. The system indicator that has been defined inside the Jstorm calculation framework; 2. The user-defined business-related indicator.

Then if the user selects the system indicator, it can be registered in the system of the computing node when the worker is initialized. If the user selects a user-defined business indicator, it can be registered in the system of the computing node when the worker initializes the user code.

Register the indicator on each compute node. Take jstorm as an example. For a stream, call a global static method in the worker provided by jstorm: registerStreamMetrics (metric related parameter), and then register the stream according to the metric related parameters in the system. Indicator ID. Then, proceed to step 220 to generate the structured indicator names of the layers layer by layer.

Step 220: For each index of a topology operation, each computing node uses a delimiter to splice the node identifier of the topmost layer to the lowest level node and the indicator identifier of the index into the lowest structured identifier name;

After each computing node registers the lowest-level indicator identifier, the computing node can generate the lowest-level structured indicator name according to the structured index name and the upper-level relationship of each hierarchical node recorded in the system.

Step 230: Each computing node sets, according to the lowest-level structured indicator name, the hierarchical node of the current level in the structured indicator name to the structured indicator name of each level, and obtains the structure of the upper level. The name of the indicator.

Step 240: After monitoring the lowest level data corresponding to the indicator, each computing node is at the bottom of the corresponding bottom layer. Statistics under the name of the structured indicator;

In step 250, each computing node summarizes the statistical data under the name of the lowest-level structured indicator according to the hierarchical relationship between the names of the structured indicators, and gradually summarizes them to the structural indicator name of the upper level.

Each computing node collects data under the name of each structured indicator according to a time period, for example, a period of 1 minute, and at the end of the time period, sends statistical data of each structured indicator name of the period to the scheduling server.

Step 260: The scheduling server acquires statistics of each structured indicator name from each computing node, and performs aggregation.

The scheduling server obtains statistics of each structured node to obtain the name of each structured indicator, and then can perform aggregation.

Step 270: The scheduling server exports the summarized statistical data under the name of each structured indicator to the database, and uses the structured indicator name and time stamp as keywords and stores the statistical data as a key value.

In practical applications, the statistic statistics of each level of each computing node can be aggregated to the scheduling server of the cluster. As a result of the actual application, each computing node aggregates the indicator statistics into the scheduling server every 1 time period, for example, 1 minute, and since the scheduling server does not act as a storage server, the data is continuously covered by new indicator statistics. Therefore, only the aggregated indicator statistics for the most recent time period can be seen.

Then, in order to be able to see the indicator statistics for a longer time or even all the history, the scheduling server of the embodiment of the present application stores the aggregated indicator statistics in an external database. For the statistical data under the name of each structured indicator after the summary, the structured indicator name + time stamp is the key, and the statistical data is the value and stored in the database.

The embodiments of the present application have the following advantages:

1. Since the application is a hierarchical node path from the topmost layer to the lowest layer, the hierarchical identifier of each hierarchical node in the hierarchical node path, and the indicator identifier, combined with the separator are combined for the hierarchical relationship of the structured indicator name, simple Statistics on the metrics at each level are simple and logical, which can reduce system consumption.

2. Due to the structured form of the above-mentioned structured indicator name, since the identification of each topology is different, the hierarchical nodes of each hierarchical node are also different, so the user does not need to carefully select the indicator name when defining the indicator name. Reduce the chance of error.

3. Since this application is a hierarchical relationship constructed according to the name of the structured indicator, it is convenient to add or delete one or several levels to facilitate expansion.

4. In this application, only the index of the lowest level hierarchical node is registered with the system of the computer node, and the structural index name corresponding to each hierarchical node of each level can be automatically generated, and the transmission overhead is small and the operation is simple.

It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.

Embodiment 3

Referring to FIG. 3, a structural block diagram of an embodiment of an indicator statistical device of the present application is shown, which may specifically include the following modules:

The structured identifier creation module 310 is configured to: for an indicator of a topology job, create a structured indicator name for each level node corresponding to the indicator; wherein, the hierarchical relationship between the hierarchical nodes is determined by the structured indicator name;

The bottom layer indicator monitoring module 320 is configured to perform statistics on the bottommost structured indicator name after monitoring the lowest level data corresponding to the indicator;

The layer-by-layer summary module 330 is configured to summarize the statistical data under the name of the lowest-level structured indicator according to the hierarchical relationship between the names of the structured indicators, and to summarize them under the structural indicator name of the previous level.

In another preferred embodiment of the present application, the structured identifier creation module 310 includes:

The underlying indicator creation sub-module is configured to, for an indicator of a topology operation, sequentially combine the node identifier of the top-level hierarchical node to the lowest-level hierarchical node and the indicator identifier of the indicator into the lowest-level structured indicator name;

The upper indicator creation sub-module is configured to set, according to the lowest-level structured indicator name, the hierarchical node of the current level in the structured indicator name to the structured indicator name of each level, and obtain the upper level Structured indicator name.

In another preferred embodiment of the present application, the bottom layer indicator creation submodule includes:

The underlying metrics are separated into sub-modules, which are used to index the nodes of the top-level to the lowest-level hierarchical nodes and the indicator identifiers of the indicators, and splicing them into the lowest-level structured indicator names. .

In another preferred embodiment of the present application, the upper layer indicator creation submodule includes:

The upper-level indicator separation creation sub-module is configured to set, according to the lowest-level structured indicator name, the hierarchical node corresponding to the current level separator in the structured indicator name to the structured indicator name of each level. , get the name of the structural indicator of the previous level.

In another preferred embodiment of the present application, before the structured identifier creation module 310, the method further includes:

The registration module is configured to register the indicator identifier corresponding to the lowest level hierarchical node to the system.

In another preferred embodiment of the present application, after the layer-by-layer summary module 330, the method further includes:

A data storage module for exporting statistics under each structured indicator name to a database for storage.

In another preferred embodiment of the present application, the data storage module includes:

The data storage sub-module is used to export the statistics under each structured indicator name to the database, and use the structured indicator name and time stamp as keywords and store the statistical data as key values.

The embodiment of the present application, in the real-time computing system, has a hierarchical structure when processing data, and the real-time computing system can quickly perform statistics on all the levels of the indicators. For the hierarchical nodes of each level of an indicator, the embodiment of the present application A structured indicator name is created for each hierarchical node, and the hierarchical relationship between the hierarchical nodes is determined by the structured indicator name itself, thereby determining the summary relationship. Therefore, only the data of the lowest-level structured indicator name needs to be counted, and the data can be summarized step by step according to the hierarchical relationship between the structured indicator names, and the data of each level corresponding to the index can be obtained. Therefore, in the embodiment of the present application, the hierarchical relationship of the structured indicator names can be used to simply count the indicators of each level, the logic is simple, the system consumption can be reduced, and the hierarchical relationship established according to the structured indicator name can be convenient. Add or remove one or several levels for easy expansion.

Embodiment 4

Referring to FIG. 4, a structural block diagram of an embodiment of an indicator statistical system of the present application is shown, which may specifically include:

The scheduling server 410, each computing node 420, database 430.

Each computer node 420 is exemplarily shown in FIG. 4, and the actual application may be set by the computer node according to the needs of the cluster. Each computer node includes a registration module 421, an underlying metric separation creation module 422, an upper metric separation creation module 423, an underlying metric monitoring module 424, and a layer by layer summary module 425. The dispatch server includes a summary module 411 and a data storage sub-module 412. Of course, each computer node may also include other required modules, which are not limited in the embodiment of the present application.

The above scheduling server 410 includes:

The summary module 411 is configured to obtain statistics of each structured indicator name from each computing node, and perform summary

The data storage sub-module 412 is configured to export the statistical data under each structured indicator name to the database 430. The database 430 stores the structured indicator name and time stamp as keywords and stores the statistical data as a key value.

Each compute node 420 includes:

The registration module 421 is configured to register, to the system of the computing node, an indicator identifier corresponding to the lowest level hierarchical node.

The bottom layer indicator separation creation module 422 is configured to use a separator to select a node identifier of the topmost layer to the lowest level node node and an indicator identifier of the indicator, and sequentially splicing into a bottom layer structured indicator name. .

The upper-level indicator separation creation module 423 is configured to set, according to the lowest-level structured indicator name, the hierarchical node corresponding to the current level separator in the structured indicator name to the structured indicator name of each level. , get the name of the structural indicator of the previous level.

The bottom layer indicator monitoring module 424 is configured to perform statistics on the bottommost structured index name after monitoring the lowest level data corresponding to the indicator;

The layer-by-layer summary module 425 is configured to summarize the statistical data under the name of the lowest-level structured indicator according to the hierarchical relationship between the names of the structured indicators, and to summarize them under the structural indicator name of the upper level.

The embodiments of the present application have the following advantages:

For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application may be included in one or more of them. A computer program product embodied on a computer usable storage medium (including but not limited to disk storage, CD@ROM, optical storage, etc.).

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, read-only optical read-only memory (CD@ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

Although a preferred embodiment of the embodiments of the present application has been described, those skilled in the art once learned the basic Additional changes and modifications to these embodiments can be made in the inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

The above describes an indicator statistical method, an indicator statistical device and an indicator statistical system provided by the present application. The specific examples are used to explain the principle and implementation manner of the present application. The above embodiments The descriptions are only used to help understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there may be changes in specific embodiments and application scopes. The contents of this specification are not to be construed as limiting the present application.

Claims

A statistical method for indicators, characterized in that it comprises:

For an indicator of a topology operation, a structured indicator name is created for each level node corresponding to the indicator; wherein, the hierarchical relationship between the nodes of each level is determined by the name of the structured indicator;

After monitoring the lowest level data corresponding to the indicator, performing statistics under the name of the corresponding lowest level structured indicator;

Based on the statistical data under the name of the lowest-level structured indicator, according to the hierarchical relationship between the names of each structured indicator, it is summarized step by step to the name of the structural indicator of the previous level.
The method according to claim 1, wherein the step of creating a structured indicator name for each level node corresponding to the indicator for an indicator of a topology operation comprises:

For an indicator of a topology operation, the node identifiers of the top-level hierarchical node to the lowest-level hierarchical node and the indicator identifier of the indicator are sequentially combined into the lowest-level structured indicator name;

Based on the name of the lowest-level structured indicator, for each structured hierarchical indicator name, the hierarchical node of the current hierarchical level in the structured indicator name is set to be empty, and the structured indicator name of the upper level is obtained.
The method according to claim 2, wherein the index of the node of the topmost level node to the lowest level level node and the indicator identifier of the indicator are sequentially combined into an lowest level for an indicator of a topology job. Steps to structure the indicator name, including:

For the indicator of a topology operation, the node identifier of the top-level to the lowest-level hierarchical node and the indicator identifier of the indicator are sequentially separated into the lowest-level structured indicator name.
The method according to claim 3, wherein the hierarchical level node of the current level in the structured indicator name is set to the structured indicator name of each level based on the lowest level structured indicator name Empty, the steps to get the level of the structured indicator name, including:

Based on the name of the lowest-level structured indicator, for each structured hierarchical indicator name, the hierarchical node corresponding to the current hierarchical separator in the structured indicator name is set to be empty, and the structural index of the upper level is obtained. name.
The method according to any one of claims 2 to 4, wherein before the creation of the structured indicator name for each level node corresponding to the indicator, the method further includes:

Register the indicator ID corresponding to the lowest level node to the system.
The method of any one of claims 1 to 4, further comprising:

Export the statistics under each structured indicator name to the database for storage.
The method according to claim 6, wherein the step of exporting statistical data under each structured indicator name to a database for storage comprises:

The statistics under each structured indicator name are exported to the database, and the structured indicator name and time stamp are used as keywords and the statistics are used as key values for storage.
An indicator statistical device, comprising:

a structured identifier creation module, configured to create a structured indicator name for each level node corresponding to the indicator for an indicator of a topology operation; wherein, the hierarchical relationship between the nodes of each level is determined by the structured indicator name;

The bottom layer indicator monitoring module is configured to perform statistics on the lowest level data corresponding to the indicator, and under the name of the corresponding lowest level structured indicator;

The layer-by-layer summary module is used to summarize the statistics under the name of the lowest-level structured indicator according to the hierarchical relationship between the names of the structured indicators, and to summarize them under the structural indicator name of the previous level.
The device according to claim 8, wherein the structured identification creation module comprises:

The underlying indicator creation sub-module is configured to, for an indicator of a topology operation, sequentially combine the node identifier of the top-level hierarchical node to the lowest-level hierarchical node and the indicator identifier of the indicator into the lowest-level structured indicator name;

The upper indicator creation sub-module is configured to set, according to the lowest-level structured indicator name, the hierarchical node of the current level in the structured indicator name to the structured indicator name of each level, and obtain the upper level Structured indicator name.
The apparatus according to claim 9, wherein the underlying indicator creation submodule comprises:

The underlying metrics are separated into sub-modules, which are used to index the nodes of the top-level to the lowest-level hierarchical nodes and the indicator identifiers of the indicators, and splicing them into the lowest-level structured indicator names. .
The apparatus according to claim 10, wherein the upper layer indicator creation submodule comprises:

The upper-level indicator separation creation sub-module is configured to set, according to the lowest-level structured indicator name, the hierarchical node corresponding to the current level separator in the structured indicator name to the structured indicator name of each level. , get the name of the structural indicator of the previous level.
The device according to any one of claims 9 to 11, wherein before the structured identification creation module, the method further comprises:

The registration module is configured to register the indicator identifier corresponding to the lowest level hierarchical node to the system.
The device according to any one of claims 8-11, further comprising:

A data storage module for exporting statistics under each structured indicator name to a database for storage.
The device according to claim 13, wherein the data storage module comprises:

The data storage sub-module is used to export the statistics under each structured indicator name to the database, and use the structured indicator name and time stamp as keywords and store the statistical data as key values.