CN114241086A

CN114241086A - Mass data drawing optimization method based on maximum triangle three-segment algorithm

Info

Publication number: CN114241086A
Application number: CN202111559756.6A
Authority: CN
Inventors: 陈科明; 蔡坤; 黄盼盼; 邢豪蔚
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-03-25

Abstract

The invention discloses a massive data drawing optimization method based on a maximum triangle three-segment algorithm. The invention firstly performs down-sampling on data based on a maximum triangle three-segment algorithm, tries to compress original data and keeps detailed characteristics as much as possible. And aiming at locally steep segmented data, the improvement of dynamically adjusting segmentation is carried out. Then, the data with magnitude exceeding the threshold is sliced, and for each slice data, the ECharts creates an instance separately to plot, resulting in a graph of sub-slice data. Finally, the obtained sub-graphs are positioned to the same position by using a cascading style sheet language, and the sub-graphs are overlapped to form a unique complete graph. By adopting the method and the device, data points used for drawing can be greatly reduced on the premise of not losing details of the original data as much as possible, so that the industrial mass data can be efficiently and high-performance visually drawn.

Description

Mass data drawing optimization method based on maximum triangle three-segment algorithm

Technical Field

The invention belongs to the technical field of data visualization, and particularly relates to a massive data drawing optimization method based on a maximum triangle three-segment algorithm.

Background

In recent years, industrial remote operation and online monitoring systems are widely used, and in an industrial monitoring scene, visualization of time series data is an unavoidable topic. Because many monitoring systems need to frequently collect and report data of field devices, the data are displayed in real-time visual charts. The data is often updated frequently, and the data volume updated each time is large, so that a time-series data set with an overlarge data volume is obtained.

These massive amounts of data present problems in use: first, the features are not obvious. The data magnitude is large, the fluctuation is frequent, the data points are mutually overlapped after being connected by the broken lines, and no matter the data points are used for subsequent data mining or directly drawn, remarkable trend characteristics are difficult to obtain, and the data analysis and visualization effect display are not facilitated. Second, too many data points present challenges to the presentation performance of the graph. The drawing performance of visualization technologies such as ECharts and Highhards of system pages can be rapidly reduced along with the continuous increase of drawing points, the updating and redrawing can be comparatively unsmooth, the memory occupation is high, and the performance of the system is seriously influenced.

Therefore, in an industrial monitoring scene, some optimization processing needs to be performed on massive time series data. For the problem of large magnitude of time series data, time series data can be subjected to down-sampling processing, and fluctuation details, trends and shapes of original data are guaranteed as much as possible, specifically, statistical characteristics (such as mean, extreme value and variance) and morphological characteristics (such as distance, angle and slope) of a sequence are quantized.

The LTTB algorithm proposed by Sveinn of Iceland university is suitable for industrial scenes at present, but in practical scenes, the algorithm is easy to lose 'spikes' at steep data and is not perfect. Chenkodong et al, Guangzhou tiger teeth, Inc. takes a mean sampling (e.g., PAA) prior to LTTB treatment with the intent of reducing spikes before LTTB, but PAA tends to lose more of the shape details of the data curve and is not as good as possible. Wang hui et al first selected a weight sum of time difference between the current data point and the previous data point as a weight value for time series data, and segmented the data into segments for LTTB. The idea of segmentation is correct, but the reason for neglecting the generation of the peak by taking the time difference as the weight is due to the sudden change of the numerical value, rather than the characteristic that the time difference is too large. And the selection of the threshold is not strict, has no proper fixed value, and is not easy to select the proper threshold when in actual setting.

Disclosure of Invention

Aiming at the problems, the invention provides a massive data drawing optimization method based on a maximum triangle three-segment algorithm, and aims to provide an improved method for performing LTTB (low temperature transform block) downsampling on massive time sequence data in an industrial scene and a scheme for enabling the performance of a subsequent drawing process to be better.

The invention comprises the following steps:

step S1, acquiring collected data, and performing down-sampling based on a maximum Triangle Three-section algorithm (LTTB) and a dynamic improvement algorithm thereof;

step S2, setting a threshold value, and slicing the oversize data;

step S3, then creating an ECharts instance, rendering each given slice data individually. Finally, overlapping each sub-chart to form a complete chart.

Further, in step S1, an improvement is performed based on a maximum Triangle Three-segment algorithm (LTTB). For most of the more regular patterns, the present invention uses LTTB algorithm for down-sampling, but for some special shapes, where the data distribution is severely uneven, the improvement is to use dynamic segmentation algorithm, namely: when the data is relatively flat, more points are regarded as a section, when the data is steep, fewer points are regarded as a section, and when the data is steep, the effective triangular area is selected.

Further, in step S2, for data of still larger magnitude, a reasonable threshold is set to slice the data, so as to further reduce the magnitude of the data.

Further, in step S3, an instance is created using a datamation technique, and the data is rendered into a visualization chart. For the sliced data, a plurality of drawing examples need to be created, and the plurality of examples are overlapped when the drawing is finally carried out, so that a complete chart is formed.

Furthermore, the data visualization technology is open source product ECharts. The EChats are JavaScript visual chart libraries which are sourced by Baidu corporation, have rich chart types and a self-developed efficient rendering engine ZRender, and can realize elegant and friendly responsive chart design.

Further, the ECharts chart is a line graph. The invention takes a polyline as an example because the visual rendering of the time series data is more suitably characterized by using the polyline. The ECharts broken line chart should take into account the problem of superimposition of slice data mentioned in steps S3, S4. In particular, the drawing of an ECharts chart should create an instance using the given data, which is handed to the rendering engine ZRender for drawing. If the data is segmented, each piece of the data is subjected to independent instance creation, and a plurality of rendered sub-graphs are subjected to independent position positioning to reach the same position so as to achieve the purpose of compositing into a complete graph.

Further, a Cascading Style Sheets language (CSS) is used to accomplish the positioning folding of the sub-charts. Setting the child charts to be the same in size (width, height), position (top, left) and positioning mode (absolute), setting the parent container to be relatively positioned, absolutely positioning each child chart to the same position in the container, and superposing to form a complete chart.

The invention mainly provides the following improvements:

firstly, aiming at the problem of excessive data points, a down-sampling method based on a maximum Triangle Three-segment algorithm (LTTB) and a dynamic improvement algorithm based on distance and discrete degree thereof are provided. The aforementioned PAA Piecewise Aggregation Approximation (PAA) algorithm maintains a sliding window, and takes the average condition of data in each window within a time period, but for industrial time series data, a sequence which changes frequently after the average processing seriously loses shape details, and shape characteristics are difficult to maintain, so that the PAA Piecewise aggregation Approximation algorithm is only suitable for steady amplitude data.

The sampling point is selected by calculating the maximum effective area of the LTTB, the effective area of one point is defined as the area of a triangle formed by two points adjacent to the point, and the distance, the angle and the contour characteristics between the holding point and the point can be effectively maintained by selecting the point with the maximum effective area in the current section. Although the algorithm is complex, the method is very suitable for processing massive time series data which change frequently.

In addition, in order to handle extreme cases that are locally too steep, the present invention improves the use of a method of dynamically determining segment size based on distance and degree of dispersion. The method has the main principle that an asymptote is fitted, and the distance from a data point to the asymptote is used for representing the dispersion degree of the asymptote. All distances and thirds are divided so that when a spike is encountered, the spike has a very high probability of falling on an equal segment alone, as long as the fit line is as accurate as possible, and is therefore taken. Since other points will not or only a small distance be generated due to their approach to the fit line.

Secondly, setting a threshold value aiming at the problem that the data is still possibly overlarge after down sampling, further slicing the data, and rendering and drawing each slice of data independently. Meanwhile, for the characteristic that the time consumption is increased sharply when ECharts draws large data, the whole data is segmented, then the rendering is carried out respectively, and finally the sub-charts are positioned and superposed into a complete chart to finish the drawing. This allows the data level to be controlled to an acceptable range.

The invention has the beneficial effects that: the invention can greatly reduce data points used for drawing on the premise of not losing original data details as much as possible, control the data magnitude within a reasonable threshold value, and directly send binary data to a drawing program for drawing, thereby being capable of carrying out visual drawing on the data with high efficiency and high performance.

Drawings

FIG. 1 is an overall flow chart of the present invention.

Fig. 2 is a schematic diagram of a down-sampling improvement algorithm used in the present invention.

FIG. 3 is a flow chart of the present invention for slicing and rendering data.

FIG. 4 is a graph of test plot effect of the present invention.

Detailed Description

For the technical solution of the present invention to be more clearly understood, the following detailed description is made with reference to the accompanying drawings, wherein the detailed steps of the present invention are as shown in fig. 1, and as follows:

and step S1, acquiring the collected data, and performing down-sampling based on a maximum Triangle Three-section algorithm (LTTB) and a dynamic improvement algorithm thereof. The steps of the LTTB algorithm are as follows:

s1-1, determining segment size threshold: to facilitate changing the segment size, the segment size is passed to the algorithm as a parameter (threshold), so if 100 times sampling is needed, only the parameter threshold (total data size/multiple) needs to be passed. The total data points are divided into all the sections equally, and divided into threshold sections. In addition, in order to ensure that the head and the tail can be selected after the data is divided, the head and the tail respectively occupy one segment.

S1-2, the first point (i.e., the first segment) is selected.

And S1-3, starting from the second segment, traversing all the points in the segment, calculating the effective triangular area of each point, and selecting the point with the largest effective area as the selected point (sampling point) of the segment. The effective triangular area is a triangular area taking three points of [ a selected point A of a previous section, a current point F and an average point B of a next section ] as vertexes.

S-4, traversing until the last point (namely the last segment) is selected, and finishing the algorithm.

The problem with the LTTB algorithm is to halve all segments, which does not work well in some cases where the data curve is steeper. Since fewer points at the gentleness reflect details and more points at the steepness are needed to reflect details, LTTB simply performs the bisection in order to save time that the segmentation takes. Therefore, the present invention makes an improvement of dynamic segmentation, i.e. an improvement of the above-mentioned algorithm step (S1-1), so that the step (S1-1) is improved to the following algorithm step:

(1-1) again, the segment size is passed to the algorithm as a parameter (threshold), and the segments are first split equally as LTTB, with the beginning and end points each occupying a segment individually.

(1-2) traversing all points in the segment from the second segment (total m is total/threshold), wherein the selected point A of the previous segment and the average point B of the next segment are asymptotes L, and the vertical distance from each point F' in the segment to the straight line L is calculated to obtain an array SS and the sum SSE of all the vertical distances.

(1-3) the target value is 1/3 × SSE, and the value of the array SS is trisected. The found trisection index is also segmented as a segmentation point, and the vertical distance represents the dispersion degree relative to an asymptote to a certain extent, so that one SSE segment is divided into three segments with uniform targets, and the severe condition of rapid change is dynamically coped with. If the trisection point which is exactly equal to the target does not exist, a point which is relatively approximate to the left is taken.

(1-4) after the segmentation, the step 2 of the LTTB algorithm is performed, and the final improved algorithm flow is shown in fig. 2.

And step S2, slicing the data. Setting a threshold value, carrying out a slicing operation on the data which is still large after the down sampling, and then entering the next step. Experiments show that the number of drawing examples performed simultaneously is not too large, and the drawing consumption is almost equal when the data volume is below hundred, so the invention sets that: the maximum threshold value of the number of the sliced data pieces is 10, and the minimum threshold value of the capacity of a single data piece is 200.

And step S3, rendering. The present invention is exemplified by ECharts plotting, the most time consuming operation being plotting data points. However, since the down-sampling algorithm step is performed, the time consumption of the algorithm for processing the data is also considered when the actual effect is considered. The drawing process mainly comprises the following steps:

s3-1, creating an instance of ECharts: rendering and drawing are carried out on each given slice data individually.

S3-2, using the cascading style sheet language, positioning and overlapping each sub chart to form a complete chart. Specifically, the parent container is set to be in relative positioning, each child chart is absolutely positioned to the same position in the container, and then the axis scale of the chart is only reserved by one, so that the superposition can be carried out. Although a plurality of sub-graphs, the graph is complete in human vision, and the specific flow is shown in fig. 3.

The technical scheme is as described in the steps. The following tests and summaries of the results are performed to illustrate the optimization results that can be achieved by the present invention, and the test steps are as follows:

A. test data is generated. Randomly generating 10 ten thousand time series data, wherein the specific generation steps are as follows:

(1) generating past timestamp cardinality: base new Date (1988,9,3)

(2) Traversal, each time the base of the chronological accumulation: base + 3600 1000, now Date (base)

(3) Data set push [ now, random value ]

(4) After 10 ten thousand times of traversal, 10 ten thousand time-sequential continuous random values are obtained.

B. And (5) down-sampling. Sampling the 10 ten thousand data at different sampling magnifications of 1, 10, 100, 500, 1000 and 10000, performing down-sampling by using the algorithm in the step S1, wherein the final drawing detail retention degree is as shown in FIG. 4, and the test data is as shown in the following Table II.

C. And rendering the segmentation data. For the data samples with sampling multiplying power of 1, 10 and 100, the sampled data still exceeds the threshold, the method of the steps S2 and S3 is adopted to perform data segmentation and separate rendering examples, and the test data are as shown in table three below.

The final test data is as follows:

first, data index setting

Test data	100000
		Down sampling multiplying power	1、10、100、500、1000、10000
Data slicing threshold	10 (maximum fragmentation of data), 200 (single minimum capacity)

Second, drawing test data

Thirdly, test data of drawing after data segmentation

The test data show that by adopting the down-sampling and data segmentation drawing optimization method, after each data is down-sampled by one data magnitude, about 70-80% of drawing time consumption can be reduced, about 20-30% of memory occupation can be reduced, the effect is good, and the effect is good especially under the condition of selecting proper sampling multiplying power and threshold value. For example, as can be seen from the graph effect diagram and the table data in fig. 4, when the sampling rate is reduced to 1000 data points and the threshold value is reduced to 10 × 200, the detail feature of the graph is well preserved, and even when the peak and valley points are reduced to 10 points, the detail feature is preserved. Meanwhile, the measured drawing time consumption and the measured memory occupation data are ideal, and the expected effect is obtained by the method.

Claims

1. A massive data drawing optimization method based on a maximum triangle three-segment algorithm is characterized by comprising the following steps:

s1, acquiring the acquired data, and performing down-sampling based on an improved maximum triangle three-section algorithm;

step S2, setting a threshold value, and slicing the oversize data;

step S3, creating an ECharts instance, and rendering and drawing each given slice data independently; overlapping each sub-chart to form a complete chart;

the method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 1, wherein the improved maximum triangle three-segment algorithm in step S1 is that when determining the segment size:

the segment size is taken as a parameter and transmitted to an algorithm, and the head and tail points respectively and independently occupy one segment;

traversing all points in the segment from the second segment, wherein a selected point A of the previous segment and an average point B of the next segment are asymptotes L, and calculating the vertical distance from each point F' in the segment to the straight line L to obtain an array SS and the sum SSE of all the vertical distances;

with target =1/3 × SSE as the target value, the array SS is trisected in value, and the trisected index found is also segmented as a segmentation point.

2. The method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 1, wherein in step S3, an ECharts instance is created by using a datamation technology, and data is rendered and drawn into a visualization chart; for the sliced data, a plurality of drawing examples need to be created, and the plurality of examples are overlapped when the drawing is finally carried out, so that a complete chart is formed.

3. The method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 2, wherein the positioning and folding of the sub-charts are completed by using a cascading style sheet language: setting the child charts to be the same in size, position and positioning mode, setting the parent container to be relatively positioned, absolutely positioning each child chart to the same position in the container, and superposing the child charts to form a complete chart.