CN114241086A - Mass data drawing optimization method based on maximum triangle three-segment algorithm - Google Patents

Mass data drawing optimization method based on maximum triangle three-segment algorithm Download PDF

Info

Publication number
CN114241086A
CN114241086A CN202111559756.6A CN202111559756A CN114241086A CN 114241086 A CN114241086 A CN 114241086A CN 202111559756 A CN202111559756 A CN 202111559756A CN 114241086 A CN114241086 A CN 114241086A
Authority
CN
China
Prior art keywords
data
segment
algorithm
chart
maximum triangle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111559756.6A
Other languages
Chinese (zh)
Inventor
陈科明
蔡坤
黄盼盼
邢豪蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111559756.6A priority Critical patent/CN114241086A/en
Publication of CN114241086A publication Critical patent/CN114241086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a massive data drawing optimization method based on a maximum triangle three-segment algorithm. The invention firstly performs down-sampling on data based on a maximum triangle three-segment algorithm, tries to compress original data and keeps detailed characteristics as much as possible. And aiming at locally steep segmented data, the improvement of dynamically adjusting segmentation is carried out. Then, the data with magnitude exceeding the threshold is sliced, and for each slice data, the ECharts creates an instance separately to plot, resulting in a graph of sub-slice data. Finally, the obtained sub-graphs are positioned to the same position by using a cascading style sheet language, and the sub-graphs are overlapped to form a unique complete graph. By adopting the method and the device, data points used for drawing can be greatly reduced on the premise of not losing details of the original data as much as possible, so that the industrial mass data can be efficiently and high-performance visually drawn.

Description

Mass data drawing optimization method based on maximum triangle three-segment algorithm
Technical Field
The invention belongs to the technical field of data visualization, and particularly relates to a massive data drawing optimization method based on a maximum triangle three-segment algorithm.
Background
In recent years, industrial remote operation and online monitoring systems are widely used, and in an industrial monitoring scene, visualization of time series data is an unavoidable topic. Because many monitoring systems need to frequently collect and report data of field devices, the data are displayed in real-time visual charts. The data is often updated frequently, and the data volume updated each time is large, so that a time-series data set with an overlarge data volume is obtained.
These massive amounts of data present problems in use: first, the features are not obvious. The data magnitude is large, the fluctuation is frequent, the data points are mutually overlapped after being connected by the broken lines, and no matter the data points are used for subsequent data mining or directly drawn, remarkable trend characteristics are difficult to obtain, and the data analysis and visualization effect display are not facilitated. Second, too many data points present challenges to the presentation performance of the graph. The drawing performance of visualization technologies such as ECharts and Highhards of system pages can be rapidly reduced along with the continuous increase of drawing points, the updating and redrawing can be comparatively unsmooth, the memory occupation is high, and the performance of the system is seriously influenced.
Therefore, in an industrial monitoring scene, some optimization processing needs to be performed on massive time series data. For the problem of large magnitude of time series data, time series data can be subjected to down-sampling processing, and fluctuation details, trends and shapes of original data are guaranteed as much as possible, specifically, statistical characteristics (such as mean, extreme value and variance) and morphological characteristics (such as distance, angle and slope) of a sequence are quantized.
The LTTB algorithm proposed by Sveinn of Iceland university is suitable for industrial scenes at present, but in practical scenes, the algorithm is easy to lose 'spikes' at steep data and is not perfect. Chenkodong et al, Guangzhou tiger teeth, Inc. takes a mean sampling (e.g., PAA) prior to LTTB treatment with the intent of reducing spikes before LTTB, but PAA tends to lose more of the shape details of the data curve and is not as good as possible. Wang hui et al first selected a weight sum of time difference between the current data point and the previous data point as a weight value for time series data, and segmented the data into segments for LTTB. The idea of segmentation is correct, but the reason for neglecting the generation of the peak by taking the time difference as the weight is due to the sudden change of the numerical value, rather than the characteristic that the time difference is too large. And the selection of the threshold is not strict, has no proper fixed value, and is not easy to select the proper threshold when in actual setting.
Disclosure of Invention
Aiming at the problems, the invention provides a massive data drawing optimization method based on a maximum triangle three-segment algorithm, and aims to provide an improved method for performing LTTB (low temperature transform block) downsampling on massive time sequence data in an industrial scene and a scheme for enabling the performance of a subsequent drawing process to be better.
The invention comprises the following steps:
step S1, acquiring collected data, and performing down-sampling based on a maximum Triangle Three-section algorithm (LTTB) and a dynamic improvement algorithm thereof;
step S2, setting a threshold value, and slicing the oversize data;
step S3, then creating an ECharts instance, rendering each given slice data individually. Finally, overlapping each sub-chart to form a complete chart.
Further, in step S1, an improvement is performed based on a maximum Triangle Three-segment algorithm (LTTB). For most of the more regular patterns, the present invention uses LTTB algorithm for down-sampling, but for some special shapes, where the data distribution is severely uneven, the improvement is to use dynamic segmentation algorithm, namely: when the data is relatively flat, more points are regarded as a section, when the data is steep, fewer points are regarded as a section, and when the data is steep, the effective triangular area is selected.
Further, in step S2, for data of still larger magnitude, a reasonable threshold is set to slice the data, so as to further reduce the magnitude of the data.
Further, in step S3, an instance is created using a datamation technique, and the data is rendered into a visualization chart. For the sliced data, a plurality of drawing examples need to be created, and the plurality of examples are overlapped when the drawing is finally carried out, so that a complete chart is formed.
Furthermore, the data visualization technology is open source product ECharts. The EChats are JavaScript visual chart libraries which are sourced by Baidu corporation, have rich chart types and a self-developed efficient rendering engine ZRender, and can realize elegant and friendly responsive chart design.
Further, the ECharts chart is a line graph. The invention takes a polyline as an example because the visual rendering of the time series data is more suitably characterized by using the polyline. The ECharts broken line chart should take into account the problem of superimposition of slice data mentioned in steps S3, S4. In particular, the drawing of an ECharts chart should create an instance using the given data, which is handed to the rendering engine ZRender for drawing. If the data is segmented, each piece of the data is subjected to independent instance creation, and a plurality of rendered sub-graphs are subjected to independent position positioning to reach the same position so as to achieve the purpose of compositing into a complete graph.
Further, a Cascading Style Sheets language (CSS) is used to accomplish the positioning folding of the sub-charts. Setting the child charts to be the same in size (width, height), position (top, left) and positioning mode (absolute), setting the parent container to be relatively positioned, absolutely positioning each child chart to the same position in the container, and superposing to form a complete chart.
The invention mainly provides the following improvements:
firstly, aiming at the problem of excessive data points, a down-sampling method based on a maximum Triangle Three-segment algorithm (LTTB) and a dynamic improvement algorithm based on distance and discrete degree thereof are provided. The aforementioned PAA Piecewise Aggregation Approximation (PAA) algorithm maintains a sliding window, and takes the average condition of data in each window within a time period, but for industrial time series data, a sequence which changes frequently after the average processing seriously loses shape details, and shape characteristics are difficult to maintain, so that the PAA Piecewise aggregation Approximation algorithm is only suitable for steady amplitude data.
The sampling point is selected by calculating the maximum effective area of the LTTB, the effective area of one point is defined as the area of a triangle formed by two points adjacent to the point, and the distance, the angle and the contour characteristics between the holding point and the point can be effectively maintained by selecting the point with the maximum effective area in the current section. Although the algorithm is complex, the method is very suitable for processing massive time series data which change frequently.
In addition, in order to handle extreme cases that are locally too steep, the present invention improves the use of a method of dynamically determining segment size based on distance and degree of dispersion. The method has the main principle that an asymptote is fitted, and the distance from a data point to the asymptote is used for representing the dispersion degree of the asymptote. All distances and thirds are divided so that when a spike is encountered, the spike has a very high probability of falling on an equal segment alone, as long as the fit line is as accurate as possible, and is therefore taken. Since other points will not or only a small distance be generated due to their approach to the fit line.
Secondly, setting a threshold value aiming at the problem that the data is still possibly overlarge after down sampling, further slicing the data, and rendering and drawing each slice of data independently. Meanwhile, for the characteristic that the time consumption is increased sharply when ECharts draws large data, the whole data is segmented, then the rendering is carried out respectively, and finally the sub-charts are positioned and superposed into a complete chart to finish the drawing. This allows the data level to be controlled to an acceptable range.
The invention has the beneficial effects that: the invention can greatly reduce data points used for drawing on the premise of not losing original data details as much as possible, control the data magnitude within a reasonable threshold value, and directly send binary data to a drawing program for drawing, thereby being capable of carrying out visual drawing on the data with high efficiency and high performance.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 is a schematic diagram of a down-sampling improvement algorithm used in the present invention.
FIG. 3 is a flow chart of the present invention for slicing and rendering data.
FIG. 4 is a graph of test plot effect of the present invention.
Detailed Description
For the technical solution of the present invention to be more clearly understood, the following detailed description is made with reference to the accompanying drawings, wherein the detailed steps of the present invention are as shown in fig. 1, and as follows:
and step S1, acquiring the collected data, and performing down-sampling based on a maximum Triangle Three-section algorithm (LTTB) and a dynamic improvement algorithm thereof. The steps of the LTTB algorithm are as follows:
s1-1, determining segment size threshold: to facilitate changing the segment size, the segment size is passed to the algorithm as a parameter (threshold), so if 100 times sampling is needed, only the parameter threshold (total data size/multiple) needs to be passed. The total data points are divided into all the sections equally, and divided into threshold sections. In addition, in order to ensure that the head and the tail can be selected after the data is divided, the head and the tail respectively occupy one segment.
S1-2, the first point (i.e., the first segment) is selected.
And S1-3, starting from the second segment, traversing all the points in the segment, calculating the effective triangular area of each point, and selecting the point with the largest effective area as the selected point (sampling point) of the segment. The effective triangular area is a triangular area taking three points of [ a selected point A of a previous section, a current point F and an average point B of a next section ] as vertexes.
S-4, traversing until the last point (namely the last segment) is selected, and finishing the algorithm.
The problem with the LTTB algorithm is to halve all segments, which does not work well in some cases where the data curve is steeper. Since fewer points at the gentleness reflect details and more points at the steepness are needed to reflect details, LTTB simply performs the bisection in order to save time that the segmentation takes. Therefore, the present invention makes an improvement of dynamic segmentation, i.e. an improvement of the above-mentioned algorithm step (S1-1), so that the step (S1-1) is improved to the following algorithm step:
(1-1) again, the segment size is passed to the algorithm as a parameter (threshold), and the segments are first split equally as LTTB, with the beginning and end points each occupying a segment individually.
(1-2) traversing all points in the segment from the second segment (total m is total/threshold), wherein the selected point A of the previous segment and the average point B of the next segment are asymptotes L, and the vertical distance from each point F' in the segment to the straight line L is calculated to obtain an array SS and the sum SSE of all the vertical distances.
(1-3) the target value is 1/3 × SSE, and the value of the array SS is trisected. The found trisection index is also segmented as a segmentation point, and the vertical distance represents the dispersion degree relative to an asymptote to a certain extent, so that one SSE segment is divided into three segments with uniform targets, and the severe condition of rapid change is dynamically coped with. If the trisection point which is exactly equal to the target does not exist, a point which is relatively approximate to the left is taken.
(1-4) after the segmentation, the step 2 of the LTTB algorithm is performed, and the final improved algorithm flow is shown in fig. 2.
And step S2, slicing the data. Setting a threshold value, carrying out a slicing operation on the data which is still large after the down sampling, and then entering the next step. Experiments show that the number of drawing examples performed simultaneously is not too large, and the drawing consumption is almost equal when the data volume is below hundred, so the invention sets that: the maximum threshold value of the number of the sliced data pieces is 10, and the minimum threshold value of the capacity of a single data piece is 200.
And step S3, rendering. The present invention is exemplified by ECharts plotting, the most time consuming operation being plotting data points. However, since the down-sampling algorithm step is performed, the time consumption of the algorithm for processing the data is also considered when the actual effect is considered. The drawing process mainly comprises the following steps:
s3-1, creating an instance of ECharts: rendering and drawing are carried out on each given slice data individually.
S3-2, using the cascading style sheet language, positioning and overlapping each sub chart to form a complete chart. Specifically, the parent container is set to be in relative positioning, each child chart is absolutely positioned to the same position in the container, and then the axis scale of the chart is only reserved by one, so that the superposition can be carried out. Although a plurality of sub-graphs, the graph is complete in human vision, and the specific flow is shown in fig. 3.
The technical scheme is as described in the steps. The following tests and summaries of the results are performed to illustrate the optimization results that can be achieved by the present invention, and the test steps are as follows:
A. test data is generated. Randomly generating 10 ten thousand time series data, wherein the specific generation steps are as follows:
(1) generating past timestamp cardinality: base new Date (1988,9,3)
(2) Traversal, each time the base of the chronological accumulation: base + 3600 1000, now Date (base)
(3) Data set push [ now, random value ]
(4) After 10 ten thousand times of traversal, 10 ten thousand time-sequential continuous random values are obtained.
B. And (5) down-sampling. Sampling the 10 ten thousand data at different sampling magnifications of 1, 10, 100, 500, 1000 and 10000, performing down-sampling by using the algorithm in the step S1, wherein the final drawing detail retention degree is as shown in FIG. 4, and the test data is as shown in the following Table II.
C. And rendering the segmentation data. For the data samples with sampling multiplying power of 1, 10 and 100, the sampled data still exceeds the threshold, the method of the steps S2 and S3 is adopted to perform data segmentation and separate rendering examples, and the test data are as shown in table three below.
The final test data is as follows:
first, data index setting
Test data 100000
Down sampling multiplying power 1、10、100、500、1000、10000
Data slicing threshold 10 (maximum fragmentation of data), 200 (single minimum capacity)
Second, drawing test data
Figure BDA0003420307710000041
Thirdly, test data of drawing after data segmentation
Figure BDA0003420307710000042
The test data show that by adopting the down-sampling and data segmentation drawing optimization method, after each data is down-sampled by one data magnitude, about 70-80% of drawing time consumption can be reduced, about 20-30% of memory occupation can be reduced, the effect is good, and the effect is good especially under the condition of selecting proper sampling multiplying power and threshold value. For example, as can be seen from the graph effect diagram and the table data in fig. 4, when the sampling rate is reduced to 1000 data points and the threshold value is reduced to 10 × 200, the detail feature of the graph is well preserved, and even when the peak and valley points are reduced to 10 points, the detail feature is preserved. Meanwhile, the measured drawing time consumption and the measured memory occupation data are ideal, and the expected effect is obtained by the method.

Claims (3)

1. A massive data drawing optimization method based on a maximum triangle three-segment algorithm is characterized by comprising the following steps:
s1, acquiring the acquired data, and performing down-sampling based on an improved maximum triangle three-section algorithm;
step S2, setting a threshold value, and slicing the oversize data;
step S3, creating an ECharts instance, and rendering and drawing each given slice data independently; overlapping each sub-chart to form a complete chart;
the method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 1, wherein the improved maximum triangle three-segment algorithm in step S1 is that when determining the segment size:
the segment size is taken as a parameter and transmitted to an algorithm, and the head and tail points respectively and independently occupy one segment;
traversing all points in the segment from the second segment, wherein a selected point A of the previous segment and an average point B of the next segment are asymptotes L, and calculating the vertical distance from each point F' in the segment to the straight line L to obtain an array SS and the sum SSE of all the vertical distances;
with target =1/3 × SSE as the target value, the array SS is trisected in value, and the trisected index found is also segmented as a segmentation point.
2. The method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 1, wherein in step S3, an ECharts instance is created by using a datamation technology, and data is rendered and drawn into a visualization chart; for the sliced data, a plurality of drawing examples need to be created, and the plurality of examples are overlapped when the drawing is finally carried out, so that a complete chart is formed.
3. The method for optimizing the drawing of the mass data based on the maximum triangle three-segment algorithm as claimed in claim 2, wherein the positioning and folding of the sub-charts are completed by using a cascading style sheet language: setting the child charts to be the same in size, position and positioning mode, setting the parent container to be relatively positioned, absolutely positioning each child chart to the same position in the container, and superposing the child charts to form a complete chart.
CN202111559756.6A 2021-12-20 2021-12-20 Mass data drawing optimization method based on maximum triangle three-segment algorithm Pending CN114241086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111559756.6A CN114241086A (en) 2021-12-20 2021-12-20 Mass data drawing optimization method based on maximum triangle three-segment algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111559756.6A CN114241086A (en) 2021-12-20 2021-12-20 Mass data drawing optimization method based on maximum triangle three-segment algorithm

Publications (1)

Publication Number Publication Date
CN114241086A true CN114241086A (en) 2022-03-25

Family

ID=80758930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111559756.6A Pending CN114241086A (en) 2021-12-20 2021-12-20 Mass data drawing optimization method based on maximum triangle three-segment algorithm

Country Status (1)

Country Link
CN (1) CN114241086A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114270A (en) * 2022-06-14 2022-09-27 马上消费金融股份有限公司 Data down-sampling method and device, electronic equipment and computer readable storage medium
CN116383452A (en) * 2023-06-06 2023-07-04 天翼云科技有限公司 Self-adaptive display method, device and equipment for monitoring data and storage medium
CN117829381A (en) * 2024-03-05 2024-04-05 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things
CN118552654A (en) * 2024-07-29 2024-08-27 无锡容智技术有限公司 Automatic drawing method based on algorithm formula

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114270A (en) * 2022-06-14 2022-09-27 马上消费金融股份有限公司 Data down-sampling method and device, electronic equipment and computer readable storage medium
CN115114270B (en) * 2022-06-14 2024-08-02 马上消费金融股份有限公司 Data downsampling method and device, electronic equipment and computer readable storage medium
CN116383452A (en) * 2023-06-06 2023-07-04 天翼云科技有限公司 Self-adaptive display method, device and equipment for monitoring data and storage medium
CN117829381A (en) * 2024-03-05 2024-04-05 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things
CN117829381B (en) * 2024-03-05 2024-05-14 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things
CN118552654A (en) * 2024-07-29 2024-08-27 无锡容智技术有限公司 Automatic drawing method based on algorithm formula

Similar Documents

Publication Publication Date Title
CN114241086A (en) Mass data drawing optimization method based on maximum triangle three-segment algorithm
CN104077309B (en) A kind of method and apparatus that dimension-reduction treatment is carried out to time series
CN108198244B (en) Apple leaf point cloud simplification method and device
CN110287904B (en) Crowdsourcing data-based lane line extraction method and device and storage medium
CN106777093B (en) Skyline inquiry system based on space time sequence data flow application
CN103291544B (en) Digitizing Wind turbines power curve method for drafting
CN104572886B (en) The financial time series similarity query method represented based on K line charts
CN110610258A (en) Urban air quality refined estimation method and device fusing multi-source space-time data
CN110109431B (en) Intelligent acquiring system for OEE information of die casting machine
CN102023616B (en) Triangle Bezier curved surface numerical-control finishing tool-path quick generation method
CN111796298A (en) Automatic point cloud point supplementing method for laser LiDAR power line
CN105160005A (en) Mass POI (Point Of Interest) map data display method based on Web browser
CN114898043A (en) Laser point cloud data tile construction method
CN109583070B (en) Method and system for optimizing quality of cutting curve, computer readable storage medium and terminal
CN101510315A (en) Method for establishing space index structure of product STL model
CN113111830A (en) Grape vine winter pruning point detection algorithm
CN117935041A (en) Split conductor extraction and modeling method based on laser point cloud cluster separation
CN113568898B (en) Method, device, equipment and readable storage medium for supplementing electric power data leakage points
CN109684424B (en) Landform data rapid generation and optimization method based on discrete characteristic line
CN113192172A (en) Airborne LiDAR ground point cloud simplification method
Wang Research on network big data mining technology based on structured similarity
CN116524150A (en) Network grid generation method based on grid node number calculation
CN104679889A (en) Big data processing-oriented data storage method and device
CN116738798A (en) Geometric side grid generation method and system based on variable arc length
CN118152389B (en) Intelligent processing method for energy system data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination