GB2518171A - Improvements in or relating to data processing - Google Patents

Improvements in or relating to data processing Download PDF

Info

Publication number
GB2518171A
GB2518171A GB1316207.8A GB201316207A GB2518171A GB 2518171 A GB2518171 A GB 2518171A GB 201316207 A GB201316207 A GB 201316207A GB 2518171 A GB2518171 A GB 2518171A
Authority
GB
United Kingdom
Prior art keywords
data
available
region
grouped
axis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1316207.8A
Other versions
GB201316207D0 (en
Inventor
Jack Leslie Talbot
Daniel John Boa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Epistem Ltd
Original Assignee
Epistem Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Epistem Ltd filed Critical Epistem Ltd
Priority to GB1316207.8A priority Critical patent/GB2518171A/en
Publication of GB201316207D0 publication Critical patent/GB201316207D0/en
Priority to US14/482,910 priority patent/US10402727B2/en
Publication of GB2518171A publication Critical patent/GB2518171A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs

Abstract

The application concerns processing data for display on a scatter plot. For many plots the available pixel resolution will be less than the resolution of the data points to be plotted and there will be a significant overlap of points when they are rendered for display. This overlap can be exploited by grouping data and representing multiple data as a single point of a displayed plot. This application renders the plot data by binning the data to fit the size of an available pixel region of a graphical output device. Data is grouped by breaking an available display area into cells and grouping points which lie within the same cell. The grouped point may be altered visually to show the number of points represented which may be done by storing a size parameter along witht eh coordinates of the the grouped point which may act as an opacity scaling factor. The invention also allows a user to zoom in on specific areas of the plot.

Description

IMPROVEMENTS IN OR RELATING TO DATA PROCESSING
BACKGROUND
The present disclosure relates to improvements in or relating to data processing, and in particular to methods of rendering data in graphical form.
SUMMARY OF THE DISCLOSURE
According to a first aspect of the disclosure there is provided a method of displaying data comprising grouping the data to form a grouped representation of the data, and displaying said grouped representation; wherein said grouping is carried out to fit an available region of a graphical output device.
Optionally said available region is an available pixel region. Alternatively, said available region is an available print region.
Optionally, grouping the data comprises aggregating the data by data binning.
Optionally, the graphical output device is a display screen.
Optionally, the binned data is output as a scatter plot with each datum represented by a graphical symbol.
Optionally a first axis of data to be plotted is binned; and then a second axis of data to be plotted is binned for each bin of the first axis that contains two or more data.
Optionally binning the data along an axis comprises: calculating a number of bins by dividing a number of output device pixels by a characteristic pixel dimension of a graphical symbol used to represent each datum; determining lower and upper bounds for each bin based on a range of an axis to be displayed and the number of bins; and allocating each datum to a bin depending on its value for that axis.
Optionally the range of an axis to be displayed comprises a lower axis bounnd and an upper axis bound, as defined by the data.
Optionally each bin has a size parameter associated with it that represents the number of data that the bin comprises.
Optionally the grouped binned data are altered visually according to the size parameter, to represent the number of data that each bin comprises.
Optionally said visual alteration comprises varying an opacity value of a grouped datum.
Optionally, a grouped representation is redrawn according to new boundaries in response to a zoom command.
According to a second aspect of the disclosure there is provided a system for displaying data comprising: a database storing a plurality of data; a graphical output device comprising means to present data in a pixel region; a processor arranged to group the data to form a grouped representation of the data, and to provide commands to display said grouped representation; wherein said grouping is carried out to fit an available region of the graphical output device.
Optionally said available region is an available pixel region. Alternatively, said available region is an available print region.
Optionally, the processor comprises a data binning component.
Optionally, the graphical output device is a display screen.
Optionally, the display screen is provided as a component part of a computing device.
Optionally, the system comprises a server comprising said database and said processor, and said graphical output device is a client of the server or is provided as part of a client device of the server.
Optionally, the server is a web server and the client device runs a browser application for accessing the data.
According to a third aspect of the disclosure there is provided a computer program product that includes instructions that when run on a computer, enable it to bin data to fit an available region of a graphical output device.
Optionally said available region is an available pixel region. Alternatively, said available region is an available print region.
According to a fourth aspect of the disclosure there is provided a computer program product that includes instructions that when run on a computer, enable it to request data from a database which is binned to fit an available region of a graphical output device associated with the computer running the instructions.
Optionally said available region is an available pixel region. Alternatively, said available region is an available print region.
The computer program products of the third and fourth aspects may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computen Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, servei; or other remote source using a coaxial cable, fibre optic cable, twisted pair, digital subscriber line (DSLJ, or wireless technologies such as infra-red, radio, and microwave, then the coaxial cable, fibre optic cable, twisted pair, DSL, or wireless technologies such as infra-red, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc [CD], laser disc, optical disc, digital versatile disc [DVDJ, floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. The instructions or code associated with a computer-readable medium of the computer program product maybe executed by a computer, e.g., by one or more processors, such as one or more digital signal processors [DSPs), general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosure will be described below, by way of example only, with reference to the accompanying drawings, in which: Figure lillustrates the grouping together of data depending on which bin they lie in; and Figure 2 an example application of the disclosure for illustrative purposes.
DETAILED DESCRIPTION
Data is only useful if it can be processed and interpreted, and for this reason graphical representations of data are key tools for helping humans review and understand patterns and trends represented by the data. One popular graphical representation is a scatter plot, where variables are plotted along axes of the plot and each datum is represented as a point on the plot. The representation of each datum may take the form of a graphical symbol, such as a circle, square, triangle or generally any desired shape. Different symbols can be used to represent different data sets.
Graphical representations of data are presented by use of various display technologies, including CRT, LCD or LED displays to name but some of many technologies that are available.
The present disclosure is not limited to any type of display technology. A display will have a
S
set of display pixels for displaying information to a user, which defines the resolution of the display For example, a popular high definition display resolution desktop LCD monitor used for a personal computer maybe is 1366 x 768 pixels in a 16:9 aspect ratio. There are a wide range of display resolutions across desktop monitors and screens for mobile computing devices such as cellular telephones, tablet computing devices, laptops and so on.
As an example, we can imagine a scatter plot with 10,000 points that is 700 pixels wide by 500 pixels tall, and where each point is represented by a circular graphical symbol having a radius of three pixels. It is to be noted that real examples may have many more times that 10,000 data points.
Rendering all 10,000 points on one graph would represent a high resource cost for a computer that is performing the rendering, which could result in either a slow or unresponsive visualization, or even worse, a crash.
Furthermore, users often like to compare multiple graphs side by side for comparison -say up to nine for example. If multiple graphs of similar size need to be plotted, this would represent a still further degradation of performance of a computer rendering the graphs.
For many plots, the available pixel resolution will be less than the resolution of the data points to be plotted. There will be a significant overlap of points when they are rendered for display In the example mentioned above [10,000 data points to be plotted on a pixel area of 700 by 500 pixels], there will most likely be significant overlap between them when being rendered for display as the points in a realistic plot will most likely never be uniformly distributed.
This overlap can be exploited by grouping data and representing multiple data as a single point on a displayed plot. Grouping points based on the distance between each other works well but requires each point to be checked against eveiy other point which is computationally expensive, so the overall process of obtaining the data, filtering it and then rendering it would take a relatively long time.
The disclosure provides for the rendering data by binning the data to fit the size of an available pixel region of a graphical output device. Data are grouped by breaking an available display area into cells and grouping points which lie within the same cell together.
This is illustrated in figure 1 which illustrates a selected sub-portion of a display screen. In this example a data point is grapically represented by circle which has a radius of three pixels on a display screen when rendered. Therefore, the display is broken into cells of three pixels square and points lying within each cell are grouped together and plotted as a single point.
When multiple points are grouped together the resulting grouped points assumes the X & Y coordinate of the first point for that group or alternatively the center of the cell for that group.
This computationally cheap method is sufficient given the level of overlap that would typically be present. More complex schemes could be employed such as calculating an average of all the group coordinates but it is preferred to avoid this complexity as for the vast majority of cases it would represent an unnecessary waste of Cpu time as the visual difference would be so minor.
In some embodiments, a grouped point may be altered visually to show the number of points it represents, thus creating the illusion that there is more than one point being rendered. This may be achieved by storing a size parameter along with the coordinates of the grouped point, where the size parameter represents the number of raw data that are combined to form a grouped datum. The size parameter can then be used to define a style to be applied to the displayed plot points, for example acting as an opacity scaling factor. As an illustration, if a single non-grouped point is rendered with an opacity of 0.25, a grouped-point representing 3 points may be rendered with an opacity of 0.75. As the grouped point is rendered at 3 times the darkness of a regular point from a distance it would look almost indistinguishable from 3 overlapping points.
The groups may be chosen may by binning the data one dimension at a time. For example, the number of bins can be calculated by dividing the number of available pixels (px] on this axis by the characteristic dimension of the graphical symbol to be displayed (in the example mentioned above and as illustrated in figure 1, a circle of three-pixel radius, that is, 700 px / 3 px = -234 bins]. A bin size can then be calculated by dividing the range of the axis by the number of bins (e.g. if the axis is -10 to 10 then the range would be 20. Thus the bin size would be 20 / 234 = 0.0855]. The lower and upper bounds of each bin can then be calculated based on the size and number of bins [e.g. first bin would span from -10 to -9.9 145, the second from -9.9145 to -9.829, and so on). Each point is then put into a bin depending on its value for that axis.
Using this function we can then create the grouped points. Data is binned along a first axis as descr bed above. Then, for each bin returned that holds two or more points, data is binned along a second axis, but using the range and dimensions of that axis. Data are now grouped and should be packaged in a format that is understandable to by the application running on the client. If the grouped points are to be altered visually [opacity, size etc.) then they should have their size parameter attached to them as well.
It is often desirable when dealing with data visualizations to allow the user to zoom in to further explore the data that are represented. The grouping algorithm described above relies on the relationship between the size of each point, the dimensions of the graph in pixels and the range of the axes. Grouping of data points also results in a loss of detail when the sca'e of the display axes is increased. Therefore when a user wishes to zoom in on a graph, the grouping algorithm is run again, as both the range of the axes and the pixel spacing between each point will change.
A user may interact through a suitable interface to select an area of a graph that they wish to zoom in on. This may for example be by clicking and dragging to select a rectangular area.
When an area has been selected, the lower and upper bounds of the selection box are calculated. New axis ranges are passed back to a database where the raw data are stored, requesting a new subset of the data to be displayed. When retrieving the data needed from the database, only the data which lies within the viewable area is obtained. This reduced dataset is then grouped as described above using the new ranges and rendered on the display.
Labels for the axes are also updated to match the selected data.
The grouping algorithm described herein may be implemented in various different ways. In one embodiment, a single computer comprises a database with raw data and an application for rendering a graph on a display screen, and the grouping algorithm operates as part of the application for rendering data from the database. In another embodiment, a database with the raw data may be provided on a server and accessed by a user with a client computing device such as a personal computer) laptop or portable computing device such as a tablet computer or a cell phone. The grouping algorithm may be performed at the server side so that the load on the front-end user application which renders the graph is minimised. The server-side grouping algorithm and client-side rendering engine may suitably be provided as a web application, where the grouped data is served as HTML documents over TCP/IP or HTTP for viewing by an appropriate browsen When the raw data is hosted on a servei; it may be on a single server or may be distributed over several devices in a grid or cloud-based mannen When implemented as a web application, new axis ranges needed when zooming can be passed from the front-end to the server by an AJAX call or other suitable technique.
Figure 2 illustrates an example application of the disclosure for illustrative purposes. The left hand side shows an example appearance of a plot with ten thousand data points, while the right hand side shows an example appearance of a plot where the methods of the disclosure have been applied and a scatter plot is rendered based on a grouped data set with just two thousand data points. It can be seen that there is no appreciable difference between the two plots, although the one on the right hand side can be rendered much more quickly because the underlying set of data is greatly reduced as compared with the plot of the left hand side.
With the disclosure, high-density multidimensional data can be filtered cheaply on the backend of a web application before it is to be rendered onto a scatterplot by the client. This allows the load on the front-end of the application to be significantly reduced without visibly sacrificing information. This also reduces the data transmitted over an internet connection, making it easiet; faster and more reliable to serve high density data to mobile devices or over slower internet connections. The method disclosed is simple and efficient and can be processed quickly enough to enable zooming and similar functions at minimal computational expense.
Various improvements and modifications can be made to the above without departing from the scope of the disclosure. It will be appreciated that while the embodiments described above have referred primarily to optimising graphical data for display on a display device, the techniques can also be used for a graphical output device that comprises a printer; that is, data can be binned according to an available pixel area that is governed by a printers print resolution, reducing ink usage.

Claims (26)

  1. CLAIMS1. A method of displaying data comprising grouping the data to form a grouped representation of the data, and displaying said grouped representation; wherein said grouping is carried out to fit an available region of a graphical output device.
  2. 2. The method of claim 1, wherein said available region is an available pixel region.Alternatively, said available region is an available print region.
  3. 3. The method of claim 1 or claim 2, wherein grouping the data comprises aggregating the data by data binning.
  4. 4. The method of any preceding claim, wherein the graphical output device is a display screen.
  5. 5. The method of any preceding claim, wherein the binned data is output as a scatter plot with each datum represented by a graphical symbol.
  6. 6. The method of any preceding claim, wherein a first axis of data to be plotted is binned; and then a second axis of data to be plotted is binned for each bin of the first axis that contains two or more data.
  7. 7. The method of claim 6, wherein binning the data along an axis comprises: calculating a number of bins by dividing a number of output device pixels by a characteristic pixel dimension of a graphical symbol used to represent each datum; determining lower and upper bounds for each bin based on a range of an axis to be displayed and the number of bins; and allocating each datum to a bin depending on its value for that axis.
  8. 8. The method of any preceding claim, wherein the range of an axis to be displayed comprises a lower axis bounnd and an upper axis bound, as defined by the data.
  9. 9. The method of claim 8, wherein each bin has a size parameter associated with it that represents the number of data that the bin comprises.
  10. 10. The method of any preceding claim, wherein the grouped binned data are altered visually according to the size parameter, to represent the number of data that each bin comprises.
  11. 11. The method of claim 10, wherein said visual alteration comprises varying an opacity value of a grouped datum.
  12. 12. The method of any preceding claim, wherein a grouped representation is redrawn according to new boundaries in response to a zoom command.
  13. 13. A system for displaying data comprising: a database storing a plurality of data; a graphical output device comprising means to present data in a pixel region; a processor arranged to group the data to form a grouped representation of the data, and to provide commands to display said grouped representation; wherein said grouping is carried out to fit an available region of the graphical output device.
  14. 14. The system of claim 13, wherein said available region is an available pixel region.
  15. 15. The system of claim 13, wherein said available region is an available print region.
  16. 16. The system of any of claims 13 to 15, wherein the processor comprises a data binning component.
  17. 17. The system of any of claims 13 to 16, wherein the graphical output device is a display screen.
  18. 18. The system of claim 17, wherein the display screen is provided as a component part of a computing device.
  19. 19. The system of any of claims 13 to 18, wherein the system comprises a server comprising said database and said processoi; and said graphical output device is a client of the server or is provided as part of a client device of the server.
  20. 20. The system of claim 19, wherein the server is a web server and the client device runs a browser application for accessing the data.
  21. 21. A computer program product that includes instructions that when run on a computer, enable it to bin data to fit an available region of a graphical output device.
  22. 22. The product of claim 21, wherein said available region is an available pixel region.
  23. 23. The product of claim 21, wherein said available region is an available print region.
  24. 24. A computer program product that includes instructions that when run on a computer, enable it to request data from a database which is binned to fit an available region of a graphical output device associated with the computer running the instructions.
  25. 25. The product of claim 24, wherein said available region is an available pixel region.
  26. 26. The product of claim 24, wherein said available region is an available print region.
GB1316207.8A 2013-09-11 2013-09-11 Improvements in or relating to data processing Withdrawn GB2518171A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1316207.8A GB2518171A (en) 2013-09-11 2013-09-11 Improvements in or relating to data processing
US14/482,910 US10402727B2 (en) 2013-09-11 2014-09-10 Methods for evaluating and simulating data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1316207.8A GB2518171A (en) 2013-09-11 2013-09-11 Improvements in or relating to data processing

Publications (2)

Publication Number Publication Date
GB201316207D0 GB201316207D0 (en) 2013-10-23
GB2518171A true GB2518171A (en) 2015-03-18

Family

ID=49487081

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1316207.8A Withdrawn GB2518171A (en) 2013-09-11 2013-09-11 Improvements in or relating to data processing

Country Status (1)

Country Link
GB (1) GB2518171A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3460646A4 (en) * 2016-05-19 2019-04-24 Sony Corporation Information processing device, program, and information processing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006003484A2 (en) * 2004-07-01 2006-01-12 Spotfire Ab Binning system for data analysis
EP1717755A2 (en) * 2005-03-08 2006-11-02 Oculus Info Inc. System and method for large scale information analysis using data visualization techniques
US20110050702A1 (en) * 2009-08-31 2011-03-03 Microsoft Corporation Contribution based chart scaling
EP2485190A1 (en) * 2011-02-04 2012-08-08 Thomson Licensing Adapting the resolution of a graphic representation of metadata

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006003484A2 (en) * 2004-07-01 2006-01-12 Spotfire Ab Binning system for data analysis
EP1717755A2 (en) * 2005-03-08 2006-11-02 Oculus Info Inc. System and method for large scale information analysis using data visualization techniques
US20110050702A1 (en) * 2009-08-31 2011-03-03 Microsoft Corporation Contribution based chart scaling
EP2485190A1 (en) * 2011-02-04 2012-08-08 Thomson Licensing Adapting the resolution of a graphic representation of metadata

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3460646A4 (en) * 2016-05-19 2019-04-24 Sony Corporation Information processing device, program, and information processing system

Also Published As

Publication number Publication date
GB201316207D0 (en) 2013-10-23

Similar Documents

Publication Publication Date Title
US7940271B2 (en) System and method for large scale information analysis using data visualization techniques
Agrawal et al. Challenges and opportunities with big data visualization
RU2402060C2 (en) Data access and data organisation through diagrams
US20210232634A1 (en) Quantified euler analysis
US9128595B2 (en) Managing data content on a computing device
US8863034B2 (en) 3D tag clouds for visualizing federated cross-system tags
US20120102419A1 (en) Representing data through a graphical object
US20130332810A1 (en) Managing objects in panorama display to navigate spreadsheet
EP2722750A2 (en) Methods and apparatus for providing color palette management within a graphical user interface
US8432400B1 (en) Transitional animation for generating tree maps
US20170221237A1 (en) Data visualization system for exploring relational information
US20140164964A1 (en) Customizable data visualization
US11687552B2 (en) Multi-faceted visualization
US10289283B1 (en) Visual analysis for multi-dimensional data
GB2518171A (en) Improvements in or relating to data processing
DE112013001829T5 (en) Method, apparatus and computer program product for visual grouping of relationships of databases
US9171387B2 (en) Data visualization system
US20230033541A1 (en) Generating a visualization of data points returned in response to a query based on attributes of a display device and display screen to render the visualization
US8488183B2 (en) Moving labels in graphical output to avoid overprinting
TWM634003U (en) Multi-dimensional correlated word cloud system
Promann et al. The role of aesthetics and perception in raising situation awareness: Lessons from springrain
Liao et al. Application study of information visualization in digital library
US20150082235A1 (en) Difference-oriented user interface creation
Wang et al. The application of data cubes in business data visualization
US10102652B2 (en) Binning to prevent overplotting for data visualization

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)