US20170052977A1 - Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources - Google Patents
Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources Download PDFInfo
- Publication number
- US20170052977A1 US20170052977A1 US14/883,502 US201514883502A US2017052977A1 US 20170052977 A1 US20170052977 A1 US 20170052977A1 US 201514883502 A US201514883502 A US 201514883502A US 2017052977 A1 US2017052977 A1 US 2017052977A1
- Authority
- US
- United States
- Prior art keywords
- data
- visualization
- server
- collection
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30174—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/128—Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
-
- G06F17/30088—
-
- G06F17/3056—
-
- G06F17/30595—
-
- G06F17/30994—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/134—Hyperlinking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Definitions
- This invention relates generally to data analyses in computer networks. More particularly, this invention relates to collaborative analyses of data snapshot visualizations from disparate sources.
- Existing data analysis techniques typically entail discrete analyses of discrete data sources. That is, an individual typically analyzes a single data source in an effort to derive useful information. Individual data sources continue to proliferate. Public data includes such things as census data, financial data and weather data. There are also premium data sources, such as market intelligence data, social data, rating data, user data and advertising data. Other sources of data are private, such as transactional data, click stream data, and log files.
- a server has a data processing module with instructions executed by a processor to maintain a collection of visualization frames that characterize a sequence of data analytics.
- Each visualization frame is a snapshot of data.
- the collection of visualization frames has associated permissions and visualization settings.
- a collection of discussion threads is maintained for the collection of visualization frames. Each discussion thread identifies different users and comments made by the different users.
- FIG. 1 illustrates a system configured in accordance with an embodiment of the invention.
- FIG. 2 illustrates component interactions utilized in accordance with an embodiment of the invention.
- FIG. 3 illustrates processing operations associated with the data ingest module.
- FIG. 4 illustrates a user interface for displaying inferred data types.
- FIG. 5 illustrates a user interface to display join relevance indicia utilized in accordance with an embodiment of the invention.
- FIG. 6 illustrates data merge operations performed in accordance with an embodiment of the invention.
- FIG. 7 illustrates in-memory data units and corresponding discussion threads utilized in accordance with an embodiment of the invention.
- FIG. 8 illustrates an initial graphical user interface that may be used in accordance with an embodiment of the invention.
- FIG. 9 illustrates various data streams that may be evaluated by a user in accordance with an embodiment of the invention.
- FIG. 10 illustrates data-aware convergence and visualization of disparate data sources.
- FIG. 11 illustrates context-aware data analysis collaboration.
- FIG. 12 illustrates data-aware visualization transition utilized in accordance with an embodiment of the invention.
- FIG. 13 illustrates data-aware annotations utilized in accordance with an embodiment of the invention.
- FIG. 14 illustrates context-aware annotations utilized in accordance with an embodiment of the invention.
- FIG. 15 illustrates the construction of a storyboard from different stories in accordance with an embodiment of the invention.
- FIG. 16 illustrates visualization units and discussion threads configured in accordance with an embodiment of the invention.
- FIG. 17 illustrates data refresh prompts supplied in accordance with an embodiment of the invention.
- FIG. 18 illustrates storyboard prompts and display features associated with embodiments of the invention.
- FIG. 19 illustrates storyboard discussion threads utilized in accordance with embodiments of the invention.
- FIG. 20 illustrates an embodiment of architectural components utilized to support storyboard operations disclosed herein.
- FIG. 1 illustrates a system 100 configured in accordance with an embodiment of the invention.
- the system 100 includes a client computer 102 connected to a set of servers 104 _ 1 through 104 _N via a network 106 , which may be any wired or wireless network.
- the servers 104 _ 1 through 104 _N are operative as data sources.
- the figure also illustrates a cluster of servers 108 _ 1 through 108 _N connected to network 106 .
- the cluster of servers is configured to implement operations of the invention.
- the client computer 102 includes standard components, such as a central processing unit 110 and input/output devices 112 connected via a bus 114 .
- the input/output devices 112 may include a keyboard, mouse, touch display and the like.
- a network interface circuit 116 is also connected to the bus 114 to provide an interface with network 106 .
- a memory 120 is also connected to the bus 114 .
- the memory 120 stores a browser 122 .
- a client machine 102 which may be a personal computer, tablet or Smartphone, accesses network 106 to obtain information supplied in accordance with an embodiment of the invention.
- Servers 104 _ 1 through 104 _N also include standard components, such as a central processing unit 130 and input/output devices 132 connected via a bus 134 .
- a network interface circuit 132 is also connected to the bus 134 to provide connectivity to network 106 .
- a memory 140 is also connected to the bus 134 .
- the memory 140 stores a data source 142 .
- Different servers 104 supply different data sources. For example, some servers may supply public data, such as census data, financial data and weather data. Other servers may provide premium data, such as market intelligence data, social data, rating data, user data and advertising data. Other servers may provide private data, such as transactional data, click stream data, and log files. The data may be in any form.
- the data is structured, such as data from a relational database.
- the data is semi-structured, such as document-oriented database.
- the data is unstructured.
- the data is streamed.
- a data stream is a sequence of data elements and associated real time indicators.
- Each server 108 has standard components, such as a central processing unit 150 connected to input/output devices 152 via a bus 154 .
- a network interface circuit 156 is also connected to the bus 154 to provide access to network 106 .
- a memory 160 is also connected to the bus 154 .
- the memory 160 stores modules and data to implement operations of the invention.
- a web application module 162 is used to provide a relatively thin front end to the system.
- the web application module 162 operates as an interface between a browser 122 on a client machine 102 and the various modules in the software stack used to implement the invention.
- the web application module 162 uses application program interfaces (APIs) to communicate with the various modules in the software stack.
- APIs application program interfaces
- the memory 160 also stores a data ingest module 164 .
- the data ingest module 164 consumes data from various data sources and discovers attributes of the data.
- the data ingest module 164 produces metadata characterizing ingested content, which is stored in a metadata catalog 166 .
- the ingested data is loaded into a file system 168 , as discussed below.
- a data processing module 170 includes executable instructions to support data queries and the ongoing push of information to a client device 102 , as discussed below.
- the modules in memory 160 are exemplary. The different modules may be on each server in the cluster or individual modules may be on different servers in the cluster.
- FIG. 2 is a more particular characterization of various modules shown in FIG. 1 .
- the arrows in the figure illustrate interactions between the modules, which are achieved through APIs.
- a browser 122 At the top of the figure is a browser 122 , which is resident on a client device 102 .
- the remaining modules in the figure are implemented on a cluster of servers 108 .
- the web application module 160 may include a story control module 200 .
- story references an ongoing evaluation of data, typically from disparate sources. The data is pushed to a client device as data is updated.
- a data story is a living analysis of one or more data sets, which may be either internal or external data sources. A data story can be automatically refreshed on a set cycle to keep the analysis up-to-date as data from the source gets updated or refreshed.
- the story control module 200 includes executable instructions to provide data visualizations that are data-aware.
- the data-awareness is used to appropriately scale data visualizations and harmonize data from discrete sources, as demonstrated below.
- the web application module 160 may also include a collaboration module 202 , which includes executable instructions to support collaboration between end users evaluating a common story.
- the collaboration module supports context-aware data analysis collaboration, such as data-aware visualization transitions, data-aware data annotations and context-aware data annotations, as demonstrated below.
- FIG. 2 also illustrates a data ingest module 164 , which includes a data discovery module 204 .
- the data discovery module 204 includes executable instructions to evaluate attributes of ingested data.
- the data discovery module 204 communicates the attributes of the ingested data as data type metadata 208 , which is stored in the metadata catalog 166 .
- the data discovery module 204 operates in conjunction with a distributed, fault-tolerant real-time computation platform, such as the Storm open source software project.
- the computation platform has a master node and worker nodes.
- the master node operates as a coordinator and job tracker.
- the master node assigns tasks to worker nodes and monitors for failures.
- Each worker node includes a supervisor method that listens for work assigned to it.
- Each worker node executes a subset of a topology.
- a running topology contains many worker processes spread across many machines.
- a topology is a graph of a computation. Each node in a topology includes processing logic. Links between nodes indicate how data is passed between nodes.
- the computation platform may operate on a stream.
- a stream is an unbounded sequence of tuples.
- a tuple is an ordered list of elements.
- a field in a tuple can be an object of any type.
- the computation platform provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, one may transform a stream of tweets into a stream of trending topics. Stream transformations may be accomplished using spouts and bolts. Spouts and bolts have interfaces that one implements to run application-specific logic.
- a spout is a source of streams.
- a spout may read tuples and emit them as a stream.
- a spout may connect to the Twitter API and emit a stream of tweets.
- a bolt consumes any number of input streams, performs some processing and possibly emits new streams. Complex stream transformations require multiple steps and therefore multiple bolts. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.
- Links between nodes in a topology indicate how tuples should be passed. For example, if there is a link between Spout A and Bolt B, a link from Spout A to Bolt C, and a link from Bolt B to Bolt C, then every time Spout A emits a tuple, it will send the tuple to both Bolt B and Bolt C. All of Bolt B′s output tuples will go to Bolt C as well.
- Data type metadata 208 from the data ingest module 164 is loaded into a file system 168 .
- the file system 168 is a Hadoop Distributed File System (HDFS).
- HDFS Hadoop is an open-source software framework that supports data-intensive distributed applications.
- the metadata may be stored in a separate catalog storage repository.
- HDFS supports the running of applications on large clusters of commodity hardware.
- stories metadata 212 is maintained to support the story control module 200 of the web application module.
- the stories metadata 212 characterizes the type of data to be supplied in a story.
- the stories metadata 212 also includes state information to track changes in the story over time.
- the stories metadata 212 provides contextual information to reconstruct the development of a story over time.
- the metadata catalog 166 also includes collaboration metadata 214 .
- the collaboration metadata 214 supports operations performed by the collaboration module 202 .
- the collaboration metadata 214 characterizes groups of individuals that may share a story.
- the collaboration metadata 214 may include various permissions that specify which individuals can see which data. For example, some collaborating individuals may have access to granular data, while others may only have access to aggregate data.
- the collaboration metadata 214 also maintains state information tracking collaboration over time. Consequently, the collaboration metadata 214 provides contextual information to reconstruct collaborative actions over time.
- the collaboration metadata 214 may be used in connection with data and analytic data stories, concepts that will be discussed in detail below. Different permissions can be set for data versus stories. For example, some collaborating individuals may have the permission to add data to the system and manage the data. Some individuals may have access to granular data and others have access to aggregate data. For analytic data stories, collaborators may have permission to iterate a story, view it only or view and comment on it. All permissions on data and stories are maintained as state information tracked over time.
- Collaboration metadata permissions may specify what operations may be performed on data or the view of data. For example, in one embodiment, a read only collaborator may only comment on and view data.
- the data processing module 170 supports distributed in-memory processing of data. As discussed below, the data processing module 170 operates on data units utilized in accordance with an embodiment of the invention.
- the data processing module 170 may utilize an open source cluster computing system, such as Spark from the University of California, Berkeley AMPLab.
- Spark from the University of California, Berkeley AMPLab.
- the core concept in Spark is a Resilient Distributed Dataset (RDD).
- RDD Resilient Distributed Dataset
- An RDD is a data structure for a sequence of data that is fault tolerant and supports many parallel data manipulation operations, while allowing users to control in-memory caching and data placement.
- RDDs explicitly remember the derivation trees for the data sets in memory so that they can be re-derived in case of a fault. RDDs also allow explicit caching so that important intermediate results can be held in memory, which accelerates later computations that require intermediate results or if that same result needs to be sent to a client again.
- the data processing module 170 is further discussed below. Attention initially focuses on data ingestion.
- FIG. 3 illustrates processing operations associated with the data ingest module 164 .
- the data ingest module 164 evaluates a data source 300 . Based upon the data source, the module infers data types, data shape and/or data scale.
- the data types may be time data, geographical data, dollar amounts, streamed data, and the like.
- the data shape may be characterized in any number of ways, such as a continuous stream of uniform data, a continuous stream of bursty data, sparse data from a data repository, aggregated sections of data from a source, and the like.
- the data scale provides an indication of the volume of data being ingested from a data source.
- the data ingest module 164 processes all types of data, whether structured data (e.g., a relational database), semi-structured data (e.g., a document-oriented database) or unstructured data.
- the data is evaluated 302 . That is, the actual data is processed to infer data types, data shape and/or data scale.
- data types the identification of a zip code or geo-spatial coordinates implicates a geography data type. Alternately, certain number formats implicate a time data type. A currency indicator may implicate a sales data type.
- Categories are also supported as a data type. Categories may be any data which does not conform to time, geography or numeric types. For example, in the case of hotels, the categories may be business, resort, extended stay or bed and breakfast. Categories may be hierarchical, such as a reading material category with a hierarchy of electronic books, audible books, magazines and newspapers. The system detects category types and suggests them to the user.
- the system allows one to filter by a specific category value or break down a numeric measure by available category values (e.g., view Hotel Revenue split by different hotel categories).
- category value e.g., view Hotel Revenue split by different hotel categories.
- evaluation of the data may lend itself to characterizations of the shape of the data.
- evaluation of the data provides an indication of the volume of data.
- FIG. 4 provides an example of such a display.
- FIG. 4 illustrates an interface 400 displaying an ingested csv file with five columns 402 , 404 , 406 , 408 and 410 .
- the first column 402 shows data in a Year/Month/Date format, which is indicated in data identification filed 412 .
- the second column 404 has the same format.
- a user may access a window 414 showing the confidence of the characterization.
- the third column 406 is characterized as a number data type.
- the fourth column 408 has a Year/Month/Data format, while the fifth column 410 has an identified number data type.
- the system provides for user reinforcement, validation and correction of inferred data types.
- a dimension is a hierarchical characterization of data. For example, in the case of a time dimension or a number dimension the hierarchy is increasing values. In the case of a geographical dimension the hierarchy is expanding geographical size (e.g., address to zip code to county to state to country).
- values are computed along dimensions 312 .
- the days are aggregated into months, which are aggregated into individual years, which are aggregated into multiple years. This roll up of values is computed automatically.
- an original data set may include data from individual days
- the ingested data maintains the data from the individual days, but is also supplemented to include dimensional data of months, individual years and multiple years.
- an original data set includes individual zip codes
- those individual zip codes are augmented to include dimensional data for county, state and country, or any other default or specified hierarchy. Observe that this is performed automatically without any user input.
- the original data is pre-processed to include dimensional data to facilitate subsequent analyses.
- the original data may also be pre-processed to generate other types of metadata, such as the number of distinct values, a minimum value and maximum value and the like. This information may inform the selection of visualizations and filtering operations. This information may also be used to provide join relevance indicia 314 .
- FIG. 5 illustrates an interface 500 to provide join relevance indicia.
- the figure provides a textual description of a data set 502 .
- the interface provides indicia 504 of the relevance of the data to other data.
- the indicia include numeric indicia (9.5 on a scale of 10.0) and graphical indicia in the form of a 95% completed wheel.
- the indicia 504 may be accompanied by characterizations of the components of the data set. In this case, there is a chronological data type component 506 , a geographical data type component 508 and an “other” data type component 510 .
- Each data type component may include indicia 512 of confidence of the data type characterization.
- the score is a function of the percentage of columns in the two data sets that can be merged.
- User input may be collected to revise or otherwise inform the join relevance indicia. In this way, the system involves the user in reinforcement, validation and correction of join recommendations.
- the next operation is to store metadata 316 .
- data type metadata 208 may be stored in the metadata catalog 166 shown in FIG. 2 .
- the final operation of FIG. 3 is to select a default visualization 318 . That is, relying upon one or more of the data type, data shape and data scale, the data ingest module 164 may establish a default visualization (e.g., map, bar chart, pie chart, etc.).
- an embodiment of the invention provides for data ingestion from disparate data sources and data inferences about the ingested data.
- Inferred data types are derived from structured, semi-structured and/or unstructured data sources.
- the data source may be internal private data or an external data source.
- the invention supports ingestion through any delivery mechanism. That is, the source can provide one-time data ingestion, periodic data ingestion at a specified time interval or a continuous data ingestion of streamed content.
- the data ingestion process also provides for data harmonization by leveraging identified data types. That is, the identified data types are used to automatically build an ontology of the data. For example, in the case of a recognized zip code, the harmonization process creates a hierarchy from zip code to city to county to state to country. Thus, all data associated with the zip code is automatically rolled up to a city aggregate value, a county aggregate value, a state aggregate value and a country aggregate value. This automated roll-up process supports subsequent drill-down operations from a high hierarchical value to a low hierarchical value (e.g., from state to city). This information is then used to generate the most appropriate visualization for the data. This data harmonization also accelerates the convergence of two or more data sets.
- identify data types are used to automatically build an ontology of the data. For example, in the case of a recognized zip code, the harmonization process creates a hierarchy from zip code to city to county to state to country. Thus, all data associated with the zip code is automatically rolled up to a
- FIG. 6 illustrates processing operations associated with the convergence of two or more data sets.
- a user has an opportunity to select a data set 600 . If a dataset is selected ( 600 —Yes), a data set is added 602 . After all data sets have been selected, the data sets are harmonized to the lowest common data unit granularity 604 . That is, when two or more data sets are converged, the common dimensions across the data sets are harmonized so that the converged data sets get rendered into visualizations that are common elements between the data sets.
- the final operation of FIG. 6 is to coordinate visualizations 606 .
- the visualization may be based upon the granularity of the data set (data scale), the data shape and/or the data type.
- the system selects a default visualization, which may be overridden by a user. Examples of the foregoing operations are provided below.
- the data processing module 170 is an in-memory iterative analytic data processing engine that operates on “data units” associated with a story.
- FIG. 7 illustrates a story 700 comprising a set of data units 702 _ 1 through 702 _N. Each data unit has a corresponding discussion thread 704 _ 1 through 704 _N.
- a data unit 702 includes data 706 .
- the data 706 includes raw ingested data plus rolled-up hierarchical data, as previously discussed.
- a data unit also includes a version field 708 .
- the version field may use a temporal identifier to specify a version of data, for example, after it has been filtered during some analytic process.
- a permissions field 710 specifies permissions to access the data. Different individuals collaborating in connection with a story may have different access levels to the data. For example, one individual may have access to all data, while another individual may only have access to aggregated data.
- a bookmark field 712 may be used to persist
- Each discussion thread 704 includes a set of discussion entries 714 _ 1 through 714 _N.
- Permissions field 710 may establish individuals that may participate in a discussion thread. Example discussion threads are provided below.
- FIG. 7 illustrates the in-memory manifestation of a discussion thread and its association with an in-memory data unit 702 .
- Data operators e.g., sum, average, standard deviation
- Each data unit may also store filter information, a best fit data visualization setting, and data visualization highlight information.
- FIG. 8 illustrates a home page 800 that may be displayed on a browser 122 of a client device 102 .
- the home page 800 may be supplied by the web application module 160 .
- the home page 800 includes a settings field 802 .
- the home page 800 also includes a field 804 to list stories owned by the user. These are stories constructed by or on behalf of the user. Typically, such stories are fully controlled by the user.
- the home page 800 may also include a field 806 for stories that may be viewed by the user. The user may have limited permissions with respect to viewing certain data associated with such stories.
- the permissions field 710 of each data unit 702 specifies permissions.
- the home page 800 also has field 808 for supplying data owned by a user.
- the data owned by a user is effectively the data units 702 owned by a user.
- the home page 800 includes a collaboration field 810 to facilitate online communication with other users of the system.
- the discussion threads 704 populate the collaboration field 810 .
- the web application module 160 utilizes the story control module 200 to access stories metadata 212 and the collaboration module 202 to access collaboration metadata 214 .
- the web application module 160 may pass information to the data processing module 170 , which loads information into data units 702 and discussion threads 704 .
- FIG. 9 illustrates an interface 900 depicting individual stories 902 .
- Each story 902 may have an associated visualization 904 and text description 906 .
- the interface 900 may also display a text description of recent activities 908 by the user.
- Collaborative members 910 may also be listed. If the user selects story 912 , the interface of FIG. 10 is provided.
- FIG. 10 illustrates an interface 1000 for the story entitled “Hotel Density and Revenue by Geography”.
- the interface 1000 indicates a first data source 1002 from a hotel transaction database and a second data source 1004 from a Dun & Bradstreet report on hotel density.
- the hotel transaction database has information organized as a function of time, while the hotel density information is organized by geography.
- the invention provides a data-aware convergence of these two data sets. More particularly, FIG. 10 illustrates data-aware convergence and visualization of disparate data sources. Observe that in FIG. 9 the story 912 is geographically scaled based upon the amount of screen space available. That is, in FIG. 9 , interface 900 simultaneously displays multiple stories.
- the story control module 200 scales the amount of displayed information in a manner consistent with the amount of screen space available.
- a data-aware visualization transition occurs, with an enhanced amount of information displayed, as shown in interface 1000 of FIG. 10 . Since more space is available in interface 1000 , the story control module 200 expands the amount of displayed information.
- the data type metadata 166 includes information on data types, data shape and data scale for ingested data. This information may be used to select appropriate visualizations.
- the interface 1000 provides different visualization options 1006 , 1007 , 1008 , such as a map, bar graph, scatter plot, table, etc.
- the map view 1006 is selected.
- Each visualization option has a set of default parameters based upon an awareness of the data.
- average hotel revenue per hotel for an arbitrary period of time is displayed in one panel 1008
- total hotel revenue for the same arbitrary period of time is displayed in another panel 1010 .
- shading may be used to reflect density of activity.
- the interface 1000 also includes a collaboration section 1012 .
- the filter indicator 1014 specifies that all data is being processed. This filter may be modified for a specific geographic location, say California, in which case the interface of FIG. 11 is provided.
- FIG. 11 illustrates an interface 1100 with the same data as in FIG. 10 , but for a smaller geographic region, namely one state, California.
- a visualization of average hotel revenue per hotel is provided in one panel 1102
- a visualization of total hotel revenue is provided in another panel 1104 .
- the visualization transition from interface 1000 to interface 1100 is data-aware in the sense that the visualization supplies data relevant to the specified filter parameter.
- the collaboration section 1106 illustrates a dialog regarding the data.
- a tab 1108 allows one to bookmark this view. That is, activating the tab 110 8 sets the bookmark field 712 in a data unit 702 associated with the story.
- This view and associated dialog information is then stored in a data unit 702 and corresponding discussion thread 704 . In this way, the information can be retrieved at a later time to evaluate the evolution of a story.
- FIG. 12 illustrates an interface 1200 displaying the total hotel revenue data as a bar chart. Observe here that the filter 1014 is set for all data. Therefore, the transition to the new visualization is for all data. That is, the same data filter is used for the new visualization. Also observe that there is collaboration context awareness as the collaboration section 1012 of FIG. 10 corresponds to the collaboration section 1202 of FIG. 12 . A highlight from the visualization of FIG. 10 may carry over to the visualization of FIG. 12 .
- This process is known as highlighting and linking, where a highlight on any one visualization is then linked to every other related visualization. For example, if in FIG. 10 , the states California, New York, Texas, New Jersey and Florida are highlighted on the map, those same states are highlighted in the bar graph of FIG. 12 .
- FIG. 13 illustrates an interface 1300 that displays a first data source 1302 of Tweet frequency data during Super Bowl 47 .
- a second data source 1304 is data from a data warehouse of click stream online activity during the same time period.
- Graph 1306 is for the data from the first data source 1302
- graph 1308 is for the data from the second data source 1304 .
- the time axes for the two graphs 1306 and 1308 are aligned.
- individual annotations on the two data sets are aligned, as shown by annotations 1310 and 1312 .
- an annotation is made on one visualization, it is automatically applied to another visualization.
- Hovering over an annotation may result in the display 1314 of collaboration data.
- a separate collaboration space 1316 with a discussion thread may also be provided.
- the web application module 160 facilitates the display of annotations 1310 and 1312 , collaboration data 1314 and collaboration space 1316 through access to the collaboration metadata 214 .
- Annotations are stateful annotations in a discussion thread 704 associated with a data unit 702 .
- An annotation may have an associated threshold to trigger an alert. For example, one can specify in an annotation a threshold of $10,000 in sales. When the threshold is met, an alert in the form of a message (e.g., an email, text, collaboration panel update) is sent to the user or a group of collaborators. A marker and an indication of the message may be added to the annotations.
- a message e.g., an email, text, collaboration panel update
- FIG. 14 illustrates an interface 1400 corresponding to interface 1300 , but with a different period of time specified on the time axis. As a result, the five annotations shown in graph 1308 are in a condensed form in graph 1402 .
- the figure also illustrates a set of bookmarks 1404 associated with this view of data. The bookmarks 1404 are supplied by the web application module 160 through its access to the collaboration metadata 214 .
- the invention provides convergence between multiple data sources, such as public data sources, premium data sources and private data sources.
- the invention does not require rigid structuring or pre-modeling of the data.
- the invention provides harmonization across key dimensions, such as geography, time and categories.
- data is continuously pushed to a user. Consequently, a user does not have to generate a query for refreshed data.
- a user can easily collaborate with others to facilitate analyses across distributed teams. Permission settings enforce user policies on viewing and sharing of data and analyses.
- An embodiment of the invention facilitates the creation of what will be referred to as a storyboard.
- a storyboard is a collection of visualization frames.
- the collection of visualization frames characterize a logical sequence of data analytics, although any combination of visualization frames may be used in accordance with embodiments of the invention.
- Each visualization frame is a snapshot of data. Since a snapshot of data is collected, the creator of the storyboard need not be a data analyst or other sophisticated computer user.
- permissions and visualization settings simplify storyboard creation and utilization. The permissions may be at the storyboard level and/or individual frame level.
- the collection of visualization frames has an associated collection of discussion threads.
- Each discussion thread involves different users and comments made by the different users.
- the discussion threads facilitate in-context collaboration of analytical data in the collection of visualization frames.
- FIG. 15 illustrates a first story 1500 with four story panels SP 1 , SP 2 , SP 3 and SP 4 .
- the figure also illustrates a second story 1502 with four story panels SPA, SPB, SPC and SPD.
- the web application module 160 is configured to allow a user to collect selected story panels to form a storyboard, such as storyboard 1504 . For example, hovering over a story panel may result in a prompt, such as “Move to Storyboard?” Alternately, a user may open a storyboard and receive a prompt to select story panels from different stories.
- Storyboard 1504 has a canvas with different visualization frames.
- visualization frame VF 1 corresponds to story panel SP 1
- visualization frame VF 4 corresponds to story panel SP 4
- visualization frame VFB corresponds to story panel SPB
- visualization frame VFC corresponds to story panel SPC.
- selected story panels from different stories are used to form storyboard 1504 .
- a compelling data analysis may be constructed through a logical sequence of visualization frames.
- the storyboard 1504 may also include a reference to an external media file (EMF).
- EMF external media file
- the EMF may be a link to an audio/visual resource that may be played to augment the sequence of data analytics associated with VF 1 , VF 4 , VFB and VFC.
- additional media sources may include data visualizations created in other business intelligence tools.
- FIG. 16 illustrates a storyboard 1600 comprising a set of visualization units 1602 _ 1 through 1602 _N and a collection of discussion threads 1604 _ 1 through 1604 _N.
- a visualization unit is similar to the previously discussed data units. However, each visualization unit is a simplified version of a data unit.
- the visualization units facilitate the creation and utilization of a collection of visualization frames.
- the visualization units mask data source complexity and provide automated operations, such as automated refresh of data, which allows the storyboard to be used by enterprise employees that are technically less sophisticated.
- a visualization unit includes a graphical visualization 1606 representing a snapshot of data (i.e., data at a given instance in time).
- the visualization unit also includes data 1608 associated with the visualization (i.e., the data that is expressed in the visualization).
- the visualization unit also includes metadata, such as a title for the visualization, a description of the data and the like.
- Various permissions 1612 are set for the visualization unit. The permissions are based upon the status of the user. For example, the creator of a storyboard may have more permission to manipulate the storyboard than a consumer or viewer of the storyboard.
- a visualization unit also includes a filter configuration block 1614 .
- permissions 1612 express the type of filters that one may apply to the data 1608 .
- the sophistication of the available filters is typically a function of the sophistication of the user.
- the visualization unit may also include visualization settings 1616 , such as visualization type (graph, bar, pie, etc.), visualization orientation, visualization scaling and the like.
- the story board 1600 also includes a collection of discussion threads 1604 _ 1 through 1604 _N. Each discussion thread lists different users and comments made by the different users. For example, entry 1618 _ 1 is a comment B from individual A, while entry 1618 2 is comment D from individual C.
- the storyboard and its associated visualization units and discussion threads may be in-memory data structures that facilitate improved functioning of a computer system.
- the visualization units include automated data access for data refresh on a scheduled basis.
- the visualization units mask system complexity for a user.
- an individual visualization frame such as VF 1 may have an associated data refresh prompt 1700 , which includes various data refresh configuration parameters.
- the data refresh configuration parameters include data refresh on demand (“Refresh Now”) 1701 , a scheduled data refresh 1702 and data refresh based upon a data change 1704 .
- the scheduled refresh 1702 may be based upon any specified time interval (e.g., every 15 minutes, every 30 minutes, every hour, every day, every week, every month, etc.).
- the data refresh configuration parameter may be stored in the visualization unit, which includes executable instructions to access the source data at the specified interval. Observe that this automated approach insulates the user from the complexities of data access.
- FIG. 18 illustrates a storyboard 1800 with prompts 1802 and 1804 .
- Prompt 1802 allows a user to specify whether to provide an indication of new comments or new data.
- Indicia 1806 may be used to indicate new comments and indicia 1808 may be used to indicate new data.
- the indicia may be text, a graphical symbol, an altered font and the like.
- Prompt 1804 allows one to specify filter conditions for the snapshot of data.
- the filter conditions may relate to the granularity of the data (e.g., instead of data for a country, data for a specific state).
- a user is prompted to name a filter.
- the user is then given various pull-down menu options for various filter attributes.
- the filter attributes may be based upon the permissions associated with the user. A sophisticated user may be given more filter attributes, while an unsophisticated user may be given limited filter attributes. This is another example of how the disclosed technology allows unsophisticated users to successfully work with data sources that may otherwise be inaccessible to the unsophisticated users.
- the filter condition is applied to each visualization frame that has data corresponding to the filter.
- Indicia 1810 may be used to let the user know which visualizations have been filtered.
- FIG. 18 also illustrates a visualization frame section 1812 .
- Such frame sections 1812 may be used to segregate related data analytics. This can reduce the complexity of a story board with numerous visualization frames.
- FIG. 19 illustrates a storyboard 1900 with a comment feed 1902 associated with the entire storyboard 1900 .
- the comment feed 1902 is a scroll of discussion threads that may be stopped, started, rewound, etc.
- a comment feed 1904 may be associated with a specific visualization frame.
- the comment feed 1904 includes links to external media.
- Individual comments or text entries may be associated with individual data elements in the visualization frame. For example, an individual comment may relate to a section of a pie chart or two sections of a pie chart.
- Interface tools are supplied to allow a user to select individual data elements and groups of data elements which are then linked to a text entry regarding the selected data.
- the data elements may be contiguous or non-contiguous.
- a discussion thread includes automatically generated text entries that are produced in response to a data value exceeding a specified threshold. For example, a rule may be specified that if a dollar value exceeds a specified threshold, then a comment, such as, “Sales target exceeded” may be automatically inserted into the discussion thread.
- the discussion thread may include input from users and rule based input that is automatically generated by the system.
- the automatically generated text may be accompanied by an alert sent to a user, for example an email alert sent to a user.
- the automatically generated text may also be accompanied by indicia placed in the visualization (e.g., indicia in a visualization of sales volume of where the sales target is exceeded).
- the automatically generated text may be a link to an external resource, such as the original business plan expressing the sales target.
- a storyboard Once a storyboard is constructed, it may be used as a template that facilitates substitution of a first set of data sources with a second set of data sources to produce a new collection of visualization frames. For example, hovering over a visualization frame may result in a prompt “Specify new data source?” A user may then enter the new data source or may be alternately provided with a pulldown menu of data sources available to the user.
- indicia e.g., a pin
- indicia is used to show related data in different frames of a storyboard. For example, one my hover over a data element and receive a prompt to move to another frame with the same data element. In a similar way, one may be prompted to see recent collaboration across a set of visualization frames.
- a prompt may also be supplied to export a storyboard to a different file format.
- a file format for offline processing may be used, such as a PDF format and PowerPoint® format and the like.
- a storyboard provides an option to link back to a story associated with a visualization frame. For example, hovering over a visualization frame may result in a prompt “Transition to original story?” A transition to the original story may then be implemented, which allows the user to collaborate in the original story, for example, by requesting clarification about a data element.
- FIG. 20 illustrates architectural components utilized to implemented the disclosed storyboards. Many components correspond to components already discussed in connection with FIG. 2 . The current discussion is limited to a discussion of the new components 2000 - 2008 .
- the web application module 162 is augmented to include a storyboard module 2000 .
- the storyboard module 2000 includes executable instructions to populate browser 122 with interfaces of the type disclosed above.
- the storyboard module 2000 interacts with a frame renderer 2002 .
- the frame renderer 2002 is configured to take a data snapshot. For example, consider the case where a story is rendered in browser 122 . A prompt may be provided to the user to move the story to a storyboard. If the user engages the prompt, the frame renderer 2002 produces a visualization unit 1602 and persistently stores the frame in a frame store 2004 .
- the storyboard module 2000 interacts with the frame renderer 2002 to update the metadata catalog 166 to create storyboard frames 2006 . That is, the metadata catalog 166 is supplemented with metadata associated with each frame and the storyboard in which it resides. In addition, the metadata catalog 166 may store storyboard permissions 2008 . The storyboard permissions may control permissions at the storyboard level. The permissions may be of the type discussed in connection with the visualization units. Thus, embodiments of the invention express permissions at the visualization unit level and/or the storyboard level.
- a scheduler (not shown) operates with the web application module 162 and the frame renderer 2002 to schedule the rendering of frames in accordance with a refresh schedule discussed in connection with FIG. 17 .
- An embodiment of the present invention relates to a computer storage product with a computer readable storage medium having computer code thereon for performing various computer-implemented operations.
- the media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
- Examples of computer-readable media include, but are not limited to: magnetic media, optical media, magneto-optical media and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices.
- Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter.
- an embodiment of the invention may be implemented using JAVA®, C++, or other object-oriented programming language and development tools.
- Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A server has a data processing module with instructions executed by a processor to maintain a collection of visualization frames that characterize a sequence of data analytics. Each visualization frame is a snapshot of data. The collection of visualization frames has associated permissions and visualization settings. A collection of discussion threads is maintained for the collection of visualization frames. Each discussion thread identifies different users and comments made by the different users.
Description
- This application is a continuation-in-part of U.S. Ser. No. 14/292,775, filed May 30, 2014, which claims priority to U.S. Provisional Patent Application Ser. No. 61/829,191, filed May 30, 2013.
- This application is related to commonly owned U.S. Ser. No. 14/292,765, filed May 30, 2014, U.S. Ser. No. 14/292,783, filed May 30, 2014 and U.S. Ser. No. 14/292,788, filed May 30, 2014.
- This invention relates generally to data analyses in computer networks. More particularly, this invention relates to collaborative analyses of data snapshot visualizations from disparate sources.
- Existing data analysis techniques typically entail discrete analyses of discrete data sources. That is, an individual typically analyzes a single data source in an effort to derive useful information. Individual data sources continue to proliferate. Public data includes such things as census data, financial data and weather data. There are also premium data sources, such as market intelligence data, social data, rating data, user data and advertising data. Other sources of data are private, such as transactional data, click stream data, and log files.
- There is a need for a scalable approach to analyses of multiple sources of data. Ideally, such an approach would support collaboration between end users.
- A server has a data processing module with instructions executed by a processor to maintain a collection of visualization frames that characterize a sequence of data analytics. Each visualization frame is a snapshot of data. The collection of visualization frames has associated permissions and visualization settings. A collection of discussion threads is maintained for the collection of visualization frames. Each discussion thread identifies different users and comments made by the different users.
- The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
-
FIG. 1 illustrates a system configured in accordance with an embodiment of the invention. -
FIG. 2 illustrates component interactions utilized in accordance with an embodiment of the invention. -
FIG. 3 illustrates processing operations associated with the data ingest module. -
FIG. 4 illustrates a user interface for displaying inferred data types. -
FIG. 5 illustrates a user interface to display join relevance indicia utilized in accordance with an embodiment of the invention. -
FIG. 6 illustrates data merge operations performed in accordance with an embodiment of the invention. -
FIG. 7 illustrates in-memory data units and corresponding discussion threads utilized in accordance with an embodiment of the invention. -
FIG. 8 illustrates an initial graphical user interface that may be used in accordance with an embodiment of the invention. -
FIG. 9 illustrates various data streams that may be evaluated by a user in accordance with an embodiment of the invention. -
FIG. 10 illustrates data-aware convergence and visualization of disparate data sources. -
FIG. 11 illustrates context-aware data analysis collaboration. -
FIG. 12 illustrates data-aware visualization transition utilized in accordance with an embodiment of the invention. -
FIG. 13 illustrates data-aware annotations utilized in accordance with an embodiment of the invention. -
FIG. 14 illustrates context-aware annotations utilized in accordance with an embodiment of the invention. -
FIG. 15 illustrates the construction of a storyboard from different stories in accordance with an embodiment of the invention. -
FIG. 16 illustrates visualization units and discussion threads configured in accordance with an embodiment of the invention. -
FIG. 17 illustrates data refresh prompts supplied in accordance with an embodiment of the invention. -
FIG. 18 illustrates storyboard prompts and display features associated with embodiments of the invention. -
FIG. 19 illustrates storyboard discussion threads utilized in accordance with embodiments of the invention. -
FIG. 20 illustrates an embodiment of architectural components utilized to support storyboard operations disclosed herein. - Like reference numerals refer to corresponding parts throughout the several views of the drawings.
-
FIG. 1 illustrates asystem 100 configured in accordance with an embodiment of the invention. Thesystem 100 includes aclient computer 102 connected to a set of servers 104_1 through 104_N via anetwork 106, which may be any wired or wireless network. The servers 104_1 through 104_N are operative as data sources. The figure also illustrates a cluster of servers 108_1 through 108_N connected tonetwork 106. The cluster of servers is configured to implement operations of the invention. - The
client computer 102 includes standard components, such as acentral processing unit 110 and input/output devices 112 connected via abus 114. The input/output devices 112 may include a keyboard, mouse, touch display and the like. Anetwork interface circuit 116 is also connected to thebus 114 to provide an interface withnetwork 106. Amemory 120 is also connected to thebus 114. Thememory 120 stores abrowser 122. Thus, aclient machine 102, which may be a personal computer, tablet or Smartphone,accesses network 106 to obtain information supplied in accordance with an embodiment of the invention. - Servers 104_1 through 104_N also include standard components, such as a
central processing unit 130 and input/output devices 132 connected via abus 134. Anetwork interface circuit 132 is also connected to thebus 134 to provide connectivity tonetwork 106. Amemory 140 is also connected to thebus 134. Thememory 140 stores adata source 142.Different servers 104 supply different data sources. For example, some servers may supply public data, such as census data, financial data and weather data. Other servers may provide premium data, such as market intelligence data, social data, rating data, user data and advertising data. Other servers may provide private data, such as transactional data, click stream data, and log files. The data may be in any form. In one form, the data is structured, such as data from a relational database. In another form the data is semi-structured, such as document-oriented database. In another form the data is unstructured. In still another form the data is streamed. A data stream is a sequence of data elements and associated real time indicators. - Each server 108 has standard components, such as a
central processing unit 150 connected to input/output devices 152 via abus 154. Anetwork interface circuit 156 is also connected to thebus 154 to provide access tonetwork 106. Amemory 160 is also connected to thebus 154. Thememory 160 stores modules and data to implement operations of the invention. In one embodiment, aweb application module 162 is used to provide a relatively thin front end to the system. Theweb application module 162 operates as an interface between abrowser 122 on aclient machine 102 and the various modules in the software stack used to implement the invention. Theweb application module 162 uses application program interfaces (APIs) to communicate with the various modules in the software stack. - The
memory 160 also stores a data ingestmodule 164. The data ingestmodule 164 consumes data from various data sources and discovers attributes of the data. The data ingestmodule 164 produces metadata characterizing ingested content, which is stored in ametadata catalog 166. The ingested data is loaded into afile system 168, as discussed below. Adata processing module 170 includes executable instructions to support data queries and the ongoing push of information to aclient device 102, as discussed below. The modules inmemory 160 are exemplary. The different modules may be on each server in the cluster or individual modules may be on different servers in the cluster. -
FIG. 2 is a more particular characterization of various modules shown inFIG. 1 . The arrows in the figure illustrate interactions between the modules, which are achieved through APIs. At the top of the figure is abrowser 122, which is resident on aclient device 102. The remaining modules in the figure are implemented on a cluster of servers 108. - The
web application module 160 may include astory control module 200. As used herein, the term story references an ongoing evaluation of data, typically from disparate sources. The data is pushed to a client device as data is updated. Thus, a data story is a living analysis of one or more data sets, which may be either internal or external data sources. A data story can be automatically refreshed on a set cycle to keep the analysis up-to-date as data from the source gets updated or refreshed. - The
story control module 200 includes executable instructions to provide data visualizations that are data-aware. The data-awareness is used to appropriately scale data visualizations and harmonize data from discrete sources, as demonstrated below. - The
web application module 160 may also include acollaboration module 202, which includes executable instructions to support collaboration between end users evaluating a common story. The collaboration module supports context-aware data analysis collaboration, such as data-aware visualization transitions, data-aware data annotations and context-aware data annotations, as demonstrated below. -
FIG. 2 also illustrates a data ingestmodule 164, which includes adata discovery module 204. Thedata discovery module 204 includes executable instructions to evaluate attributes of ingested data. Thedata discovery module 204 communicates the attributes of the ingested data asdata type metadata 208, which is stored in themetadata catalog 166. - In one embodiment, the
data discovery module 204 operates in conjunction with a distributed, fault-tolerant real-time computation platform, such as the Storm open source software project. In one embodiment, the computation platform has a master node and worker nodes. The master node operates as a coordinator and job tracker. The master node assigns tasks to worker nodes and monitors for failures. Each worker node includes a supervisor method that listens for work assigned to it. Each worker node executes a subset of a topology. A running topology contains many worker processes spread across many machines. - A topology is a graph of a computation. Each node in a topology includes processing logic. Links between nodes indicate how data is passed between nodes. The computation platform may operate on a stream. A stream is an unbounded sequence of tuples. A tuple is an ordered list of elements. A field in a tuple can be an object of any type.
- The computation platform provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, one may transform a stream of tweets into a stream of trending topics. Stream transformations may be accomplished using spouts and bolts. Spouts and bolts have interfaces that one implements to run application-specific logic.
- A spout is a source of streams. For example, a spout may read tuples and emit them as a stream. Alternately, a spout may connect to the Twitter API and emit a stream of tweets.
- A bolt consumes any number of input streams, performs some processing and possibly emits new streams. Complex stream transformations require multiple steps and therefore multiple bolts. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.
- Links between nodes in a topology indicate how tuples should be passed. For example, if there is a link between Spout A and Bolt B, a link from Spout A to Bolt C, and a link from Bolt B to Bolt C, then every time Spout A emits a tuple, it will send the tuple to both Bolt B and Bolt C. All of Bolt B′s output tuples will go to Bolt C as well.
-
Data type metadata 208 from the data ingestmodule 164 is loaded into afile system 168. In one embodiment, thefile system 168 is a Hadoop Distributed File System (HDFS). Hadoop is an open-source software framework that supports data-intensive distributed applications. Alternately, the metadata may be stored in a separate catalog storage repository. Advantageously, HDFS supports the running of applications on large clusters of commodity hardware. - Returning to the
metadata catalog 166, stories metadata 212 is maintained to support thestory control module 200 of the web application module. The stories metadata 212 characterizes the type of data to be supplied in a story. The stories metadata 212 also includes state information to track changes in the story over time. Thus, the stories metadata 212 provides contextual information to reconstruct the development of a story over time. - The
metadata catalog 166 also includescollaboration metadata 214. Thecollaboration metadata 214 supports operations performed by thecollaboration module 202. Thecollaboration metadata 214 characterizes groups of individuals that may share a story. Thecollaboration metadata 214 may include various permissions that specify which individuals can see which data. For example, some collaborating individuals may have access to granular data, while others may only have access to aggregate data. Thecollaboration metadata 214 also maintains state information tracking collaboration over time. Consequently, thecollaboration metadata 214 provides contextual information to reconstruct collaborative actions over time. - The
collaboration metadata 214 may be used in connection with data and analytic data stories, concepts that will be discussed in detail below. Different permissions can be set for data versus stories. For example, some collaborating individuals may have the permission to add data to the system and manage the data. Some individuals may have access to granular data and others have access to aggregate data. For analytic data stories, collaborators may have permission to iterate a story, view it only or view and comment on it. All permissions on data and stories are maintained as state information tracked over time. Collaboration metadata permissions may specify what operations may be performed on data or the view of data. For example, in one embodiment, a read only collaborator may only comment on and view data. - In one embodiment, the
data processing module 170 supports distributed in-memory processing of data. As discussed below, thedata processing module 170 operates on data units utilized in accordance with an embodiment of the invention. - The
data processing module 170 may utilize an open source cluster computing system, such as Spark from the University of California, Berkeley AMPLab. The core concept in Spark is a Resilient Distributed Dataset (RDD). An RDD is a data structure for a sequence of data that is fault tolerant and supports many parallel data manipulation operations, while allowing users to control in-memory caching and data placement. - RDDs explicitly remember the derivation trees for the data sets in memory so that they can be re-derived in case of a fault. RDDs also allow explicit caching so that important intermediate results can be held in memory, which accelerates later computations that require intermediate results or if that same result needs to be sent to a client again. The
data processing module 170 is further discussed below. Attention initially focuses on data ingestion. -
FIG. 3 illustrates processing operations associated with the data ingestmodule 164. Initially, the data ingestmodule 164 evaluates adata source 300. Based upon the data source, the module infers data types, data shape and/or data scale. The data types may be time data, geographical data, dollar amounts, streamed data, and the like. The data shape may be characterized in any number of ways, such as a continuous stream of uniform data, a continuous stream of bursty data, sparse data from a data repository, aggregated sections of data from a source, and the like. The data scale provides an indication of the volume of data being ingested from a data source. The data ingestmodule 164 processes all types of data, whether structured data (e.g., a relational database), semi-structured data (e.g., a document-oriented database) or unstructured data. - Next, the data is evaluated 302. That is, the actual data is processed to infer data types, data shape and/or data scale. In the case of data types, the identification of a zip code or geo-spatial coordinates implicates a geography data type. Alternately, certain number formats implicate a time data type. A currency indicator may implicate a sales data type. Categories are also supported as a data type. Categories may be any data which does not conform to time, geography or numeric types. For example, in the case of hotels, the categories may be business, resort, extended stay or bed and breakfast. Categories may be hierarchical, such as a reading material category with a hierarchy of electronic books, audible books, magazines and newspapers. The system detects category types and suggests them to the user. The system allows one to filter by a specific category value or break down a numeric measure by available category values (e.g., view Hotel Revenue split by different hotel categories). In the case of data shape, evaluation of the data may lend itself to characterizations of the shape of the data. In the case of the data scale, evaluation of the data provides an indication of the volume of data.
- These evaluations result in inferred data types, which may be displayed to a
user 304.FIG. 4 provides an example of such a display. In particular,FIG. 4 illustrates aninterface 400 displaying an ingested csv file with fivecolumns first column 402 shows data in a Year/Month/Date format, which is indicated in data identification filed 412. Thesecond column 404 has the same format. A user may access awindow 414 showing the confidence of the characterization. Thethird column 406 is characterized as a number data type. Thefourth column 408 has a Year/Month/Data format, while thefifth column 410 has an identified number data type. Thus, the system provides for user reinforcement, validation and correction of inferred data types. - Returning to
FIG. 3 , if a user wants to refine an inferred data she may do so (306—Yes). Input is then received from theuser 308. For example, thewindow 414 ofFIG. 4 may be used to receive user input that refines the data characterization. After data refinement or if data refinement is no longer required, the data is associated with one ormore dimensions 310. A dimension is a hierarchical characterization of data. For example, in the case of a time dimension or a number dimension the hierarchy is increasing values. In the case of a geographical dimension the hierarchy is expanding geographical size (e.g., address to zip code to county to state to country). - Next, values are computed along
dimensions 312. For example, consider the case of ingested data with a list of days. The days are aggregated into months, which are aggregated into individual years, which are aggregated into multiple years. This roll up of values is computed automatically. Thus, while an original data set may include data from individual days, the ingested data maintains the data from the individual days, but is also supplemented to include dimensional data of months, individual years and multiple years. Similarly, in the case of geography, if an original data set includes individual zip codes, those individual zip codes are augmented to include dimensional data for county, state and country, or any other default or specified hierarchy. Observe that this is performed automatically without any user input. Thus, the original data is pre-processed to include dimensional data to facilitate subsequent analyses. The original data may also be pre-processed to generate other types of metadata, such as the number of distinct values, a minimum value and maximum value and the like. This information may inform the selection of visualizations and filtering operations. This information may also be used to provide joinrelevance indicia 314. -
FIG. 5 illustrates aninterface 500 to provide join relevance indicia. In particular, the figure provides a textual description of adata set 502. Further, the interface providesindicia 504 of the relevance of the data to other data. In this case, the indicia include numeric indicia (9.5 on a scale of 10.0) and graphical indicia in the form of a 95% completed wheel. Theindicia 504 may be accompanied by characterizations of the components of the data set. In this case, there is a chronologicaldata type component 506, a geographicaldata type component 508 and an “other”data type component 510. Each data type component may includeindicia 512 of confidence of the data type characterization. In one embodiment, the score is a function of the percentage of columns in the two data sets that can be merged. User input may be collected to revise or otherwise inform the join relevance indicia. In this way, the system involves the user in reinforcement, validation and correction of join recommendations. - Returning to
FIG. 3 , the next operation is to storemetadata 316. For example,data type metadata 208 may be stored in themetadata catalog 166 shown inFIG. 2 . The final operation ofFIG. 3 is to select adefault visualization 318. That is, relying upon one or more of the data type, data shape and data scale, the data ingestmodule 164 may establish a default visualization (e.g., map, bar chart, pie chart, etc.). - Thus, an embodiment of the invention provides for data ingestion from disparate data sources and data inferences about the ingested data. Inferred data types are derived from structured, semi-structured and/or unstructured data sources. The data source may be internal private data or an external data source. The invention supports ingestion through any delivery mechanism. That is, the source can provide one-time data ingestion, periodic data ingestion at a specified time interval or a continuous data ingestion of streamed content.
- The data ingestion process also provides for data harmonization by leveraging identified data types. That is, the identified data types are used to automatically build an ontology of the data. For example, in the case of a recognized zip code, the harmonization process creates a hierarchy from zip code to city to county to state to country. Thus, all data associated with the zip code is automatically rolled up to a city aggregate value, a county aggregate value, a state aggregate value and a country aggregate value. This automated roll-up process supports subsequent drill-down operations from a high hierarchical value to a low hierarchical value (e.g., from state to city). This information is then used to generate the most appropriate visualization for the data. This data harmonization also accelerates the convergence of two or more data sets.
- The convergence of two or more data sets may be implemented through the
data processing module 170 and thestory control module 200 of theweb application module 160.FIG. 6 illustrates processing operations associated with the convergence of two or more data sets. A user has an opportunity to select adata set 600. If a dataset is selected (600—Yes), a data set is added 602. After all data sets have been selected, the data sets are harmonized to the lowest commondata unit granularity 604. That is, when two or more data sets are converged, the common dimensions across the data sets are harmonized so that the converged data sets get rendered into visualizations that are common elements between the data sets. For instance, if a first data set is at a zip code level and a second data set is at a county level, when the first data set is combined with the second data set, the combination is automatically harmonized to the lowest level of common granularity. In this example, county is the lowest common granularity across the data sets. This harmonization accelerates the process of converging multiple data sets during multi-source analyses. The final operation ofFIG. 6 is to coordinatevisualizations 606. The visualization may be based upon the granularity of the data set (data scale), the data shape and/or the data type. The system selects a default visualization, which may be overridden by a user. Examples of the foregoing operations are provided below. - The
data processing module 170 is an in-memory iterative analytic data processing engine that operates on “data units” associated with a story.FIG. 7 illustrates astory 700 comprising a set of data units 702_1 through 702_N. Each data unit has a corresponding discussion thread 704_1 through 704_N. In one embodiment, adata unit 702 includesdata 706. Thedata 706 includes raw ingested data plus rolled-up hierarchical data, as previously discussed. A data unit also includes aversion field 708. The version field may use a temporal identifier to specify a version of data, for example, after it has been filtered during some analytic process. Apermissions field 710 specifies permissions to access the data. Different individuals collaborating in connection with a story may have different access levels to the data. For example, one individual may have access to all data, while another individual may only have access to aggregated data. Abookmark field 712 may be used to persist a data unit, as discussed below. - Each
discussion thread 704 includes a set of discussion entries 714_1 through 714_N. Permissions field 710 may establish individuals that may participate in a discussion thread. Example discussion threads are provided below. - Thus,
FIG. 7 illustrates the in-memory manifestation of a discussion thread and its association with an in-memory data unit 702. Data operators (e.g., sum, average, standard deviation) may be used to perform iterative operations on data units. Each data unit may also store filter information, a best fit data visualization setting, and data visualization highlight information. - The operations of the invention are more fully appreciated with reference to a use scenario.
FIG. 8 illustrates ahome page 800 that may be displayed on abrowser 122 of aclient device 102. Thehome page 800 may be supplied by theweb application module 160. In this example, thehome page 800 includes asettings field 802. Thehome page 800 also includes afield 804 to list stories owned by the user. These are stories constructed by or on behalf of the user. Typically, such stories are fully controlled by the user. - The
home page 800 may also include afield 806 for stories that may be viewed by the user. The user may have limited permissions with respect to viewing certain data associated with such stories. In one embodiment, thepermissions field 710 of eachdata unit 702 specifies permissions. - The
home page 800 also hasfield 808 for supplying data owned by a user. The data owned by a user is effectively thedata units 702 owned by a user. Finally, thehome page 800 includes acollaboration field 810 to facilitate online communication with other users of the system. Thediscussion threads 704 populate thecollaboration field 810. - Thus, all users have settings, data and stories. Access to stories and collaboration permissions may be controlled by the stories metadata 212 and
collaboration metadata 214 of themetadata catalog 166 operating in conjunction with the data units. More particularly, theweb application module 160 utilizes thestory control module 200 to access stories metadata 212 and thecollaboration module 202 to accesscollaboration metadata 214. Theweb application module 160 may pass information to thedata processing module 170, which loads information intodata units 702 anddiscussion threads 704. - If a user activates the
link 804 for her stories, an interface, such as that shown inFIG. 9 may be supplied.FIG. 9 illustrates aninterface 900 depictingindividual stories 902. Eachstory 902 may have an associatedvisualization 904 andtext description 906. Theinterface 900 may also display a text description ofrecent activities 908 by the user.Collaborative members 910 may also be listed. If the user selectsstory 912, the interface ofFIG. 10 is provided. -
FIG. 10 illustrates aninterface 1000 for the story entitled “Hotel Density and Revenue by Geography”. Theinterface 1000 indicates afirst data source 1002 from a hotel transaction database and asecond data source 1004 from a Dun & Bradstreet report on hotel density. In this example, the hotel transaction database has information organized as a function of time, while the hotel density information is organized by geography. The invention provides a data-aware convergence of these two data sets. More particularly,FIG. 10 illustrates data-aware convergence and visualization of disparate data sources. Observe that inFIG. 9 thestory 912 is geographically scaled based upon the amount of screen space available. That is, inFIG. 9 ,interface 900 simultaneously displays multiple stories. Consequently, thestory control module 200 scales the amount of displayed information in a manner consistent with the amount of screen space available. On the other hand, afterstory 912 is selected, a data-aware visualization transition occurs, with an enhanced amount of information displayed, as shown ininterface 1000 ofFIG. 10 . Since more space is available ininterface 1000, thestory control module 200 expands the amount of displayed information. As previously discussed, thedata type metadata 166 includes information on data types, data shape and data scale for ingested data. This information may be used to select appropriate visualizations. - The
interface 1000 providesdifferent visualization options map view 1006 is selected. Each visualization option has a set of default parameters based upon an awareness of the data. In this example, average hotel revenue per hotel for an arbitrary period of time is displayed in onepanel 1008, while total hotel revenue for the same arbitrary period of time is displayed in anotherpanel 1010. As shown, shading may be used to reflect density of activity. - The
interface 1000 also includes acollaboration section 1012. Thefilter indicator 1014 specifies that all data is being processed. This filter may be modified for a specific geographic location, say California, in which case the interface ofFIG. 11 is provided. -
FIG. 11 illustrates aninterface 1100 with the same data as inFIG. 10 , but for a smaller geographic region, namely one state, California. A visualization of average hotel revenue per hotel is provided in onepanel 1102, while a visualization of total hotel revenue is provided in anotherpanel 1104. Observe that the visualization transition frominterface 1000 tointerface 1100 is data-aware in the sense that the visualization supplies data relevant to the specified filter parameter. - The
collaboration section 1106 illustrates a dialog regarding the data. Atab 1108 allows one to bookmark this view. That is, activating thetab 110 8 sets thebookmark field 712 in adata unit 702 associated with the story. This view and associated dialog information is then stored in adata unit 702 andcorresponding discussion thread 704. In this way, the information can be retrieved at a later time to evaluate the evolution of a story. - As previously indicated in connection with
FIG. 10 ,different visualization options bar chart option 1007, then the interface ofFIG. 12 is supplied.FIG. 12 illustrates aninterface 1200 displaying the total hotel revenue data as a bar chart. Observe here that thefilter 1014 is set for all data. Therefore, the transition to the new visualization is for all data. That is, the same data filter is used for the new visualization. Also observe that there is collaboration context awareness as thecollaboration section 1012 ofFIG. 10 corresponds to thecollaboration section 1202 ofFIG. 12 . A highlight from the visualization ofFIG. 10 may carry over to the visualization ofFIG. 12 . This process is known as highlighting and linking, where a highlight on any one visualization is then linked to every other related visualization. For example, if inFIG. 10 , the states California, New York, Texas, New Jersey and Florida are highlighted on the map, those same states are highlighted in the bar graph ofFIG. 12 . -
FIG. 13 illustrates aninterface 1300 that displays afirst data source 1302 of Tweet frequency data duringSuper Bowl 47. Asecond data source 1304 is data from a data warehouse of click stream online activity during the same time period.Graph 1306 is for the data from thefirst data source 1302, whilegraph 1308 is for the data from thesecond data source 1304. The time axes for the twographs annotations - Hovering over an annotation may result in the
display 1314 of collaboration data. Aseparate collaboration space 1316 with a discussion thread may also be provided. Theweb application module 160 facilitates the display ofannotations collaboration data 1314 andcollaboration space 1316 through access to thecollaboration metadata 214. - Observe that the
annotations 1310 are applied to visualized data. Annotations are stateful annotations in adiscussion thread 704 associated with adata unit 702. An annotation may have an associated threshold to trigger an alert. For example, one can specify in an annotation a threshold of $10,000 in sales. When the threshold is met, an alert in the form of a message (e.g., an email, text, collaboration panel update) is sent to the user or a group of collaborators. A marker and an indication of the message may be added to the annotations. -
FIG. 14 illustrates aninterface 1400 corresponding to interface 1300, but with a different period of time specified on the time axis. As a result, the five annotations shown ingraph 1308 are in a condensed form ingraph 1402. The figure also illustrates a set ofbookmarks 1404 associated with this view of data. Thebookmarks 1404 are supplied by theweb application module 160 through its access to thecollaboration metadata 214. - Thus, the invention provides convergence between multiple data sources, such as public data sources, premium data sources and private data sources. The invention does not require rigid structuring or pre-modeling of the data. Advantageously, the invention provides harmonization across key dimensions, such as geography, time and categories.
- In certain embodiments, data is continuously pushed to a user. Consequently, a user does not have to generate a query for refreshed data. In addition, a user can easily collaborate with others to facilitate analyses across distributed teams. Permission settings enforce user policies on viewing and sharing of data and analyses.
- Those skilled in the art will appreciate the numerous benefits associated with the disclosed stories. Those benefits may be limited to data analysts and similar power users that are knowledgeable about data sources and interactions with data sources. However, in any enterprise there are numerous decision makers that do not have such expertise. Accordingly, it would be desirable to provide such decision makers with simplified tools that facilitate in-context collaboration with respect to analytical data.
- An embodiment of the invention facilitates the creation of what will be referred to as a storyboard. A storyboard is a collection of visualization frames. Typically, the collection of visualization frames characterize a logical sequence of data analytics, although any combination of visualization frames may be used in accordance with embodiments of the invention. Each visualization frame is a snapshot of data. Since a snapshot of data is collected, the creator of the storyboard need not be a data analyst or other sophisticated computer user. As discussed below, permissions and visualization settings simplify storyboard creation and utilization. The permissions may be at the storyboard level and/or individual frame level.
- The collection of visualization frames has an associated collection of discussion threads. Each discussion thread involves different users and comments made by the different users. The discussion threads facilitate in-context collaboration of analytical data in the collection of visualization frames.
-
FIG. 15 illustrates afirst story 1500 with four story panels SP1, SP2, SP3 and SP4. The figure also illustrates asecond story 1502 with four story panels SPA, SPB, SPC and SPD. As discussed below, theweb application module 160 is configured to allow a user to collect selected story panels to form a storyboard, such asstoryboard 1504. For example, hovering over a story panel may result in a prompt, such as “Move to Storyboard?” Alternately, a user may open a storyboard and receive a prompt to select story panels from different stories. -
Storyboard 1504 has a canvas with different visualization frames. In this example visualization frame VF1 corresponds to story panel SP1, visualization frame VF4 corresponds to story panel SP4, visualization frame VFB corresponds to story panel SPB and visualization frame VFC corresponds to story panel SPC. Thus, in this example selected story panels from different stories are used to formstoryboard 1504. In this way a compelling data analysis may be constructed through a logical sequence of visualization frames. - The
storyboard 1504 may also include a reference to an external media file (EMF). For example, the EMF may be a link to an audio/visual resource that may be played to augment the sequence of data analytics associated with VF1, VF4, VFB and VFC. Thus, it can be appreciated that the data analytic and collaborative aspects of the disclosed technology may be supplemented by additional media sources. The additional media sources may include data visualizations created in other business intelligence tools. -
FIG. 16 illustrates astoryboard 1600 comprising a set of visualization units 1602_1 through 1602_N and a collection of discussion threads 1604_1 through 1604_N. A visualization unit is similar to the previously discussed data units. However, each visualization unit is a simplified version of a data unit. The visualization units facilitate the creation and utilization of a collection of visualization frames. The visualization units mask data source complexity and provide automated operations, such as automated refresh of data, which allows the storyboard to be used by enterprise employees that are technically less sophisticated. - In one embodiment, a visualization unit includes a
graphical visualization 1606 representing a snapshot of data (i.e., data at a given instance in time). The visualization unit also includesdata 1608 associated with the visualization (i.e., the data that is expressed in the visualization). The visualization unit also includes metadata, such as a title for the visualization, a description of the data and the like.Various permissions 1612 are set for the visualization unit. The permissions are based upon the status of the user. For example, the creator of a storyboard may have more permission to manipulate the storyboard than a consumer or viewer of the storyboard. - A visualization unit also includes a
filter configuration block 1614. As discussed below,permissions 1612 express the type of filters that one may apply to thedata 1608. The sophistication of the available filters is typically a function of the sophistication of the user. The visualization unit may also includevisualization settings 1616, such as visualization type (graph, bar, pie, etc.), visualization orientation, visualization scaling and the like. - The
story board 1600 also includes a collection of discussion threads 1604_1 through 1604_N. Each discussion thread lists different users and comments made by the different users. For example, entry 1618_1 is a comment B from individual A, whileentry 1618 2 is comment D from individual C. - The storyboard and its associated visualization units and discussion threads may be in-memory data structures that facilitate improved functioning of a computer system. For example, the visualization units include automated data access for data refresh on a scheduled basis. The visualization units mask system complexity for a user.
- Turning now to
FIG. 17 , an individual visualization frame, such as VF1, may have an associated data refresh prompt 1700, which includes various data refresh configuration parameters. In this example, the data refresh configuration parameters include data refresh on demand (“Refresh Now”) 1701, a scheduleddata refresh 1702 and data refresh based upon adata change 1704. The scheduledrefresh 1702 may be based upon any specified time interval (e.g., every 15 minutes, every 30 minutes, every hour, every day, every week, every month, etc.). The data refresh configuration parameter may be stored in the visualization unit, which includes executable instructions to access the source data at the specified interval. Observe that this automated approach insulates the user from the complexities of data access. -
FIG. 18 illustrates astoryboard 1800 withprompts Indicia 1806 may be used to indicate new comments andindicia 1808 may be used to indicate new data. The indicia may be text, a graphical symbol, an altered font and the like. - Prompt 1804 allows one to specify filter conditions for the snapshot of data. For example, the filter conditions may relate to the granularity of the data (e.g., instead of data for a country, data for a specific state). In one embodiment, a user is prompted to name a filter. The user is then given various pull-down menu options for various filter attributes. The filter attributes may be based upon the permissions associated with the user. A sophisticated user may be given more filter attributes, while an unsophisticated user may be given limited filter attributes. This is another example of how the disclosed technology allows unsophisticated users to successfully work with data sources that may otherwise be inaccessible to the unsophisticated users.
- After a filter is set, the filter condition is applied to each visualization frame that has data corresponding to the filter.
Indicia 1810 may be used to let the user know which visualizations have been filtered. -
FIG. 18 also illustrates avisualization frame section 1812.Such frame sections 1812 may be used to segregate related data analytics. This can reduce the complexity of a story board with numerous visualization frames. -
FIG. 19 illustrates astoryboard 1900 with acomment feed 1902 associated with theentire storyboard 1900. In one embodiment, thecomment feed 1902 is a scroll of discussion threads that may be stopped, started, rewound, etc. Alternately or in addition, acomment feed 1904 may be associated with a specific visualization frame. In one embodiment, thecomment feed 1904 includes links to external media. Individual comments or text entries may be associated with individual data elements in the visualization frame. For example, an individual comment may relate to a section of a pie chart or two sections of a pie chart. Interface tools are supplied to allow a user to select individual data elements and groups of data elements which are then linked to a text entry regarding the selected data. The data elements may be contiguous or non-contiguous. - In one embodiment a discussion thread includes automatically generated text entries that are produced in response to a data value exceeding a specified threshold. For example, a rule may be specified that if a dollar value exceeds a specified threshold, then a comment, such as, “Sales target exceeded” may be automatically inserted into the discussion thread. Thus, the discussion thread may include input from users and rule based input that is automatically generated by the system. The automatically generated text may be accompanied by an alert sent to a user, for example an email alert sent to a user. The automatically generated text may also be accompanied by indicia placed in the visualization (e.g., indicia in a visualization of sales volume of where the sales target is exceeded). The automatically generated text may be a link to an external resource, such as the original business plan expressing the sales target.
- Once a storyboard is constructed, it may be used as a template that facilitates substitution of a first set of data sources with a second set of data sources to produce a new collection of visualization frames. For example, hovering over a visualization frame may result in a prompt “Specify new data source?” A user may then enter the new data source or may be alternately provided with a pulldown menu of data sources available to the user.
- In one embodiment, indicia (e.g., a pin) is used to show related data in different frames of a storyboard. For example, one my hover over a data element and receive a prompt to move to another frame with the same data element. In a similar way, one may be prompted to see recent collaboration across a set of visualization frames.
- A prompt may also be supplied to export a storyboard to a different file format. For example, a file format for offline processing may be used, such as a PDF format and PowerPoint® format and the like.
- In one embodiment, a storyboard provides an option to link back to a story associated with a visualization frame. For example, hovering over a visualization frame may result in a prompt “Transition to original story?” A transition to the original story may then be implemented, which allows the user to collaborate in the original story, for example, by requesting clarification about a data element.
-
FIG. 20 illustrates architectural components utilized to implemented the disclosed storyboards. Many components correspond to components already discussed in connection withFIG. 2 . The current discussion is limited to a discussion of the new components 2000-2008. Theweb application module 162 is augmented to include astoryboard module 2000. Thestoryboard module 2000 includes executable instructions to populatebrowser 122 with interfaces of the type disclosed above. In addition, thestoryboard module 2000 interacts with aframe renderer 2002. Theframe renderer 2002 is configured to take a data snapshot. For example, consider the case where a story is rendered inbrowser 122. A prompt may be provided to the user to move the story to a storyboard. If the user engages the prompt, theframe renderer 2002 produces a visualization unit 1602 and persistently stores the frame in aframe store 2004. - The
storyboard module 2000 interacts with theframe renderer 2002 to update themetadata catalog 166 to create storyboard frames 2006. That is, themetadata catalog 166 is supplemented with metadata associated with each frame and the storyboard in which it resides. In addition, themetadata catalog 166 may storestoryboard permissions 2008. The storyboard permissions may control permissions at the storyboard level. The permissions may be of the type discussed in connection with the visualization units. Thus, embodiments of the invention express permissions at the visualization unit level and/or the storyboard level. In one embodiment, a scheduler (not shown) operates with theweb application module 162 and theframe renderer 2002 to schedule the rendering of frames in accordance with a refresh schedule discussed in connection withFIG. 17 . - An embodiment of the present invention relates to a computer storage product with a computer readable storage medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media, optical media, magneto-optical media and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using JAVA®, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
- The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.
-
- A server, comprising:
- a data processing module with instructions executed by a processor to:
- maintain a collection of visualization frames that characterize a sequence of data analytics, wherein each visualization frame is a snapshot of data and the collection of visualization frames has associated permissions and visualization settings; and
- maintain a collection of discussion threads for the collection of visualization frames, wherein each discussion thread identifies different users and comments made by the different users.
Claims (15)
- 2. The server of claim 1 wherein each visualization frame has a configurable data refresh parameter.
- 3. The server of
claim 2 wherein the data refresh parameter is selected from data refresh on demand, a scheduled data refresh and a data refresh based upon a data change. - 4. The server of claim 1 wherein each visualization frame has an associated indicator of new comments.
- 5. The server of claim 1 wherein each visualization frame has an associated indicator of new data.
- 6. The server of claim 1 wherein the collection of visualization frames includes a filter configuration block based upon the associated permissions.
- 7. The server of claim 1 wherein the collection of visualization frames includes individual visualization frames with an indicator of filtered data.
- 8. The server of claim 1 wherein the collection of visualization frames is segregated into visualization frame sections.
- 9. The server of claim 1 further comprising individual discussion threads associated with individual visualization frames.
- 10. The server of claim 1 wherein the collection of visualization frames includes a frame with a link to a media file.
- 11. The server of claim 1 wherein the collection of visualization frames is operative as a template that facilitates substitution of a first set of data sources with a second set of data sources to produce a new collection of visualization frames.
- 12. The server of claim 1 wherein the collection of discussion threads includes automatically generated text entries produced in response to a data value exceeding a specified threshold.
- 13. The server of claim 1 wherein the collection of visualization frames includes visualization frames with indicia linking common data elements shown in the visualization frames.
- 14. The server of claim 1 wherein the collection of visualization frames includes visualization frames and a collection of recent discussion threads about the visualization frames.
- 15. The server of claim 1 further comprising instructions executed by the processor to export the collection of visualization frames to an offline file format.
- 16. The server of claim 1 further comprising instructions executed by the processor to transition from a visualization frame to a data source corresponding to the snapshot of data.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/883,502 US20170052977A1 (en) | 2013-05-30 | 2015-10-14 | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources |
US15/225,589 US20160342604A1 (en) | 2013-05-30 | 2016-08-01 | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources Using State Based Visual Data Link Recommendations |
PCT/US2016/056915 WO2017066491A1 (en) | 2013-05-30 | 2016-10-13 | Apparatus and method for collaboratively analyzing data snapshot visualizations from disparate data sources |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361829191P | 2013-05-30 | 2013-05-30 | |
US14/292,765 US9495436B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for ingesting and augmenting data |
US14/292,775 US20140359425A1 (en) | 2013-05-30 | 2014-05-30 | Apparatus and Method for Collaboratively Analyzing Data from Disparate Data Sources |
US14/292,783 US9372913B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for harmonizing data along inferred hierarchical dimensions |
US14/292,788 US9613124B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for state management across visual transitions |
US14/883,502 US20170052977A1 (en) | 2013-05-30 | 2015-10-14 | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/292,775 Continuation-In-Part US20140359425A1 (en) | 2013-05-30 | 2014-05-30 | Apparatus and Method for Collaboratively Analyzing Data from Disparate Data Sources |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/225,589 Continuation-In-Part US20160342604A1 (en) | 2013-05-30 | 2016-08-01 | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources Using State Based Visual Data Link Recommendations |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170052977A1 true US20170052977A1 (en) | 2017-02-23 |
Family
ID=51986301
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/292,783 Active 2035-02-24 US9372913B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for harmonizing data along inferred hierarchical dimensions |
US14/292,788 Active 2034-10-14 US9613124B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for state management across visual transitions |
US14/292,765 Active 2035-01-08 US9495436B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for ingesting and augmenting data |
US14/292,775 Abandoned US20140359425A1 (en) | 2013-05-30 | 2014-05-30 | Apparatus and Method for Collaboratively Analyzing Data from Disparate Data Sources |
US14/883,502 Abandoned US20170052977A1 (en) | 2013-05-30 | 2015-10-14 | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources |
Family Applications Before (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/292,783 Active 2035-02-24 US9372913B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for harmonizing data along inferred hierarchical dimensions |
US14/292,788 Active 2034-10-14 US9613124B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for state management across visual transitions |
US14/292,765 Active 2035-01-08 US9495436B2 (en) | 2013-05-30 | 2014-05-30 | Apparatus and method for ingesting and augmenting data |
US14/292,775 Abandoned US20140359425A1 (en) | 2013-05-30 | 2014-05-30 | Apparatus and Method for Collaboratively Analyzing Data from Disparate Data Sources |
Country Status (4)
Country | Link |
---|---|
US (5) | US9372913B2 (en) |
EP (1) | EP3005174A4 (en) |
HK (1) | HK1223701A1 (en) |
WO (2) | WO2014194251A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107918830A (en) * | 2017-11-20 | 2018-04-17 | 国网重庆市电力公司南岸供电分公司 | A kind of distribution Running State assessment system and method based on big data technology |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8874477B2 (en) | 2005-10-04 | 2014-10-28 | Steven Mark Hoffberg | Multifactorial optimization system and method |
US11290912B2 (en) | 2011-12-14 | 2022-03-29 | Seven Networks, Llc | Mobile device configured for operating in a power save mode and a traffic optimization mode and related method |
US9785890B2 (en) * | 2012-08-10 | 2017-10-10 | Fair Isaac Corporation | Data-driven product grouping |
US9372913B2 (en) * | 2013-05-30 | 2016-06-21 | ClearStory Data Inc. | Apparatus and method for harmonizing data along inferred hierarchical dimensions |
US10037122B2 (en) * | 2014-09-26 | 2018-07-31 | Oracle International Corporation | Canvas layout algorithm |
US10037187B2 (en) | 2014-11-03 | 2018-07-31 | Google Llc | Data flow windowing and triggering |
US10275476B2 (en) * | 2014-12-22 | 2019-04-30 | Verizon Patent And Licensing Inc. | Machine to machine data aggregator |
US9779134B2 (en) | 2014-12-26 | 2017-10-03 | Business Objects Software Ltd. | System and method of data wrangling |
US10812341B1 (en) | 2015-04-06 | 2020-10-20 | EMC IP Holding Company LLC | Scalable recursive computation across distributed data processing nodes |
US10496926B2 (en) | 2015-04-06 | 2019-12-03 | EMC IP Holding Company LLC | Analytics platform for scalable distributed computations |
US10515097B2 (en) | 2015-04-06 | 2019-12-24 | EMC IP Holding Company LLC | Analytics platform for scalable distributed computations |
US10791063B1 (en) | 2015-04-06 | 2020-09-29 | EMC IP Holding Company LLC | Scalable edge computing using devices with limited resources |
US10541936B1 (en) | 2015-04-06 | 2020-01-21 | EMC IP Holding Company LLC | Method and system for distributed analysis |
US10541938B1 (en) | 2015-04-06 | 2020-01-21 | EMC IP Holding Company LLC | Integration of distributed data processing platform with one or more distinct supporting platforms |
US10776404B2 (en) | 2015-04-06 | 2020-09-15 | EMC IP Holding Company LLC | Scalable distributed computations utilizing multiple distinct computational frameworks |
US10505863B1 (en) | 2015-04-06 | 2019-12-10 | EMC IP Holding Company LLC | Multi-framework distributed computation |
US10860622B1 (en) | 2015-04-06 | 2020-12-08 | EMC IP Holding Company LLC | Scalable recursive computation for pattern identification across distributed data processing nodes |
US10706970B1 (en) | 2015-04-06 | 2020-07-07 | EMC IP Holding Company LLC | Distributed data analytics |
US10425350B1 (en) | 2015-04-06 | 2019-09-24 | EMC IP Holding Company LLC | Distributed catalog service for data processing platform |
US10528875B1 (en) | 2015-04-06 | 2020-01-07 | EMC IP Holding Company LLC | Methods and apparatus implementing data model for disease monitoring, characterization and investigation |
US10511659B1 (en) | 2015-04-06 | 2019-12-17 | EMC IP Holding Company LLC | Global benchmarking and statistical analysis at scale |
US10270707B1 (en) | 2015-04-06 | 2019-04-23 | EMC IP Holding Company LLC | Distributed catalog service for multi-cluster data processing platform |
US10509684B2 (en) | 2015-04-06 | 2019-12-17 | EMC IP Holding Company LLC | Blockchain integration for scalable distributed computations |
US10404787B1 (en) * | 2015-04-06 | 2019-09-03 | EMC IP Holding Company LLC | Scalable distributed data streaming computations across multiple data processing clusters |
US11068647B2 (en) * | 2015-05-28 | 2021-07-20 | International Business Machines Corporation | Measuring transitions between visualizations |
CN106487694B (en) * | 2015-08-27 | 2020-03-27 | 华为技术有限公司 | Data stream processing method and device |
US10929417B2 (en) | 2015-09-11 | 2021-02-23 | International Business Machines Corporation | Transforming and loading data utilizing in-memory processing |
US10607139B2 (en) | 2015-09-23 | 2020-03-31 | International Business Machines Corporation | Candidate visualization techniques for use with genetic algorithms |
US10656861B1 (en) | 2015-12-29 | 2020-05-19 | EMC IP Holding Company LLC | Scalable distributed in-memory computation |
US10078537B1 (en) | 2016-06-29 | 2018-09-18 | EMC IP Holding Company LLC | Analytics platform and associated controller for automated deployment of analytics workspaces |
US11055303B2 (en) | 2016-06-29 | 2021-07-06 | EMC IP Holding Company LLC | Ingestion manager for analytics platform |
US10685035B2 (en) | 2016-06-30 | 2020-06-16 | International Business Machines Corporation | Determining a collection of data visualizations |
US10521442B1 (en) | 2016-09-16 | 2019-12-31 | EMC IP Holding Company LLC | Hierarchical value-based governance architecture for enterprise data assets |
US10452679B2 (en) * | 2016-09-30 | 2019-10-22 | Google Llc | Systems and methods for context-sensitive data annotation and annotation visualization |
US10552997B2 (en) | 2016-12-22 | 2020-02-04 | Here Global B.V. | Data aware interface controls |
US10374968B1 (en) | 2016-12-30 | 2019-08-06 | EMC IP Holding Company LLC | Data-driven automation mechanism for analytics workload distribution |
US10453228B2 (en) | 2017-03-08 | 2019-10-22 | Microsoft Technology Licensing, Llc | Difference visualization between data sets |
US10581945B2 (en) | 2017-08-28 | 2020-03-03 | Banjo, Inc. | Detecting an event from signal data |
US11025693B2 (en) | 2017-08-28 | 2021-06-01 | Banjo, Inc. | Event detection from signal data removing private information |
US10313413B2 (en) * | 2017-08-28 | 2019-06-04 | Banjo, Inc. | Detecting events from ingested communication signals |
EP3588894B1 (en) * | 2018-06-28 | 2022-08-10 | eperi GmbH | Communicating data between computers by harmonizing data types |
JP7059579B2 (en) * | 2017-11-14 | 2022-04-26 | 富士フイルムビジネスイノベーション株式会社 | Information processing system, information processing device, and program |
US10970301B2 (en) | 2017-12-27 | 2021-04-06 | Sap Se | Keyfigure comments bound to database level persistence |
US10585724B2 (en) | 2018-04-13 | 2020-03-10 | Banjo, Inc. | Notifying entities of relevant events |
WO2019217437A2 (en) | 2018-05-07 | 2019-11-14 | Eolianvr, Incorporated | Device and content agnostic, interactive, collaborative, synchronized mixed reality system and method |
US11269905B2 (en) | 2019-06-20 | 2022-03-08 | International Business Machines Corporation | Interaction between visualizations and other data controls in an information system by matching attributes in different datasets |
CN111046350A (en) * | 2019-10-31 | 2020-04-21 | 贝壳技术有限公司 | Authority information processing method and system |
CN111092786B (en) * | 2019-12-12 | 2022-03-08 | 中盈优创资讯科技有限公司 | Network equipment safety authentication service reliability enhancing system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110302221A1 (en) * | 2010-06-04 | 2011-12-08 | Salesforce.Com, Inc. | Methods and systems for analyzing a network feed in a multi-tenant database system environment |
US20120311500A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc. | Graphical User Interfaces for Displaying Media Items |
US8510646B1 (en) * | 2008-07-01 | 2013-08-13 | Google Inc. | Method and system for contextually placed chat-like annotations |
US20130262410A1 (en) * | 2012-03-30 | 2013-10-03 | Commvault Systems, Inc. | Data previewing before recalling large data files |
Family Cites Families (69)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870746A (en) * | 1995-10-12 | 1999-02-09 | Ncr Corporation | System and method for segmenting a database based upon data attributes |
US5832496A (en) * | 1995-10-12 | 1998-11-03 | Ncr Corporation | System and method for performing intelligent analysis of a computer database |
US6141759A (en) | 1997-12-10 | 2000-10-31 | Bmc Software, Inc. | System and architecture for distributing, monitoring, and managing information requests on a computer network |
US6687878B1 (en) * | 1999-03-15 | 2004-02-03 | Real Time Image Ltd. | Synchronizing/updating local client notes with annotations previously made by other clients in a notes database |
DE19952527C2 (en) | 1999-10-30 | 2002-01-17 | Ibrixx Ag Fuer Etransaction Ma | Process and transaction interface for secure data exchange between distinguishable networks |
US6981225B1 (en) * | 2000-01-31 | 2005-12-27 | Intel Corporation | Indicating the differences between Internet web pages |
US20030023593A1 (en) * | 2000-05-11 | 2003-01-30 | Richard Schmidt | Real-time adaptive data mining system and method |
US8380630B2 (en) | 2000-07-06 | 2013-02-19 | David Paul Felsher | Information record infrastructure, system and method |
US7171379B2 (en) | 2001-03-23 | 2007-01-30 | Restaurant Services, Inc. | System, method and computer program product for normalizing data in a supply chain management framework |
US20030069848A1 (en) | 2001-04-06 | 2003-04-10 | Larson Daniel S. | A User interface for computer network management |
US7526425B2 (en) | 2001-08-14 | 2009-04-28 | Evri Inc. | Method and system for extending keyword searching to syntactically and semantically annotated data |
EP1438689A2 (en) * | 2001-09-28 | 2004-07-21 | Accenture Global Services GmbH | Collaborative portal system for business launch centers and other environments |
US6738774B2 (en) * | 2001-10-24 | 2004-05-18 | Environmental Management Solutions | Method for benchmarking standardized data element values of agricultural operations through an internet accessible central database and user interface |
WO2004002044A2 (en) * | 2002-02-01 | 2003-12-31 | John Fairweather | A system for exchanging binary data |
US20030154443A1 (en) * | 2002-02-13 | 2003-08-14 | Ncr Corporation | Visual discovery tool |
US6928617B2 (en) * | 2002-04-11 | 2005-08-09 | International Business Machines Corporation | Segmentation of views for simplified navigation on limited device |
US7337180B2 (en) | 2002-12-20 | 2008-02-26 | Sap Aktiengesellschaft | Displaying data tables in user interfaces |
EP1609044A4 (en) * | 2003-03-28 | 2008-08-06 | Dun & Bradstreet Inc | System and method for data cleansing |
US9130897B2 (en) | 2003-09-30 | 2015-09-08 | Ca, Inc. | System and method for securing web services |
US7496500B2 (en) | 2004-03-01 | 2009-02-24 | Microsoft Corporation | Systems and methods that determine intent of data and respond to the data based on the intent |
US8438140B2 (en) * | 2004-12-23 | 2013-05-07 | Business Objects Software Ltd. | Apparatus and method for generating reports from versioned data |
US20080040124A1 (en) | 2005-02-14 | 2008-02-14 | Sony Chemicals & Information Device Corporation | Business Process System, Business Process Method, and Information Processing Apparatus |
US20070011183A1 (en) * | 2005-07-05 | 2007-01-11 | Justin Langseth | Analysis and transformation tools for structured and unstructured data |
US8874477B2 (en) | 2005-10-04 | 2014-10-28 | Steven Mark Hoffberg | Multifactorial optimization system and method |
US7934660B2 (en) * | 2006-01-05 | 2011-05-03 | Hand Held Products, Inc. | Data collection system having reconfigurable data collection terminal |
US8000995B2 (en) * | 2006-03-22 | 2011-08-16 | Sas Institute Inc. | System and method for assessing customer segmentation strategies |
US7735101B2 (en) | 2006-03-28 | 2010-06-08 | Cisco Technology, Inc. | System allowing users to embed comments at specific points in time into media presentation |
US8032405B2 (en) | 2006-11-22 | 2011-10-04 | Proclivity Systems, Inc. | System and method for providing E-commerce consumer-based behavioral target marketing reports |
US20080127322A1 (en) | 2006-11-28 | 2008-05-29 | Azaleos Corporation | Solicited remote control in an interactive management system |
US8495663B2 (en) * | 2007-02-02 | 2013-07-23 | Microsoft Corporation | Real time collaboration using embedded data visualizations |
US20090094271A1 (en) | 2007-06-26 | 2009-04-09 | Allurdata Llc | Variable driven method and system for the management and display of information |
US8832073B2 (en) | 2007-06-29 | 2014-09-09 | Alcatel Lucent | Method and apparatus for efficient aggregate computation over data streams |
US8180886B2 (en) | 2007-11-15 | 2012-05-15 | Trustwave Holdings, Inc. | Method and apparatus for detection of information transmission abnormalities |
US7769746B2 (en) * | 2008-01-16 | 2010-08-03 | Yahoo! Inc. | Local query identification and normalization for web search |
US8230319B2 (en) * | 2008-01-31 | 2012-07-24 | Microsoft Corporation | Web-based visualization, refresh, and consumption of data-linked diagrams |
US20090198706A1 (en) * | 2008-02-01 | 2009-08-06 | Electronic Data Systems Corporation | System and method for managing facility location data |
US20090210391A1 (en) * | 2008-02-14 | 2009-08-20 | Hall Stephen G | Method and system for automated search for, and retrieval and distribution of, information |
US8266187B2 (en) | 2008-02-19 | 2012-09-11 | Hewlett-Packard Development Company, L.P. | Integration of static and dynamic data for database entities and the unified presentation thereof |
US20090234570A1 (en) * | 2008-03-13 | 2009-09-17 | Sever Gil | Method and apparatus for universal and unified location representation and its interaction with gps devices |
US20090248707A1 (en) * | 2008-03-25 | 2009-10-01 | Yahoo! Inc. | Site-specific information-type detection methods and systems |
US9418110B1 (en) | 2008-06-30 | 2016-08-16 | Emc Corporation | Intelligent, scalable, low-overhead mechanism for data retrieval in a distributed network environment |
US20100036884A1 (en) | 2008-08-08 | 2010-02-11 | Brown Robert G | Correlation engine for generating anonymous correlations between publication-restricted data and personal attribute data |
CA2738801A1 (en) | 2008-10-07 | 2010-04-15 | Zap Holdings Limited | Synchronization of relational databases with olap cubes |
US8315994B2 (en) * | 2008-10-31 | 2012-11-20 | Disney Enterprises, Inc. | System and method for updating digital media content |
US20100131457A1 (en) | 2008-11-26 | 2010-05-27 | Microsoft Corporation | Flattening multi-dimensional data sets into de-normalized form |
US9916381B2 (en) * | 2008-12-30 | 2018-03-13 | Telecom Italia S.P.A. | Method and system for content classification |
US8190538B2 (en) * | 2009-01-30 | 2012-05-29 | Lexisnexis Group | Methods and systems for matching records and normalizing names |
WO2010091186A2 (en) | 2009-02-04 | 2010-08-12 | Breach Security, Inc. | Method and system for providing remote protection of web servers |
WO2011064756A2 (en) | 2009-11-29 | 2011-06-03 | Kinor Knowledge Networks Ltd. | Automated generation of ontologies |
US20110246298A1 (en) * | 2010-03-31 | 2011-10-06 | Williams Gregory D | Systems and Methods for Integration and Anomymization of Supplier Data |
US8799299B2 (en) | 2010-05-27 | 2014-08-05 | Microsoft Corporation | Schema contracts for data integration |
CA2704676A1 (en) | 2010-05-28 | 2010-08-12 | Ibm Canada Limited - Ibm Canada Limitee | Managing drill-through parameter mappings |
US20120089562A1 (en) | 2010-10-04 | 2012-04-12 | Sempras Software, Inc. | Methods and Apparatus for Integrated Management of Structured Data From Various Sources and Having Various Formats |
KR101206095B1 (en) | 2010-11-30 | 2012-11-28 | 엘에스산전 주식회사 | Intelligent Electric Device, network system including the device and the protecting method for the network |
US9225793B2 (en) * | 2011-01-28 | 2015-12-29 | Cisco Technology, Inc. | Aggregating sensor data |
US20120197856A1 (en) | 2011-01-28 | 2012-08-02 | Cisco Technology, Inc. | Hierarchical Network for Collecting, Aggregating, Indexing, and Searching Sensor Data |
US9275093B2 (en) | 2011-01-28 | 2016-03-01 | Cisco Technology, Inc. | Indexing sensor data |
US20120290606A1 (en) | 2011-05-11 | 2012-11-15 | Searchreviews LLC | Providing sentiment-related content using sentiment and factor-based analysis of contextually-relevant user-generated data |
US20120290622A1 (en) * | 2011-05-11 | 2012-11-15 | Searchviews LLC | Sentiment and factor-based analysis in contextually-relevant user-generated data management |
EP2740026A4 (en) | 2011-08-03 | 2015-06-24 | Ingenuity Systems Inc | Methods and systems for biological data analysis |
US20140266012A1 (en) | 2013-03-15 | 2014-09-18 | Z124 | Mobile Handset Recharge |
US9021021B2 (en) * | 2011-12-14 | 2015-04-28 | Seven Networks, Inc. | Mobile network reporting and usage analytics system and method aggregated using a distributed traffic optimization system |
US9195777B2 (en) | 2012-03-07 | 2015-11-24 | Avira B.V. | System, method and computer program product for normalizing data obtained from a plurality of social networks |
US10235205B2 (en) | 2012-05-24 | 2019-03-19 | Citrix Systems, Inc. | Remote management of distributed datacenters |
US9824127B2 (en) * | 2012-10-22 | 2017-11-21 | Workday, Inc. | Systems and methods for interest-driven data visualization systems utilized in interest-driven business intelligence systems |
US20140164071A1 (en) | 2012-11-28 | 2014-06-12 | Michael R. English | System and Methods for Analyzing Business Data |
US9400830B2 (en) | 2013-03-21 | 2016-07-26 | Sap Se | Key figure data filters in OLAP with hierarchies |
US9436919B2 (en) * | 2013-03-28 | 2016-09-06 | Wal-Mart Stores, Inc. | System and method of tuning item classification |
US9372913B2 (en) * | 2013-05-30 | 2016-06-21 | ClearStory Data Inc. | Apparatus and method for harmonizing data along inferred hierarchical dimensions |
-
2014
- 2014-05-30 US US14/292,783 patent/US9372913B2/en active Active
- 2014-05-30 EP EP14803847.4A patent/EP3005174A4/en not_active Withdrawn
- 2014-05-30 WO PCT/US2014/040334 patent/WO2014194251A2/en active Application Filing
- 2014-05-30 US US14/292,788 patent/US9613124B2/en active Active
- 2014-05-30 US US14/292,765 patent/US9495436B2/en active Active
- 2014-05-30 US US14/292,775 patent/US20140359425A1/en not_active Abandoned
-
2015
- 2015-10-14 US US14/883,502 patent/US20170052977A1/en not_active Abandoned
-
2016
- 2016-10-13 HK HK16111818.9A patent/HK1223701A1/en unknown
- 2016-10-13 WO PCT/US2016/056915 patent/WO2017066491A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8510646B1 (en) * | 2008-07-01 | 2013-08-13 | Google Inc. | Method and system for contextually placed chat-like annotations |
US20110302221A1 (en) * | 2010-06-04 | 2011-12-08 | Salesforce.Com, Inc. | Methods and systems for analyzing a network feed in a multi-tenant database system environment |
US20120311500A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc. | Graphical User Interfaces for Displaying Media Items |
US20130262410A1 (en) * | 2012-03-30 | 2013-10-03 | Commvault Systems, Inc. | Data previewing before recalling large data files |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107918830A (en) * | 2017-11-20 | 2018-04-17 | 国网重庆市电力公司南岸供电分公司 | A kind of distribution Running State assessment system and method based on big data technology |
Also Published As
Publication number | Publication date |
---|---|
US20140359425A1 (en) | 2014-12-04 |
WO2014194251A3 (en) | 2015-01-22 |
WO2014194251A2 (en) | 2014-12-04 |
US20140358846A1 (en) | 2014-12-04 |
US9613124B2 (en) | 2017-04-04 |
US20140358975A1 (en) | 2014-12-04 |
EP3005174A2 (en) | 2016-04-13 |
US9372913B2 (en) | 2016-06-21 |
HK1223701A1 (en) | 2017-08-04 |
US20140358999A1 (en) | 2014-12-04 |
EP3005174A4 (en) | 2017-02-22 |
US9495436B2 (en) | 2016-11-15 |
WO2017066491A1 (en) | 2017-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170052977A1 (en) | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources | |
US20160342604A1 (en) | Apparatus and Method for Collaboratively Analyzing Data Snapshot Visualizations from Disparate Data Sources Using State Based Visual Data Link Recommendations | |
US12050598B2 (en) | Dynamic dashboard with guided discovery | |
US9582495B2 (en) | Domain knowledge driven semantic extraction system | |
US8892513B2 (en) | Method, process and system to atomically structure varied data and transform into context associated data | |
US20170139891A1 (en) | Shared elements for business information documents | |
US8180795B2 (en) | Apparatus and method for distribution of a report with dynamic write-back to a data source | |
US11537496B2 (en) | Audit logging database system and user interface | |
US20230368091A1 (en) | Systems and methods for efficiently distributing alert messages | |
US20140229222A1 (en) | Integrated project planning and management application | |
US12032586B2 (en) | Systems and methods for generation and display of query visualizations | |
US20140359742A1 (en) | Apparatus and Method for Agent Based Ingestion of Data | |
US10769164B2 (en) | Simplified access for core business with enterprise search | |
CN111143328A (en) | Agile business intelligent data construction method, system, equipment and storage medium | |
US20230087339A1 (en) | System and method for generating automatic insights of analytics data | |
Berwind et al. | Hadoop Ecosystem Tools and Algorithms | |
US20120089593A1 (en) | Query optimization based on reporting specifications | |
US20090271699A1 (en) | Apparatus and method for updating a report through view time interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CLEARSTORY DATA INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SATTIRAJU, KIRAN;VAN DER MOLEN, DOUGLAS WAYNE;LAGERBLAD, BO JONAS BIRGER;AND OTHERS;SIGNING DATES FROM 20160428 TO 20160613;REEL/FRAME:039566/0138 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |