US20060093222A1 - Data processing, analysis, and visualization system for use with disparate data types - Google Patents
Data processing, analysis, and visualization system for use with disparate data types Download PDFInfo
- Publication number
- US20060093222A1 US20060093222A1 US11/282,567 US28256705A US2006093222A1 US 20060093222 A1 US20060093222 A1 US 20060093222A1 US 28256705 A US28256705 A US 28256705A US 2006093222 A1 US2006093222 A1 US 2006093222A1
- Authority
- US
- United States
- Prior art keywords
- data
- records
- attributes
- chart
- record
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9038—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/40—Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
Definitions
- This invention relates to data mining and visualization.
- the invention relates to methods for analyzing text, numerical, categorical, and sequence data within a single framework.
- the invention also relates to an integrated approach for interactively linking and visualizing disparate data types.
- This data may include not only text information, but also DNA sequences, protein sequences, numerical data (e.g., from gene chip assays), and categoric data.
- What is needed, therefore, is a tool that allows a user to analyze, mine, link, and visualize information of disparate data types within an integrated framework.
- Systems and methods consistent with the present invention aids a user in analyzing large volumes of information that contain different types of data, such as textual data, numeric data, categorical data, or sequential string data. Such systems and methods determine and display the relative content and context of information and aid in identifying relationships among disparate data types.
- one such method defines a uniform data structure for representing the content of an object of different data types, selects attributes of different objects of a variety of different data types that may be represented in the uniform data structure and operates on the selected attributes to produce first representations of the objects in correspondence with the uniform data structure.
- the data types may include numeric, sequence string, categorical and text data types.
- An index may be produced that includes second representations of non-selected attributes of a particular object and that associates the non-selected attributes with a particular first representation.
- the first and second representations may be vector representations.
- a first set of the selected attributes associated with a first set of objects may be used to determine the relationships among the first set of objects of a particular data type and non-selected attributes associated with the first set of selected attributes may be used to correlate objects represented by the first set of selected attributes with a second set of objects represented by a second set of selected attributes.
- the first and second set of objects may be displayed in first and second windows on a display screen and the second set of objects that corresponds to the selected object or objects may be highlighted.
- a method consistent with the present invention identifies relationships among different visualizations of data sets and includes displaying first graphical results of a first type analysis performed on selected attributes of a first set of objects and displaying second graphical results of a second type analysis performed on selected attributes of a second set of objects. Certain objects represented in the first graphical results may be selected and corresponding objects represented by the second graphical results that correspond to the certain objects are highlighted. The highlighting may be based on attributes not used for creating the first graphical results.
- Another aspect of the present invention is directed to a system and a method for visualization of multiple queries to a database that includes selecting multiple queries to a database, querying records in the database based on the multiple queries, creating a query matrix indexed based on the selecting, and populating the query matrix based on the querying.
- Another method consistent with the present invention interactively displays records and their corresponding attributes and includes generating a first 2-D chart for a first record, where at least two attributes associated with the first record are shown along one axis, and the values of the attributes are shown along the other axis.
- Input is received from a user selecting the first record on the first 2-D chart and an index is analyzed to determine if the first record is shown in another view. If the first record is shown in another view, the visual representation of the first record is altered in the another view based on the user input.
- Another method consistent with the present invention interactively displays records and their corresponding attributes and includes generating a 2-D scatter chart that depicts a plurality of records.
- a 2-D line chart is generated for a group of records contained in a portion of the 2-D scatter chart. At least two attributes associated with the group of records are shown along one axis, and a statistical value for each of the at least two attributes is shown along the other axis.
- a 2-D line chart is superimposed at a location on the 2-D scatter chart that is based on the location of the group of records on the 2-D scatter chart.
- FIG. 1 is a block diagram of visualizations screens or views that are consistent with the present invention
- FIG. 2 a is a block diagram of a computer system and program modules consistent with the present invention
- FIGS. 2 b , 2 c , 2 d and 2 e are block diagrams of program modules consistent with the present invention.
- FIG. 3 is a flow diagram of a processes associated with a data editor consistent with the present invention.
- FIGS. 4 a and 4 b are screen shots associated with a data editor consistent with the present invention.
- FIG. 5 a - 5 d are flow diagrams of a processes associated with a view editor consistent with the present invention.
- FIGS. 6 a - 6 m are screen shots associated with a view editor consistent with the present invention.
- FIGS. 7 a and 7 b are flow diagrams of processes associated with an analysis processing module consistent with the present invention.
- FIG. 8 is an example file format consistent with an embodiment of the present invention.
- FIG. 9 is a flow diagram of a clustering process consistent with the present invention.
- FIG. 10 is a flow diagram of a projection process consistent with the present invention.
- FIG. 11 is table that identifies operations of program modules used in conjunction the meta data consistent with the present invention.
- FIG. 12 is a flow diagram of a visualization linking process consistent with the present invention.
- FIG. 13 a flow diagram of a method consistent with the invention for displaying information interactively by using 2-D charts
- FIG. 14 is a representative user interface screen showing 2-D line charts consistent with the invention.
- FIG. 15 is another representative user interface screen showing 2-D point charts consistent with the invention.
- FIG. 16 is another representative user interface screen showing 2-D line charts linked to a galaxy view consistent with the invention.
- FIG. 17 a flow diagram of a method consistent with the invention for displaying information interactively by using summary miniplots
- FIG. 18 is a representative user interface screen showing the use of summary miniplots in a galaxy view
- FIG. 19 provides an illustration of a multiple query tool visualization according to the present invention.
- FIG. 20 illustrates a process of creating a visualization using the multiple query tool
- FIG. 21 illustrates a dialog box to set the type of query
- FIGS. 22A-22C display exemplary parameter-setting dialog boxes for query types shown in FIG. 21 ;
- FIG. 23 illustrates a query matrix according to an aspect of the present invention
- FIG. 24 illustrates a visualization of the query matrix of FIG. 23 indexed by records
- FIG. 25 illustrates a visualization of the query matrix of FIG. 23 indexed by clusters
- FIG. 26 illustrates a visualization as a three-dimensional view
- FIG. 27 illustrates a two-dimensional scatter plot of rows vs. values
- FIG. 28 illustrates the contents of a menu bar, with associated sub-menus, of the visualization of FIG. 19 ;
- FIG. 29 illustrates examples of functions of a tool bar associated with the visualization of FIG. 19 ;
- FIGS. 30A and 30B illustrates views of a visualization matrix having a grid and not having a grid, respectively.
- Systems and methods consistent with the present invention are useful in analyzing information that contains different types of data and presenting the information to the user in an interactive visual format that allows the user to discover relationships among the different data types.
- Such methods and systems include high-dimensional context vector creation for representing elements of a dataset, visualization techniques for representing elements of a dataset including methods for indicating relationships among objects in a proximity map, and interaction among datasets including linking the visualizations and a common set of interactive tools.
- the interactions, regardless of data type, among the visualizations and the common set of tools for the interactions is enabled by maintaining meta data, as discussed herein, in a common set of file structures (or database).
- Methods and systems consistent with the present invention may include various visualization tools for representing information used in connection with the present invention.
- a tool for visualizing multiple queries to a database is provided.
- the visual representation of the first record is altered in the second view based on the user input.
- a 2-D line chart is superimposed at a location on a 2-D scatter chart that is based on the location of a group of records on the 2-D scatter chart.
- Other tools consistent with the present invention may be used in conjunction with the methods and systems described herein.
- a record generally refers to an individual element of a data set.
- the characteristics associated with records are generally referred to herein as attributes.
- a data set containing records is generally processed as follows. First, the information represented by the records (including text, numeric, categoric, and sequence/string data) are received in electronic form. Second, the records are analyzed to produce a high-dimensional vector for each record. Third, the high-dimensional vectors may be grouped in space (i.e. a coordinate system) to identify relationships, such as clustering among the various records of the data set. Fourth, the high-dimensional vectors are converted to a two-dimensional representation for viewing purposes.
- projection The two-dimensional representation of the high-dimensional vectors is generally referred to herein as “projection.”
- the projections may be viewed in different formats according to user-selected options, as shown by the four views ( 110 , 120 , 130 , and 140 ) on display monitor 100 in FIG. 1 .
- Systems and methods consistent with the present invention enable a user to select a record in view 110 and cause the corresponding record in another view to be highlighted. For example, selecting a particular record in view 110 causes the corresponding records 122 and 132 to be highlighted in views 120 and 130 , respectively.
- the highlighted points may represent different analyses performed on the same records or may represent different data types associated with the records.
- FIG. 2 a depicts a computer system 200 consistent with the present invention.
- Computer programs used to implement methods consistent with the present invention are generally located in a memory unit 210 , and the processes of the present invention are carried out through the use of a central processing unit (CPU) 280 in conjunction with application programs or modules.
- CPU central processing unit
- memory unit 210 is representative of read-only, random access memory, and other memory elements used in a computer system. For simplicity, many components of a computer system have not been illustrated, such as address buffers and other standard control circuits; these elements are well known in the art.
- Memory unit 210 contains databases, tables, and files that are used in carrying out the processes associated with the present invention.
- CPU 280 in combination with computer software and an operating system, controls the operations of the computer system.
- Memory unit 210 , CPU 280 , and other components of the computer system communicate via a bus 284 .
- Data or signals resulting from the processes of the present invention are output from the computer system via an input/output (I/O) interface 290 .
- I/O input/output
- the computer program modules and data used by methods and systems consistent with the present invention include visualization set up programs 212 , processing programs 220 , meta data files 230 , interactive graphics and tools programs 240 , and an application interface 250 .
- the visualization set up programs 212 determine the name to be used for a collection of records identified by a user, determine the formats to be used for reading files associated with the records, identify formatting conventions for storing and indexing the records, and determine parameters to be used for analysis and viewing of the records.
- the processing programs 220 transform the raw data of the identified records into meta data, which in turn is used by the interactive visualization tools.
- the meta data files 230 include the results of statistical feature extraction, n-space representation, clustering, indexing and other information used to construct and interact among the different views.
- the interactive graphics and tools programs 240 enable the user to explore and interact with various views to identify the relationships among records.
- the application programming interface (API) 250 enables the components 212 , 220 , 230 , and 240 to exchange and interface information as needed for use in analysis and visual display.
- the visualization setup programs 212 further include a data set editor 214 and a view editor 216 .
- the processing programs 220 further include vector programs 222 , cluster programs 224 , and projection programs 226 .
- the meta data files 230 are a subset of databases and files 260 .
- the data set editor 212 enables the user to define the collection of records (i.e., a data set) to be analyzed, identifies the data type, and creates directories for use in organizing the data of the data set.
- the view editor 216 sets up the user's raw data for viewing by the interactive tools and graphics.
- Vector programs 222 create high-dimensional context vectors that represent attributes of the records of the data set.
- Cluster program 224 groups related records near each other in a given space (cluster) to enable a user to visually determine relationships.
- Projection programs 226 convert high-dimensional representations of the records of a data set to a two-dimensional or three-dimensional representation that is used for display.
- the databases and files 260 contain data used in conjunction with the present invention, such as the meta data 230 .
- FIG. 3 illustrates an implementation of processes performed to define and enable the formatting of a selected data set, as performed by the data set editor 212 .
- a data file to be used as the source for the subsequent analysis is requested (step 302 ).
- the process determines and validates the data type indicated by the user (step 310 ).
- the validation process first determines whether the data of the source data file is in a common sequence data format (step 312 ). If the data is not one of the common sequence data formats, the process determines whether the data is an array of data consisting of numeric, categoric, sequence, or text (step 314 ). If the data is not a data array, the process determines whether the data is free form text (step 316 ). If the data is not free form text (step 316 ), an error message is generated (step 320 ).
- the process determines whether the sequence data is in FastA file format (step 322 ) or whether the sequence data is in a SwissProt file format (step 324 ).
- An example FastA input file is provided in Appendix B. The operations and data associated with processing sequence data is discussed in more detail in U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Methods and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed on Sep. 30, 1999, by Jeffrey Saffer, et al.
- step 320 If the sequence data is not in either of these formats, an error message is generated (step 320 ). If, however, the data is either a FASTA file (step 322 ) or a SwissProt file (step 324 ), the appropriate formats and delimiters, as discussed herein, are determined to be used for the respective FASTA file or SwissProt file (step 330 ). After the appropriate format/delimiters for the data type are determined (step 330 ), the corresponding format file/record delimiters are established (step 340 ). The format file/record delimiters specify the valid formats for reading the files and identifies the meta data files that are to be used for subsequent processing of the data set as discussed herein.
- a file directory 360 is created for storing the meta data files associated with the data set (step 350 ).
- the file directory 360 includes a document catalog file (DCAT) 362 and a data set properties file 364 .
- the DCAT file 362 is used as a master index for all records in the data set.
- the indexes stored in the DCAT file are used to integrate the information associated with the various views selected for the data set.
- the DCAT file 362 contains indexes that associate all the data of a data set with a particular view, although only a subset of the data set is used to create the view.
- the properties file 364 is also produced and stored in the file directory and contains information about the source data files for the view, including their type (corpus type), the number and full path (location) for the source files, the format used, and the date created. In addition, the properties file keeps track of subsequently processed views including the subdirectory where those views reside.
- An example properties file is provided in Appendix A.
- FIGS. 4 a and 4 b depict exemplary screen shots presented on a display monitor to a user for defining a new data set (i.e., collection of records) using data set editor 212 .
- a user names and defines a data set using the data set editor 212 .
- a graphical interface screen 400 is presented to a user for use in defining options or parameters associated with the data set. For example, graphical interface screen 400 is presented to a user when the user selects the sources tab 410 .
- the user may enter a name for the data set in a field 412 and may specify the data set type as indicated by the selection options 414 , such as array data, protein or nucleotide sequences, or text.
- the source of this data set may be specified in the field 418 as indicated by the directory and subdirectory specification 420 .
- the user may select the add, view, or delete options 424 to perform the function indicated by the name on the data set source.
- the user may save the data as indicated by the option 426 or continue to a new view as indicated by the option 428 .
- the user may specify how fields contained within the source file are delimited by selection of a field delimiter option 442 .
- the field delimiter options illustrated include an option to delimit the field by a colon, comma, space, tab, or a user defined delimiter.
- FIG. 5 a illustrates a preferred implementation of a process used for creating parameters to be used in defining the type of analyses or views for a data set, as performed by view editor 216 .
- the user may enter this information using a graphical interface as depicted in FIG. 619 , which shows source file tab 604 , format tab 610 , preparation tab 630 , processing tab 660 , clustering tab 680 , and projection tab 690 , respectively.
- FIG. 6 b is a screen display showing the options presented to a user when the format tab 610 is selected.
- the user may provide in the format file field 610 , a file to use for formatting the view such as medline 31.fmt.
- the user may also specify a stop words file such as the default text stop file shown in the field 614 . This stop words file is a list of words that the text engine will ignore during analysis.
- the user may input a file to specify the default punctuation of the file as indicated by the default.punc file indicated in the field 616 .
- the punctuation file tells the text engine how to handle non-alphabet characters.
- the user may use the default file specified by the system or choose another.
- the user may select or view any of the files of the format screen of FIG. 6 b by selecting the select option 620 or the view option 622 .
- the user is also requested to provide preparation parameters (step 540 ).
- the processes associated with step 540 are discussed in more detail in FIG. 5 b .
- the user may specify vector creation, cluster, and projection parameters to be used in constructing a view (steps 550 , 560 , and 570 , respectively).
- the projection parameters include cluster cohesion, cluster area, and cluster spread.
- Vector creation and clustering parameter processes are discussed in more detail in FIGS. 5 c and 5 d , respectively.
- the view editor first checks the data type (step 541 ) by evaluating whether the data is sequence data (step 542 ). If the data is sequence data, sequence specific preparation information is requested (step 543 ), such as requesting number and length of n grams, SEG parameters, substitution filter values, and motif pattern file parameters (step 544 ). If the data is not sequence data (step 542 ), the process determines whether the data is numeric data (step 545 ). If the data is not numeric data, no preprocessing or preparation information is required for text information (step 546 ). If the data is numeric data, a display screen that requests numeric data and preparation information from a user (step 547 ) is presented. The numeric preparation data request may include column/row specifications, operation sets, and clustering fields (step 548 ).
- FIG. 5 c illustrates a preferred implementation of the processes associated with gathering vector creation parameters within the view editor 216 ( FIG. 2 ).
- the view editor 216 first checks the data type (step 551 ). If the data is sequence data (step 552 ), sequence specific text engine parameters are requested or obtained for the particular data set (step 553 ).
- the text engine parameters requested may include the number of topics/cross terms, topicality settings, use association t/f parameters, associated matrix threshold parameters, and record filter ranges (step 554 ).
- the view editor determines whether the data is text data (step 555 ). If the data is text data, text specific text engine parameters are requested from the user (step 556 ) such as the text engine parameters discussed above (step 554 ). If the data is not text data (step 555 ), no user specified parameters are needed and default parameters may be used (step 557 ). The text engine parameters may be used if desired (step 554 ).
- FIG. 5 d illustrates a preferred implementation of a process for specifying clustering parameters.
- clustering may be used such as k-means or hierarchical clustering as known to those skilled in the art.
- the view editor 216 presents a display screen to the user for the user to specify the clustering choice (step 561 ).
- the process determines whether k-means clustering has been chosen (step 562 ). If k-means clustering is requested (step 562 ), k-means clustering parameters are requested from a user or obtained (step 563 ) such as the number of clusters, the number of iterations, the cluster seed method or whether correlation order is to be used (step 564 ).
- the process determines whether the user desires hierarchical clustering (step 565 ), and displays or gets hierarchical clustering parameters (step 566 ).
- the hierarchical clustering parameters may include determining the number of clusters or cluster coherence values to be used and whether the user desires correlation order for the clusters may be determined (step 567 ). If hierarchical clustering is not desired (step 565 ), no parameters are required (step 568 ).
- the user when the preparation tab 630 is selected, the user is presented with a data specification option 632 , an operation set option 640 and a clustering selection option 650 .
- the user may enter a value for the columns in the field 634 .
- the user may identify the type of data, such as numeric data, categorical data, sequence data, or text data by selecting a data type 635 .
- the user may specify the columns 636 in which that data type is located and may specify a field name for that specific data as indicated under the field name 637 .
- a predefined selection field 638 may be used to specify the types of data for the field name and columns provided.
- a user may perform any number of mathematical manipulations on the numeric data (one or more manipulations or transformations of the data is referred to as an operation set). These options include various logarithmic operations, methods for normalizing data, methods for filing missing data points, and all algebraic functions. Referring to FIG. 6 d , for example, the reciprocal or the value for each numeric data item may be requested and then the logarithm taken for that reciprocal, creating a new field 642 called Operation Set1.
- FIG. 6 e shows the screen displayed if the clustering selection tab 650 is selected.
- the user is presented with a set of field/trench forms 652 for which clustering operations may be applied.
- operation set 1, or numeric field name 1 may be chosen for clustering.
- the user may have motifs/n-grams, complexity filtering, exclusions, and amino acid substitutions options from which to select. Operation on or with sequence data is discussed in more detail in U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Method and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed Sep. 30, 1999, which is expressly incorporated herein by reference. If the user wants to represent the sequence as a high-dimensional vector based on the occurrence of functional or structural motifs, a file is specified which defines those motifs.
- the user can have that vector based on the number of occurrences of each motif or, if desired, have the vector based on a binary format (the motif is either there or not) by checking the single motif output option.
- the user may specify any combination of overlapping n-grams to be created to represent the sequence in field 654 .
- the user also has the option to specify whether the n-gram should be included based on number of occurrences within the sequence. If neither motif nor n-gram options are selected, the program will analyze the text (e.g., annotations) associated with the sequence records.
- the complexity filtering options provide the user the ability to include the entire sequence or eliminate regions of low or high complexity, for example, using the public domain tool SEG.
- the user may also specify certain records to be excluded, for example, based on sequence length, or title, by selecting options in the exclusion interface.
- the use of amino acid or nucleotide substitutions can be defined in the Amino Acid Substitution interface.
- the user may use a sliding scale to specify the magnitude or weight to give to associations as indicated by the association field 672 .
- the user may enter the number of topics to be used in the field 674 .
- the topics are the features that describe the vectors. For text, these are the vocabulary words that best describe the thematic content of the records; for sequences, the topics are the n-gram vocabulary words that best distinguish one sequence from another.
- the user may specify the requested number of cross terms as indicated in the field 676 . Cross terms are the vocabulary words that are not topics.
- the user may specify the number of times that the topics may appear in a record before being identified as a topic and an upper limit may be included as well as indicated in the fields 678 a and 678 b .
- the user may specify the number of times that the terms must appear in other documents by specifying a lower limit in field 679 a and an upper limit in field 679 b .
- These fields are used as filtering fields for processing.
- the topicality method for FIG. 6 g is ‘Specify the settings by the number of terms.’
- the topicality method for the processing option is specified as ‘Specify the settings by threshold.’
- the user may use the sliding scale field 680 to specify the number of associations needed.
- the user may use a sliding scale input for identifying the minimum topicality for topics weight and the minimum topicality for cross terms as indicated by the fields 682 and 684 , respectively.
- the user may specify upper and lower limits for defining the number of appearances to trigger identification for topics and cross terms, as indicated by the fields 686 a , 686 b , 688 a , and 688 b.
- the user may specify a topicality method that automatically calculates the setting for the view all indicated in the display screen illustrated.
- the user may use a sliding scale selection field that specifies the weights of association as indicated by the field 689 .
- the user may specify the weights of association for the topicality method that automatically calculates the settings with emphasis on local topics.
- the user may specify the preferred clustering method such as hierarchical or k-means.
- the user may select an option to compute clusters based on coherence.
- the user may indicate the number of clusters, and the cluster coherence.
- the user may also select whether to correlate the order after clustering.
- FIG. 6 l the graphical interface used for specifying the parameters of the k-means is illustrated.
- the user may specify the number of clusters or the number of iterations to be used for the k-means.
- the user may select the cluster seeding parameters such as using random seeding or using dimensional seeding.
- the seeding may also occur by using the computer's internal clock (system time) to seed random number generator.
- the user may alternatively specify a value for the random generator seed.
- the user may select the type of projection to use by selecting the projection tab 695 .
- the user may select cluster cohesion, cluster area, or cluster spread.
- the user may use a weighted scale for each of the options to identify the weight to be associated with each projection option.
- FIG. 2 b illustrates vector creation engines consistent with the present invention.
- vector creation programs 222 include a numeric engine 222 a , and a text engine 222 b.
- sequence data is preprocessed (step 702 ) prior to data being input into the text engine.
- sequence data is modified to a form that is acceptable to the text engine for generating the high-dimensional context vectors.
- High-dimensional context vectors are created based upon the attributes of the objects or records to be used for a view and vector indices that correspond to the particular view are created and stored in a vector file associated with the data set (step 706 ).
- the vectors are clustered using known clustering programs based upon information from the vector files (step 708 ).
- the cluster assignment file (.hcls), as discussed below, is created (step 708 ).
- Two dimensional coordinates of the records and centroids are calculated for creating a two dimensional projection of the clustered vectors (step 710 ).
- Two dimensional coordinate files are created (.docpt) for each document.
- each type of data is represented in that manner.
- the vector representation is simply the values associated with each record attribute.
- the vector representation can be based on any method that translates categorical values or the distances between values as a number.
- the vector representation can be derived by latent semantic indexing as known to those skilled in the art or by related methods, such as described in U.S. patent application Ser. No. 08/713,313, entitled “System for Information Discovery,” filed on Sep. 13, 1996, now issued as U.S. Pat. No. 6,772,170.
- the context vector can be derived from any combination of numerical or categorical attributes of the sequence or by methods described herein.
- a user skilled in the art will recognize that the vectors created for each record do not have to be created from a single data type. Rather, the vectors can be created from mixed mode data, such as combined numeric and text data.
- Files are binary, and remain within a directory established for the analysis; (2) IDs and positions are 0-based; (3) Terms have been converted to lowercase, and are listed in ascending lexical order; (4) Record IDs are listed in ascending order (5) Index files (. ⁇ x>_index) contain cumulative counts of records written to the file they are indexing (. ⁇ x>). This cumulative count is for the current record and all previous records. This cumulative count is equivalent to the record no. of the next record; (6) Internal Numerical representations in Sun Microsystem Operating System are: TermID (4 bytes) TermCount (4) DocID (4) DocCount (4) streampos (4) double (8)
- the visualization methods keep track of the location of the record representation and may use an object-oriented design.
- One type of visualization that is especially effective with high-dimensional data is a proximity map or a galaxy view. This and related visualizations can take advantage of methods to group the records in the high-dimensional space (clustering) and to project the arrangement of objects in high-dimensional space to two or three dimensions (projection).
- Clustering can be by any of a number of methods including partition methods (such as k-means) or hierarchical methods (such as complete linkage). Any of these type methods can be used with the present invention. Despite the different methods, the computational processes that carry out the clustering create a common set of meta files that allow the chosen visualization method to access the clustering information, regardless of original data type.
- partition methods such as k-means
- hierarchical methods such as complete linkage
- the files produced during cluster analysis are: .hcls (cluster assignment file) This file contains the assignments for each record to a cluster.
- the format of the file is as follows: Number of total Clusters For each cluster (in correlation order) Cluster ID Cluster vector as determined by taking the average of the record vectors assigned to the cluster Number of Records in the Cluster The record id's of the records assigned to the cluster
- .hcls file 9 (number of clusters) 6 (cluster ID) 0.0457451 0.0399342 0.0864002 0.0652852 0.0635923 0.0429373 0.0650352 0.0661765 0.0487868 0.0885645 0.10 0173 0.0482019 0.048553 0.091455 0.0991594 (cluster vector) 4 (number of records in the cluster) 7 (record ID) 4 (record ID) 3 (record ID) 5 (record ID) 5 0.0392523 0.0364486 0.0897196 0.0626168 0.0598131 0.0364486 0.0616822 0.0794393 0.0448598 0.0925234 0.11 215 0.0429907 0.0420561 0.0962617 0.103738 1 6 1 0.0341207 0.0209974 0.0918635 0.06824
- Projection can also be by any number of methods, for example, multidimensional scaling. Like cluster analysis, a specific projection method is not required for use with the present invention. However, as with clustering, the results of that projection are stored in a common format so that the visualization operations can retrieve the data independent of the original data type.
- An example cluster file 6 0.770783 0.831761 5 1 1 1 1 0.920542 0.989886 3 0.073888 0.210541 7 0.0206639 0.109404 4 0 0.13854 0 0.0187581 0.153266 2 0.139079 0.0695485 8 0.374849 0
- the present invention enables linkage among all visualizations and data types (text, categorical, numerical, or sequence).
- Prior methods simply enabled linkage between views of the same data visualized using different attributes or visualizations.
- other attributes or descriptors for each data record are linked and readily available for interaction. These interactions are possible with any of the data types. That is, additional attributes related to a record, as well as those used for vector creation, are equally available regardless of data type. This is accomplished through the use of a common set of file or database structures created by the numeric or text engines. These files store information about each record attribute, which itself can be any of the data types. These files are created during an initial processing of the data and are independent of the specific visualization method to be employed. These files provide a common framework that can be addressed by any visualization or interactive tool through an API.
- the files created to store and manage the ancillary data, such as data not used in creating a view, are: .headings (used for data input through a matrix array only) for each record (line number-1 is the record id) name of the column heading .vocab (text) for each term in the view term (i.e., a word) .vocab_index for each term in the view cumulative no.
- the inverted file index consists of .ifi and .ifi_index files. Each index is a list of the cumulative number of records in the data file.
- files provide indexing of and access to the textual information associated with each record including the distribution of keywords within each record and co-occurrences of those keywords. Furthermore, the files provide a catalog of all the categorical data including the distribution of the values. For numerical attributes not used in the actual vector representation, additional files are created using the .docv format so that this type of ancillary information will also be readily available to establish interaction among the various views.
- the text engine ( 730 ) creates the files associated with text or categorical fields.
- the expected input for the text engine (block 730 ) is a tagged formatted file.
- the input is either the original format for the input or the result of a processing step to identify the beginning and end of each record along with special information, such as the record title.
- An example original input file to the text engine is provided in Appendix C.
- a software module ( 724 ) reformats the input file to contain a series of fields that delineate the initial input and meta data created for the vector representation ( 726 ).
- the reformatting and processing of sequence data is discussed in more detail in the U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Method and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed Sep. 30, 1999, which is incorporated herein by reference.
- the text engine ( 730 ) is able to create all the required meta data files.
- Numerical data, or any other data presented in a data matrix, ( 750 ) is received at the numeric engine ( 752 ).
- the data in the input file can be tab delimited or use any other delimiter.
- the numeric engine ( 752 ) creates the record vectors for data presented in a data matrix instead of the text engine.
- the user may specify other columns within the table that can contain textual, sequence, or categorical information or additional numerical data that will not be used for the vector created.
- each row in the table becomes a record; however, the user can choose to make each column the record.
- Each user-defined set of columns becomes an attribute (also called fields) within the record.
- a set of numeric columns is specified by the user for subsequent clustering.
- the other fields, which can be numeric, text, categorical, or sequence will become attributes of the record that can be queried, listed, or otherwise made available within the interactive tools.
- categorical data is specified by the file format ( FIG. 8 ), as indicated by the index 804 for the view used, categorical data is processed during the text engine processing steps for all types of data.
- the categorical data shown in FIG. 8 records where each unique character strain and the categorical field occurs in the data set. Thus, subsequent categorical tools are enabled to correlate various records based upon the categorical values.
- Each field expected in the input file is defined by a section beginning with ⁇ F followed by the field number (e.g., ⁇ F0).
- the name is defined (in this case, title).
- the type of field is defined; this could be string (text or categorical), numeric, or sequence.
- the delimiter tag for the field is defined.
- the METHOD line indicates whether the field is on a single line or continues to the next field.
- the DOC_VECTOR line tells the clustering module whether to use this information in the cluster analysis.
- the next item designates whether the field should be accessible within the query tools.
- the CORR line determines whether the contents of the field should be indexed for all possible associations.
- the next item defines whether the content is case sensitive or not.
- the numeric engine ( 752 ) is executed on the set of columns that the user specified for clustering.
- the numeric engine ( 752 ) performs any number of user defined mathematical operations and creates a record vector that is identical in format to those produced for sequence or text data.
- the vector creation in the numeric engine ( 752 ) utilizes a user specified set of columns from the users column/row formatted source file.
- the numeric engine automatically creates a text engine compatible source file (i.e., reverse engineered tagged text file, 754 ), and corresponding format file ( 756 ) from the input column/row formatted table.
- a text engine compatible source file i.e., reverse engineered tagged text file, 754
- corresponding format file 756
- An example format file produced from the numeric engine is shown in Appendix D.
- the new tagged text source file and format files ( 726 ) are used so that any text, categorical, or sequence information that may have been embedded within the original column/row files, can be processed by the same programs that operate on text, categorical, or sequence information.
- This subsequent processing is performed by the text engine ( 730 ), which reads the reverse-engineered tagged text source file and indexes the textual and/or categorical data fields within each record ( 732 , 734 and 736 ).
- the result is a standardized set of meta data which is related to the user source data and which is available to all tools regardless of data type.
- numeric engine processes numerical data
- processing steps of the numeric engine places any of the other data types (text, categorical, or sequence) into an appropriate tagged field in the data file so that the text engine will handle it appropriately.
- the numeric engine ( 752 ) if the data input is array data, the array data (column/row formatted tables) is processed by the numeric engine ( 752 ).
- the numeric engine 752 creates a second vector that is identical to the format of the context vectors for sequence and text data produced by the text engine ( 730 ).
- the numeric engine 1052 accepts a user defined series of mathematical operations to be performed on specified columns of the array data source file.
- a format file is produced and a tag text format file is produced for the non-numeric contents associated with the numeric file.
- the associated non-numeric contents is used as an input to the text engine and the output is associated with the numeric data.
- the textual or categorical data associated with the numeric array data may be indexed and associated with the data as produced for other text data sets that are input to the text engine ( 730 ).
- Plain text data should be in a tagged text format and does not require any pre-processing prior to input to the text engine ( 730 ).
- FIG. 2 c illustrates clustering programs.
- Three clustering modules or options k-means 224 a , cluster-sid 224 b , and correlation order 224 c are provided.
- the clustering options may have a set of user definable parameters.
- the k-means module 224 a clusters documents by establishing a user specified number of seed clusters and then iteratively assigns documents to those documents until a user specified number of iterations is reached or the process/algorithm determines that all the documents have been assigned to the clusters.
- the k-means module 224 a moves documents to minimize the sum of squares between objects and centroids as known by those skilled in the art.
- the cluster-sid 224 b is an agglomerative/hierarchical clustering method that minimizes the maximal between clusters distance (farthest neighbor method).
- the output of the clustering process is a file containing a correlation ordered list of clusters and the record's IDs of their members. Those skilled in the art will recognize that other clustering algorithms can be used.
- FIG. 9 shows a clustering process performed by the processing unit.
- a vector file is received from the stored context vector files (step 760 ) at the cluster implementer (step 904 ).
- the user specified clustering parameters are retrieved from stored files (step 906 ) and the clustering program and parameters associated with the files are determined (step 908 ).
- the clustering parameters associated with the clustering program are provided to the cluster implementer (step 904 ) and the clustering program associated with the vector file of the data set is selected (step 910 ).
- the clustering programs are chosen from a k-means clustering program (block 912 ), a hierarchical clustering program (block 914 ), or no clustering is selected (block 916 ).
- a cluster assignment file (.hcls) is created (step 920 ).
- FIG. 2 d illustrates projection programs 226 .
- Systems consistent with the present invention may apply three separate processes to produce the meta data used to produce visualizations. These processes are carried out by three modules, the PCA-clusters module 226 a , a triangulation module 226 b , and a document projection module 226 c .
- the PCA-clusters module 226 a determines the principle components for each cluster and then determines the two dimensional coordinates for projecting the cluster centroids as known to those skilled in the art.
- the triangulation module 226 b determines the boundaries for the area around each cluster centroid.
- the doc projection module 226 c determines the x,y projection coordinates for each record in the visual analysis.
- the cluster assignment file (.hcls) is retrieved from storage (step 1002 ) and the principle component analysis of the cluster centroid vectors are performed (step 1004 ).
- Two dimensional coordinates for the cluster (.clster) are created (step 1008 ).
- Delaunay triangulation is performed (step 1010 ) based on the vector file retrieved from storage (step 1012 ) that is associated with the data set.
- Nearest neighbor assignments are associated with the Delaunay triangulation results (step 1014 ).
- the projection program determines the two dimensional coordinates for each record (step 1018 ) based upon the vector files retrieved from storage (step 1012 ).
- the projection program also accesses and retrieves the cluster assignment file (.hcls) (step 1020 ) associated with the data set.
- the two dimensional coordinates for the group of documents of the data set are stored in a document file (.docpt) (step 1030 ).
- the interactive tools and graphics modules 240 include a galaxy module 240 a , a master query module 240 b , a plot data module 240 c , a record viewer module 240 d , a query (word) module 240 e , a query by example module 240 f , a group module 240 g , a gist module 240 h , and a surface map module 240 i.
- the galaxy module 240 g displays records as a scatter plot.
- the master query module 240 b applies a correlation algorithm to all indexed categorical data and creates a two dimensional matrix with values of a category along each axis. At each intersection in the matrix, a rectangle is drawn with sections colored to show the correlation between the categories.
- the following are analytical tools.
- the plot data module 240 c displays a two dimensional line plot of the n-dimensional vectors created for analysis by the user, this is done for all records in the analysis or just those selected by the user. This module can also be used to examine any ancillary numerical attributes associated with the records.
- the record viewer module 240 d displays a list of the currently selected documents, displays a text of a document, highlights terms selected by other tools, such as the query tool 240 e .
- the query tools 240 e and 240 f enable the user to input requests to search for information that has been represented by a vector during the processing and analysis of the user's data set.
- the query tools 240 e and 240 f compare the user input to vectors representing the processed data set.
- the query tool 240 f performs Boolean or phrase queries in any text or categorical field based on a users input.
- the query tool 240 f also performs n-space queries based on the user's input and compares the input to the n-dimensional vector used for clustering.
- the numeric query tool 240 f performs queries based on numeric values.
- the group tool 240 g enables users to create groups of records of a data set, based on queries or based on user selections, and colors the groups for display in the galaxy visualization created by the galaxy module 240 a .
- the gist tool 240 h determines the most frequently used terms in the currently selected set of records.
- the surface map module 240 i provides a surface map that shows records and a plurality of attributes associated with those records.
- a table is shown that illustrates meta data files that result from statistical analyses and indexing of the data sets consistent with an embodiment of the present invention.
- the table also depicts the meta data files that are used for the various interactive tools and graphics modules. All of the meta data files except for the tab delimited column/row file, the tagged text source file(s), and the re-engineered tag text file are defined by the data set name or view name as created by the data set editor 314 or view editor 316 ( FIG. 2 ) plus an “.extension,” such as [data set name].dcat or [view name].cluster.
- the meta data files include a data set name.dcat file, a data set name.properties file, a view name.clsp file, a view name.cluster file, a view name.corrv file, a view name.dcat file, a view name.docpt file, a view name.docterm file, a view name.docterm index file, a view name.docv(vector) file, a view name.edge file, a view name.fieldoff file, a view name.gif file, a view name.groups file, a view name.fmt file, a view name.hcls file, a view name.headings file, a view name.ifi file, a view name.ifi index file, a view name.properties file, a view name.punc file, a view name.rel file, a view name.repository file, a view name.stop file, a view name.tl file, a view name.topic file,
- the table indicates which program modules create, read or update files as indicated by the letters C, R, and U, respectively.
- the view name.clsp file is created by the view editor 216 ( FIG. 2 b ) and is read by the k-means module 224 a and the cluster-sid module 224 b ( FIG. 2 c ) and is read by the galaxy module 240 a ( FIG. 2 e ).
- the view name.groups file is updated by the group module 240 g . All file access is performed through the API layer ( FIG. 2 a ).
- a system operating according to the present invention enables a user to identify relationships among different visualizations or views by maintaining all attributes associated with the data record for indexing although all attributes are not used in creating the visualization. Referring to FIG. 12 , the processes consistent with an embodiment of the present invention used to link different visualizations or views is discussed.
- the user may request to identify the relationships that exist between the attributes used to create the current visualization with the attributes used to create another visualization (step 1202 ).
- an index file associated with the user's current view or data set is accessed (step 1210 ).
- the process determines whether objects selected by the user in the current view, such as by initiating a query, correspond to objects of a target view based upon all of the attributes contained in the index file (step 1220 ). If objects of the target view or file correspond to the selected objects of the current view, the objects of the target view are highlighted (step 1230 ). Therefore, relationships among attributes of data records other than those used in creating the visualization can be used to identify relationships of another visualization as discussed in connection with FIG. 1 .
- Methods and apparatus consistent with the invention also provide tools that allow a user to display information interactively so that the user can explore the information to discover knowledge.
- One such tool displays a set of records and their associated attributes in the form of superimposed two-dimensional line charts.
- the tool can also generate a single two-dimensional line chart that provides the average values for the attributes associated with the set of records.
- Each of these charts are linked to other views, such that a record selected in the charts is highlighted in the other views, and vice versa.
- Another tool generates summary miniplots that may be quickly used by a user to obtain an overview of the attributes associated with a particular group of records.
- records shown in a scatter chart are organized into groups.
- the average values for the attributes associated with each group of records is used to form a two-dimensional line chart.
- the line chart is superimposed on the scatter chart, based on the location of the set of records.
- one basic visual tool implemented by the invention for viewing information is a “galaxy view” as produced by the galaxy tool 350 a .
- a galaxy view is shown in window 120 of FIG. 1 .
- the galaxy view is a two-dimensional scatter graph in which records are organized and depicted in groups (or “clusters”) based on relationships between one record and another.
- the invention provides numerous interactive visual tools that allow a user to explore and discover knowledge.
- FIG. 13 describes one method for displaying information interactively, in the form of two-dimensional line charts.
- the method begins with the user selecting a set of records and a set of attributes associated with those records (stage 1305 ).
- the attributes may comprise any of numerous data types, including the following: numeric, text, sequence (e.g., protein or DNA sequences), or categoric.
- the selected attributes are converted into numerical values, as discussed above.
- FIG. 14 represents a preferred implementation of two-dimensional charts that are consistent with the invention.
- FIG. 14 contains line chart 1405 , and legends 1440 and 1450 .
- Chart 1405 contains a collection of superimposed line charts that depict a set of records.
- line chart 1420 depicts one record within the set
- line chart 1425 depicts another.
- the x-axis e.g., as shown by 1410
- the y-axis e.g., as shown by 1415
- the scale of each axis and the colors of the line charts may be modified by the user.
- this description focuses on line charts, other types of charts may be used to depict a set of records, as shown for example by the point chart shown as 1505 in FIG. 15 .
- Legend 1440 contains a text-based description of records.
- legend 1440 contains a record described as “122C”, as shown by 1445 .
- Legend 1450 contains a text-based description of attributes.
- Methods consistent with the invention can also generate a two-dimensional line chart that shows relationships between the records shown in 1405 (stage 1320 ).
- FIG. 14 shows a line chart 1430 that depicts a statistical value corresponding to the set of records shown in 1405 .
- chart 1430 depicts the average attribute value for each record shown in 1405 .
- chart 1430 may depict other relevant characterizations of the set of records, such as median attribute values, standard deviations (as shown by 1435 ), etc.
- the user can interact with the line charts.
- the invention is capable of receiving input from a user selecting a portion of a chart (stage 1325 ). This may be achieved, for example, by using a device to point to a portion of map 1405 or by clicking a pointing device on a portion of map 1405 .
- the text-based description of the selected record and/or attribute is highlighted in legends 1440 and 1450 (stage 1330 ).
- the user has selected record “122C”, as shown by the highlighting in legend 1440 .
- the value of a particular attribute being pointed to in charts 1405 or 1430 can be displayed in text format.
- the user has selected attribute “RBC”, as shown by the highlighting 1515 in the legend and 1520 on the x-axis.
- any selections made by the user on charts 1405 or 1430 are propagated to other views.
- an index as discussed above, is analyzed to determine if the record is shown in another view (stage 1335 ). If the record is shown in another display (stage 1340 ), the visual representation of that record in the other view is altered (stage 1345 ).
- FIG. 16 is a diagram showing both (1) charts 1405 and 1430 , and (2) a galaxy view 1605 of records. If a record is selected on map 1405 , the record is highlighted in galaxy view 1605 , and vice versa. Similarly, the group of records shown on map 1405 may be highlighted in galaxy view 1605 (as shown by 1610 ), and vice versa.
- FIG. 17 describes another method of displaying information interactively, in the form of summary miniplots.
- the method begins with the user selecting a set of records and a set of attributes associated with those records (stage 1705 ).
- the attributes may comprise any of numerous data types, including the following: numeric, text, sequence (e.g., protein or DNA sequences), or categoric.
- the selected attributes are converted into numerical values, as discussed above (stage 1710 ).
- a two-dimensional scatter chart is generated to visually depict the records (stage 1715 ).
- An example of such a chart is galaxy view 1805 shown in FIG. 18 .
- Galaxy view 1805 contains a collection of records, one example of which is shown as 1810 .
- the records within galaxy view 1805 are organized into groups (or clusters) (stage 1720 ), based on relationships between one record and another.
- a two-dimensional line chart (summary miniplot) is generated that depicts some information about the records contained within that group (stage 1725 ).
- Each such summary miniplot is superimposed onto the two-dimensional scatter chart, based on the location of the group of records on the scatter chart (stage 1730 ).
- chart 1805 contains a group of records 1815 , for which summary miniplot 1820 represents the average attribute values.
- summary miniplot 1820 is superimposed at the centroid coordinate for the records in group 1815 .
- summary miniplots may be used to represent other groupings of record.
- the records shown in a scatter chart may be grouped into quadrants of the scatter chart; and four summary miniplots could be used to represent the quadrants.
- each line charts, such as line chart 1820 can also be coded in a variety of ways (e.g., size, color, thickness of lines, etc.) to represent additional information (e.g., the variability within the group's records, the value of an unrelated field, etc.).
- the invention is capable of receiving input from a user selecting a summary miniplot (stage 1735 ). This may be achieved, for example, by using a device to point to a portion of map 1805 or by clicking a pointing device on a portion of map 1105 .
- the user input constitutes selecting group 1825 , as shown by the fact that group 1825 is highlighted.
- a graph is generated that contains a series of superimposed line charts, with each line chart representing a record (stage 1740 ).
- An example of such a graph is shown in FIG. 18 as 1830 , which is a series of superimposed line charts that represent attribute values for the records selected by the user in group 1825 .
- any selections made by the user of a summary miniplot on chart 1805 is propagated to other views. For example, in response to receiving input from a user selecting summary miniplot 1820 , an index, as discussed above, is analyzed to determine if the records represented by summary miniplot 1820 are shown in another view (stage 1745 ). If the records are shown in another display (stage 1750 ), the visual representation of the records in the other view are altered (stage 1755 ). Similarly, if a user selects a record in another view, the summary miniplot corresponding to that record can be highlighted.
- the preceding visualizations provide the opportunity to query records by attributes represented, e.g., by categorical and numerical values and by sequence of text content. Because the visualizations support a limited number of queries, the visualizations cannot analyze large associations efficiently.
- a multiple query tool creates a visualization that provides an overview of a large number of comparisons automatically, presenting the user with information, e.g., about associations and their expectation. Further, the multiple query tool also provides information about associations between clusters and attributes as well as associations between sets of attributes.
- FIG. 19 provides an illustration of a multiple query tool visualization according to the present invention.
- the multiple query tool produces a visualization in the form of an interactive matrix that displays the requested associations and permits access to the underlying information.
- the multiple query tool can provide links back to other open visualizations and tools, or stand alone as a separate visualization.
- FIG. 20 illustrates a process of creating a visualization using the multiple query tool.
- the user accesses the multiple query in any common manner of a graphical user interface, for example, a tool bar button, a previous visualization menu, a pop-up box, or a main menu.
- Visualization of data begins with the selection of a data file. As shown in step 2020 , a user selects a data file of interest. Alternatively, the data file can be preselected, when, e.g., the multiple query visualization is linked to another visualization analysis.
- a dialog box can be displayed to the user with a drop-down menu of query types. While FIG. 21 shows a selection between query types records vs. attributes, attributes vs. attributes, current data vs. historical data, and current data vs. expert data, other query types are within the scope of the invention.
- the drop-down menu is rolled up to display only the selected query.
- FIGS. 22A-4C display exemplary parameter-setting dialog boxes for query types shown in FIG. 21 .
- FIG. 22A a record vs. attributes query dialog box 2200 is displayed.
- records are correlated to selected attributes.
- the records can be viewed as clusters of the records, for example, as clusters such as those defined in the galaxy view of a previous visualization or those defined using any other process.
- FIG. 22A displays four attribute sources, although other sources could be displayed.
- attribute source area 2210 labeled ‘Vocabulary Word(s),’ of dialog box 2200 , the user types in the word or words that serve as attributes.
- a delimiter such as a semicolon
- Other processing could also intelligently separate the words.
- logical operators such as Boolean AND, OR, NOT, could be included to produce a single composite attribute.
- Vocabulary files including synonyms may have the following formats in one aspect of the present invention: Format 1 Keyword1: alt_word1A; alt_word1B Keyword2: Keyword3: alt_word3A Format 2 Keyword1 alt_word1A alt_word1B Keyword2 Keyword3 alt_word3A
- the processing of the identified text file will operate on files of the format(s) of existing user files, so as to avoid issues of file format conversion.
- FIG. 22A also illustrates attribute source areas 2230 and 2240 for categorical values.
- attribute source area 2230 labeled ‘Category Field(s),’ the user types in the category or categories that serve as attributes.
- a delimiter such as a semicolon
- Other processing could also intelligently separate the categories.
- logical operators such as Boolean AND, OR, NOT, could be included to act on categories to produce a single composite attribute 2250 illustrates an area to access selectable menu of categories in the database, in the format of, e.g., a drop-down box. To develop the menu, each record in the database is parsed to identify all possible categorical values.
- attribute source area 2240 labeled ‘Category File,’ the user can identify attribute categories by pointing to a text file that contains a list of categories. Selecting categories from a file enables to the user to specify easily the order in which the categorical values would be displayed in the visualization and to allow the user to specify a hierarchy for those values.
- One format for the categorical value file is: categorical_value_1 1 (tab delimited lines with value indicating categorical_value_1.1 2 hierarchy level) categorical_value_2 1 categorical_value_2.1 2 categorical_value_2.2 2 categorical_value_2.2.1 3
- the categories could be combined, similarly to the use of synonyms, or, for hierarchical categorical data, the user could select a maximum hierarchical level.
- the database is queried using the multiple query.
- the results of the multiple query are used to create a query matrix.
- the multiple query tool creates a query matrix of record rows and attribute columns.
- the cells of the matrix are set to binary values indicating the presence or absence of the attribute in each record.
- a vocabulary file with synonyms is used, a single matrix cell should be created for each keyword, and the cell is marked if either the keyword or any of the alternate forms are found.
- One method of determining the presence of attribute would be to search the original data file or any indexed files describing the distribution of words or categorical values within the data set.
- the query matrix is visualized, in step 2060 .
- One visualization is a binary, co-occurrence scheme, as shown in FIG. 24 , where cells having a value of “1” are marked in a color or shade, 2410 , while cells having a value of “0” are marked in a different color or shade, 2420 .
- the user can select a size of cells, so that more cells or less cells are shown in a display of the visualization.
- the user can select a visualization based on cluster rows.
- the cluster row visualization could be set as the default.
- the cells of the visualization matrix are set to indicate the presence or absence of the attribute in each record.
- the query matrix is created or processed to create a composite value for a cell, for example, a basic scheme would involve summing the binary co-occurrence scores for a cluster and dividing by the number of records in the cluster.
- FIG. 25 shows a binary co-occurrence shading scheme that illustrates the query matrix of FIG. 23 , if records 1 and 2, 3 and 4, and 5 and 6 are assumed to be in clusters 1, 2, and 3, respectively.
- FIG. 26 an overall visualization can be displayed as a three-dimensional view of the rows vs. columns vs. values, with the value of each cell represented by a cube at an appropriate height on the Z-axis.
- the overall visualization is rotatable, so that the user can view 2-D scatter plots corresponding to the rows and columns.
- a 2-D row scatter plot is shown in FIG. 27 .
- Another more complex visualization serves as the default when cluster rows are used.
- the cells show association probabilities.
- the scheme of showing association probabilities would be to represent deviations as a difference from an expected value under a random distribution assumption.
- the scale could be non-linear so that only the very high and very low probabilities are highlighted.
- association probabilities either an exact or approximate method is used for each of the association methods of the present invention.
- the exact method is precise at the cost of being computationally intensive.
- the approximate method can reduce the number of computations when the total number of objects and total number of occurrences of the attributes are relatively large. Further, the use of the laws of logarithms to reduce products and quotients to sums and differences, respectively, and exponentiation to a product will also save computing time.
- the probability of observing what is observed given a random distribution indicates the possibility of observing certain number of occurrences of an attribute in a given cluster if the attribute is randomly distributed over all clusters. The lower the probability, the further the attribute distribution deviates from randomness. Described below are the exact method and approximate method for calculating this probability.
- Equation 1 provides the exact method. Equation 1 is the discrete density function for a random variable having a hypergeometric distribution.
- the numerator consists of the product of two terms. The first term calculates how many ways to choose exactly m attributes out of M possible for the cluster of interest; the second term calculates the ways to assign the other (n ⁇ m) attributes which are not in the cluster of interest to the other clusters collectively.
- n number of objects in the given cluster
- Equation 2 provides the approximate method.
- Equation 2 is the discrete density function for a random variable having a binomial distribution, where the probability of a success is M/N and the probability of failure is (1 ⁇ M/N).
- N and M are large, (N ⁇ n)/(N ⁇ 1) is close to one; thus, Equation 2 provides a reasonably good approximation to the hypergeometric distribution.
- N, M, n, and m denote the same quantities as defined above in Equation 1.
- p ( n m ) ⁇ ( M N ) m ⁇ ( 1 - M N ) n - m . Equation ⁇ ⁇ 2
- the association probability can be represented as a measure of an unusual number of occurrences, which is a deviation of observed occurrence from the expected occurrence if the attribute is randomly distributed over all clusters.
- An exact method (Equation 3) or an approximate method (Equation 4) can be used.
- N, M, n, and m denote the same quantities in Equation 1. Note that the expectation is the sum over the range of the random variable of x of x multiplies p(x). Equation 3 uses hypergeometric distribution and Equation 4 uses a binomial method, similar to Equations 1 and 2, respectively. The exact method is very computationally expensive due to the summation, while summation in the approximate method can be calculated through and written into the simple form of Equation 4.
- the deviation from expected occurrence can be measured using ether ratio or difference of the observed number of occurrences over (or from) the expected number of occurrences.
- the order of attributes along the columns and the order of rows or clusters along the columns of the matrix can be selected by the user, using a menu item or by dragging rows and columns to new positions, for example, the order of the records or the order of the clusters is preferably automatically set to same correlation order as known to those skilled in the art.
- the default display for attributes is based on correlation order, with the attribute having the highest column sum being on the left-hand side.
- an attributes vs. attributes query dialog box 2260 is displayed.
- the attributes vs. attributes query type is not interested in occurrences with specific records, only in defining the associations among attributes.
- Query dialog box 2260 operates similarly to records vs. attribute query dialog box 2200 , except that the user will be specifying two sets of attributes (vocabulary words or categories).
- the matrix cell scores are generated as a cumulative measure of the number of records that contain both test attributes. Then, the score should be normalized against the number of records.
- the total number of records that have each attribute is counted so that deviation from expected frequency can be calculated.
- Another use of the multiple query tool visualization is rapid assessment of the correlation between the current experiment being analyzed and historical data. Such a visualization points to the similarities or differences for all equivalent data points (record and condition).
- a current data vs. historical data query dialog box 2270 is displayed when the user selects such a visualization.
- a file containing a data matrix is used as the historical data.
- the user would select the files of a prior visualization.
- a data matrix similar to those currently used to input data into the numerical engine, could be designated.
- step 2040 the method determines where the current and historical experiments overlap. For example, if the current experiment contains records 1 through 10 and the historical experiment contains records 1 through 5 and records 8 through 12, then correlations would only be performed with the common records 1 to and 8 to 10. Similarly, if the current experiment used conditions (components) A through E (e.g., 5 time points or distinct treatments) and the historical experiment used conditions A, C, D, and F, then the correlation would be calculated only using the common conditions A, C, and D.
- the current experiment used conditions components
- a through E e.g., 5 time points or distinct treatments
- a query data matrix would then be created comparing the common entries. For record1, a correlation with the historical data set would be performed using all the common conditions (intersection). In the example given, this would be a correlation between current_record1(A,C,D) and historical_record1(A,C,D). A similar score would be derived for each record present in both data sets. For a record in the current data set that is not present in the historical set, the query matrix would be blank (or set to some flag). The calculations would be repeated for each historical set requested.
- the query matrix is visualized as follows.
- the color code in each cell is based on the correlation of that record to its counterpart in the historical data.
- the matrix cell should have no color (or be colored the same as the background) or, alternatively, these cells can be hidden. If the cells not shared with the historical data set are shown, the degree of overlap between the current and the historical data sets can be visualized.
- This visualization could also be selected as a separate visualization that shows the overlap, for example by using a gray-scale color code in the matrix, where black indicates full overlap with the historical data components and white indicates no overlap.
- This query type would also be useful with other data mining tools.
- cluster assignments from one experiment to the next can be compared.
- the method can assess what fraction of other current cluster records exist in the same cluster in the historical set. Then, an average of the results from each current cluster record to is computed to get a score for that cluster.
- Another example assesses, for each record in a current cluster, what fraction of other current cluster records are found in the historical data within x Euclidean distance. An interactive slider would allow the user to change x and the method would allow viewing of the results dynamically.
- the overall value for the cluster will be represented as the average or other statistical measure, such as median of the record correlations, based only on those records that are common between the data sets.
- An indication of variation is provided since a cluster that contains 10 records with a correlation of 0.8 and a cluster that contains 10 records with a correlation of 0.9 and 1 with a correlation of ⁇ 1 (both cluster with average of 0.8) may be of different interest to the user. Such an indication can be achieved using multiple visualizations, for example by duplicating the previous query, that simultaneously show the average and the standard deviation, the minimum value or the maximum value.
- a row is added that summarizes the comparison of the entire current data against each historical data set. For example, a row labeled “Summary” will be the average of all record correlations.
- the user or system could identify specific records to group together at the top of the visualization. For example, all the controls could be grouped together as opposed to in separate clusters. Also, while only one set each of current and historical data is used, several sets data could be visualized contemporaneously. That is, any one of the data sets is treated as the prototype against which others are measured. A slider bar having each visualization would allow the user to run through multiple experiments. The progress through the slider (data sets) could be semiautomated to play like a movie, stopping whenever certain similarities or dissimilarities are found.
- the ‘current data vs. literature/expert knowledge’ query is similar to the other queries. Correlations between the current data and the literature or expert knowledge are defined either as what records have previously been found to group together or as similarity to actual published/historical values.
- the visualization as shown in FIG. 19 , will be displayed in an interactive area of a display screen, so that the user may adapt the visualization to her preferences.
- the visualization could include a menu bar and a toolbar.
- a menu bar 1010 , with associated sub-menus, of the visualization could include the features shown in FIG. 28 .
- the Duplicate command in the File menu of menu bar 2810 allows access to previously stored queries, so that the user can either re-run or adjust a previously run multiple query.
- the other commands in the File order are self-explanatory.
- the Row Order menu of menu bar 2810 provides option for organizing the records, clusters, or row attributes.
- the Cluster from View command results in a correlation ordering for the records and clusters (if correlation ordering was not done for the view, then it is also not done here in the default), as discussed above this ordering is the default for a records vs. attributes query type or a current data vs. historical data query type.
- the Correlation with Columns command is an option for recalculating the cluster order based on the values in the query matrix. In a cluster view, records would remain with their cluster and the clusters are reordered according to correlation ordering. If a cluster was expanded to show records, the records in the cluster would be reordered according to correlation ordering. As discussed above, for an attributes vs. attributes query, correlation with columns is the default.
- the Advanced sub-menu of the Row Order menu allows access to the following commands.
- the Cluster Based on Column Values command recalculates the clustering of the records or the attributes using the scores along the row as the vectors for clustering.
- the user would have the choice of using any clustering algorithm, such as either the hierarchical or partition methods.
- the Sum command is an option to order the records or attributes based on the sum of the scores across the row, with the record/attribute with the highest sum being at the top and the lowest being at the bottom, for example. Rows having a value below a predetermined threshold could be placed in a low value row or removed from the visualization matrix.
- the Sum command is not valid for visualization using clusters and would be deactivated.
- the File Order sets the order of clusters or attributes to that specified by the user, for example in an input file. If no file is provided or record rows are selected, this option would be deactivated.
- the Column Order menu of menu bar 2810 provides analogous options as the Row Order menu for organizing the column attributes, expect that there will be no clustering from the view, as records and clusters do not appear in the columns, in one aspect of the present invention.
- the Color menu of menu bar 2810 permits a selection of display colors within the multiple query tool.
- a tool bar is also provided in the visualization, either as a separate pop-up area or a bar, for example, located below a status bar, to provide access to functions with a single click.
- FIG. 29 illustrates examples of functions of a tool bar.
- the RecordViewer function displays the currently highlighted record (or records in the highlighted cluster). For a record vs. attribute cell, this shows the single record with the specific attribute highlighted in the record. For a cluster vs. attribute cell, the RecordViewer shows all the records in that cluster with the specific attribute highlighted in the records. For an attribute vs. attribute cell, the RecordViewer would display all records that contain both attributes, with both attributes highlighted. To access the records, the RecordViewer calls a process that parses the data source file in the galaxy cluster view. An interpretation tool, such as the plot data tool, could also be provided. A double click on a cell can also call the RecordViewer function.
- the Zoom function operates similarly to a zoom in the galaxy visualization. Primarily, the zoom will zoom out, so that an overview of a large multiple query tool can be obtained.
- the maximum zoom out should be based on the number of records and a user's desired minimum resolution, so that the colors of the visualization will be readily discernable.
- a possible default size for a cell in the multiple query tool is 12 by 12 pixels. This is large enough to display text labels at 10 point Helvetica for both rows and columns. Zooming out would provide an overview for large data sets.
- the Zoom Reset function returns the visualization to its default size.
- the Pan function takes the form of a hand and allows the user to drag the graphic around the window, so that area hidden by display objects or the physical dimensions of a display screen can be viewed.
- Scroll bars as shown in the multiple query tool above, could be employed instead of, or in addition to, the Pan tool. Nevertheless, labels for the rows and columns would always remain visible.
- the Expand Row Clusters and Expand Column Clusters functions open the selected cluster(s) to display all their records or attributes as separate rows. If no clusters are selected, all clusters are expanded. If no clusters are defined (either from the associated view or by having done a cluster ordering within the multiple query tool), these functions are deactivated.
- the Collapse Row Clusters and Collapse Column Clusters functions closes the cluster that contains the selected record(s) or attribute(s). If no record or attribute is selected, all clusters are collapsed. If no clusters are defined (either from the associated view or by having done a cluster ordering within the multiple query tool, these functions are deactivated. Although not illustrated in FIG. 29 , a single button could also collapse all row and columns with a deviation from expectation between, e.g., ⁇ 0.5 and +0.5 (or other definable range) into a single group or remove rows and columns that do not have values above a predetermined threshold.
- the Orient Rows vs. Values and Orient Columns vs. Values functions orient the visualization so that the view is perpendicular to the row axis or column axis, respectively. This provides views of the 2-D scatterplot, as shown in FIG. 27 , for example.
- the Reset Orientation function orients the visualization to the default ‘overhead’ view showing rows vs. columns.
- the Spacing Toggle function toggles the matrix between the two types of views shown in FIGS. 30A and 30B .
- Providing a grid as shown in FIG. 30A allows viewing of cells as discrete entities, for easier selection.
- Removing the grid, as shown in FIG. 12B allows more information to be compressed into the same space and could improve enhance structure distinctions in the visualization matrix.
- the visualization area itself consists not only of the colored visualization matrix, but also includes labels for the rows and columns.
- the row labels are the record titles. Since record titles may be long, the initial substantially 20 characters could be displayed with a scroll bar or pop-up function to enable viewing of all of the characters.
- the row labels are labeled by cluster number.
- the categorical value or vocabulary word itself serve as the label.
- the rows and columns could have a master label indicating the content.
- the label would say “RECORDS.”
- the label would be “VOCABULARY”.
- the label would be the file name.
- the field name would be shown. If multiple fields were requested, each field name would be shown, centered over its collection of row or column labels. The user could also edit or define the row, column, and major labels.
- Rows and columns are selected and highlighted by clicking on the row and column labels using a mouse input device, for example. Shift-clicking and control-clicking can be used to select multiple labels.
- the visualization is preferably interactive. In addition to highlighting labels for selecting rows and columns, clicking on a cell should display key information regarding the cell. This pop-up information would be context sensitive, depending on the type of query and whether the cell represents an individual record or attribute as opposed to a cluster or group.
- the following provide suggested formats of the key attributes of a cell of the different groups and query types:
- Systems and methods consistent with the present invention employ an open architecture that enables different types of data to be used for analysis and visualization.
- any genetic material from organism to microbe, could be represented using the context vectors of the present invention.
- the present invention is not limited to genetic material, and any material or energy could also be represented.
- the rows and columns used in the description are illustrative only, and, for example, records could be placed along the columns.
- the attributes used are not limited to text and categorical features. Numerical values could be set as attributes, for example using binning where adjacent ranges of numbers are defined.
- categorical data could be presented in a single column rather than multiple columns for each categorical value as described above; in this case, the occurrence of a specific categorical value as described above; in this case, the occurrence of a specific categorical value could be represented as a specific color.
- the resulting matrix could also be dynamically controllable by the user. The order of rows or columns could be adjusted by dragging or sorted according to the information within the row or column.
- the described implementation includes software, the invention may be implemented as a combination of hardware and software or in hardware alone.
- aspects of the present invention are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on other types of computer-readable media, such as secondary storage devices, like hard disks, floppy disks, or CD-ROM; a carrier wave from the Internet; or other forms of memory.
- metabisulphite concentrations required to inhibit spore production as well as alkaline phosphatase synthesis/activity were found to be relatively low and well within safety levels for human comsumption. It is concluded that metabisulphite is an effective anti-sporulation agent and a recommendation for its general use in semi-solid and liquid foods is proposed.
- HICARB high-complex carbohydrate low-saturated fat diet
- NIDDM non- insulin-dependent diabetes mellitus
- a high-fat diet (43% carbohydrate; 42% fat, polyunsaturated to saturated 0.3 ; fiber 9 g/1000 kcal; cholesterol 550 mg/day) for 7-10 days.
- Control subjects (3 NIDDM, 3 nondiabetic) continued this diet for 5 wk.
- the 13 subjects changed to a HICARB diet (65% carbohydrate; 21% fat, polyunsaturated to saturated 1.2; fiber 18 g/1000 kcal; cholesterol 550 mg/day) for 5 wk.
- NIDDM subjects on the HICARB diet had decreased low-density lipoprotien cholesterol (LDL-chol) concentrations (107 vs.
- LDL-chol low-density lipoprotien cholesterol
- Feeding commercial enteral diets to critically ill dogs and cats via nasogastric tubes was an appropriate means for providing nutritional support and was associated with few complications. Twenty-six cats and 25 dogs in the intensive care unit of out teaching hospital were evaluated for malnutrition and identified as candidates for nutritional support via nasogastric tube. Four commercial liquid formula diets and one protein supplement designed for use in human beings were fed to the dogs and cats. Outcome variables used to assess efficacy and safety of nutritional support were return to voluntary food intake, maintenance of body weight to within 10% of admission weight, and complications associated with feeding liquid diets. Sixty-three percent of animals experienced no complications with enteral feedings; resumption of food intake began for most animals (52%) while they were still in the hospital.
- Nutritional support as a component of the therapy in small animals often is initiated late in the course of the disease when animals have not recovered as quickly as expected. If begun before the animal becomes nutrient depleted, enteral feeding may better support the animal and avoid serious complications.
- Microbiology of meats has been a subject of great concern in food science and public health in recent years. Although many articles have been devoted to the microbiology of beef, pork, and poultry meats, much less has been written about microbiology of lamb meat and even less on restructured lamb meat. This article presents data on microbilogy and shelf-life of fresh lamb meat; restructured meat products, restructured lamb meat products, bacteriology of restructured meat products, and important foodborne pathogens such as Salmonella, Escherichia coli 0157:H7, and Listeria monocytogenes in meats and lamb meats.
- t-PA tissue plasminogen activator
- t-PA can be administered with an acceptable margin of safety within 5 hours of stroke, to evaluate the therapeutic benefits of intraarterial prourokinase, and to assess the use of magnetic resonance spectroscopy to identify which patients are most likely to benefit from thrombolysis.
- Combination thrombolytic- neuroprotectant therapy is also being studied. In theory, patients could be given an initial dose of a neuroprotectant by paramedics and receive thrombolytic therapy in the hospital.
- Policosanol is a natural mixture of higher aliphatic primary alcohols. Oral toxicity of policosanol was evaluated in a 12-month study in which doses from 0.5 to 500 mg/kg were given orally to Sprague Dawley (SD) rats (20sex/group) daily. There was no treatment-related toxicity. Thus, effects on body weight gain, food consumption, clinical observations, blood biochemistry, hematology, organ weight ratios and histopathological findings were similar in control and treated groups. This study supports the wide safety margin of policosanol when administered chronically.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data. Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.
Description
- This is a division of application Ser. No. 09/410,367, filed Sep. 30, 1999.
- The following identified U.S. patent applications are relied upon and are incorporated by reference in this application:
- U.S. patent application Ser. No. 09/409,260, entitled “METHOD AND APPARATUS FOR EXTRACTING ATTRIBUTES FROM SEQUENCE STRINGS AND BIOPOLYMER MATERIAL,” now issued as U.S. Pat. No. 6,898,530, filed on Sep. 30, 1999, by Jeffrey Saffer, et al.;
- U.S. patent application Ser. No. 08/695,455, entitled “THREE-DIMENSIONAL DISPLAY OF DOCUMENT SET,” filed on Aug. 12, 1996; and
- U.S. patent application Ser. No. 08/713,313, entitled “SYSTEM FOR INFORMATION DISCOVERY,” filed on Sep. 13, 1996.
- The disclosures of each of these applications are herein incorporated by reference in their entirety.
- This invention relates to data mining and visualization. In particular, the invention relates to methods for analyzing text, numerical, categorical, and sequence data within a single framework. The invention also relates to an integrated approach for interactively linking and visualizing disparate data types.
- A problem today for many practitioners, particularly in the science disciplines, is the scarcity of time to review the large volumes of information that are being collected. For example, modern methods in the life and chemical sciences are producing data at an unprecedented pace. This data may include not only text information, but also DNA sequences, protein sequences, numerical data (e.g., from gene chip assays), and categoric data.
- Effective and timely use of this array of information is no longer possible using traditional approaches, such as lists, tables, or even simple graphs. Furthermore, it is clear that more valuable hypotheses can be derived by simultaneous consideration of multiple types of experimental data (e.g., protein sequence in addition to gene expression data), a process that is currently problematic with large amounts of data.
- Visualization-based tools for analyzing data are discussed in, for example, Nielson G M, Hagen H, Müller H, eds., (1997) Scientific Visualization, IEEE Computer Society, Los Alamitos); (Becker R A, Cleveland W S (1987) Brushing Scatterplots, Technometrics 29:127-142; Cleveland W S (1993) Visualizing Data, Hobart Press, Summit, N.J.); (Bertin J (1983) Seminology of Graphics, University of Wisconsin Press, London; Cleveland W S (1993) Visualizing Data, Hobart Press, Summit, N.J.). These tools have focused largely on data characterization, and have provided limited user interactivity. For example, the user may gain access to underlying information by “brushing” an item with a pointer.
- These tools, however, have significant drawbacks. Although current tools can handle certain data types (e.g., text, or numerical data), they do not allow a user to interact with disparate data types (i.e., text, numerical, categoric, and sequence data) within an integrated data analysis, mining, and visualization framework. Furthermore, these tools do not allow a user to interact well between different visualizations in the manner required to gain knowledge.
- What is needed, therefore, is a tool that allows a user to analyze, mine, link, and visualize information of disparate data types within an integrated framework.
- Systems and methods consistent with the present invention aids a user in analyzing large volumes of information that contain different types of data, such as textual data, numeric data, categorical data, or sequential string data. Such systems and methods determine and display the relative content and context of information and aid in identifying relationships among disparate data types.
- More specifically, one such method defines a uniform data structure for representing the content of an object of different data types, selects attributes of different objects of a variety of different data types that may be represented in the uniform data structure and operates on the selected attributes to produce first representations of the objects in correspondence with the uniform data structure.
- The data types may include numeric, sequence string, categorical and text data types. An index may be produced that includes second representations of non-selected attributes of a particular object and that associates the non-selected attributes with a particular first representation. The first and second representations may be vector representations. A first set of the selected attributes associated with a first set of objects may be used to determine the relationships among the first set of objects of a particular data type and non-selected attributes associated with the first set of selected attributes may be used to correlate objects represented by the first set of selected attributes with a second set of objects represented by a second set of selected attributes. The first and second set of objects may be displayed in first and second windows on a display screen and the second set of objects that corresponds to the selected object or objects may be highlighted.
- A method consistent with the present invention identifies relationships among different visualizations of data sets and includes displaying first graphical results of a first type analysis performed on selected attributes of a first set of objects and displaying second graphical results of a second type analysis performed on selected attributes of a second set of objects. Certain objects represented in the first graphical results may be selected and corresponding objects represented by the second graphical results that correspond to the certain objects are highlighted. The highlighting may be based on attributes not used for creating the first graphical results.
- Another aspect of the present invention is directed to a system and a method for visualization of multiple queries to a database that includes selecting multiple queries to a database, querying records in the database based on the multiple queries, creating a query matrix indexed based on the selecting, and populating the query matrix based on the querying.
- Another method consistent with the present invention interactively displays records and their corresponding attributes and includes generating a first 2-D chart for a first record, where at least two attributes associated with the first record are shown along one axis, and the values of the attributes are shown along the other axis. Input is received from a user selecting the first record on the first 2-D chart and an index is analyzed to determine if the first record is shown in another view. If the first record is shown in another view, the visual representation of the first record is altered in the another view based on the user input.
- Another method consistent with the present invention interactively displays records and their corresponding attributes and includes generating a 2-D scatter chart that depicts a plurality of records. A 2-D line chart is generated for a group of records contained in a portion of the 2-D scatter chart. At least two attributes associated with the group of records are shown along one axis, and a statistical value for each of the at least two attributes is shown along the other axis. A 2-D line chart is superimposed at a location on the 2-D scatter chart that is based on the location of the group of records on the 2-D scatter chart.
- The accompanying drawings, which are incorporated in, and constitute a part of, this specification illustrate at least one embodiment of the invention and, together with the description, serve to explain the advantages and principles of the invention. In the drawings,
-
FIG. 1 is a block diagram of visualizations screens or views that are consistent with the present invention; -
FIG. 2 a is a block diagram of a computer system and program modules consistent with the present invention; -
FIGS. 2 b, 2 c, 2 d and 2 e are block diagrams of program modules consistent with the present invention; -
FIG. 3 is a flow diagram of a processes associated with a data editor consistent with the present invention; -
FIGS. 4 a and 4 b are screen shots associated with a data editor consistent with the present invention; -
FIG. 5 a-5 d are flow diagrams of a processes associated with a view editor consistent with the present invention; -
FIGS. 6 a-6 m are screen shots associated with a view editor consistent with the present invention; -
FIGS. 7 a and 7 b are flow diagrams of processes associated with an analysis processing module consistent with the present invention; -
FIG. 8 is an example file format consistent with an embodiment of the present invention; -
FIG. 9 is a flow diagram of a clustering process consistent with the present invention; -
FIG. 10 is a flow diagram of a projection process consistent with the present invention; -
FIG. 11 is table that identifies operations of program modules used in conjunction the meta data consistent with the present invention; -
FIG. 12 is a flow diagram of a visualization linking process consistent with the present invention; -
FIG. 13 a flow diagram of a method consistent with the invention for displaying information interactively by using 2-D charts; -
FIG. 14 is a representative user interface screen showing 2-D line charts consistent with the invention; -
FIG. 15 is another representative user interface screen showing 2-D point charts consistent with the invention; -
FIG. 16 is another representative user interface screen showing 2-D line charts linked to a galaxy view consistent with the invention; -
FIG. 17 a flow diagram of a method consistent with the invention for displaying information interactively by using summary miniplots; -
FIG. 18 is a representative user interface screen showing the use of summary miniplots in a galaxy view; -
FIG. 19 provides an illustration of a multiple query tool visualization according to the present invention; -
FIG. 20 illustrates a process of creating a visualization using the multiple query tool; -
FIG. 21 illustrates a dialog box to set the type of query; -
FIGS. 22A-22C display exemplary parameter-setting dialog boxes for query types shown inFIG. 21 ; -
FIG. 23 illustrates a query matrix according to an aspect of the present invention; -
FIG. 24 illustrates a visualization of the query matrix ofFIG. 23 indexed by records; -
FIG. 25 illustrates a visualization of the query matrix ofFIG. 23 indexed by clusters; -
FIG. 26 illustrates a visualization as a three-dimensional view; -
FIG. 27 illustrates a two-dimensional scatter plot of rows vs. values; -
FIG. 28 illustrates the contents of a menu bar, with associated sub-menus, of the visualization ofFIG. 19 ; -
FIG. 29 illustrates examples of functions of a tool bar associated with the visualization ofFIG. 19 ; and -
FIGS. 30A and 30B illustrates views of a visualization matrix having a grid and not having a grid, respectively. - Reference will now be made in detail to one or more embodiments of the present invention as illustrated in the accompanying drawings. The same reference numbers may be used throughout the drawings and the following description to refer to the same or like parts.
- A. Overview
- Systems and methods consistent with the present invention are useful in analyzing information that contains different types of data and presenting the information to the user in an interactive visual format that allows the user to discover relationships among the different data types. Such methods and systems include high-dimensional context vector creation for representing elements of a dataset, visualization techniques for representing elements of a dataset including methods for indicating relationships among objects in a proximity map, and interaction among datasets including linking the visualizations and a common set of interactive tools. In an embodiment, the interactions, regardless of data type, among the visualizations and the common set of tools for the interactions is enabled by maintaining meta data, as discussed herein, in a common set of file structures (or database).
- Methods and systems consistent with the present invention may include various visualization tools for representing information used in connection with the present invention. A tool for visualizing multiple queries to a database is provided. In another visualization tool, if a first record of a 2-D chart of one view is shown in a second view, the visual representation of the first record is altered in the second view based on the user input. In another visualization tool, a 2-D line chart is superimposed at a location on a 2-D scatter chart that is based on the location of a group of records on the 2-D scatter chart. Other tools consistent with the present invention may be used in conjunction with the methods and systems described herein.
- As used herein, a record (or object) generally refers to an individual element of a data set. The characteristics associated with records are generally referred to herein as attributes. A data set containing records is generally processed as follows. First, the information represented by the records (including text, numeric, categoric, and sequence/string data) are received in electronic form. Second, the records are analyzed to produce a high-dimensional vector for each record. Third, the high-dimensional vectors may be grouped in space (i.e. a coordinate system) to identify relationships, such as clustering among the various records of the data set. Fourth, the high-dimensional vectors are converted to a two-dimensional representation for viewing purposes. The two-dimensional representation of the high-dimensional vectors is generally referred to herein as “projection.” Fifth, the projections may be viewed in different formats according to user-selected options, as shown by the four views (110, 120, 130, and 140) on
display monitor 100 inFIG. 1 . - Systems and methods consistent with the present invention enable a user to select a record in
view 110 and cause the corresponding record in another view to be highlighted. For example, selecting a particular record inview 110 causes the correspondingrecords views - B. Architecture
-
FIG. 2 a depicts acomputer system 200 consistent with the present invention. Computer programs used to implement methods consistent with the present invention are generally located in amemory unit 210, and the processes of the present invention are carried out through the use of a central processing unit (CPU) 280 in conjunction with application programs or modules. Those skilled in the art will appreciate thatmemory unit 210 is representative of read-only, random access memory, and other memory elements used in a computer system. For simplicity, many components of a computer system have not been illustrated, such as address buffers and other standard control circuits; these elements are well known in the art. -
Memory unit 210 contains databases, tables, and files that are used in carrying out the processes associated with the present invention.CPU 280, in combination with computer software and an operating system, controls the operations of the computer system.Memory unit 210,CPU 280, and other components of the computer system communicate via abus 284. Data or signals resulting from the processes of the present invention are output from the computer system via an input/output (I/O)interface 290. - The computer program modules and data used by methods and systems consistent with the present invention include visualization set up
programs 212,processing programs 220, meta data files 230, interactive graphics andtools programs 240, and anapplication interface 250. The visualization set upprograms 212 determine the name to be used for a collection of records identified by a user, determine the formats to be used for reading files associated with the records, identify formatting conventions for storing and indexing the records, and determine parameters to be used for analysis and viewing of the records. Theprocessing programs 220 transform the raw data of the identified records into meta data, which in turn is used by the interactive visualization tools. The meta data files 230 include the results of statistical feature extraction, n-space representation, clustering, indexing and other information used to construct and interact among the different views. The interactive graphics andtools programs 240 enable the user to explore and interact with various views to identify the relationships among records. The application programming interface (API) 250 enables thecomponents - The
visualization setup programs 212 further include adata set editor 214 and aview editor 216. Theprocessing programs 220 further includevector programs 222,cluster programs 224, andprojection programs 226. The meta data files 230 are a subset of databases and files 260. - The
data set editor 212 enables the user to define the collection of records (i.e., a data set) to be analyzed, identifies the data type, and creates directories for use in organizing the data of the data set. Theview editor 216 sets up the user's raw data for viewing by the interactive tools and graphics.Vector programs 222 create high-dimensional context vectors that represent attributes of the records of the data set.Cluster program 224 groups related records near each other in a given space (cluster) to enable a user to visually determine relationships.Projection programs 226 convert high-dimensional representations of the records of a data set to a two-dimensional or three-dimensional representation that is used for display. The databases and files 260 contain data used in conjunction with the present invention, such as themeta data 230. - C. Architectural Operation
- 1. Data Collection (Data Set Editor)
-
FIG. 3 illustrates an implementation of processes performed to define and enable the formatting of a selected data set, as performed by the data seteditor 212. A data file to be used as the source for the subsequent analysis is requested (step 302). After a file name, data type and directory location is entered (step 304), the process determines and validates the data type indicated by the user (step 310). The validation process first determines whether the data of the source data file is in a common sequence data format (step 312). If the data is not one of the common sequence data formats, the process determines whether the data is an array of data consisting of numeric, categoric, sequence, or text (step 314). If the data is not a data array, the process determines whether the data is free form text (step 316). If the data is not free form text (step 316), an error message is generated (step 320). - If the validation process determines that the data is sequence data, such as genome sequence data (step 312), the process determines whether the sequence data is in FastA file format (step 322) or whether the sequence data is in a SwissProt file format (step 324). An example FastA input file is provided in Appendix B. The operations and data associated with processing sequence data is discussed in more detail in U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Methods and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed on Sep. 30, 1999, by Jeffrey Saffer, et al. If the sequence data is not in either of these formats, an error message is generated (step 320). If, however, the data is either a FASTA file (step 322) or a SwissProt file (step 324), the appropriate formats and delimiters, as discussed herein, are determined to be used for the respective FASTA file or SwissProt file (step 330). After the appropriate format/delimiters for the data type are determined (step 330), the corresponding format file/record delimiters are established (step 340). The format file/record delimiters specify the valid formats for reading the files and identifies the meta data files that are to be used for subsequent processing of the data set as discussed herein.
- A
file directory 360 is created for storing the meta data files associated with the data set (step 350). Thefile directory 360 includes a document catalog file (DCAT) 362 and a data set properties file 364. TheDCAT file 362 is used as a master index for all records in the data set. The indexes stored in the DCAT file are used to integrate the information associated with the various views selected for the data set. For example, the DCAT file 362 contains indexes that associate all the data of a data set with a particular view, although only a subset of the data set is used to create the view. The properties file 364 is also produced and stored in the file directory and contains information about the source data files for the view, including their type (corpus type), the number and full path (location) for the source files, the format used, and the date created. In addition, the properties file keeps track of subsequently processed views including the subdirectory where those views reside. An example properties file is provided in Appendix A. -
FIGS. 4 a and 4 b depict exemplary screen shots presented on a display monitor to a user for defining a new data set (i.e., collection of records) using data seteditor 212. A user names and defines a data set using the data seteditor 212. When the data set editor is selected, agraphical interface screen 400 is presented to a user for use in defining options or parameters associated with the data set. For example,graphical interface screen 400 is presented to a user when the user selects thesources tab 410. - The user may enter a name for the data set in a
field 412 and may specify the data set type as indicated by theselection options 414, such as array data, protein or nucleotide sequences, or text. The source of this data set may be specified in thefield 418 as indicated by the directory andsubdirectory specification 420. The user may select the add, view, or deleteoptions 424 to perform the function indicated by the name on the data set source. The user may save the data as indicated by theoption 426 or continue to a new view as indicated by theoption 428. - By selecting the
format tab 440, the user may specify how fields contained within the source file are delimited by selection of afield delimiter option 442. The field delimiter options illustrated include an option to delimit the field by a colon, comma, space, tab, or a user defined delimiter. - 2. Analysis and View Setup (View Editor)
-
FIG. 5 a illustrates a preferred implementation of a process used for creating parameters to be used in defining the type of analyses or views for a data set, as performed byview editor 216. The user may enter this information using a graphical interface as depicted inFIG. 619 , which showssource file tab 604,format tab 610,preparation tab 630, processing tab 660,clustering tab 680, and projection tab 690, respectively. - The user is first requested to name the view (step 510) and also is requested to identify the directory locations of the source files (step 520). The user is requested to specify the format of the source data (step 530).
FIG. 6 b is a screen display showing the options presented to a user when theformat tab 610 is selected. The user may provide in theformat file field 610, a file to use for formatting the view such as medline 31.fmt. The user may also specify a stop words file such as the default text stop file shown in thefield 614. This stop words file is a list of words that the text engine will ignore during analysis. The user may input a file to specify the default punctuation of the file as indicated by the default.punc file indicated in thefield 616. The punctuation file tells the text engine how to handle non-alphabet characters. For each of the files requested, the user may use the default file specified by the system or choose another. The user may select or view any of the files of the format screen ofFIG. 6 b by selecting theselect option 620 or theview option 622. - The user is also requested to provide preparation parameters (step 540). The processes associated with
step 540 are discussed in more detail inFIG. 5 b. The user may specify vector creation, cluster, and projection parameters to be used in constructing a view (steps FIGS. 5 c and 5 d, respectively. - Referring to
FIG. 5 b, the view editor processes are discussed. The view editor first checks the data type (step 541) by evaluating whether the data is sequence data (step 542). If the data is sequence data, sequence specific preparation information is requested (step 543), such as requesting number and length of n grams, SEG parameters, substitution filter values, and motif pattern file parameters (step 544). If the data is not sequence data (step 542), the process determines whether the data is numeric data (step 545). If the data is not numeric data, no preprocessing or preparation information is required for text information (step 546). If the data is numeric data, a display screen that requests numeric data and preparation information from a user (step 547) is presented. The numeric preparation data request may include column/row specifications, operation sets, and clustering fields (step 548). -
FIG. 5 c illustrates a preferred implementation of the processes associated with gathering vector creation parameters within the view editor 216 (FIG. 2 ). Theview editor 216 first checks the data type (step 551). If the data is sequence data (step 552), sequence specific text engine parameters are requested or obtained for the particular data set (step 553). The text engine parameters requested may include the number of topics/cross terms, topicality settings, use association t/f parameters, associated matrix threshold parameters, and record filter ranges (step 554). - If the data is not sequence data (step 552), the view editor determines whether the data is text data (step 555). If the data is text data, text specific text engine parameters are requested from the user (step 556) such as the text engine parameters discussed above (step 554). If the data is not text data (step 555), no user specified parameters are needed and default parameters may be used (step 557). The text engine parameters may be used if desired (step 554).
-
FIG. 5 d illustrates a preferred implementation of a process for specifying clustering parameters. Various types of clustering may be used such as k-means or hierarchical clustering as known to those skilled in the art. Theview editor 216 presents a display screen to the user for the user to specify the clustering choice (step 561). The process determines whether k-means clustering has been chosen (step 562). If k-means clustering is requested (step 562), k-means clustering parameters are requested from a user or obtained (step 563) such as the number of clusters, the number of iterations, the cluster seed method or whether correlation order is to be used (step 564). If k-means clustering is not requested (step 562), the process determines whether the user desires hierarchical clustering (step 565), and displays or gets hierarchical clustering parameters (step 566). The hierarchical clustering parameters may include determining the number of clusters or cluster coherence values to be used and whether the user desires correlation order for the clusters may be determined (step 567). If hierarchical clustering is not desired (step 565), no parameters are required (step 568). - Referring to
FIG. 6 c, when thepreparation tab 630 is selected, the user is presented with adata specification option 632, anoperation set option 640 and aclustering selection option 650. The user may enter a value for the columns in thefield 634. For the data set specified, the user may identify the type of data, such as numeric data, categorical data, sequence data, or text data by selecting adata type 635. The user may specify thecolumns 636 in which that data type is located and may specify a field name for that specific data as indicated under thefield name 637. Apredefined selection field 638 may be used to specify the types of data for the field name and columns provided. - A user may perform any number of mathematical manipulations on the numeric data (one or more manipulations or transformations of the data is referred to as an operation set). These options include various logarithmic operations, methods for normalizing data, methods for filing missing data points, and all algebraic functions. Referring to
FIG. 6 d, for example, the reciprocal or the value for each numeric data item may be requested and then the logarithm taken for that reciprocal, creating anew field 642 called Operation Set1. -
FIG. 6 e shows the screen displayed if theclustering selection tab 650 is selected. The user is presented with a set of field/trench forms 652 for which clustering operations may be applied. In the example illustrated, operation set 1, ornumeric field name 1 may be chosen for clustering. - Referring to
FIG. 6 , for a sequence, the user may have motifs/n-grams, complexity filtering, exclusions, and amino acid substitutions options from which to select. Operation on or with sequence data is discussed in more detail in U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Method and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed Sep. 30, 1999, which is expressly incorporated herein by reference. If the user wants to represent the sequence as a high-dimensional vector based on the occurrence of functional or structural motifs, a file is specified which defines those motifs. The user can have that vector based on the number of occurrences of each motif or, if desired, have the vector based on a binary format (the motif is either there or not) by checking the single motif output option. Alternatively, or in addition, the user may specify any combination of overlapping n-grams to be created to represent the sequence infield 654. The user also has the option to specify whether the n-gram should be included based on number of occurrences within the sequence. If neither motif nor n-gram options are selected, the program will analyze the text (e.g., annotations) associated with the sequence records. The complexity filtering options provide the user the ability to include the entire sequence or eliminate regions of low or high complexity, for example, using the public domain tool SEG. The user may also specify certain records to be excluded, for example, based on sequence length, or title, by selecting options in the exclusion interface. Finally, the use of amino acid or nucleotide substitutions can be defined in the Amino Acid Substitution interface. - Referring to
FIG. 6 g, the options provided to the user for processing data is illustrated. The user may use a sliding scale to specify the magnitude or weight to give to associations as indicated by theassociation field 672. The user may enter the number of topics to be used in thefield 674. The topics are the features that describe the vectors. For text, these are the vocabulary words that best describe the thematic content of the records; for sequences, the topics are the n-gram vocabulary words that best distinguish one sequence from another. The user may specify the requested number of cross terms as indicated in thefield 676. Cross terms are the vocabulary words that are not topics. The user may specify the number of times that the topics may appear in a record before being identified as a topic and an upper limit may be included as well as indicated in the fields 678 a and 678 b. In the field 679 a and 679 b, the user may specify the number of times that the terms must appear in other documents by specifying a lower limit in field 679 a and an upper limit in field 679 b. These fields are used as filtering fields for processing. The topicality method forFIG. 6 g is ‘Specify the settings by the number of terms.’ - Referring to
FIG. 6 h, the topicality method for the processing option is specified as ‘Specify the settings by threshold.’ The user may use the slidingscale field 680 to specify the number of associations needed. The user may use a sliding scale input for identifying the minimum topicality for topics weight and the minimum topicality for cross terms as indicated by thefields fields - Referring to
FIG. 6 i, the user may specify a topicality method that automatically calculates the setting for the view all indicated in the display screen illustrated. The user may use a sliding scale selection field that specifies the weights of association as indicated by the field 689. Referring toFIG. 6 j, the user may specify the weights of association for the topicality method that automatically calculates the settings with emphasis on local topics. - Referring to
FIG. 6 k, when a user selects the clustering tab 690, the user may specify the preferred clustering method such as hierarchical or k-means. When hierarchical clustering is chosen, the user may select an option to compute clusters based on coherence. The user may indicate the number of clusters, and the cluster coherence. The user may also select whether to correlate the order after clustering. Referring toFIG. 6 l, the graphical interface used for specifying the parameters of the k-means is illustrated. The user may specify the number of clusters or the number of iterations to be used for the k-means. When k-means is used, the user may select the cluster seeding parameters such as using random seeding or using dimensional seeding. The seeding may also occur by using the computer's internal clock (system time) to seed random number generator. The user may alternatively specify a value for the random generator seed. - Referring to
FIG. 6 m, the user may select the type of projection to use by selecting the projection tab 695. The user may select cluster cohesion, cluster area, or cluster spread. When the user selects either of these options, the user may use a weighted scale for each of the options to identify the weight to be associated with each projection option. - 3. Common Formatting, Vector Creation, and Index Creation
-
FIG. 2 b illustrates vector creation engines consistent with the present invention. In a preferred implementation,vector creation programs 222 include anumeric engine 222 a, and atext engine 222 b. - Referring to
FIG. 7 a, the general processes performed by the processing programs are discussed. Certain types of data, such as sequence data, is preprocessed (step 702) prior to data being input into the text engine. The sequence data is modified to a form that is acceptable to the text engine for generating the high-dimensional context vectors. - High-dimensional context vectors are created based upon the attributes of the objects or records to be used for a view and vector indices that correspond to the particular view are created and stored in a vector file associated with the data set (step 706). The vectors are clustered using known clustering programs based upon information from the vector files (step 708). The cluster assignment file (.hcls), as discussed below, is created (step 708). Two dimensional coordinates of the records and centroids are calculated for creating a two dimensional projection of the clustered vectors (step 710). Two dimensional coordinate files are created (.docpt) for each document.
- i. Vector Creation and Formatting
- The visualizations discussed herein are based on high-dimensional context vector representations of the data. Thus, each type of data is represented in that manner. For purely numeric data, the vector representation is simply the values associated with each record attribute. For categorical data, the vector representation can be based on any method that translates categorical values or the distances between values as a number. For text data, the vector representation can be derived by latent semantic indexing as known to those skilled in the art or by related methods, such as described in U.S. patent application Ser. No. 08/713,313, entitled “System for Information Discovery,” filed on Sep. 13, 1996, now issued as U.S. Pat. No. 6,772,170. For sequence data, the context vector can be derived from any combination of numerical or categorical attributes of the sequence or by methods described herein. In addition, a user skilled in the art will recognize that the vectors created for each record do not have to be created from a single data type. Rather, the vectors can be created from mixed mode data, such as combined numeric and text data.
- Not only are high-dimensional vectors created for each record of a data type, but also a common method is used to store that information about the records and their vectors so that later processes can access the data. Methods consistent with the present invention create a group of meta data files through the action of a series of computational steps (collectively referred to as the numeric engine) alone, or in conjunction with another series of computational steps, referred to as the text engine. The files that are produced are binary, for reasons of access speed and storage compactness. The files produced during vector creation are discussed below in more detail.
- Unless otherwise noted, the files discussed below have the following characteristics: (1) Files are binary, and remain within a directory established for the analysis; (2) IDs and positions are 0-based; (3) Terms have been converted to lowercase, and are listed in ascending lexical order; (4) Record IDs are listed in ascending order (5) Index files (.<x>_index) contain cumulative counts of records written to the file they are indexing (.<x>). This cumulative count is for the current record and all previous records. This cumulative count is equivalent to the record no. of the next record; (6) Internal Numerical representations in Sun Microsystem Operating System are:
TermID (4 bytes) TermCount (4) DocID (4) DocCount (4) streampos (4) double (8) - Although the examples provided refer to flat file storage of the relevant information, one skilled in the art will recognize that a database could equally serve as the method for storing and retrieving the meta data.
- The files produced during vector creation are:
.dcat (document catalog) number of records in the source file for each record (line number-2 is the record id) Source file id Starting byte offset with the source file Length (in bytes) of the record .tl (title file) for each record (line number-1 is the record id) title field .docv (vector file) no. of records in view no. of dimensions for vectors (= no. of topics) for each record for each dimension coordinate value (float) - ii. Visualization and Formatting
- The visualization methods keep track of the location of the record representation and may use an object-oriented design. One type of visualization that is especially effective with high-dimensional data is a proximity map or a galaxy view. This and related visualizations can take advantage of methods to group the records in the high-dimensional space (clustering) and to project the arrangement of objects in high-dimensional space to two or three dimensions (projection).
- Clustering can be by any of a number of methods including partition methods (such as k-means) or hierarchical methods (such as complete linkage). Any of these type methods can be used with the present invention. Despite the different methods, the computational processes that carry out the clustering create a common set of meta files that allow the chosen visualization method to access the clustering information, regardless of original data type.
- The files produced during cluster analysis are:
.hcls (cluster assignment file) This file contains the assignments for each record to a cluster. The format of the file is as follows: Number of total Clusters For each cluster (in correlation order) Cluster ID Cluster vector as determined by taking the average of the record vectors assigned to the cluster Number of Records in the Cluster The record id's of the records assigned to the cluster - After the .hcls file is produced, it may be resorted in correlation order (a user-definable option).
An example .hcls file: 9 (number of clusters) 6 (cluster ID) 0.0457451 0.0399342 0.0864002 0.0652852 0.0635923 0.0429373 0.0650352 0.0661765 0.0487868 0.0885645 0.10 0173 0.0482019 0.048553 0.091455 0.0991594 (cluster vector) 4 (number of records in the cluster) 7 (record ID) 4 (record ID) 3 (record ID) 5 (record ID) 5 0.0392523 0.0364486 0.0897196 0.0626168 0.0598131 0.0364486 0.0616822 0.0794393 0.0448598 0.0925234 0.11 215 0.0429907 0.0420561 0.0962617 0.103738 1 6 1 0.0341207 0.0209974 0.0918635 0.0682415 0.0603675 0.0314961 0.0629921 0.0656168 0.0393701 0.11811 0.1049 87 0.0393701 0.0393701 0.112861 0.110236 1 8 3 0.0587949 0.0578231 0.0739416 0.0695847 0.0651338 0.0544486 0.0705118 0.0665825 0.0739358 0.0612976 0.07 11892 0.0697833 0.0711892 0.0645948 0.0711892 3 12 13 2 - iii. Projection and Formatting
- Projection can also be by any number of methods, for example, multidimensional scaling. Like cluster analysis, a specific projection method is not required for use with the present invention. However, as with clustering, the results of that projection are stored in a common format so that the visualization operations can retrieve the data independent of the original data type.
- Files created during projection from high-dimensional space to 2 or 3 dimensions are:
- .cluster (2-D Coordinates for the Cluster Centroids)
-
- This file contains the 2-D coordinates for placing the cluster centroid on a galaxy view). For each cluster, a single line in the file contains:
- Cluster ID
- X coordinate
- Y coordinate
- An example cluster file:
6 0.770783 0.831761 5 1 1 1 0.920542 0.989886 3 0.073888 0.210541 7 0.0206639 0.109404 4 0 0.13854 0 0.0187581 0.153266 2 0.139079 0.0695485 8 0.374849 0 - .docpt (2-D Coordinates for the Individual Records)
-
- This file contains the 2-D coordinates for placing the records on the Galaxy
- For each record, a single line in the file contains
- Record ID
- X coordinate
- Y coordinate
- Cluster ID that the record belongs to
- Example of a .
docpt file 0 0.374849 −4.46282e−07 8 1 0.0300137 0.145639 0 2 0.0890008 0.222 3 3 0.861783 0.90898 6 4 0.745403 0.813245 6 5 0.84583 0.896318 6 6 1 1 5 7 0.630116 0.708499 6 8 0.920542 0.989886 1 9 0.0206639 0.109405 7 10 0.0206639 0.109405 7 11 −4.91018e−08 0.1385 4 - Note that the X and Y coordinates in the cluster and .docpt files are represented by a number between 0 and 1 inclusive. Also note that analogous file structures would be used for a 3D projection.
- iv. Data Linkage and Formatting
- Advantageously, the present invention enables linkage among all visualizations and data types (text, categorical, numerical, or sequence). Prior methods simply enabled linkage between views of the same data visualized using different attributes or visualizations. In addition to the attributes used to create the visualization, other attributes or descriptors for each data record are linked and readily available for interaction. These interactions are possible with any of the data types. That is, additional attributes related to a record, as well as those used for vector creation, are equally available regardless of data type. This is accomplished through the use of a common set of file or database structures created by the numeric or text engines. These files store information about each record attribute, which itself can be any of the data types. These files are created during an initial processing of the data and are independent of the specific visualization method to be employed. These files provide a common framework that can be addressed by any visualization or interactive tool through an API.
- The files created to store and manage the ancillary data, such as data not used in creating a view, are:
.headings (used for data input through a matrix array only) for each record (line number-1 is the record id) name of the column heading .vocab (text) for each term in the view term (i.e., a word) .vocab_index for each term in the view cumulative no. of chars written to .vocab (including ♯n's); .field_off for each record for each field defined in the format file starting position (in bytes) of the field from the start of the record and the number bytes in the field .corrv for each correlatable field defined in the format file number of unique values of field for each unique value of the field number of records that contain the unique value record id's of the records that contain the value .ifi (inverted file index) for each term in the view for each record containing that term doc ID frequency of term within the record .ifi_index for each term in the view cumulative no. of records written to .ifi .docterm (document term file) for each record for each term in the record term ID frequency of term within the record .docterm_index for each record cumulative no. of records written to .docterm .topic (topic file) no. of topics minimum topicality for topics minimum no. of docs containing a topic maximum no. of docs containing a topic no. of cross terms minimum topicality for cross terms minimum no. of docs containing a cross term maximum no. of docs containing a cross term for each major term (topic or cross term) term ID topicality no. of docs containing the term term strength (4 bytes; 0=MINOR_TERM, 1=CROSS_TERM, 2=TOPIC_TERM) .rel (Association matrix file) no. of major terms no. of topics conditional correction for each major term for each topic relation value of major term to topic (values are encoded as four-bits and packed into bytes) four zero bits to pad last byte for major term, if needed - In each of the above files, “terms” refer to text vocabulary words; “topics” refer to text vocabulary words deemed by statistical analysis to be most likely to convey the thematic meaning of the text; and “crossterms” refer to text vocabulary words that provide some meaningful description of the text content but are not topics. U.S. patent application Ser. No. 08/713,313, entitled “System for Information Discovery,” filed on Sep. 13, 1996 discusses topics and crossterms in more detail.
- Many of the binary files are paired, with the first file holding the information, and the second providing an easily accessed index into the first. For example, the inverted file index consists of .ifi and .ifi_index files. Each index is a list of the cumulative number of records in the data file.
- Together these files provide indexing of and access to the textual information associated with each record including the distribution of keywords within each record and co-occurrences of those keywords. Furthermore, the files provide a catalog of all the categorical data including the distribution of the values. For numerical attributes not used in the actual vector representation, additional files are created using the .docv format so that this type of ancillary information will also be readily available to establish interaction among the various views.
- The processes associated with producing the series of common files described above are depicted in
FIG. 7 b. Referring toFIG. 7 b, the text engine (730) creates the files associated with text or categorical fields. The expected input for the text engine (block 730) is a tagged formatted file. For text data sets, the input is either the original format for the input or the result of a processing step to identify the beginning and end of each record along with special information, such as the record title. An example original input file to the text engine is provided in Appendix C. - For sequence data in the commonly used formats FASTA (720) or SwissProt (722), a software module (724) reformats the input file to contain a series of fields that delineate the initial input and meta data created for the vector representation (726). The reformatting and processing of sequence data is discussed in more detail in the U.S. patent application Ser. No. 09/409,260, now issued as U.S. Pat. No. 6,898,530, entitled “Method and Apparatus for Extracting Attributes from Sequence Strings and Biopolymer Material” filed Sep. 30, 1999, which is incorporated herein by reference. Once in this tagged format (726), the text engine (730) is able to create all the required meta data files.
- Numerical data, or any other data presented in a data matrix, (750) is received at the numeric engine (752). The data in the input file can be tab delimited or use any other delimiter. The numeric engine (752) creates the record vectors for data presented in a data matrix instead of the text engine. In addition to the numerical columns, the user may specify other columns within the table that can contain textual, sequence, or categorical information or additional numerical data that will not be used for the vector created. Usually, each row in the table becomes a record; however, the user can choose to make each column the record. Each user-defined set of columns becomes an attribute (also called fields) within the record. A set of numeric columns is specified by the user for subsequent clustering. The other fields, which can be numeric, text, categorical, or sequence, will become attributes of the record that can be queried, listed, or otherwise made available within the interactive tools.
- If categorical data is specified by the file format (
FIG. 8 ), as indicated by theindex 804 for the view used, categorical data is processed during the text engine processing steps for all types of data. The categorical data shown inFIG. 8 records where each unique character strain and the categorical field occurs in the data set. Thus, subsequent categorical tools are enabled to correlate various records based upon the categorical values. - Each field expected in the input file is defined by a section beginning with ∥F followed by the field number (e.g., ∥F0). For each field, the name is defined (in this case, title). Then the type of field is defined; this could be string (text or categorical), numeric, or sequence. Next, the delimiter tag for the field is defined. The METHOD line indicates whether the field is on a single line or continues to the next field. The DOC_VECTOR line tells the clustering module whether to use this information in the cluster analysis. The next item designates whether the field should be accessible within the query tools. The CORR line determines whether the contents of the field should be indexed for all possible associations. The next item defines whether the content is case sensitive or not. The following lines describe the behavior of the delimiter tag. WHOLE_BOUNDARY indicates whether the tag must be a single word or could be embedded within other text; LINEPOS indicates whether the tag must start at the beginning of a line or may be found elsewhere. Similar information would be given about each field in the data. This format file is stored in a directory associated with the view created.
- Referring again to
FIG. 7 b, the numeric engine (752) is executed on the set of columns that the user specified for clustering. The numeric engine (752) performs any number of user defined mathematical operations and creates a record vector that is identical in format to those produced for sequence or text data. In contrast to the text engine (730), which automatically determines the features to use in the record vector, the vector creation in the numeric engine (752) utilizes a user specified set of columns from the users column/row formatted source file. - Once the record vector is created (758), the numeric engine automatically creates a text engine compatible source file (i.e., reverse engineered tagged text file, 754), and corresponding format file (756) from the input column/row formatted table. An example format file produced from the numeric engine is shown in Appendix D. The new tagged text source file and format files (726) are used so that any text, categorical, or sequence information that may have been embedded within the original column/row files, can be processed by the same programs that operate on text, categorical, or sequence information. This subsequent processing is performed by the text engine (730), which reads the reverse-engineered tagged text source file and indexes the textual and/or categorical data fields within each record (732, 734 and 736). The result is a standardized set of meta data which is related to the user source data and which is available to all tools regardless of data type.
- Although the numeric engine processes numerical data, the processing steps of the numeric engine places any of the other data types (text, categorical, or sequence) into an appropriate tagged field in the data file so that the text engine will handle it appropriately.
- In summary, if the data input is array data, the array data (column/row formatted tables) is processed by the numeric engine (752). The
numeric engine 752 creates a second vector that is identical to the format of the context vectors for sequence and text data produced by the text engine (730). However, in contrast to the text engine, which can automatically determine the features to use in the second vector, the numeric engine 1052 accepts a user defined series of mathematical operations to be performed on specified columns of the array data source file. In order to make the non-numeric contents, such as annotated notes, associated with the array file accessible for subsequent analysis, a format file is produced and a tag text format file is produced for the non-numeric contents associated with the numeric file. The associated non-numeric contents is used as an input to the text engine and the output is associated with the numeric data. Thus, the textual or categorical data associated with the numeric array data may be indexed and associated with the data as produced for other text data sets that are input to the text engine (730). Plain text data should be in a tagged text format and does not require any pre-processing prior to input to the text engine (730). - 4. Clustering
-
FIG. 2 c illustrates clustering programs. Three clustering modules or options k-means 224 a, cluster-sid 224 b, andcorrelation order 224 c are provided. The clustering options may have a set of user definable parameters. The k-means module 224 a clusters documents by establishing a user specified number of seed clusters and then iteratively assigns documents to those documents until a user specified number of iterations is reached or the process/algorithm determines that all the documents have been assigned to the clusters. - The k-
means module 224 a moves documents to minimize the sum of squares between objects and centroids as known by those skilled in the art. The cluster-sid 224 b is an agglomerative/hierarchical clustering method that minimizes the maximal between clusters distance (farthest neighbor method). The output of the clustering process is a file containing a correlation ordered list of clusters and the record's IDs of their members. Those skilled in the art will recognize that other clustering algorithms can be used. -
FIG. 9 shows a clustering process performed by the processing unit. A vector file is received from the stored context vector files (step 760) at the cluster implementer (step 904). The user specified clustering parameters are retrieved from stored files (step 906) and the clustering program and parameters associated with the files are determined (step 908). The clustering parameters associated with the clustering program are provided to the cluster implementer (step 904) and the clustering program associated with the vector file of the data set is selected (step 910). The clustering programs are chosen from a k-means clustering program (block 912), a hierarchical clustering program (block 914), or no clustering is selected (block 916). After the clustering program performs its operations (step 910), a cluster assignment file (.hcls) is created (step 920). - 5. Projection
-
FIG. 2 d illustrates projection programs 226. Systems consistent with the present invention may apply three separate processes to produce the meta data used to produce visualizations. These processes are carried out by three modules, the PCA-clusters module 226 a, atriangulation module 226 b, and adocument projection module 226 c. The PCA-clusters module 226 a determines the principle components for each cluster and then determines the two dimensional coordinates for projecting the cluster centroids as known to those skilled in the art. Thetriangulation module 226 b determines the boundaries for the area around each cluster centroid. These boundaries are later used in thedoc projection module 226 c to take into account the influence of records and neighboring clusters when determining how far from the center and on what side of the cluster centroid a record will be projected. Thedoc projection module 226 c determines the x,y projection coordinates for each record in the visual analysis. - Referring to
FIG. 10 , the processes associated with creating a two dimensional projection from the cluster assignment files is illustrated. The cluster assignment file (.hcls) is retrieved from storage (step 1002) and the principle component analysis of the cluster centroid vectors are performed (step 1004). Two dimensional coordinates for the cluster (.clster) are created (step 1008). Delaunay triangulation is performed (step 1010) based on the vector file retrieved from storage (step 1012) that is associated with the data set. Nearest neighbor assignments are associated with the Delaunay triangulation results (step 1014). The projection program determines the two dimensional coordinates for each record (step 1018) based upon the vector files retrieved from storage (step 1012). The projection program also accesses and retrieves the cluster assignment file (.hcls) (step 1020) associated with the data set. The two dimensional coordinates for the group of documents of the data set are stored in a document file (.docpt) (step 1030). - 6. Graphic Modules and Tools
- Referring to
FIG. 2 e, the interactive tools and graphics modules are illustrated. The interactive tools andgraphics modules 240 include agalaxy module 240 a, amaster query module 240 b, aplot data module 240 c, arecord viewer module 240 d, a query (word)module 240 e, a query byexample module 240 f, agroup module 240 g, agist module 240 h, and asurface map module 240 i. - The
galaxy module 240 g displays records as a scatter plot. Themaster query module 240 b applies a correlation algorithm to all indexed categorical data and creates a two dimensional matrix with values of a category along each axis. At each intersection in the matrix, a rectangle is drawn with sections colored to show the correlation between the categories. The following are analytical tools. Theplot data module 240 c displays a two dimensional line plot of the n-dimensional vectors created for analysis by the user, this is done for all records in the analysis or just those selected by the user. This module can also be used to examine any ancillary numerical attributes associated with the records. Therecord viewer module 240 d displays a list of the currently selected documents, displays a text of a document, highlights terms selected by other tools, such as thequery tool 240 e. Thequery tools query tools query tool 240 f performs Boolean or phrase queries in any text or categorical field based on a users input. Thequery tool 240 f also performs n-space queries based on the user's input and compares the input to the n-dimensional vector used for clustering. Thus, vectors that correspond to the user's input can be identified and highlighted. Thenumeric query tool 240 f performs queries based on numeric values. Thegroup tool 240 g enables users to create groups of records of a data set, based on queries or based on user selections, and colors the groups for display in the galaxy visualization created by thegalaxy module 240 a. Thegist tool 240 h determines the most frequently used terms in the currently selected set of records. Thesurface map module 240 i provides a surface map that shows records and a plurality of attributes associated with those records. - Referring to
FIG. 11 , a table is shown that illustrates meta data files that result from statistical analyses and indexing of the data sets consistent with an embodiment of the present invention. The table also depicts the meta data files that are used for the various interactive tools and graphics modules. All of the meta data files except for the tab delimited column/row file, the tagged text source file(s), and the re-engineered tag text file are defined by the data set name or view name as created by the data seteditor 314 or view editor 316 (FIG. 2 ) plus an “.extension,” such as [data set name].dcat or [view name].cluster. The meta data files include a data set name.dcat file, a data set name.properties file, a view name.clsp file, a view name.cluster file, a view name.corrv file, a view name.dcat file, a view name.docpt file, a view name.docterm file, a view name.docterm index file, a view name.docv(vector) file, a view name.edge file, a view name.fieldoff file, a view name.gif file, a view name.groups file, a view name.fmt file, a view name.hcls file, a view name.headings file, a view name.ifi file, a view name.ifi index file, a view name.properties file, a view name.punc file, a view name.rel file, a view name.repository file, a view name.stop file, a view name.tl file, a view name.topic file, a view name.vocab file, a view name.vocab index file, a tab delimited column/row file, a tag text source file(s), and a re-engineered tag text file. The table indicates which program modules create, read or update files as indicated by the letters C, R, and U, respectively. For example, the view name.clsp file is created by the view editor 216 (FIG. 2 b) and is read by the k-means module 224 a and the cluster-sid module 224 b (FIG. 2 c) and is read by thegalaxy module 240 a (FIG. 2 e). The view name.groups file is updated by thegroup module 240 g. All file access is performed through the API layer (FIG. 2 a). - After the clustering and projection processes have been completed, the user may now view the results of the various operations performed on the user's data set. As discussed above, prior methods of visualization do not adequately provide access to relationships among attributes of data records other than those used in creating the visualization and, consequently, do not enable the identification of relationships between attributes of different visualizations or views. A system operating according to the present invention enables a user to identify relationships among different visualizations or views by maintaining all attributes associated with the data record for indexing although all attributes are not used in creating the visualization. Referring to
FIG. 12 , the processes consistent with an embodiment of the present invention used to link different visualizations or views is discussed. When a user is viewing a particular visualization or view, the user may request to identify the relationships that exist between the attributes used to create the current visualization with the attributes used to create another visualization (step 1202). After the user initiates a request to explore the data of another view (a target view) an index file associated with the user's current view or data set is accessed (step 1210). After the index file is accessed (step 1210), the process determines whether objects selected by the user in the current view, such as by initiating a query, correspond to objects of a target view based upon all of the attributes contained in the index file (step 1220). If objects of the target view or file correspond to the selected objects of the current view, the objects of the target view are highlighted (step 1230). Therefore, relationships among attributes of data records other than those used in creating the visualization can be used to identify relationships of another visualization as discussed in connection withFIG. 1 . - Methods and apparatus consistent with the invention also provide tools that allow a user to display information interactively so that the user can explore the information to discover knowledge. One such tool displays a set of records and their associated attributes in the form of superimposed two-dimensional line charts. The tool can also generate a single two-dimensional line chart that provides the average values for the attributes associated with the set of records. Each of these charts are linked to other views, such that a record selected in the charts is highlighted in the other views, and vice versa.
- Another tool generates summary miniplots that may be quickly used by a user to obtain an overview of the attributes associated with a particular group of records. In particular, records shown in a scatter chart are organized into groups. The average values for the attributes associated with each group of records is used to form a two-dimensional line chart. The line chart is superimposed on the scatter chart, based on the location of the set of records.
- As described above, one basic visual tool implemented by the invention for viewing information is a “galaxy view” as produced by the galaxy tool 350 a. A galaxy view is shown in
window 120 ofFIG. 1 . The galaxy view is a two-dimensional scatter graph in which records are organized and depicted in groups (or “clusters”) based on relationships between one record and another. In addition to this galaxy view tool, the invention provides numerous interactive visual tools that allow a user to explore and discover knowledge. -
FIG. 13 describes one method for displaying information interactively, in the form of two-dimensional line charts. The method begins with the user selecting a set of records and a set of attributes associated with those records (stage 1305). The attributes may comprise any of numerous data types, including the following: numeric, text, sequence (e.g., protein or DNA sequences), or categoric. The selected attributes are converted into numerical values, as discussed above. - Next, a two-dimensional line chart is generated to visually depict the records and their associated attributes (stage 1315).
FIG. 14 represents a preferred implementation of two-dimensional charts that are consistent with the invention.FIG. 14 containsline chart 1405, andlegends -
Chart 1405 contains a collection of superimposed line charts that depict a set of records. For example,line chart 1420 depicts one record within the set, whileline chart 1425 depicts another. In the line charts, the x-axis (e.g., as shown by 1410) represents attributes associated with the records, and the y-axis (e.g., as shown by 1415) represents the value of each attribute. The scale of each axis and the colors of the line charts may be modified by the user. Although this description focuses on line charts, other types of charts may be used to depict a set of records, as shown for example by the point chart shown as 1505 inFIG. 15 .Legend 1440 contains a text-based description of records. For example,legend 1440 contains a record described as “122C”, as shown by 1445.Legend 1450 contains a text-based description of attributes. - Methods consistent with the invention can also generate a two-dimensional line chart that shows relationships between the records shown in 1405 (stage 1320). For example,
FIG. 14 shows aline chart 1430 that depicts a statistical value corresponding to the set of records shown in 1405. In the example shown inFIG. 14 ,chart 1430 depicts the average attribute value for each record shown in 1405. In alternative implementations, however,chart 1430 may depict other relevant characterizations of the set of records, such as median attribute values, standard deviations (as shown by 1435), etc. - In addition to viewing the information in graphical form, the user can interact with the line charts. The invention is capable of receiving input from a user selecting a portion of a chart (stage 1325). This may be achieved, for example, by using a device to point to a portion of
map 1405 or by clicking a pointing device on a portion ofmap 1405. In response to this user input, the text-based description of the selected record and/or attribute is highlighted inlegends 1440 and 1450 (stage 1330). In the example shown inFIG. 14 , the user has selected record “122C”, as shown by the highlighting inlegend 1440. Similarly, the value of a particular attribute being pointed to incharts FIG. 15 , the user has selected attribute “RBC”, as shown by the highlighting 1515 in the legend and 1520 on the x-axis. - Furthermore, any selections made by the user on
charts chart 1405, an index, as discussed above, is analyzed to determine if the record is shown in another view (stage 1335). If the record is shown in another display (stage 1340), the visual representation of that record in the other view is altered (stage 1345).FIG. 16 is a diagram showing both (1)charts galaxy view 1605 of records. If a record is selected onmap 1405, the record is highlighted ingalaxy view 1605, and vice versa. Similarly, the group of records shown onmap 1405 may be highlighted in galaxy view 1605 (as shown by 1610), and vice versa. -
FIG. 17 describes another method of displaying information interactively, in the form of summary miniplots. The method begins with the user selecting a set of records and a set of attributes associated with those records (stage 1705). The attributes may comprise any of numerous data types, including the following: numeric, text, sequence (e.g., protein or DNA sequences), or categoric. The selected attributes are converted into numerical values, as discussed above (stage 1710). - Next, a two-dimensional scatter chart is generated to visually depict the records (stage 1715). An example of such a chart is
galaxy view 1805 shown inFIG. 18 .Galaxy view 1805 contains a collection of records, one example of which is shown as 1810. The records withingalaxy view 1805 are organized into groups (or clusters) (stage 1720), based on relationships between one record and another. - For each group shown in
galaxy view 1805, a two-dimensional line chart (summary miniplot) is generated that depicts some information about the records contained within that group (stage 1725). Each such summary miniplot is superimposed onto the two-dimensional scatter chart, based on the location of the group of records on the scatter chart (stage 1730). For example,chart 1805 contains a group ofrecords 1815, for whichsummary miniplot 1820 represents the average attribute values. In the example shown,summary miniplot 1820 is superimposed at the centroid coordinate for the records ingroup 1815. - In alternate implementations, summary miniplots may be used to represent other groupings of record. For example, the records shown in a scatter chart may be grouped into quadrants of the scatter chart; and four summary miniplots could be used to represent the quadrants. Furthermore, each line charts, such as
line chart 1820, can also be coded in a variety of ways (e.g., size, color, thickness of lines, etc.) to represent additional information (e.g., the variability within the group's records, the value of an unrelated field, etc.). - In addition to viewing the information in graphical form, the user can interact with the summary miniplots. The invention is capable of receiving input from a user selecting a summary miniplot (stage 1735). This may be achieved, for example, by using a device to point to a portion of
map 1805 or by clicking a pointing device on a portion of map 1105. InFIG. 18 , the user input constitutes selectinggroup 1825, as shown by the fact thatgroup 1825 is highlighted. In response to this user input, a graph is generated that contains a series of superimposed line charts, with each line chart representing a record (stage 1740). An example of such a graph is shown inFIG. 18 as 1830, which is a series of superimposed line charts that represent attribute values for the records selected by the user ingroup 1825. - Furthermore, any selections made by the user of a summary miniplot on
chart 1805 is propagated to other views. For example, in response to receiving input from a user selectingsummary miniplot 1820, an index, as discussed above, is analyzed to determine if the records represented bysummary miniplot 1820 are shown in another view (stage 1745). If the records are shown in another display (stage 1750), the visual representation of the records in the other view are altered (stage 1755). Similarly, if a user selects a record in another view, the summary miniplot corresponding to that record can be highlighted. - The preceding visualizations provide the opportunity to query records by attributes represented, e.g., by categorical and numerical values and by sequence of text content. Because the visualizations support a limited number of queries, the visualizations cannot analyze large associations efficiently. A multiple query tool creates a visualization that provides an overview of a large number of comparisons automatically, presenting the user with information, e.g., about associations and their expectation. Further, the multiple query tool also provides information about associations between clusters and attributes as well as associations between sets of attributes.
-
FIG. 19 provides an illustration of a multiple query tool visualization according to the present invention. The multiple query tool produces a visualization in the form of an interactive matrix that displays the requested associations and permits access to the underlying information. For example, the multiple query tool can provide links back to other open visualizations and tools, or stand alone as a separate visualization. -
FIG. 20 illustrates a process of creating a visualization using the multiple query tool. As shown instep 2010, the user accesses the multiple query in any common manner of a graphical user interface, for example, a tool bar button, a previous visualization menu, a pop-up box, or a main menu. - Visualization of data begins with the selection of a data file. As shown in
step 2020, a user selects a data file of interest. Alternatively, the data file can be preselected, when, e.g., the multiple query visualization is linked to another visualization analysis. - After a data set is selected, as shown in
step 2030, the user sets the type of query. As shown inFIG. 21 , a dialog box can be displayed to the user with a drop-down menu of query types. WhileFIG. 21 shows a selection between query types records vs. attributes, attributes vs. attributes, current data vs. historical data, and current data vs. expert data, other query types are within the scope of the invention. Once selected, the drop-down menu is rolled up to display only the selected query. - Upon selection of a query type, a dialog box specific to the query type is displayed so that the user can set the parameters of the query.
FIGS. 22A-4C display exemplary parameter-setting dialog boxes for query types shown inFIG. 21 . - For example,
FIG. 22A , a record vs. attributes querydialog box 2200 is displayed. In this query, records are correlated to selected attributes. In one of its aspects, the records can be viewed as clusters of the records, for example, as clusters such as those defined in the galaxy view of a previous visualization or those defined using any other process.FIG. 22A displays four attribute sources, although other sources could be displayed. - In
attribute source area 2210, labeled ‘Vocabulary Word(s),’ ofdialog box 2200, the user types in the word or words that serve as attributes. For multiple words, a delimiter, such as a semicolon, could be used to separate entries. Other processing could also intelligently separate the words. Also, logical operators, such as Boolean AND, OR, NOT, could be included to produce a single composite attribute. - Also, the user can identify attribute words by pointing to a text file that contains a list of words. The user can identify the text file in
attribute source area 2220, labeled ‘Vocabulary File.’ One format for this list would be a single keyword per line or a single phrase per line. With the text file, synonyms can also be identified. Vocabulary files including synonyms may have the following formats in one aspect of the present invention:Format 1 Keyword1: alt_word1A; alt_word1B Keyword2: Keyword3: alt_word3A Format 2 Keyword1 alt_word1A alt_word1B Keyword2 Keyword3 alt_word3A - The processing of the identified text file will operate on files of the format(s) of existing user files, so as to avoid issues of file format conversion.
-
FIG. 22A also illustratesattribute source areas attribute source area 2230, labeled ‘Category Field(s),’ the user types in the category or categories that serve as attributes. For multiple categories, a delimiter, such as a semicolon, could be used to separate entries. Other processing could also intelligently separate the categories. Also, logical operators, such as Boolean AND, OR, NOT, could be included to act on categories to produce a singlecomposite attribute 2250 illustrates an area to access selectable menu of categories in the database, in the format of, e.g., a drop-down box. To develop the menu, each record in the database is parsed to identify all possible categorical values. - In
attribute source area 2240, labeled ‘Category File,’ the user can identify attribute categories by pointing to a text file that contains a list of categories. Selecting categories from a file enables to the user to specify easily the order in which the categorical values would be displayed in the visualization and to allow the user to specify a hierarchy for those values. One format for the categorical value file is:categorical_value_1 1 (tab delimited lines with value indicating categorical_value_1.1 2 hierarchy level) categorical_value_2 1 categorical_value_2.1 2 categorical_value_2.2 2 categorical_value_2.2.1 3 - Further, to collapse the number of attribute columns, the categories could be combined, similarly to the use of synonyms, or, for hierarchical categorical data, the user could select a maximum hierarchical level. As shown in
step 2040 ofFIG. 20 , after the user selects the attributes, the database is queried using the multiple query. Instep 2050, the results of the multiple query are used to create a query matrix. - For example, as shown in
FIG. 23 , from the attribute words or categories, the multiple query tool creates a query matrix of record rows and attribute columns. The cells of the matrix are set to binary values indicating the presence or absence of the attribute in each record. When a vocabulary file with synonyms is used, a single matrix cell should be created for each keyword, and the cell is marked if either the keyword or any of the alternate forms are found. One method of determining the presence of attribute would be to search the original data file or any indexed files describing the distribution of words or categorical values within the data set. - Following creation of the query matrix, the query matrix is visualized, in
step 2060. One visualization is a binary, co-occurrence scheme, as shown inFIG. 24 , where cells having a value of “1” are marked in a color or shade, 2410, while cells having a value of “0” are marked in a different color or shade, 2420. The user can select a size of cells, so that more cells or less cells are shown in a display of the visualization. - To minimize the display, the user can select a visualization based on cluster rows. When large numbers of records are to be analyzed, the cluster row visualization could be set as the default.
- In this case, as shown in
FIG. 25 , the cells of the visualization matrix are set to indicate the presence or absence of the attribute in each record. To set the cell values, the query matrix is created or processed to create a composite value for a cell, for example, a basic scheme would involve summing the binary co-occurrence scores for a cluster and dividing by the number of records in the cluster. - When the matrix using cluster rows is visualized in
step 2060, cells are colored or shaded to indicate their composite values.FIG. 25 shows a binary co-occurrence shading scheme that illustrates the query matrix ofFIG. 23 , ifrecords clusters FIG. 26 , an overall visualization can be displayed as a three-dimensional view of the rows vs. columns vs. values, with the value of each cell represented by a cube at an appropriate height on the Z-axis. The overall visualization is rotatable, so that the user can view 2-D scatter plots corresponding to the rows and columns. A 2-D row scatter plot is shown inFIG. 27 . - Another more complex visualization, however, serves as the default when cluster rows are used. In this alternative visualization of cluster rows, the cells show association probabilities. The scheme of showing association probabilities would be to represent deviations as a difference from an expected value under a random distribution assumption. To calculate expected values, the total number of records containing each attribute, or the sum of the columns of the query matrix, is computed. Lower than expected values could be, for example, cool colors (blue (=−1) to green) and higher than expected will be hot colors (inverted black body with red=1). Deviations from an expected value under a random distribution assumption could also be represented as a ratio. Also, the probability of observing a number of attributes in a cluster of this size given this many total number of attributes are randomly distributed over all the clusters could also be represented. In this case, the values will range from 0 to 1 and the color display would have blue=0, white=0.5, and red=1; for example. To highlight extreme behaviors, the scale could be non-linear so that only the very high and very low probabilities are highlighted.
- To compute association probabilities either an exact or approximate method is used for each of the association methods of the present invention. The exact method is precise at the cost of being computationally intensive. The approximate method can reduce the number of computations when the total number of objects and total number of occurrences of the attributes are relatively large. Further, the use of the laws of logarithms to reduce products and quotients to sums and differences, respectively, and exponentiation to a product will also save computing time.
- The probability of observing what is observed given a random distribution indicates the possibility of observing certain number of occurrences of an attribute in a given cluster if the attribute is randomly distributed over all clusters. The lower the probability, the further the attribute distribution deviates from randomness. Described below are the exact method and approximate method for calculating this probability.
-
Equation 1 provides the exact method.Equation 1 is the discrete density function for a random variable having a hypergeometric distribution. The numerator consists of the product of two terms. The first term calculates how many ways to choose exactly m attributes out of M possible for the cluster of interest; the second term calculates the ways to assign the other (n−m) attributes which are not in the cluster of interest to the other clusters collectively. The denominator calculates the total number of ways to assign N objects to a cluster of size n.
where N: total number of objects in the data set - M: total number of occurrence of the attribute
- n: number of objects in the given cluster
- m: number of occurrences of the attribute in the given cluster
-
Equation 2 provides the approximate method.Equation 2 is the discrete density function for a random variable having a binomial distribution, where the probability of a success is M/N and the probability of failure is (1−M/N). When N and M are large, (N−n)/(N−1) is close to one; thus,Equation 2 provides a reasonably good approximation to the hypergeometric distribution. N, M, n, and m denote the same quantities as defined above inEquation 1. - Alternatively, the association probability can be represented as a measure of an unusual number of occurrences, which is a deviation of observed occurrence from the expected occurrence if the attribute is randomly distributed over all clusters. An exact method (Equation 3) or an approximate method (Equation 4) can be used. N, M, n, and m denote the same quantities in
Equation 1. Note that the expectation is the sum over the range of the random variable of x of x multiplies p(x).Equation 3 uses hypergeometric distribution andEquation 4 uses a binomial method, similar toEquations Equation 4. - The deviation from expected occurrence can be measured using ether ratio or difference of the observed number of occurrences over (or from) the expected number of occurrences. The range of the ratio is between zero and infinity. A ratio value further away from 1 indicates a larger deviation from randomness.
- Alternatively to make the deviation more comparable for various sizes of clusters, the difference between observed and expected occurrences is divided by the size of the cluster (Equation 6). Therefore, the range of this deviation measure is normalized between −1 and 1. A value further away from zero indicates a larger deviation from randomness.
- While the order of attributes along the columns and the order of rows or clusters along the columns of the matrix can be selected by the user, using a menu item or by dragging rows and columns to new positions, for example, the order of the records or the order of the clusters is preferably automatically set to same correlation order as known to those skilled in the art. The default display for attributes is based on correlation order, with the attribute having the highest column sum being on the left-hand side.
- Thus, visualizations for the record vs. attributes query type is explained. The processing involved in creating the query matrix and visualization for the remaining query types is similar to the process of records vs. attributes query type.
- If the user selects an attribute vs. attributes query type in
step 230, as shown inFIG. 22B , an attributes vs. attributes querydialog box 2260 is displayed. The attributes vs. attributes query type is not interested in occurrences with specific records, only in defining the associations among attributes. -
Query dialog box 2260 operates similarly to records vs. attributequery dialog box 2200, except that the user will be specifying two sets of attributes (vocabulary words or categories). - When querying the database in
step 2040 and creating the query matrix instep 2050, the matrix cell scores are generated as a cumulative measure of the number of records that contain both test attributes. Then, the score should be normalized against the number of records. In other words, for n records, i row attributes, and j column attributes:for row_attribute = 1 to i for column_attribute = 1 to j score(i,j)=0 for record=1 to n if record contains both row_attribute(i) and column_attribute(j), then score(i,j)=score(i,j)+1 next record norm_score(i,j)=score(i,j)/n next column_attribute next row_attribute - Also, the total number of records that have each attribute is counted so that deviation from expected frequency can be calculated.
- In
step 2060, the attribute vs. attribute visualization follows the same mechanics as for records vs. attributes, but with a few differences. Specifically, in the default view for the attributes vs. attributes visualization, the default order for both axes would be the correlation order, with the column with the highest total score (e.g., the highest average value) on the top or left, and the default mode for showing associations uses deviation from expectation using with lower than expected values shown as cool colors (blue (=−1) to green) and higher than expected shown as hot colors (inverted black body with red=1). - Another use of the multiple query tool visualization is rapid assessment of the correlation between the current experiment being analyzed and historical data. Such a visualization points to the similarities or differences for all equivalent data points (record and condition).
- As shown in
FIG. 22C , a current data vs. historical data querydialog box 2270 is displayed when the user selects such a visualization. A file containing a data matrix is used as the historical data. In other words, the user would select the files of a prior visualization. Alternatively, a data matrix, similar to those currently used to input data into the numerical engine, could be designated. - In
step 2040, the method determines where the current and historical experiments overlap. For example, if the current experiment containsrecords 1 through 10 and the historical experiment containsrecords 1 through 5 andrecords 8 through 12, then correlations would only be performed with thecommon records 1 to and 8 to 10. Similarly, if the current experiment used conditions (components) A through E (e.g., 5 time points or distinct treatments) and the historical experiment used conditions A, C, D, and F, then the correlation would be calculated only using the common conditions A, C, and D. - In
step 2050, a query data matrix would then be created comparing the common entries. For record1, a correlation with the historical data set would be performed using all the common conditions (intersection). In the example given, this would be a correlation between current_record1(A,C,D) and historical_record1(A,C,D). A similar score would be derived for each record present in both data sets. For a record in the current data set that is not present in the historical set, the query matrix would be blank (or set to some flag). The calculations would be repeated for each historical set requested. - In
step 2060, the query matrix is visualized as follows. The color code in each cell is based on the correlation of that record to its counterpart in the historical data. The correlation values will range from −1 to +1 and be presented using, for example, a modified rainbow with negative correlations being cool colors (blue=−1) and positive correlations being hot colors (red=1). For records that are not shared with the historical data set, the matrix cell should have no color (or be colored the same as the background) or, alternatively, these cells can be hidden. If the cells not shared with the historical data set are shown, the degree of overlap between the current and the historical data sets can be visualized. This visualization could also be selected as a separate visualization that shows the overlap, for example by using a gray-scale color code in the matrix, where black indicates full overlap with the historical data components and white indicates no overlap. This query type would also be useful with other data mining tools. - Instead of comparisons of the records of the current and historical data, cluster assignments from one experiment to the next, even when the experiment types are quite different, can be compared. Preferably, for each record in a current data cluster, the method can assess what fraction of other current cluster records exist in the same cluster in the historical set. Then, an average of the results from each current cluster record to is computed to get a score for that cluster. Another example assesses, for each record in a current cluster, what fraction of other current cluster records are found in the historical data within x Euclidean distance. An interactive slider would allow the user to change x and the method would allow viewing of the results dynamically.
- When records are combined into clusters, the overall value for the cluster will be represented as the average or other statistical measure, such as median of the record correlations, based only on those records that are common between the data sets. An indication of variation is provided since a cluster that contains 10 records with a correlation of 0.8 and a cluster that contains 10 records with a correlation of 0.9 and 1 with a correlation of −1 (both cluster with average of 0.8) may be of different interest to the user. Such an indication can be achieved using multiple visualizations, for example by duplicating the previous query, that simultaneously show the average and the standard deviation, the minimum value or the maximum value.
- The default order of clusters and records in this visualization should be the same as in the records vs. attributes query tool. In addition, a row is added that summarizes the comparison of the entire current data against each historical data set. For example, a row labeled “Summary” will be the average of all record correlations.
- Alternatively, the user or system could identify specific records to group together at the top of the visualization. For example, all the controls could be grouped together as opposed to in separate clusters. Also, while only one set each of current and historical data is used, several sets data could be visualized contemporaneously. That is, any one of the data sets is treated as the prototype against which others are measured. A slider bar having each visualization would allow the user to run through multiple experiments. The progress through the slider (data sets) could be semiautomated to play like a movie, stopping whenever certain similarities or dissimilarities are found.
- The ‘current data vs. literature/expert knowledge’ query is similar to the other queries. Correlations between the current data and the literature or expert knowledge are defined either as what records have previously been found to group together or as similarity to actual published/historical values.
- Regardless of the query type, the visualization, as shown in
FIG. 19 , will be displayed in an interactive area of a display screen, so that the user may adapt the visualization to her preferences. - For example, to provide commands, the visualization could include a menu bar and a toolbar. A
menu bar 1010, with associated sub-menus, of the visualization could include the features shown inFIG. 28 . - The Duplicate command in the File menu of menu bar 2810 allows access to previously stored queries, so that the user can either re-run or adjust a previously run multiple query. The other commands in the File order are self-explanatory.
- The Row Order menu of menu bar 2810 provides option for organizing the records, clusters, or row attributes. The Cluster from View command results in a correlation ordering for the records and clusters (if correlation ordering was not done for the view, then it is also not done here in the default), as discussed above this ordering is the default for a records vs. attributes query type or a current data vs. historical data query type. The Correlation with Columns command is an option for recalculating the cluster order based on the values in the query matrix. In a cluster view, records would remain with their cluster and the clusters are reordered according to correlation ordering. If a cluster was expanded to show records, the records in the cluster would be reordered according to correlation ordering. As discussed above, for an attributes vs. attributes query, correlation with columns is the default.
- The Advanced sub-menu of the Row Order menu allows access to the following commands. The Cluster Based on Column Values command recalculates the clustering of the records or the attributes using the scores along the row as the vectors for clustering. The user would have the choice of using any clustering algorithm, such as either the hierarchical or partition methods. The Sum command is an option to order the records or attributes based on the sum of the scores across the row, with the record/attribute with the highest sum being at the top and the lowest being at the bottom, for example. Rows having a value below a predetermined threshold could be placed in a low value row or removed from the visualization matrix. The Sum command is not valid for visualization using clusters and would be deactivated. The File Order sets the order of clusters or attributes to that specified by the user, for example in an input file. If no file is provided or record rows are selected, this option would be deactivated.
- The Column Order menu of menu bar 2810 provides analogous options as the Row Order menu for organizing the column attributes, expect that there will be no clustering from the view, as records and clusters do not appear in the columns, in one aspect of the present invention.
- To provide the user the ability to choose a custom coloring scheme, the Color menu of menu bar 2810 permits a selection of display colors within the multiple query tool.
- A tool bar is also provided in the visualization, either as a separate pop-up area or a bar, for example, located below a status bar, to provide access to functions with a single click.
FIG. 29 illustrates examples of functions of a tool bar. - The RecordViewer function displays the currently highlighted record (or records in the highlighted cluster). For a record vs. attribute cell, this shows the single record with the specific attribute highlighted in the record. For a cluster vs. attribute cell, the RecordViewer shows all the records in that cluster with the specific attribute highlighted in the records. For an attribute vs. attribute cell, the RecordViewer would display all records that contain both attributes, with both attributes highlighted. To access the records, the RecordViewer calls a process that parses the data source file in the galaxy cluster view. An interpretation tool, such as the plot data tool, could also be provided. A double click on a cell can also call the RecordViewer function.
- The Zoom function operates similarly to a zoom in the galaxy visualization. Primarily, the zoom will zoom out, so that an overview of a large multiple query tool can be obtained. The maximum zoom out should be based on the number of records and a user's desired minimum resolution, so that the colors of the visualization will be readily discernable. A possible default size for a cell in the multiple query tool is 12 by 12 pixels. This is large enough to display text labels at 10 point Helvetica for both rows and columns. Zooming out would provide an overview for large data sets. The Zoom Reset function returns the visualization to its default size.
- The Pan function takes the form of a hand and allows the user to drag the graphic around the window, so that area hidden by display objects or the physical dimensions of a display screen can be viewed. Scroll bars, as shown in the multiple query tool above, could be employed instead of, or in addition to, the Pan tool. Nevertheless, labels for the rows and columns would always remain visible.
- The Expand Row Clusters and Expand Column Clusters functions open the selected cluster(s) to display all their records or attributes as separate rows. If no clusters are selected, all clusters are expanded. If no clusters are defined (either from the associated view or by having done a cluster ordering within the multiple query tool), these functions are deactivated.
- The Collapse Row Clusters and Collapse Column Clusters functions closes the cluster that contains the selected record(s) or attribute(s). If no record or attribute is selected, all clusters are collapsed. If no clusters are defined (either from the associated view or by having done a cluster ordering within the multiple query tool, these functions are deactivated. Although not illustrated in
FIG. 29 , a single button could also collapse all row and columns with a deviation from expectation between, e.g., −0.5 and +0.5 (or other definable range) into a single group or remove rows and columns that do not have values above a predetermined threshold. - The Orient Rows vs. Values and Orient Columns vs. Values functions orient the visualization so that the view is perpendicular to the row axis or column axis, respectively. This provides views of the 2-D scatterplot, as shown in
FIG. 27 , for example. The Reset Orientation function orients the visualization to the default ‘overhead’ view showing rows vs. columns. - The Spacing Toggle function toggles the matrix between the two types of views shown in
FIGS. 30A and 30B . Providing a grid as shown inFIG. 30A allows viewing of cells as discrete entities, for easier selection. Removing the grid, as shown inFIG. 12B , allows more information to be compressed into the same space and could improve enhance structure distinctions in the visualization matrix. - In addition to the command bars, the visualization area itself, as shown in
FIG. 19 , consists not only of the colored visualization matrix, but also includes labels for the rows and columns. - When the rows are records, the row labels are the record titles. Since record titles may be long, the initial substantially 20 characters could be displayed with a scroll bar or pop-up function to enable viewing of all of the characters. When collapsed into clusters, the row labels are labeled by cluster number. For attributes, the categorical value or vocabulary word itself serve as the label. In addition to the labels themselves, the rows and columns could have a master label indicating the content. For records as rows, the label would say “RECORDS.” For vocabulary words input directly in the initial dialog box, the label would be “VOCABULARY”. For vocabulary words input through a file, the label would be the file name. For categories as attributes, the field name would be shown. If multiple fields were requested, each field name would be shown, centered over its collection of row or column labels. The user could also edit or define the row, column, and major labels.
- Rows and columns are selected and highlighted by clicking on the row and column labels using a mouse input device, for example. Shift-clicking and control-clicking can be used to select multiple labels.
- The visualization is preferably interactive. In addition to highlighting labels for selecting rows and columns, clicking on a cell should display key information regarding the cell. This pop-up information would be context sensitive, depending on the type of query and whether the cell represents an individual record or attribute as opposed to a cluster or group. The following provide suggested formats of the key attributes of a cell of the different groups and query types:
- For a cell intersecting a record and attribute in a records vs. attributes query:
Row: Record_name Column: Column_attribute_name Co-occurrence: 0 ( or 1) Attribute found in ##/total_rows records - For a cell intersecting a cluster and attribute in a records vs. attributes query:
Row: Cluster# containing ## members Column: Column_attribute_name Co-occurrences: ## Number of co-occurrences expected: ## Deviation from expected co-occurrence: ## Probability of observation: ## - For a cell intersecting an attribute and attribute in an attributes vs. attributes query:
Row: Row_attribute_name Column: Column_attribute_name Co-occurrences: ## Row attribute found in ##/total_columns columns Column attribute found in ##/total_rows rows Number of co-occurrences expected: ## Deviation from expected co-occurrence: ## Probability of observation: ## - For the cell intersecting a record and historical data in a current data vs. historical
data query: Probability of observation: ## Row: Record_name Column: historical_experiment_name Correlation: ## (if this record does not intersect with historical data, ‘no intersection’) - For the cell intersecting a cluster and historical data in a current data vs. historical
data query: Probability of observation: ## Row: Record_name Column: historical_experiment_name Average Correlation: ## (if this cluster does not contain any genes that intersect with historical data this should say ‘no intersection’) Maximum Correlation: ## with record_name Minimum Correlation: ## with record_name Records that do not intersect historical data(could be a scrollable list): record_name1 record_name5... - Systems and methods consistent with the present invention employ an open architecture that enables different types of data to be used for analysis and visualization.
- It will be understood by those skilled in the art that various changes and modifications may be made, and equivalents may be substituted for elements thereof without departing from the true scope of the invention.
- Modifications may be made to adapt a particular element, technique, or implementation to the teachings of the present invention without departing from the spirit of the invention. For example, any genetic material, from organism to microbe, could be represented using the context vectors of the present invention. Further, the present invention is not limited to genetic material, and any material or energy could also be represented. Additionally, the rows and columns used in the description are illustrative only, and, for example, records could be placed along the columns. Also, the attributes used are not limited to text and categorical features. Numerical values could be set as attributes, for example using binning where adjacent ranges of numbers are defined. Additionally, for queries against individual records, categorical data could be presented in a single column rather than multiple columns for each categorical value as described above; in this case, the occurrence of a specific categorical value as described above; in this case, the occurrence of a specific categorical value could be represented as a specific color. The resulting matrix could also be dynamically controllable by the user. The order of rows or columns could be adjusted by dragging or sorted according to the information within the row or column.
- Moreover, although the described implementation includes software, the invention may be implemented as a combination of hardware and software or in hardware alone. Additionally, although aspects of the present invention are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on other types of computer-readable media, such as secondary storage devices, like hard disks, floppy disks, or CD-ROM; a carrier wave from the Internet; or other forms of memory.
- Therefore, it is intended that this invention not be limited to the particular embodiment and method disclosed herein, but that the invention include all embodiments falling within the scope of the appended claims.
-
APPENDIX A Example Data Set Properties File CORPUS_TYPE=1 VIEW=protein.aa\ gene.expression source_file_0.com.bmi.vision.api.FastaDataFile.format= source_file_class_0=com.bmi.vision.api.FastaDataFile source_file_0. com.bmi.vision.api.FastaDataFile.fullpath=/home/battelle/omniviz_data/ sources/yeast.fasta number_sources=1 -
>MJ0001 aspartate aminotransferase MISSRCKNIKPSAIREIFNLATSDCINLGIGEPDFDTPKHIIEAAKRALDEGKTHYSPNN GIPELREEISNKLKDDYNLDVDKDNIIVTCGASEALMLSIMTLIDRGDEVLIPNPSFVSY FSLTEFAEGKIKNIDLDENFNIDLEKVKESITKKTKLIIFNSPSNPTGKVYDKETIKGLA EIAEDYNLIIVSDEVYDKIIYDKKHYSPMQFTDRCILINGFSKTYAMTGWRIGYLAVSDE LNKELDLINNMIKIHQYSFACATTFAQYGALAALRGSQKCVEDMVREFKMRRDLIYNGLK DIFKVNKPDGAFYIFPDVSEYGDGVEVAKKLIENKVLCVPGVAFGENGANYIRFSYATKY EDIEKALGIIKEIFE >MJ0002 MEIFMEVPIFVVISGSDLYGIPNPSDVDIRGAHILDRELFIKNCLYKSKEEEVINKMFGK CDFVSFELGKFLRELLKPNANFIEIALSDKVLYSSKYHEDVKGIAYNCICKKLYHHWKGF AKPLQKLCEKESYNNPKTLLYILRAYYQGILCLESGEFKSDFSSFRCLDCYDEDIVSYLF ECKVNKKPVDESYKKKIKSYFYELGVLLDESYKNSNLIDEPSETAKIKAIELYKKLYFED VRE >MJ0003 MKGKRIAIVSHRILNQNSVVNGLERAEGAFNEVVEILLKNNYGIIQLPCPELIYLGIDRE GKTKEEYDTKEYRELCKKLLEPIIKYLQEYKKDNYKFILIGIENSTTCDIFKNRGILMEE FFKEVEKLNIIIKAIEYPKNEKDYNKFVKTLEKMIK >MJ0004 activator of (R) -2-hydroxyglutaryl-CoA dehydratase MILGIDVGSTTTKMVLMEDSKIIWYKIEDIGVVIEEDILLKMVKEIEQKYPIDKIVATGY GRHKVSFADKIVPEVIALGKGANYFFNEADGVIDIGGQDTKVLKIDKNGKVVDFILSDKC AAGTGKFLEKALDILKIDKNEINKYKSDNIAKISSMCAVFAESEIISLLSKKVPKEGILM GVYESIINRVIPMTNRLKIQNIVFSGGVAKNKVLVEMFEKKLNKKLLIPKEPQIVCCVGA ILV >MJ0005 formate dehydrogenase, beta subunit MKYVLIQATDNGILRRAECGGAVTALFKYLLDKKLVDGVLALKRGEDVYDGIPTFITNSN ELVETAGSLHCAPTNFGKLIAKYLADKKIAVPAKPCDAMAIRELAKLNQINLDNVYMIGL NCGGTISPITAMKMIELFYEVNPLDVVKEEIDKGKFIIELKNGEHKAVKIEELEEKGFGR RKNCQRCEIMIPRNADLACGNWGAEKGWTFVEICSERGRKLVEDAEKDGYIKIKQPSEKA IQVREKIESIMIKLAKKFQKKHLEEEYPSLEKWKKYWNRCIKCYGCRDNCPLCFCVECSL EKDYIEEKGKIPPNPLIFQGIRLSHISQSCINCGQCEDACPMDIPLAYIFHRMQLKIRDT LGYIPGVDNSLPPLFNIER >MJ0006 formate dehydrogenase, alpha subunit MKVVHTICPGCSVGCGIDLIVKDDKVVGTYPYKRHPINEGKNCSNGKNSYKIIYHEKRLK KPLIKKNGKLVEATWDEALSFIAEKLKNYNADDITFIASGKCTNEDNYALKKLVDSLKAK IGHCICNSPKVNYAEVSTTIDDIENAKNIIIIGDVFSEHALIGRKVIKAKEKGSKVTIFN TEEKEILKLNADEFVKVDSYLGVDLSNVDKNTIIIINAPVNVDEIIKTAKENKAKVLPVA KHCNTVGATLIGIPALNKDEYFELLKNSKFLYIMGENPALVDKDVLKNVEFLVVQDIIMT ETAEMADVVLPSTCWAEKDGTFINTDKRIQKINKAVNPPGDAMDDWLIIKSLAEKLGSDL GFNSLEDIQQDIHRNKLL >MJ0007 2-hydroxyglutaryl-CoA dehydratase, subunit beta MMKLKAIEKLMQKFASRKEQLYKQKEEGRKVFGMFCAYVPIEIILAANAIPVGLCGGKND TIPIAEEDLPRNLCPLIKSSYGFKKAKTCPYFEASDIVIGETTCEGKKKMFELMERLVPM HIMHLPHMKDEDSLKIWIKEVEKLKELVEKETGNKITEEKLKETVDKVNKVRELFYKLYE LRKNKPAPIKGLDVLKLFQFAYLLDIDDTIGILEDLIEELEERVKKGEGYEGKRILITGC PMVAGNNKIVEIIEEVGGVVVGEESCTGTRFFENFVEGYSVEDIAKRYFKIPCACRFKND ERVENIKRLVKELDVDGVVYYTLQYCHTFNIEGAKVEEALKEEGIPIIRIETDYSESDRE QLKTRLEAFIEMI >MJ0008 MFCGSMIAICMRSKEGFLFNNKLMDWGLHYNPKIVKDNNIIGYHAPILDLDKKESIIILK NIIENIKGRDYLTIHLHNGKYGKINKETLIENLSIVNEFAEKNGIKLCIENLRKGFSSNP NNIIEIADEINCYITFDVGHIPYNRRLEFLEICSDRVYNSHVYEIEVDGKHLPPKNLNNL KPILDRLLDIKCKMFLIELMDIKEVLRTERMLKDYLEMYR >MJ0009 MIFNENTPNFIDFKESFKELPLSDETFKIIEENGIKLREIAIGEFSGRDSVAAIIKAIEE GIDFVLPVVAFTGTDYGNINIFYKNWEIVNKRIKEIDKDKILLPLHFMFEPKLWNALNGR WVVLSFKRYGYYRPCIGCHAYLRIIRIPLAKHLGGKIISGERLYHNGDFKIDQIEEVLNV YSKICRDFDVELILPIRYIREGKKIKEIIGEEWEQGEKQFSCVFSGNYRDKDGKVIFDKE GILKMLNEFIYPASVEILKEGYKGNFNYLNIVKKLI >MJ0010 phosphonopyruvate decarboxylase MRAILILLDGLGDRASEILNNKTPLQFAKTPNLDRLAENGMCGLMTTYKEGIPLGTEVAH FLLWGYSLEEFPGRGVIEALGEDIEIEKNAIYLRASLGFVKKDEKGFLVIDRRTKDISRE EIEKLVDSLPTCVDGYKFELFYSFDVHFILKIKERNGWISDKISDSDPFYKNRYVMKVKA IRELCKSEVEYSKAKDTARALNKYLLNVYKILQNHKINRKRRKLEKMPANFLLTKWASRY KRVESFKEKWGMNAVILAESSLFKGLAKFLGMDFIKIESFEEGIDLIPELDYDFIHLHTK ETDEAAHTKNPLNKVKVIEKIDKLIGNLKLREDDLLIITADHSTPSVGNLIHSGESVPIL FYGKNVRVDNVKEFNEISCSNGHLRIRGEELMHLILNYTDRALLYGLRSGDRLRYYIPKD DEIDLLEG - RECORDKEY
- Title: Effect of metabisulphite on sporulation and alkaline phosphatase in Bacillus subtilis and Bacillus cereus
- Date: 1990
- The effect of metabisulphite on spore formation and alkaline phosphatase activity/production in Bacillus subtilis and Bacillus cereus was investigated both in liquid and semi-solid substrates. While supplementary nutrient broth (SNB) and sporulation medium (SM) were used as the liquid growth media, two brands of powdered milk were used as the food (semi-solid) substrates. Under both aerobic and anaerobic conditions, B. subtilis was more resistant to metabisulphite than B. cereus while the level of enzyme production and spores formed were generally higher under aerobic than anaerobic conditions. The metabisulphite concentrations required to inhibit spore production as well as alkaline phosphatase synthesis/activity were found to be relatively low and well within safety levels for human comsumption. It is concluded that metabisulphite is an effective anti-sporulation agent and a recommendation for its general use in semi-solid and liquid foods is proposed.
- RECORDKEY
- Title: Effects of replacing saturated fat with complex carbohydrate in diets of subjects with NIDDM
- Date: 1989
- This study examined the safety of an isocaloric high-complex carbohydrate low-saturated fat diet (HICARB) in obese patients with non- insulin- dependent diabetes mellitus (NIDDM). Although hypocaloric diets should be recommended to these patients, many find compliance with this diet difficult; therefore, the safety of an isocaloric increase in dietary carbohydrate needs assessment. Lipoprotein cholesterol and triglyceride (TG,mg/dl) concentrations in isocaloric high-fat and HICARB diets were compared in 7 NIDDM subjects (fat 32 +/−3 %, fasting glucose 190 +/−38 mg/dl) and 6 nondiabetic subjects (fat 33 +/−5%). They ate a high-fat diet (43% carbohydrate; 42% fat, polyunsaturated to saturated 0.3 ; fiber 9 g/1000 kcal;
cholesterol 550 mg/day) for 7-10 days. Control subjects (3 NIDDM, 3 nondiabetic) continued this diet for 5 wk. The 13 subjects changed to a HICARB diet (65% carbohydrate; 21% fat, polyunsaturated to saturated 1.2; fiber 18 g/1000 kcal;cholesterol 550 mg/day) for 5 wk. NIDDM subjects on the HICARB diet had decreased low-density lipoprotien cholesterol (LDL-chol) concentrations (107 vs. 82, P less than .001), but their high-density lipoprotein cholesterol (HDL-chol) concentrations, glucose, and body weight were unchanged. Changes in total plasma TG concentrations in NIDDM subjects were heterogeneous. Concentrations were either unchanged or had decreased in 5 and increased in 2 NIDDM subjects. Nondiabetic subjects on the HICARB diet had decreased LDL-chol (111 vs. 81, P less than .01) and unchanged HDL-chol and plasma TG concentrations). (ABSTRACT TRUNCATED AT 250 WORDS) - RECORDKEY
- Title: Enteral feeding of dogs and cats: 51 cases (1989-1991)
- Date: 1992
- Feeding commercial enteral diets to critically ill dogs and cats via nasogastric tubes was an appropriate means for providing nutritional support and was associated with few complications. Twenty-six cats and 25 dogs in the intensive care unit of out teaching hospital were evaluated for malnutrition and identified as candidates for nutritional support via nasogastric tube. Four commercial liquid formula diets and one protein supplement designed for use in human beings were fed to the dogs and cats. Outcome variables used to assess efficacy and safety of nutritional support were return to voluntary food intake, maintenance of body weight to within 10% of admission weight, and complications associated with feeding liquid diets. Sixty-three percent of animals experienced no complications with enteral feedings; resumption of food intake began for most animals (52%) while they were still in the hospital. Weight was maintained in 61% of the animals (16 of 26 cats and 15 of 25 dogs). Complications that did occur included vomiting, diarrhea, and inadvertent tube removal. Most problems were resolved by changing the diet or adhering to the recommended feeding protocol. Nutritional support as a component of the therapy in small animals often is initiated late in the course of the disease when animals have not recovered as quickly as expected. If begun before the animal becomes nutrient depleted, enteral feeding may better support the animal and avoid serious complications.
- Title: Microbiology of fresh and restructured lamb meat: a review
- Date: 1995
- Microbiology of meats has been a subject of great concern in food science and public health in recent years. Although many articles have been devoted to the microbiology of beef, pork, and poultry meats, much less has been written about microbiology of lamb meat and even less on restructured lamb meat. This article presents data on microbilogy and shelf-life of fresh lamb meat; restructured meat products, restructured lamb meat products, bacteriology of restructured meat products, and important foodborne pathogens such as Salmonella, Escherichia coli 0157:H7, and Listeria monocytogenes in meats and lamb meats. Also, the potential use of sodium and potassium lactates to control foodborne pathogens in meats and restructured lamb meat is reviewed This article should be of interest to all meat scientists, food scientists, and public health microbiologists who are concerned with the safety of meats in general and lamb meats in particular.
- RECORDKEY
- Title: Hyperacute stroke therapy with tissue plasminogen activator
- Date: 1997
- The past year has seen tremendous progress in developing new therapies aimed at reversing the effects of acute stroke. Thrombolytic therapy with various agents has been extensively studied in stroke patients for the past 7 years. Tissue plasminogen activator (t-PA) received formal US Food and Drug Administration approval in June 1996 for use in patients within 3 hours of onset of an ischemic stroke. Treatment with t-PA improves neurologic outcome and functional disability to such a degree that, for every 100 stroke patients treated with t-PA, an additional 11-13 will be normal or nearly normal 3 months after their stroke. The downside of t-PA therapy is a 6% rate of symptomatic intracerebral hemorrhage (ICH) and a 3% rate of fatal ICH. Studies are under way to determine whether t-PA can be administered with an acceptable margin of safety within 5 hours of stroke, to evaluate the therapeutic benefits of intraarterial prourokinase, and to assess the use of magnetic resonance spectroscopy to identify which patients are most likely to benefit from thrombolysis. Combination thrombolytic- neuroprotectant therapy is also being studied. In theory, patients could be given an initial dose of a neuroprotectant by paramedics and receive thrombolytic therapy in the hospital. We are now entering an era of proactive, not reactive, stroke therapies. These treatments may reverse some or all acute stroke symptoms and improve functional outcomes.
- RECORDKEY
- Title: A 12-month study of policasonal oral toxicity in Sprague Dawley rats
- Date: 1994
- Policosanol is a natural mixture of higher aliphatic primary alcohols. Oral toxicity of policosanol was evaluated in a 12-month study in which doses from 0.5 to 500 mg/kg were given orally to Sprague Dawley (SD) rats (20sex/group) daily. There was no treatment-related toxicity. Thus, effects on body weight gain, food consumption, clinical observations, blood biochemistry, hematology, organ weight ratios and histopathological findings were similar in control and treated groups. This study supports the wide safety margin of policosanol when administered chronically.
-
APPENDIX D NAME= NumericEngine DESC= Format for source file produced by numeric engine. END_DESC DELIMITER= RECORDKEY ||F0 NAME= Title TYPE= STRING TAG= TITLE: METHOD= LINES:1 DOC_VECTOR= TRUE SEARCH= TRUE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F1 NAME= Components TYPE= STRING TAG= COMPONENTS: METHOD= LINES:1 DOC_VECTOR= TRUE SEARCH= FALSE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F2 NAME= ChipData TYPE= NUMERIC TAG= ChipData: METHOD= LINES:1 DOC_VECTOR= FALSE SEARCH= FALSE CORR= FALSE CASE_SENSITIVE= FALSE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F3 NAME= SGD_Name TYPE= STRING TAG= SGD_Name: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F4 NAME= Description TYPE= STRING TAG= Description: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F5 NAME= Location TYPE= STRING TAG= Location: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F6 NAME= Deletion TYPE= STRING TAG= Deletion: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= TRUE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F7 NAME= Peak TYPE= STRING TAG= Peak: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= TRUE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F8 NAME= MCB_sites TYPE= STRING TAG= MCB_sites: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= TRUE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F9 NAME= SFF_sites TYPE= STRING TAG= SFF_sites: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= TRUE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F10 NAME= Swi5e_sites TYPE= STRING TAG= Swi5e_sites: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= TRUE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT ||F11 NAME= Sequence— TYPE= STRING TAG= Sequence_: METHOD= NEXT_TAG DOC_VECTOR= TRUE SEARCH= TRUE CORR= FALSE CASE_SENSITIVE= TRUE WHOLE_BOUNDARY= FALSE LINEPOS= FLOAT
Claims (25)
1. A method of interactively displaying records and their corresponding attributes, comprising:
generating a first 2-D chart for a first record, wherein at least two attributes associated with the first record are shown along one axis, and wherein the values of the attributes are shown along the other axis;
receiving input from a user selecting the first record on the first 2-D chart;
analyzing an index to determine if the first record is shown in another view; and
if the first record is shown in another view, altering the visual representation of the first record in the another view based on the user input.
2. The method of claim 1 , wherein the first 2-D chart is a line chart.
3. The method of claim 1 , wherein the first 2-D chart is a scatter chart.
4. The method of claim 1 , wherein the user can select the scale of the axes.
5. The method of claim 1 , wherein the another view comprises a galaxy view of groups of records.
6. The method of claim 1 , further comprising generating a second 2-D chart for a second record, wherein at least two attributes associated with the second record are shown along one axis, and wherein the values of the attributes are shown along the other axis.
7. The method of claim 6 , wherein the first 2-D chart is shown in a first color and the second 2-D chart is shown in a second color.
8. The method of claim 6 , wherein the second 2-D chart is superimposed upon the first 2-D chart.
9. The method of claim 6 , further comprising:
displaying text-based descriptions of the first and second records;
receiving input from the user selecting a text-based description; and
highlighting the 2-D chart of the record corresponding to the selected description.
10. The method of claim 6 , further comprising:
displaying text-based descriptions of each attribute shown in the first and second 2-D charts;
receiving input from the user selecting a text-based description; and
highlighting the attributes and values in the 2-D chart that correspond to the description.
11. The method of claim 6 , further comprising generating a third 2-D chart, wherein at least two attributes associated with the first and second records are shown along one axis, and wherein statistical values of the attributes are shown along the other axis.
12. The method of claim 11 , wherein the statistical values comprise average values.
13. The method of claim 11 , wherein the statistical values comprise median values.
14. The method of claim 1 , further comprising displaying a text-based identification of the record selected by the user.
15. The method of claim 14 , further comprising:
receiving input from a user pointing to a portion of the 2-D chart; and
displaying a text-based identification of the attribute and value corresponding to the pointed portion.
16. The method of claim 1 , further comprising:
receiving input from a user selecting a record in another view;
analyzing an index to determine if the record is shown in the 2-D line chart; and
if the record is shown in the 2-D line chart, altering the visual representation of the record in the 2-D line chart.
17. A method of interactively displaying records and their corresponding attributes, comprising:
selecting a record and its associated attributes, wherein the associated attributes are any combination of numeric, categoric, sequence, and text information; converting the associated attributes into numeric values; and
generating a 2-D chart for the record, wherein at least two attributes associated with the record are shown along one axis, and wherein the values of the attributes are shown along the other axis.
18. A method of interactively displaying records and their corresponding attributes, comprising:
generating a 2-D scatter chart that depicts a plurality of records;
generating a 2-D line chart for a group of records contained in a portion of the 2-D scatter chart, wherein at least two attributes associated with the group of records are shown along one axis, and wherein a statistical value for each of the at least two attributes is shown along the other axis; and
superimposing the 2-D line chart at a location on the 2-D scatter chart that is based on the location of the group of records on the 2-D scatter chart.
19. The method of claim 18 , wherein the statistical value is an average value.
20. The method of claim 18 , wherein the statistical value is a median value.
21. The method of claim 18 , wherein the portion is a quadrant.
22. The method of claim 18 , wherein the portion is a cluster.
23. The method of claim 18 , further comprising selecting a color for the 2-D line chart based on user-defined criteria.
24. The method of claim 18 , further comprising selecting a size for the 2-D line chart based on user-defined criteria.
25. A method of interactively displaying records and their corresponding attributes, comprising:
selecting a set of records and their associated attributes, wherein the associated attributes are any combination of numeric, categoric, sequence, and text information;
converting the associated attributes into numeric values;
generating a first chart that depicts the set of records;
generating a second chart for a subset of records depicted in the first chart, wherein at least two attributes associated with the subset of records are shown along one axis, and wherein a statistical value for each of the at least two attributes is shown along the other axis; and
superimposing the second chart at a location on the first chart that is based on the location of the subset of records on the first chart.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/282,567 US20060093222A1 (en) | 1999-09-30 | 2005-11-21 | Data processing, analysis, and visualization system for use with disparate data types |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/410,367 US6990238B1 (en) | 1999-09-30 | 1999-09-30 | Data processing, analysis, and visualization system for use with disparate data types |
US11/282,567 US20060093222A1 (en) | 1999-09-30 | 2005-11-21 | Data processing, analysis, and visualization system for use with disparate data types |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/410,367 Division US6990238B1 (en) | 1999-09-30 | 1999-09-30 | Data processing, analysis, and visualization system for use with disparate data types |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060093222A1 true US20060093222A1 (en) | 2006-05-04 |
Family
ID=23624417
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/410,367 Expired - Lifetime US6990238B1 (en) | 1999-09-30 | 1999-09-30 | Data processing, analysis, and visualization system for use with disparate data types |
US11/282,567 Abandoned US20060093222A1 (en) | 1999-09-30 | 2005-11-21 | Data processing, analysis, and visualization system for use with disparate data types |
US11/282,569 Abandoned US20060106783A1 (en) | 1999-09-30 | 2005-11-21 | Data processing, analysis, and visualization system for use with disparate data types |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/410,367 Expired - Lifetime US6990238B1 (en) | 1999-09-30 | 1999-09-30 | Data processing, analysis, and visualization system for use with disparate data types |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/282,569 Abandoned US20060106783A1 (en) | 1999-09-30 | 2005-11-21 | Data processing, analysis, and visualization system for use with disparate data types |
Country Status (6)
Country | Link |
---|---|
US (3) | US6990238B1 (en) |
EP (1) | EP1323069A2 (en) |
JP (2) | JP2003532943A (en) |
AU (1) | AU7741300A (en) |
CA (1) | CA2385836A1 (en) |
WO (1) | WO2001024060A2 (en) |
Cited By (106)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080015870A1 (en) * | 2003-05-30 | 2008-01-17 | Lawrence Benjamin Elowitz | Apparatus and method for facilitating a search for gems |
US20110310039A1 (en) * | 2010-06-16 | 2011-12-22 | Samsung Electronics Co., Ltd. | Method and apparatus for user-adaptive data arrangement/classification in portable terminal |
US20130113820A1 (en) * | 2011-03-15 | 2013-05-09 | Oracle International Corporation | Galaxy views for visualizing large numbers of nodes |
US20130227054A1 (en) * | 2010-11-17 | 2013-08-29 | Alibaba Group Holding Limited | Transmitting Product Information |
WO2013126281A1 (en) * | 2012-02-24 | 2013-08-29 | Lexisnexis Risk Solutions Fl Inc. | Systems and methods for putative cluster analysis |
US20140006382A1 (en) * | 2012-06-29 | 2014-01-02 | International Business Machines Corporation | Predicate pushdown with late materialization in database query processing |
US9087064B2 (en) * | 2011-10-27 | 2015-07-21 | International Business Machines Corporation | User-defined hierarchies in file system data sets |
EP2911100A1 (en) * | 2014-02-20 | 2015-08-26 | Palantir Technologies, Inc. | Relationship visualizations |
US20150261846A1 (en) * | 2014-03-11 | 2015-09-17 | Sas Institute Inc. | Computerized cluster analysis framework for decorrelated cluster identification in datasets |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9380431B1 (en) | 2013-01-31 | 2016-06-28 | Palantir Technologies, Inc. | Use of teams in a mobile application |
USD760761S1 (en) | 2015-04-07 | 2016-07-05 | Domo, Inc. | Display screen or portion thereof with a graphical user interface |
US9383911B2 (en) | 2008-09-15 | 2016-07-05 | Palantir Technologies, Inc. | Modal-less interface enhancements |
USD762690S1 (en) * | 2014-07-08 | 2016-08-02 | Jan Magnus Edman | Display screen with graphical user interface |
US9412141B2 (en) | 2003-02-04 | 2016-08-09 | Lexisnexis Risk Solutions Fl Inc | Systems and methods for identifying entities using geographical and social mapping |
US9424337B2 (en) | 2013-07-09 | 2016-08-23 | Sas Institute Inc. | Number of clusters estimation |
US9454281B2 (en) | 2014-09-03 | 2016-09-27 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
USD769916S1 (en) * | 2014-04-13 | 2016-10-25 | Jan Magnus Edman | Display screen with graphical user interface |
US9501851B2 (en) | 2014-10-03 | 2016-11-22 | Palantir Technologies Inc. | Time-series analysis system |
US9514200B2 (en) | 2013-10-18 | 2016-12-06 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US9547316B2 (en) | 2012-09-07 | 2017-01-17 | Opower, Inc. | Thermostat classification method and system |
US9558352B1 (en) | 2014-11-06 | 2017-01-31 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9619557B2 (en) | 2014-06-30 | 2017-04-11 | Palantir Technologies, Inc. | Systems and methods for key phrase characterization of documents |
US9646396B2 (en) | 2013-03-15 | 2017-05-09 | Palantir Technologies Inc. | Generating object time series and data objects |
USD789409S1 (en) * | 2014-08-26 | 2017-06-13 | Jan Magnus Edman | Display screen with graphical user interface |
US20170185666A1 (en) * | 2015-12-28 | 2017-06-29 | Facebook, Inc. | Aggregated Broad Topics |
USD792452S1 (en) * | 2014-05-14 | 2017-07-18 | Jan Magnus Edman | Display screen with graphical user interface |
US9727560B2 (en) | 2015-02-25 | 2017-08-08 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US9727063B1 (en) | 2014-04-01 | 2017-08-08 | Opower, Inc. | Thermostat set point identification |
US20170251980A1 (en) * | 2016-03-02 | 2017-09-07 | Roche Diabetes Care, Inc. | Patient diabetes monitoring system with clustering of unsupervised daily cgm profiles (or insulin profiles) and method thereof |
US9767172B2 (en) | 2014-10-03 | 2017-09-19 | Palantir Technologies Inc. | Data aggregation and analysis system |
US9804666B2 (en) | 2015-05-26 | 2017-10-31 | Samsung Electronics Co., Ltd. | Warp clustering |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US9823818B1 (en) | 2015-12-29 | 2017-11-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US9852205B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | Time-sensitive cube |
US9852195B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | System and method for generating event visualizations |
US9857958B2 (en) | 2014-04-28 | 2018-01-02 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US9870389B2 (en) | 2014-12-29 | 2018-01-16 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US9880987B2 (en) | 2011-08-25 | 2018-01-30 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US9891808B2 (en) | 2015-03-16 | 2018-02-13 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US9898335B1 (en) | 2012-10-22 | 2018-02-20 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US9953445B2 (en) | 2013-05-07 | 2018-04-24 | Palantir Technologies Inc. | Interactive data object map |
US9958360B2 (en) | 2015-08-05 | 2018-05-01 | Opower, Inc. | Energy audit device |
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US9984133B2 (en) | 2014-10-16 | 2018-05-29 | Palantir Technologies Inc. | Schematic and database linking system |
US9998485B2 (en) | 2014-07-03 | 2018-06-12 | Palantir Technologies, Inc. | Network intrusion data item clustering and analysis |
US10001792B1 (en) | 2013-06-12 | 2018-06-19 | Opower, Inc. | System and method for determining occupancy schedule for controlling a thermostat |
US10019739B1 (en) | 2014-04-25 | 2018-07-10 | Opower, Inc. | Energy usage alerts for a climate control device |
US10033184B2 (en) | 2014-11-13 | 2018-07-24 | Opower, Inc. | Demand response device configured to provide comparative consumption information relating to proximate users or consumers |
US10037383B2 (en) | 2013-11-11 | 2018-07-31 | Palantir Technologies, Inc. | Simple web search |
US10037314B2 (en) | 2013-03-14 | 2018-07-31 | Palantir Technologies, Inc. | Mobile reports |
US20180240256A1 (en) * | 2017-02-23 | 2018-08-23 | Wipro Limited. | Method and system for processing input data for display in an optimal visualization format |
US10067516B2 (en) | 2013-01-22 | 2018-09-04 | Opower, Inc. | Method and system to control thermostat using biofeedback |
US10074097B2 (en) | 2015-02-03 | 2018-09-11 | Opower, Inc. | Classification engine for classifying businesses based on power consumption |
US10180977B2 (en) | 2014-03-18 | 2019-01-15 | Palantir Technologies Inc. | Determining and extracting changed data from a data source |
US10180929B1 (en) | 2014-06-30 | 2019-01-15 | Palantir Technologies, Inc. | Systems and methods for identifying key phrase clusters within documents |
US10198515B1 (en) | 2013-12-10 | 2019-02-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US10198483B2 (en) | 2015-02-02 | 2019-02-05 | Opower, Inc. | Classification engine for identifying business hours |
US10216801B2 (en) | 2013-03-15 | 2019-02-26 | Palantir Technologies Inc. | Generating data clusters |
US10230746B2 (en) | 2014-01-03 | 2019-03-12 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US10229284B2 (en) | 2007-02-21 | 2019-03-12 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10296617B1 (en) | 2015-10-05 | 2019-05-21 | Palantir Technologies Inc. | Searches of highly structured data |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US10324609B2 (en) | 2016-07-21 | 2019-06-18 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10371861B2 (en) | 2015-02-13 | 2019-08-06 | Opower, Inc. | Notification techniques for reducing energy usage |
US10410130B1 (en) | 2014-08-07 | 2019-09-10 | Opower, Inc. | Inferring residential home characteristics based on energy data |
US10423582B2 (en) | 2011-06-23 | 2019-09-24 | Palantir Technologies, Inc. | System and method for investigating large amounts of data |
US10438308B2 (en) | 2003-02-04 | 2019-10-08 | Lexisnexis Risk Solutions Fl Inc. | Systems and methods for identifying entities using geographical and social mapping |
US10437612B1 (en) | 2015-12-30 | 2019-10-08 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US10437649B2 (en) * | 2016-03-11 | 2019-10-08 | Intel Corporation | Task mapping for heterogeneous platforms |
US10444941B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10452678B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Filter chains for exploring large data sets |
US10474510B2 (en) | 2016-03-11 | 2019-11-12 | Intel Corporation | Declarative properties for data collections |
US10484407B2 (en) | 2015-08-06 | 2019-11-19 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US10559044B2 (en) | 2015-11-20 | 2020-02-11 | Opower, Inc. | Identification of peak days |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10572305B2 (en) | 2016-03-11 | 2020-02-25 | Intel Corporation | Multi-grained memory operands |
US10657684B1 (en) | 2018-12-19 | 2020-05-19 | EffectiveTalent Office LLC | Matched array alignment system and method |
US10678860B1 (en) | 2015-12-17 | 2020-06-09 | Palantir Technologies, Inc. | Automatic generation of composite datasets based on hierarchical fields |
US10684891B2 (en) | 2016-03-11 | 2020-06-16 | Intel Corporation | Memory operand descriptors |
US10698938B2 (en) | 2016-03-18 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10699071B2 (en) | 2013-08-08 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for template based custom document generation |
US10706434B1 (en) | 2015-09-01 | 2020-07-07 | Palantir Technologies Inc. | Methods and systems for determining location information |
US10719188B2 (en) | 2016-07-21 | 2020-07-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
US10719797B2 (en) | 2013-05-10 | 2020-07-21 | Opower, Inc. | Method of tracking and reporting energy performance for businesses |
US10795723B2 (en) | 2014-03-04 | 2020-10-06 | Palantir Technologies Inc. | Mobile tasks |
US10803085B1 (en) | 2018-12-19 | 2020-10-13 | Airspeed Systems LLC | Matched array airspeed and angle of attack alignment system and method |
US10817789B2 (en) | 2015-06-09 | 2020-10-27 | Opower, Inc. | Determination of optimal energy storage methods at electric customer service points |
US10817513B2 (en) | 2013-03-14 | 2020-10-27 | Palantir Technologies Inc. | Fair scheduling for mixed-query loads |
US10885021B1 (en) | 2018-05-02 | 2021-01-05 | Palantir Technologies Inc. | Interactive interpreter and graphical user interface |
US10896529B1 (en) | 2018-12-19 | 2021-01-19 | EffectiveTalent Office LLC | Matched array talent architecture system and method |
US10929436B2 (en) | 2014-07-03 | 2021-02-23 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US11010941B1 (en) | 2018-12-19 | 2021-05-18 | EffectiveTalent Office LLC | Matched array general talent architecture system and method |
US11016988B1 (en) | 2018-12-19 | 2021-05-25 | Airspeed Systems LLC | Matched array flight alignment system and method |
US11093950B2 (en) | 2015-02-02 | 2021-08-17 | Opower, Inc. | Customer activity score |
US11138180B2 (en) | 2011-09-02 | 2021-10-05 | Palantir Technologies Inc. | Transaction protocol for reading database values |
US11150917B2 (en) | 2015-08-26 | 2021-10-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US11188929B2 (en) | 2014-08-07 | 2021-11-30 | Opower, Inc. | Advisor and notification to reduce bill shock |
US20210396585A1 (en) * | 2019-03-06 | 2021-12-23 | Electric Pocket Limited | Thermal quality mappings |
US11403268B2 (en) * | 2020-08-06 | 2022-08-02 | Sap Se | Predicting types of records based on amount values of records |
US11599369B1 (en) | 2018-03-08 | 2023-03-07 | Palantir Technologies Inc. | Graphical user interface configuration system |
Families Citing this family (136)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6990238B1 (en) * | 1999-09-30 | 2006-01-24 | Battelle Memorial Institute | Data processing, analysis, and visualization system for use with disparate data types |
US6751621B1 (en) * | 2000-01-27 | 2004-06-15 | Manning & Napier Information Services, Llc. | Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors |
DE60005293T2 (en) * | 2000-02-23 | 2004-07-01 | Ser Solutions Inc. | Method and device for processing electronic documents |
WO2001083058A2 (en) | 2000-05-01 | 2001-11-08 | Cfph, L.L.C. | Real-time interactive wagering on event outcomes |
US7912868B2 (en) * | 2000-05-02 | 2011-03-22 | Textwise Llc | Advertisement placement method and system using semantic analysis |
US6879332B2 (en) * | 2000-05-16 | 2005-04-12 | Groxis, Inc. | User interface for displaying and exploring hierarchical information |
US7043457B1 (en) | 2000-06-28 | 2006-05-09 | Probuild, Inc. | System and method for managing and evaluating network commodities purchasing |
US9177828B2 (en) * | 2011-02-10 | 2015-11-03 | Micron Technology, Inc. | External gettering method and device |
US6615211B2 (en) * | 2001-03-19 | 2003-09-02 | International Business Machines Corporation | System and methods for using continuous optimization for ordering categorical data sets |
US7191250B1 (en) * | 2001-03-19 | 2007-03-13 | Palmsource, Inc. | Communication protocol for wireless data exchange via a packet transport based system |
JP4629280B2 (en) * | 2001-08-24 | 2011-02-09 | 富士通株式会社 | Knowledge discovery support apparatus and support method |
ES2375403T3 (en) | 2001-08-27 | 2012-02-29 | BDGB Enterprise Software Sàrl | A METHOD FOR THE AUTOMATIC INDEXATION OF DOCUMENTS. |
US6888548B1 (en) | 2001-08-31 | 2005-05-03 | Attenex Corporation | System and method for generating a visualized data representation preserving independent variable geometric relationships |
US6778995B1 (en) | 2001-08-31 | 2004-08-17 | Attenex Corporation | System and method for efficiently generating cluster groupings in a multi-dimensional concept space |
US6978274B1 (en) | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
US7752266B2 (en) | 2001-10-11 | 2010-07-06 | Ebay Inc. | System and method to facilitate translation of communications between entities over a network |
JP2003126045A (en) * | 2001-10-22 | 2003-05-07 | Olympus Optical Co Ltd | Diagnostic assistant system |
US9374451B2 (en) | 2002-02-04 | 2016-06-21 | Nokia Technologies Oy | System and method for multimodal short-cuts to digital services |
US7271804B2 (en) | 2002-02-25 | 2007-09-18 | Attenex Corporation | System and method for arranging concept clusters in thematic relationships in a two-dimensional visual display area |
US8078505B2 (en) | 2002-06-10 | 2011-12-13 | Ebay Inc. | Method and system for automatically updating a seller application utilized in a network-based transaction facility |
US20040125143A1 (en) * | 2002-07-22 | 2004-07-01 | Kenneth Deaton | Display system and method for displaying a multi-dimensional file visualizer and chooser |
US7383513B2 (en) * | 2002-09-25 | 2008-06-03 | Oracle International Corporation | Graphical condition builder for facilitating database queries |
GB2395807A (en) * | 2002-11-27 | 2004-06-02 | Sony Uk Ltd | Information retrieval |
US20050171948A1 (en) * | 2002-12-11 | 2005-08-04 | Knight William C. | System and method for identifying critical features in an ordered scale space within a multi-dimensional feature space |
US20040138988A1 (en) * | 2002-12-20 | 2004-07-15 | Bart Munro | Method to facilitate a search of a database utilizing multiple search criteria |
US7341517B2 (en) * | 2003-04-10 | 2008-03-11 | Cantor Index, Llc | Real-time interactive wagering on event outcomes |
US9026901B2 (en) | 2003-06-20 | 2015-05-05 | International Business Machines Corporation | Viewing annotations across multiple applications |
US8321470B2 (en) * | 2003-06-20 | 2012-11-27 | International Business Machines Corporation | Heterogeneous multi-level extendable indexing for general purpose annotation systems |
GB2403636A (en) * | 2003-07-02 | 2005-01-05 | Sony Uk Ltd | Information retrieval using an array of nodes |
US7610313B2 (en) | 2003-07-25 | 2009-10-27 | Attenex Corporation | System and method for performing efficient document scoring and clustering |
US7548651B2 (en) * | 2003-10-03 | 2009-06-16 | Asahi Kasei Kabushiki Kaisha | Data process unit and data process unit control program |
US20050149351A1 (en) * | 2003-12-29 | 2005-07-07 | Jiao Gong | Method of businessman hospital inquiring system |
US7191175B2 (en) | 2004-02-13 | 2007-03-13 | Attenex Corporation | System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space |
US20050210005A1 (en) * | 2004-03-18 | 2005-09-22 | Lee Thompson | Methods and systems for searching data containing both text and numerical/tabular data formats |
US9189568B2 (en) | 2004-04-23 | 2015-11-17 | Ebay Inc. | Method and system to display and search in a language independent manner |
US7296021B2 (en) * | 2004-05-21 | 2007-11-13 | International Business Machines Corporation | Method, system, and article to specify compound query, displaying visual indication includes a series of graphical bars specify weight relevance, ordered segments of unique colors where each segment length indicative of the extent of match of each object with one of search parameters |
AU2005253141A1 (en) | 2004-06-07 | 2005-12-22 | Cfph, Llc | System and method for managing financial market information |
US7890396B2 (en) | 2005-06-07 | 2011-02-15 | Cfph, Llc | Enhanced system and method for managing financial market information |
JP4249674B2 (en) * | 2004-08-25 | 2009-04-02 | 株式会社東芝 | Multidimensional data display method and display device |
JP2006091994A (en) * | 2004-09-21 | 2006-04-06 | Toshiba Corp | Device, method and program for processing document information |
US20060112089A1 (en) * | 2004-11-22 | 2006-05-25 | International Business Machines Corporation | Methods and apparatus for assessing web page decay |
CA2500573A1 (en) * | 2005-03-14 | 2006-09-14 | Oculus Info Inc. | Advances in nspace - system and method for information analysis |
US7404151B2 (en) | 2005-01-26 | 2008-07-22 | Attenex Corporation | System and method for providing a dynamic user interface for a dense three-dimensional scene |
US7356777B2 (en) | 2005-01-26 | 2008-04-08 | Attenex Corporation | System and method for providing a dynamic user interface for a dense three-dimensional scene |
JP2008541696A (en) * | 2005-04-27 | 2008-11-27 | エミリーム インコーポレイテッド | Novel method and device for assessing poisons |
CN1855094A (en) * | 2005-04-28 | 2006-11-01 | 国际商业机器公司 | Method and device for processing electronic files of users |
US7461039B1 (en) * | 2005-09-08 | 2008-12-02 | International Business Machines Corporation | Canonical model to normalize disparate persistent data sources |
WO2007034482A2 (en) * | 2005-09-20 | 2007-03-29 | Sterna Technologies (2005) Ltd. | A method and system for managing data and organizational constraints |
JP4205090B2 (en) * | 2005-09-30 | 2009-01-07 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Apparatus for displaying text information in association with numerical information, and method thereof |
US7571151B1 (en) * | 2005-12-15 | 2009-08-04 | Gneiss Software, Inc. | Data analysis tool for analyzing data stored in multiple text files |
US20070150864A1 (en) * | 2005-12-26 | 2007-06-28 | Goh Chee Ying Josiah | Visual-Based Object Oriented Programming Language & System |
US20070211056A1 (en) * | 2006-03-08 | 2007-09-13 | Sudip Chakraborty | Multi-dimensional data visualization |
US7603351B2 (en) * | 2006-04-19 | 2009-10-13 | Apple Inc. | Semantic reconstruction |
US20070250476A1 (en) * | 2006-04-21 | 2007-10-25 | Lockheed Martin Corporation | Approximate nearest neighbor search in metric space |
US20080082567A1 (en) * | 2006-05-01 | 2008-04-03 | Bezanson Jeffrey W | Apparatuses, Methods And Systems For Vector Operations And Storage In Matrix Models |
JP4555256B2 (en) * | 2006-05-24 | 2010-09-29 | Necソフト株式会社 | Analysis method aiming at feature extraction and comparative classification of time-series gene expression data, and analysis apparatus based on the analysis method |
US8464066B1 (en) * | 2006-06-30 | 2013-06-11 | Amazon Technologies, Inc. | Method and system for sharing segments of multimedia data |
WO2008047383A2 (en) * | 2006-07-28 | 2008-04-24 | Persistent Systems Private Limited | System and method for network association inference, validation and pruning based on integrated constraints from diverse data |
US8639782B2 (en) | 2006-08-23 | 2014-01-28 | Ebay, Inc. | Method and system for sharing metadata between interfaces |
US8452767B2 (en) * | 2006-09-15 | 2013-05-28 | Battelle Memorial Institute | Text analysis devices, articles of manufacture, and text analysis methods |
US8996993B2 (en) * | 2006-09-15 | 2015-03-31 | Battelle Memorial Institute | Text analysis devices, articles of manufacture, and text analysis methods |
US8562422B2 (en) | 2006-09-28 | 2013-10-22 | Cfph, Llc | Products and processes for processing information related to weather and other events |
WO2008055034A2 (en) * | 2006-10-30 | 2008-05-08 | Noblis, Inc. | Method and system for personal information extraction and modeling with fully generalized extraction contexts |
US8122045B2 (en) * | 2007-02-27 | 2012-02-21 | International Business Machines Corporation | Method for mapping a data source to a data target |
US8880564B2 (en) * | 2007-10-11 | 2014-11-04 | Microsoft Corporation | Generic model editing framework |
US20090112533A1 (en) * | 2007-10-31 | 2009-04-30 | Caterpillar Inc. | Method for simplifying a mathematical model by clustering data |
US8290966B2 (en) * | 2007-11-29 | 2012-10-16 | Sap Aktiengesellschaft | System and method for implementing a non-destructive tree filter |
US20090204921A1 (en) * | 2008-02-07 | 2009-08-13 | Vestyck Anthony R | System and Method for Organizing, Managing, and Using Electronic Files |
US8965881B2 (en) * | 2008-08-15 | 2015-02-24 | Athena A. Smyros | Systems and methods for searching an index |
US9424339B2 (en) | 2008-08-15 | 2016-08-23 | Athena A. Smyros | Systems and methods utilizing a search engine |
US8229971B2 (en) * | 2008-09-29 | 2012-07-24 | Efrem Meretab | System and method for dynamically configuring content-driven relationships among data elements |
US8576218B2 (en) * | 2008-12-18 | 2013-11-05 | Microsoft Corporation | Bi-directional update of a grid and associated visualizations |
US20100204923A1 (en) * | 2009-02-10 | 2010-08-12 | Bruce Alan White | Comparing Accuracies Of Lie Detection Methods |
US9235563B2 (en) * | 2009-07-02 | 2016-01-12 | Battelle Memorial Institute | Systems and processes for identifying features and determining feature associations in groups of documents |
US9626339B2 (en) * | 2009-07-20 | 2017-04-18 | Mcap Research Llc | User interface with navigation controls for the display or concealment of adjacent content |
US8572084B2 (en) | 2009-07-28 | 2013-10-29 | Fti Consulting, Inc. | System and method for displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor |
US8612446B2 (en) | 2009-08-24 | 2013-12-17 | Fti Consulting, Inc. | System and method for generating a reference set for use during document review |
US9152883B2 (en) * | 2009-11-02 | 2015-10-06 | Harry Urbschat | System and method for increasing the accuracy of optical character recognition (OCR) |
US9213756B2 (en) * | 2009-11-02 | 2015-12-15 | Harry Urbschat | System and method of using dynamic variance networks |
US9158833B2 (en) * | 2009-11-02 | 2015-10-13 | Harry Urbschat | System and method for obtaining document information |
US20110179003A1 (en) * | 2010-01-21 | 2011-07-21 | Korea Advanced Institute Of Science And Technology | System for Sharing Emotion Data and Method of Sharing Emotion Data Using the Same |
US8402027B1 (en) * | 2010-02-11 | 2013-03-19 | Disney Enterprises, Inc. | System and method for hybrid hierarchical segmentation |
WO2011137368A2 (en) | 2010-04-30 | 2011-11-03 | Life Technologies Corporation | Systems and methods for analyzing nucleic acid sequences |
US20110295592A1 (en) * | 2010-05-28 | 2011-12-01 | Bank Of America Corporation | Survey Analysis and Categorization Assisted by a Knowledgebase |
US8667333B2 (en) * | 2010-06-01 | 2014-03-04 | The United States Of America As Represented By The Secretary Of The Navy | Extensible testing system |
US9268903B2 (en) | 2010-07-06 | 2016-02-23 | Life Technologies Corporation | Systems and methods for sequence data alignment quality assessment |
US9043296B2 (en) | 2010-07-30 | 2015-05-26 | Microsoft Technology Licensing, Llc | System of providing suggestions based on accessible and contextual information |
JP2012038239A (en) * | 2010-08-11 | 2012-02-23 | Sony Corp | Information processing equipment, information processing method and program |
US9959263B2 (en) * | 2010-12-07 | 2018-05-01 | Microsoft Technology Licensing, Llc. | User interface form field expansion |
US9135241B2 (en) | 2010-12-08 | 2015-09-15 | At&T Intellectual Property I, L.P. | System and method for learning latent representations for natural language tasks |
US9665643B2 (en) | 2011-12-30 | 2017-05-30 | Microsoft Technology Licensing, Llc | Knowledge-based entity detection and disambiguation |
US9864817B2 (en) * | 2012-01-28 | 2018-01-09 | Microsoft Technology Licensing, Llc | Determination of relationships between collections of disparate media types |
US20130207980A1 (en) * | 2012-02-13 | 2013-08-15 | Anil Babu Ankisettipalli | Visualization of data clusters |
US20130232157A1 (en) * | 2012-03-05 | 2013-09-05 | Tammer Eric Kamel | Systems and methods for processing unstructured numerical data |
US9202139B2 (en) * | 2012-07-18 | 2015-12-01 | Nvidia Corporation | System, method, and computer program product for generating a subset of a low discrepancy sequence |
GB2504356B (en) * | 2012-07-27 | 2015-02-25 | Thales Holdings Uk Plc | Detection of anomalous behaviour in computer network activity |
US20140046879A1 (en) * | 2012-08-13 | 2014-02-13 | Predixion Software, Inc. | Machine learning semantic model |
JP2014085921A (en) * | 2012-10-25 | 2014-05-12 | Sony Corp | Information processing apparatus and method, and program |
US9208254B2 (en) * | 2012-12-10 | 2015-12-08 | Microsoft Technology Licensing, Llc | Query and index over documents |
KR102029055B1 (en) * | 2013-02-08 | 2019-10-07 | 삼성전자주식회사 | Method and apparatus for high-dimensional data visualization |
US9070227B2 (en) | 2013-03-04 | 2015-06-30 | Microsoft Technology Licensing, Llc | Particle based visualizations of abstract information |
US9754392B2 (en) | 2013-03-04 | 2017-09-05 | Microsoft Technology Licensing, Llc | Generating data-mapped visualization of data |
US10656800B2 (en) * | 2013-03-29 | 2020-05-19 | Microsoft Technology Licensing, Llc | Visual configuration and activation |
US10452222B2 (en) | 2013-05-29 | 2019-10-22 | Microsoft Technology Licensing, Llc | Coordination of system readiness tasks |
US9158809B2 (en) | 2013-06-04 | 2015-10-13 | Microsoft Technology Licensing, Llc | Grid queries |
US9720972B2 (en) | 2013-06-17 | 2017-08-01 | Microsoft Technology Licensing, Llc | Cross-model filtering |
US9189517B2 (en) | 2013-10-02 | 2015-11-17 | Microsoft Technology Licensing, Llc | Integrating search with application analysis |
US8744840B1 (en) * | 2013-10-11 | 2014-06-03 | Realfusion LLC | Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping |
US10002149B2 (en) * | 2014-02-22 | 2018-06-19 | SourceThought, Inc. | Relevance ranking for data and transformations |
US10191956B2 (en) | 2014-08-19 | 2019-01-29 | New England Complex Systems Institute, Inc. | Event detection and characterization in big data streams |
US9335979B1 (en) * | 2014-09-15 | 2016-05-10 | The Mathworks, Inc. | Data type visualization |
US10095487B1 (en) | 2014-09-15 | 2018-10-09 | The Mathworks, Inc. | Data type visualization |
US10452651B1 (en) * | 2014-12-23 | 2019-10-22 | Palantir Technologies Inc. | Searching charts |
US9922037B2 (en) * | 2015-01-30 | 2018-03-20 | Splunk Inc. | Index time, delimiter based extractions and previewing for use in indexing |
DE102015111549A1 (en) * | 2015-07-16 | 2017-01-19 | Wolfgang Grond | Method for visually displaying electronic output data sets |
US10866992B2 (en) | 2016-05-14 | 2020-12-15 | Gratiana Denisa Pol | System and methods for identifying, aggregating, and visualizing tested variables and causal relationships from scientific research |
WO2017210618A1 (en) | 2016-06-02 | 2017-12-07 | Fti Consulting, Inc. | Analyzing clusters of coded documents |
JP2018022204A (en) * | 2016-08-01 | 2018-02-08 | Dmg森精機株式会社 | Processing state display device, nc program generation device with the same, and nc program editing device |
US10929264B2 (en) * | 2016-09-14 | 2021-02-23 | International Business Machines Corporation | Measuring effective utilization of a service practitioner for ticket resolution via a wearable device |
US20180113938A1 (en) * | 2016-10-24 | 2018-04-26 | Ebay Inc. | Word embedding with generalized context for internet search queries |
US10235784B2 (en) * | 2016-12-22 | 2019-03-19 | Sap Se | Color synchronization across a story |
US10739955B2 (en) * | 2017-06-12 | 2020-08-11 | Royal Bank Of Canada | System and method for adaptive data visualization |
US11494395B2 (en) | 2017-07-31 | 2022-11-08 | Splunk Inc. | Creating dashboards for viewing data in a data storage system based on natural language requests |
US10635700B2 (en) * | 2017-11-09 | 2020-04-28 | Cloudera, Inc. | Design-time information based on run-time artifacts in transient cloud-based distributed computing clusters |
US10514948B2 (en) | 2017-11-09 | 2019-12-24 | Cloudera, Inc. | Information based on run-time artifacts in a distributed computing cluster |
US10417798B2 (en) * | 2017-11-19 | 2019-09-17 | Cadreon LLC | System and method based on sliding-scale cluster groups for precise look-alike modeling |
US11281726B2 (en) | 2017-12-01 | 2022-03-22 | Palantir Technologies Inc. | System and methods for faster processor comparisons of visual graph features |
US11195119B2 (en) | 2018-01-05 | 2021-12-07 | International Business Machines Corporation | Identifying and visualizing relationships and commonalities amongst record entities |
CN108846880A (en) * | 2018-04-25 | 2018-11-20 | 云南中烟工业有限责任公司 | A kind of cigarette quality feature visualization method |
RU2728506C2 (en) * | 2018-06-29 | 2020-07-29 | Акционерное общество "Лаборатория Касперского" | Method of blocking network connections |
US11675926B2 (en) * | 2018-12-31 | 2023-06-13 | Dathena Science Pte Ltd | Systems and methods for subset selection and optimization for balanced sampled dataset generation |
RU2737720C1 (en) * | 2019-11-20 | 2020-12-02 | Общество с ограниченной ответственностью "Аби Продакшн" | Retrieving fields using neural networks without using templates |
US11210824B2 (en) | 2020-05-21 | 2021-12-28 | At&T Intellectual Property I, L.P. | Integer-based graphical representations of words and texts |
CN111723206B (en) * | 2020-06-19 | 2024-01-19 | 北京明略软件系统有限公司 | Text classification method, apparatus, computer device and storage medium |
CN113392294B (en) * | 2020-10-15 | 2023-11-10 | 腾讯科技(深圳)有限公司 | Sample labeling method and device |
US12032598B2 (en) | 2021-12-27 | 2024-07-09 | Data Ramp Technologies Llc | Personal data association method |
Citations (94)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5047842A (en) * | 1989-11-03 | 1991-09-10 | The Trustees Of Princeton University | Color image display with a limited palette size |
US5121337A (en) * | 1990-10-15 | 1992-06-09 | Exxon Research And Engineering Company | Method for correcting spectral data for data due to the spectral measurement process itself and estimating unknown property and/or composition data of a sample using such method |
US5261093A (en) * | 1990-11-05 | 1993-11-09 | David Saroff Research Center, Inc. | Interactive relational database analysis with successive refinement steps in selection of ouput data from underlying database |
US5325298A (en) * | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
US5361326A (en) * | 1991-12-31 | 1994-11-01 | International Business Machines Corporation | Enhanced interface for a neural network engine |
US5446681A (en) * | 1990-10-12 | 1995-08-29 | Exxon Research And Engineering Company | Method of estimating property and/or composition data of a test sample |
US5506937A (en) * | 1993-04-02 | 1996-04-09 | University Of West Florida | Concept mapbased multimedia computer system for facilitating user understanding of a domain of knowledge |
US5528735A (en) * | 1993-03-23 | 1996-06-18 | Silicon Graphics Inc. | Method and apparatus for displaying data within a three-dimensional information landscape |
US5546472A (en) * | 1992-08-07 | 1996-08-13 | Arch Development Corp. | Feature guided method and apparatus for obtaining an image of an object |
US5555354A (en) * | 1993-03-23 | 1996-09-10 | Silicon Graphics Inc. | Method and apparatus for navigation within three-dimensional information landscape |
US5574873A (en) * | 1993-05-07 | 1996-11-12 | Apple Computer, Inc. | Decoding guest instruction to directly access emulation routines that emulate the guest instructions |
US5592599A (en) * | 1991-12-18 | 1997-01-07 | Ampex Corporation | Video special effects system with graphical operator interface |
US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
US5623679A (en) * | 1993-11-19 | 1997-04-22 | Waverley Holdings, Inc. | System and method for creating and manipulating notes each containing multiple sub-notes, and linking the sub-notes to portions of data objects |
US5623681A (en) * | 1993-11-19 | 1997-04-22 | Waverley Holdings, Inc. | Method and apparatus for synchronizing, displaying and manipulating text and image documents |
US5625767A (en) * | 1995-03-13 | 1997-04-29 | Bartell; Brian | Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents |
US5659766A (en) * | 1994-09-16 | 1997-08-19 | Xerox Corporation | Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision |
US5675788A (en) * | 1995-09-15 | 1997-10-07 | Infonautics Corp. | Method and apparatus for generating a composite document on a selected topic from a plurality of information sources |
US5687364A (en) * | 1994-09-16 | 1997-11-11 | Xerox Corporation | Method for learning to infer the topical content of documents based upon their lexical content |
US5692107A (en) * | 1994-03-15 | 1997-11-25 | Lockheed Missiles & Space Company, Inc. | Method for generating predictive models in a computer system |
US5721912A (en) * | 1994-08-05 | 1998-02-24 | Data Integration Solutions Corp. | Graphical user interface for creating database integration specifications |
US5721903A (en) * | 1995-10-12 | 1998-02-24 | Ncr Corporation | System and method for generating reports from a computer database |
US5732260A (en) * | 1994-09-01 | 1998-03-24 | International Business Machines Corporation | Information retrieval system and method |
US5737591A (en) * | 1996-05-23 | 1998-04-07 | Microsoft Corporation | Database view generation system |
US5751612A (en) * | 1995-08-24 | 1998-05-12 | Lockheed Martin Corporation | System and method for accurate and efficient geodetic database retrieval |
US5767854A (en) * | 1996-09-27 | 1998-06-16 | Anwar; Mohammed S. | Multidimensional data display and manipulation system and methods for using same |
US5784544A (en) * | 1996-08-30 | 1998-07-21 | International Business Machines Corporation | Method and system for determining the data type of a stream of data |
US5787274A (en) * | 1995-11-29 | 1998-07-28 | International Business Machines Corporation | Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records |
US5819258A (en) * | 1997-03-07 | 1998-10-06 | Digital Equipment Corporation | Method and apparatus for automatically generating hierarchical categories from large document collections |
US5838973A (en) * | 1996-05-03 | 1998-11-17 | Andersen Consulting Llp | System and method for interactively transforming a system or process into a visual representation |
US5842206A (en) * | 1996-08-20 | 1998-11-24 | Iconovex Corporation | Computerized method and system for qualified searching of electronically stored documents |
US5857185A (en) * | 1995-10-20 | 1999-01-05 | Fuji Xerox Co., Ltd. | Method and system for searching and for presenting the search results in an attribute that corresponds to the retrieved documents |
US5857179A (en) * | 1996-09-09 | 1999-01-05 | Digital Equipment Corporation | Computer method and apparatus for clustering documents and automatic generation of cluster keywords |
US5861891A (en) * | 1997-01-13 | 1999-01-19 | Silicon Graphics, Inc. | Method, system, and computer program for visually approximating scattered data |
US5864863A (en) * | 1996-08-09 | 1999-01-26 | Digital Equipment Corporation | Method for parsing, indexing and searching world-wide-web pages |
US5873076A (en) * | 1995-09-15 | 1999-02-16 | Infonautics Corporation | Architecture for processing search queries, retrieving documents identified thereby, and method for using same |
US5907838A (en) * | 1996-12-10 | 1999-05-25 | Seiko Epson Corporation | Information search and collection method and system |
US5913214A (en) * | 1996-05-30 | 1999-06-15 | Massachusetts Inst Technology | Data extraction from world wide web pages |
US5918010A (en) * | 1997-02-07 | 1999-06-29 | General Internet, Inc. | Collaborative internet data mining systems |
US5926806A (en) * | 1996-10-18 | 1999-07-20 | Apple Computer, Inc. | Method and system for displaying related information from a database |
US5926820A (en) * | 1997-02-27 | 1999-07-20 | International Business Machines Corporation | Method and system for performing range max/min queries on a data cube |
US5930784A (en) * | 1997-08-21 | 1999-07-27 | Sandia Corporation | Method of locating related items in a geometric space for data mining |
US5930803A (en) * | 1997-04-30 | 1999-07-27 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing an evidence classifier |
US5945982A (en) * | 1995-05-30 | 1999-08-31 | Minolta Co., Ltd. | Data administration apparatus that can search for desired image data using maps |
US5953716A (en) * | 1996-05-30 | 1999-09-14 | Massachusetts Inst Technology | Querying heterogeneous data sources distributed over a network using context interchange |
US5953006A (en) * | 1992-03-18 | 1999-09-14 | Lucent Technologies Inc. | Methods and apparatus for detecting and displaying similarities in large data sets |
US5963965A (en) * | 1997-02-18 | 1999-10-05 | Semio Corporation | Text processing and retrieval system and method |
US5966139A (en) * | 1995-10-31 | 1999-10-12 | Lucent Technologies Inc. | Scalable data segmentation and visualization system |
US6012053A (en) * | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
US6014661A (en) * | 1996-05-06 | 2000-01-11 | Ivee Development Ab | System and method for automatic analysis of data bases and for user-controlled dynamic querying |
US6023694A (en) * | 1996-01-02 | 2000-02-08 | Timeline, Inc. | Data retrieval method and apparatus with multiple source capability |
US6026409A (en) * | 1996-09-26 | 2000-02-15 | Blumenthal; Joshua O. | System and method for search and retrieval of digital information by making and scaled viewing |
US6029176A (en) * | 1997-11-25 | 2000-02-22 | Cannon Holdings, L.L.C. | Manipulating and analyzing data using a computer system having a database mining engine resides in memory |
US6032157A (en) * | 1994-03-17 | 2000-02-29 | Hitachi, Ltd. | Retrieval method using image information |
US6034697A (en) * | 1997-01-13 | 2000-03-07 | Silicon Graphics, Inc. | Interpolation between relational tables for purposes of animating a data visualization |
US6035057A (en) * | 1997-03-10 | 2000-03-07 | Hoffman; Efrem H. | Hierarchical data matrix pattern recognition and identification system |
US6038561A (en) * | 1996-10-15 | 2000-03-14 | Manning & Napier Information Services | Management and analysis of document information text |
US6038538A (en) * | 1997-09-15 | 2000-03-14 | International Business Machines Corporation | Generating process models from workflow logs |
US6044366A (en) * | 1998-03-16 | 2000-03-28 | Microsoft Corporation | Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining |
US6049806A (en) * | 1997-04-11 | 2000-04-11 | Multimedia Archival Systems, Inc. | Computer system for managing a plurality of data types |
US6058391A (en) * | 1997-12-17 | 2000-05-02 | Mci Communications Corporation | Enhanced user view/update capability for managing data from relational tables |
US6067542A (en) * | 1995-10-20 | 2000-05-23 | Ncr Corporation | Pragma facility and SQL3 extension for optimal parallel UDF execution |
US6073138A (en) * | 1998-06-11 | 2000-06-06 | Boardwalk A.G. | System, method, and computer program product for providing relational patterns between entities |
US6073115A (en) * | 1992-09-30 | 2000-06-06 | Marshall; Paul Steven | Virtual reality generator for displaying abstract information |
US6078314A (en) * | 1997-04-11 | 2000-06-20 | Samsung Electronics Co., Ltd. | Mobile information terminal and operating method thereof |
US6078914A (en) * | 1996-12-09 | 2000-06-20 | Open Text Corporation | Natural language meta-search system and method |
US6081802A (en) * | 1997-08-12 | 2000-06-27 | Microsoft Corporation | System and method for accessing compactly stored map element information from memory |
US6085190A (en) * | 1996-11-15 | 2000-07-04 | Digital Vision Laboratories Corporation | Apparatus and method for retrieval of information from various structured information |
US6088032A (en) * | 1996-10-04 | 2000-07-11 | Xerox Corporation | Computer controlled display system for displaying a three-dimensional document workspace having a means for prefetching linked documents |
US6092061A (en) * | 1997-08-15 | 2000-07-18 | International Business Machines Corporation | Data partitioning by co-locating referenced and referencing records |
US6094648A (en) * | 1995-01-11 | 2000-07-25 | Philips Electronics North America Corporation | User interface for document retrieval |
US6094649A (en) * | 1997-12-22 | 2000-07-25 | Partnet, Inc. | Keyword searches of structured databases |
US6098065A (en) * | 1997-02-13 | 2000-08-01 | Nortel Networks Corporation | Associative search engine |
US6100901A (en) * | 1998-06-22 | 2000-08-08 | International Business Machines Corporation | Method and apparatus for cluster exploration and visualization |
US6108651A (en) * | 1997-09-09 | 2000-08-22 | Netscape Communications Corporation | Heuristic co-identification of objects across heterogeneous information sources |
US6108666A (en) * | 1997-06-12 | 2000-08-22 | International Business Machines Corporation | Method and apparatus for pattern discovery in 1-dimensional event streams |
US6108004A (en) * | 1997-10-21 | 2000-08-22 | International Business Machines Corporation | GUI guide for data mining |
US6111578A (en) * | 1997-03-07 | 2000-08-29 | Silicon Graphics, Inc. | Method, system and computer program product for navigating through partial hierarchies |
US6112209A (en) * | 1998-06-17 | 2000-08-29 | Gusack; Mark David | Associative database model for electronic-based informational assemblies |
US6112194A (en) * | 1997-07-21 | 2000-08-29 | International Business Machines Corporation | Method, apparatus and computer program product for data mining having user feedback mechanism for monitoring performance of mining tasks |
US6121969A (en) * | 1997-07-29 | 2000-09-19 | The Regents Of The University Of California | Visual navigation in perceptual databases |
US6122636A (en) * | 1997-06-30 | 2000-09-19 | International Business Machines Corporation | Relational emulation of a multi-dimensional database index |
US6128624A (en) * | 1997-11-12 | 2000-10-03 | Ncr Corporation | Collection and integration of internet and electronic commerce data in a database during web browsing |
US6252597B1 (en) * | 1997-02-14 | 2001-06-26 | Netscape Communications Corporation | Scalable user interface for graphically representing hierarchical data |
US6289353B1 (en) * | 1997-09-24 | 2001-09-11 | Webmd Corporation | Intelligent query system for automatically indexing in a database and automatically categorizing users |
US6301579B1 (en) * | 1998-10-20 | 2001-10-09 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing a data structure |
US6307573B1 (en) * | 1999-07-22 | 2001-10-23 | Barbara L. Barros | Graphic-information flow method and system for visually analyzing patterns and relationships |
US6374251B1 (en) * | 1998-03-17 | 2002-04-16 | Microsoft Corporation | Scalable system for clustering of large databases |
US6738502B1 (en) * | 1999-06-04 | 2004-05-18 | Kairos Scientific, Inc. | Multispectral taxonomic identification |
US6772170B2 (en) * | 1996-09-13 | 2004-08-03 | Battelle Memorial Institute | System and method for interpreting document contents |
US6898530B1 (en) * | 1999-09-30 | 2005-05-24 | Battelle Memorial Institute | Method and apparatus for extracting attributes from sequence strings and biopolymer material |
US6940509B1 (en) * | 2000-09-29 | 2005-09-06 | Battelle Memorial Institute | Systems and methods for improving concept landscape visualizations as a data analysis tool |
US6990238B1 (en) * | 1999-09-30 | 2006-01-24 | Battelle Memorial Institute | Data processing, analysis, and visualization system for use with disparate data types |
US7212999B2 (en) * | 1999-04-09 | 2007-05-01 | Trading Technologies International, Inc. | User interface for an electronic trading system |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4823306A (en) * | 1987-08-14 | 1989-04-18 | International Business Machines Corporation | Text search system |
US5559940A (en) * | 1990-12-14 | 1996-09-24 | Hutson; William H. | Method and system for real-time information analysis of textual material |
US5257349A (en) * | 1990-12-18 | 1993-10-26 | David Sarnoff Research Center, Inc. | Interactive data visualization with smart object |
JPH0736915A (en) * | 1993-06-28 | 1995-02-07 | Sanyo Electric Co Ltd | Information file device |
CA2127764A1 (en) * | 1993-08-24 | 1995-02-25 | Stephen Gregory Eick | Displaying query results |
US5696963A (en) | 1993-11-19 | 1997-12-09 | Waverley Holdings, Inc. | System, method and computer program product for searching through an individual document and a group of documents |
US5574837A (en) * | 1995-01-17 | 1996-11-12 | Lucent Technologies Inc. | Method of generating a browser interface for representing similarities between segments of code |
US5699507A (en) * | 1995-01-17 | 1997-12-16 | Lucent Technologies Inc. | Method of identifying similarities in code segments |
JPH0969107A (en) * | 1995-06-20 | 1997-03-11 | Casio Comput Co Ltd | Method for retrieving and extracting record and record extracting device |
US5999192A (en) | 1996-04-30 | 1999-12-07 | Lucent Technologies Inc. | Interactive data exploration apparatus and methods |
US6166739A (en) * | 1996-11-07 | 2000-12-26 | Natrificial, Llc | Method and apparatus for organizing and processing information using a digital computer |
US5999937A (en) | 1997-06-06 | 1999-12-07 | Madison Information Technologies, Inc. | System and method for converting data between data sets |
US5982370A (en) | 1997-07-18 | 1999-11-09 | International Business Machines Corporation | Highlighting tool for search specification in a user interface of a computer system |
JP4005672B2 (en) * | 1997-07-28 | 2007-11-07 | 株式会社ジャストシステム | Document processing apparatus, storage medium storing document processing program, and document processing method |
US5987470A (en) | 1997-08-21 | 1999-11-16 | Sandia Corporation | Method of data mining including determining multidimensional coordinates of each item using a predetermined scalar similarity value for each item pair |
US5986673A (en) | 1997-10-17 | 1999-11-16 | Martz; David R. | Method for relational ordering and displaying multidimensional data |
US5986652A (en) | 1997-10-21 | 1999-11-16 | International Business Machines Corporation | Method for editing an object wherein steps for creating the object are preserved |
US5983224A (en) * | 1997-10-31 | 1999-11-09 | Hitachi America, Ltd. | Method and apparatus for reducing the computational requirements of K-means data clustering |
US6188403B1 (en) * | 1997-11-21 | 2001-02-13 | Portola Dimensional Systems, Inc. | User-friendly graphics generator using direct manipulation |
US5991714A (en) | 1998-04-22 | 1999-11-23 | The United States Of America As Represented By The National Security Agency | Method of identifying data type and locating in a file |
US6449643B1 (en) * | 1998-05-14 | 2002-09-10 | Nortel Networks Limited | Access control with just-in-time resource discovery |
US6327574B1 (en) | 1998-07-07 | 2001-12-04 | Encirq Corporation | Hierarchical models of consumer attributes for targeting content in a privacy-preserving manner |
US6493709B1 (en) * | 1998-07-31 | 2002-12-10 | The Regents Of The University Of California | Method and apparatus for digitally shredding similar documents within large document sets in a data processing environment |
-
1999
- 1999-09-30 US US09/410,367 patent/US6990238B1/en not_active Expired - Lifetime
-
2000
- 2000-09-29 AU AU77413/00A patent/AU7741300A/en not_active Abandoned
- 2000-09-29 WO PCT/US2000/026964 patent/WO2001024060A2/en not_active Application Discontinuation
- 2000-09-29 JP JP2001526758A patent/JP2003532943A/en active Pending
- 2000-09-29 EP EP00967172A patent/EP1323069A2/en not_active Withdrawn
- 2000-09-29 CA CA002385836A patent/CA2385836A1/en not_active Abandoned
-
2005
- 2005-11-21 US US11/282,567 patent/US20060093222A1/en not_active Abandoned
- 2005-11-21 US US11/282,569 patent/US20060106783A1/en not_active Abandoned
-
2010
- 2010-09-06 JP JP2010199149A patent/JP2010282655A/en active Pending
Patent Citations (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5047842A (en) * | 1989-11-03 | 1991-09-10 | The Trustees Of Princeton University | Color image display with a limited palette size |
US5446681A (en) * | 1990-10-12 | 1995-08-29 | Exxon Research And Engineering Company | Method of estimating property and/or composition data of a test sample |
US5121337A (en) * | 1990-10-15 | 1992-06-09 | Exxon Research And Engineering Company | Method for correcting spectral data for data due to the spectral measurement process itself and estimating unknown property and/or composition data of a sample using such method |
US5261093A (en) * | 1990-11-05 | 1993-11-09 | David Saroff Research Center, Inc. | Interactive relational database analysis with successive refinement steps in selection of ouput data from underlying database |
US5325298A (en) * | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
US5592599A (en) * | 1991-12-18 | 1997-01-07 | Ampex Corporation | Video special effects system with graphical operator interface |
US5361326A (en) * | 1991-12-31 | 1994-11-01 | International Business Machines Corporation | Enhanced interface for a neural network engine |
US5953006A (en) * | 1992-03-18 | 1999-09-14 | Lucent Technologies Inc. | Methods and apparatus for detecting and displaying similarities in large data sets |
US5546472A (en) * | 1992-08-07 | 1996-08-13 | Arch Development Corp. | Feature guided method and apparatus for obtaining an image of an object |
US6073115A (en) * | 1992-09-30 | 2000-06-06 | Marshall; Paul Steven | Virtual reality generator for displaying abstract information |
US5671381A (en) * | 1993-03-23 | 1997-09-23 | Silicon Graphics, Inc. | Method and apparatus for displaying data within a three-dimensional information landscape |
US5528735A (en) * | 1993-03-23 | 1996-06-18 | Silicon Graphics Inc. | Method and apparatus for displaying data within a three-dimensional information landscape |
US5555354A (en) * | 1993-03-23 | 1996-09-10 | Silicon Graphics Inc. | Method and apparatus for navigation within three-dimensional information landscape |
US5506937A (en) * | 1993-04-02 | 1996-04-09 | University Of West Florida | Concept mapbased multimedia computer system for facilitating user understanding of a domain of knowledge |
US5574873A (en) * | 1993-05-07 | 1996-11-12 | Apple Computer, Inc. | Decoding guest instruction to directly access emulation routines that emulate the guest instructions |
US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
US5794178A (en) * | 1993-09-20 | 1998-08-11 | Hnc Software, Inc. | Visualization of information using graphical representations of context vector based relationships and attributes |
US5623679A (en) * | 1993-11-19 | 1997-04-22 | Waverley Holdings, Inc. | System and method for creating and manipulating notes each containing multiple sub-notes, and linking the sub-notes to portions of data objects |
US5623681A (en) * | 1993-11-19 | 1997-04-22 | Waverley Holdings, Inc. | Method and apparatus for synchronizing, displaying and manipulating text and image documents |
US5692107A (en) * | 1994-03-15 | 1997-11-25 | Lockheed Missiles & Space Company, Inc. | Method for generating predictive models in a computer system |
US6032157A (en) * | 1994-03-17 | 2000-02-29 | Hitachi, Ltd. | Retrieval method using image information |
US5721912A (en) * | 1994-08-05 | 1998-02-24 | Data Integration Solutions Corp. | Graphical user interface for creating database integration specifications |
US5732260A (en) * | 1994-09-01 | 1998-03-24 | International Business Machines Corporation | Information retrieval system and method |
US5659766A (en) * | 1994-09-16 | 1997-08-19 | Xerox Corporation | Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision |
US5687364A (en) * | 1994-09-16 | 1997-11-11 | Xerox Corporation | Method for learning to infer the topical content of documents based upon their lexical content |
US6094648A (en) * | 1995-01-11 | 2000-07-25 | Philips Electronics North America Corporation | User interface for document retrieval |
US5625767A (en) * | 1995-03-13 | 1997-04-29 | Bartell; Brian | Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents |
US5945982A (en) * | 1995-05-30 | 1999-08-31 | Minolta Co., Ltd. | Data administration apparatus that can search for desired image data using maps |
US5751612A (en) * | 1995-08-24 | 1998-05-12 | Lockheed Martin Corporation | System and method for accurate and efficient geodetic database retrieval |
US5675788A (en) * | 1995-09-15 | 1997-10-07 | Infonautics Corp. | Method and apparatus for generating a composite document on a selected topic from a plurality of information sources |
US5873076A (en) * | 1995-09-15 | 1999-02-16 | Infonautics Corporation | Architecture for processing search queries, retrieving documents identified thereby, and method for using same |
US5721903A (en) * | 1995-10-12 | 1998-02-24 | Ncr Corporation | System and method for generating reports from a computer database |
US6067542A (en) * | 1995-10-20 | 2000-05-23 | Ncr Corporation | Pragma facility and SQL3 extension for optimal parallel UDF execution |
US5857185A (en) * | 1995-10-20 | 1999-01-05 | Fuji Xerox Co., Ltd. | Method and system for searching and for presenting the search results in an attribute that corresponds to the retrieved documents |
US5966139A (en) * | 1995-10-31 | 1999-10-12 | Lucent Technologies Inc. | Scalable data segmentation and visualization system |
US5787274A (en) * | 1995-11-29 | 1998-07-28 | International Business Machines Corporation | Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records |
US6023694A (en) * | 1996-01-02 | 2000-02-08 | Timeline, Inc. | Data retrieval method and apparatus with multiple source capability |
US5838973A (en) * | 1996-05-03 | 1998-11-17 | Andersen Consulting Llp | System and method for interactively transforming a system or process into a visual representation |
US6014661A (en) * | 1996-05-06 | 2000-01-11 | Ivee Development Ab | System and method for automatic analysis of data bases and for user-controlled dynamic querying |
US5737591A (en) * | 1996-05-23 | 1998-04-07 | Microsoft Corporation | Database view generation system |
US5953716A (en) * | 1996-05-30 | 1999-09-14 | Massachusetts Inst Technology | Querying heterogeneous data sources distributed over a network using context interchange |
US5913214A (en) * | 1996-05-30 | 1999-06-15 | Massachusetts Inst Technology | Data extraction from world wide web pages |
US5864863A (en) * | 1996-08-09 | 1999-01-26 | Digital Equipment Corporation | Method for parsing, indexing and searching world-wide-web pages |
US5842206A (en) * | 1996-08-20 | 1998-11-24 | Iconovex Corporation | Computerized method and system for qualified searching of electronically stored documents |
US5784544A (en) * | 1996-08-30 | 1998-07-21 | International Business Machines Corporation | Method and system for determining the data type of a stream of data |
US5857179A (en) * | 1996-09-09 | 1999-01-05 | Digital Equipment Corporation | Computer method and apparatus for clustering documents and automatic generation of cluster keywords |
US6772170B2 (en) * | 1996-09-13 | 2004-08-03 | Battelle Memorial Institute | System and method for interpreting document contents |
US6026409A (en) * | 1996-09-26 | 2000-02-15 | Blumenthal; Joshua O. | System and method for search and retrieval of digital information by making and scaled viewing |
US5767854A (en) * | 1996-09-27 | 1998-06-16 | Anwar; Mohammed S. | Multidimensional data display and manipulation system and methods for using same |
US6088032A (en) * | 1996-10-04 | 2000-07-11 | Xerox Corporation | Computer controlled display system for displaying a three-dimensional document workspace having a means for prefetching linked documents |
US6038561A (en) * | 1996-10-15 | 2000-03-14 | Manning & Napier Information Services | Management and analysis of document information text |
US6101493A (en) * | 1996-10-18 | 2000-08-08 | Apple Computer, Inc. | Method and system for displaying related information from a database |
US5926806A (en) * | 1996-10-18 | 1999-07-20 | Apple Computer, Inc. | Method and system for displaying related information from a database |
US6085190A (en) * | 1996-11-15 | 2000-07-04 | Digital Vision Laboratories Corporation | Apparatus and method for retrieval of information from various structured information |
US6078914A (en) * | 1996-12-09 | 2000-06-20 | Open Text Corporation | Natural language meta-search system and method |
US5907838A (en) * | 1996-12-10 | 1999-05-25 | Seiko Epson Corporation | Information search and collection method and system |
US5861891A (en) * | 1997-01-13 | 1999-01-19 | Silicon Graphics, Inc. | Method, system, and computer program for visually approximating scattered data |
US6034697A (en) * | 1997-01-13 | 2000-03-07 | Silicon Graphics, Inc. | Interpolation between relational tables for purposes of animating a data visualization |
US6081788A (en) * | 1997-02-07 | 2000-06-27 | About.Com, Inc. | Collaborative internet data mining system |
US5918010A (en) * | 1997-02-07 | 1999-06-29 | General Internet, Inc. | Collaborative internet data mining systems |
US6098065A (en) * | 1997-02-13 | 2000-08-01 | Nortel Networks Corporation | Associative search engine |
US6252597B1 (en) * | 1997-02-14 | 2001-06-26 | Netscape Communications Corporation | Scalable user interface for graphically representing hierarchical data |
US5963965A (en) * | 1997-02-18 | 1999-10-05 | Semio Corporation | Text processing and retrieval system and method |
US5926820A (en) * | 1997-02-27 | 1999-07-20 | International Business Machines Corporation | Method and system for performing range max/min queries on a data cube |
US6111578A (en) * | 1997-03-07 | 2000-08-29 | Silicon Graphics, Inc. | Method, system and computer program product for navigating through partial hierarchies |
US6259451B1 (en) * | 1997-03-07 | 2001-07-10 | Silicon Graphics, Inc. | Method, system, and computer program product for mapping between an overview and a partial hierarchy |
US5819258A (en) * | 1997-03-07 | 1998-10-06 | Digital Equipment Corporation | Method and apparatus for automatically generating hierarchical categories from large document collections |
US6035057A (en) * | 1997-03-10 | 2000-03-07 | Hoffman; Efrem H. | Hierarchical data matrix pattern recognition and identification system |
US6078314A (en) * | 1997-04-11 | 2000-06-20 | Samsung Electronics Co., Ltd. | Mobile information terminal and operating method thereof |
US6049806A (en) * | 1997-04-11 | 2000-04-11 | Multimedia Archival Systems, Inc. | Computer system for managing a plurality of data types |
US5930803A (en) * | 1997-04-30 | 1999-07-27 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing an evidence classifier |
US6108666A (en) * | 1997-06-12 | 2000-08-22 | International Business Machines Corporation | Method and apparatus for pattern discovery in 1-dimensional event streams |
US6012053A (en) * | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
US6122636A (en) * | 1997-06-30 | 2000-09-19 | International Business Machines Corporation | Relational emulation of a multi-dimensional database index |
US6112194A (en) * | 1997-07-21 | 2000-08-29 | International Business Machines Corporation | Method, apparatus and computer program product for data mining having user feedback mechanism for monitoring performance of mining tasks |
US6121969A (en) * | 1997-07-29 | 2000-09-19 | The Regents Of The University Of California | Visual navigation in perceptual databases |
US6081802A (en) * | 1997-08-12 | 2000-06-27 | Microsoft Corporation | System and method for accessing compactly stored map element information from memory |
US6092061A (en) * | 1997-08-15 | 2000-07-18 | International Business Machines Corporation | Data partitioning by co-locating referenced and referencing records |
US5930784A (en) * | 1997-08-21 | 1999-07-27 | Sandia Corporation | Method of locating related items in a geometric space for data mining |
US6108651A (en) * | 1997-09-09 | 2000-08-22 | Netscape Communications Corporation | Heuristic co-identification of objects across heterogeneous information sources |
US6038538A (en) * | 1997-09-15 | 2000-03-14 | International Business Machines Corporation | Generating process models from workflow logs |
US6289353B1 (en) * | 1997-09-24 | 2001-09-11 | Webmd Corporation | Intelligent query system for automatically indexing in a database and automatically categorizing users |
US6108004A (en) * | 1997-10-21 | 2000-08-22 | International Business Machines Corporation | GUI guide for data mining |
US6128624A (en) * | 1997-11-12 | 2000-10-03 | Ncr Corporation | Collection and integration of internet and electronic commerce data in a database during web browsing |
US6029176A (en) * | 1997-11-25 | 2000-02-22 | Cannon Holdings, L.L.C. | Manipulating and analyzing data using a computer system having a database mining engine resides in memory |
US6058391A (en) * | 1997-12-17 | 2000-05-02 | Mci Communications Corporation | Enhanced user view/update capability for managing data from relational tables |
US6094649A (en) * | 1997-12-22 | 2000-07-25 | Partnet, Inc. | Keyword searches of structured databases |
US6044366A (en) * | 1998-03-16 | 2000-03-28 | Microsoft Corporation | Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining |
US6374251B1 (en) * | 1998-03-17 | 2002-04-16 | Microsoft Corporation | Scalable system for clustering of large databases |
US6073138A (en) * | 1998-06-11 | 2000-06-06 | Boardwalk A.G. | System, method, and computer program product for providing relational patterns between entities |
US6112209A (en) * | 1998-06-17 | 2000-08-29 | Gusack; Mark David | Associative database model for electronic-based informational assemblies |
US6100901A (en) * | 1998-06-22 | 2000-08-08 | International Business Machines Corporation | Method and apparatus for cluster exploration and visualization |
US6301579B1 (en) * | 1998-10-20 | 2001-10-09 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing a data structure |
US7212999B2 (en) * | 1999-04-09 | 2007-05-01 | Trading Technologies International, Inc. | User interface for an electronic trading system |
US6738502B1 (en) * | 1999-06-04 | 2004-05-18 | Kairos Scientific, Inc. | Multispectral taxonomic identification |
US6307573B1 (en) * | 1999-07-22 | 2001-10-23 | Barbara L. Barros | Graphic-information flow method and system for visually analyzing patterns and relationships |
US6898530B1 (en) * | 1999-09-30 | 2005-05-24 | Battelle Memorial Institute | Method and apparatus for extracting attributes from sequence strings and biopolymer material |
US6990238B1 (en) * | 1999-09-30 | 2006-01-24 | Battelle Memorial Institute | Data processing, analysis, and visualization system for use with disparate data types |
US6940509B1 (en) * | 2000-09-29 | 2005-09-06 | Battelle Memorial Institute | Systems and methods for improving concept landscape visualizations as a data analysis tool |
Cited By (170)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10438308B2 (en) | 2003-02-04 | 2019-10-08 | Lexisnexis Risk Solutions Fl Inc. | Systems and methods for identifying entities using geographical and social mapping |
US9412141B2 (en) | 2003-02-04 | 2016-08-09 | Lexisnexis Risk Solutions Fl Inc | Systems and methods for identifying entities using geographical and social mapping |
US20080015870A1 (en) * | 2003-05-30 | 2008-01-17 | Lawrence Benjamin Elowitz | Apparatus and method for facilitating a search for gems |
US10229284B2 (en) | 2007-02-21 | 2019-03-12 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10719621B2 (en) | 2007-02-21 | 2020-07-21 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10248294B2 (en) | 2008-09-15 | 2019-04-02 | Palantir Technologies, Inc. | Modal-less interface enhancements |
US9383911B2 (en) | 2008-09-15 | 2016-07-05 | Palantir Technologies, Inc. | Modal-less interface enhancements |
US10747952B2 (en) | 2008-09-15 | 2020-08-18 | Palantir Technologies, Inc. | Automatic creation and server push of multiple distinct drafts |
US20110310039A1 (en) * | 2010-06-16 | 2011-12-22 | Samsung Electronics Co., Ltd. | Method and apparatus for user-adaptive data arrangement/classification in portable terminal |
US9160808B2 (en) * | 2010-11-17 | 2015-10-13 | Alibaba Group Holding Limited | Transmitting product information |
US20130227054A1 (en) * | 2010-11-17 | 2013-08-29 | Alibaba Group Holding Limited | Transmitting Product Information |
US9105118B2 (en) * | 2011-03-15 | 2015-08-11 | Oracle International Corporation | Galaxy views for visualizing large numbers of nodes |
US20130113820A1 (en) * | 2011-03-15 | 2013-05-09 | Oracle International Corporation | Galaxy views for visualizing large numbers of nodes |
US11392550B2 (en) | 2011-06-23 | 2022-07-19 | Palantir Technologies Inc. | System and method for investigating large amounts of data |
US10423582B2 (en) | 2011-06-23 | 2019-09-24 | Palantir Technologies, Inc. | System and method for investigating large amounts of data |
US10706220B2 (en) | 2011-08-25 | 2020-07-07 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US9880987B2 (en) | 2011-08-25 | 2018-01-30 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US11138180B2 (en) | 2011-09-02 | 2021-10-05 | Palantir Technologies Inc. | Transaction protocol for reading database values |
US9087064B2 (en) * | 2011-10-27 | 2015-07-21 | International Business Machines Corporation | User-defined hierarchies in file system data sets |
WO2013126281A1 (en) * | 2012-02-24 | 2013-08-29 | Lexisnexis Risk Solutions Fl Inc. | Systems and methods for putative cluster analysis |
US20140006382A1 (en) * | 2012-06-29 | 2014-01-02 | International Business Machines Corporation | Predicate pushdown with late materialization in database query processing |
US8856103B2 (en) * | 2012-06-29 | 2014-10-07 | International Business Machines Corporation | Predicate pushdown with late materialization in database query processing |
US8862571B2 (en) | 2012-06-29 | 2014-10-14 | International Business Machines Corporation | Predicate pushdown with late materialization in database query processing |
US9547316B2 (en) | 2012-09-07 | 2017-01-17 | Opower, Inc. | Thermostat classification method and system |
US9898335B1 (en) | 2012-10-22 | 2018-02-20 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US11182204B2 (en) | 2012-10-22 | 2021-11-23 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US10067516B2 (en) | 2013-01-22 | 2018-09-04 | Opower, Inc. | Method and system to control thermostat using biofeedback |
US10313833B2 (en) | 2013-01-31 | 2019-06-04 | Palantir Technologies Inc. | Populating property values of event objects of an object-centric data model using image metadata |
US10743133B2 (en) | 2013-01-31 | 2020-08-11 | Palantir Technologies Inc. | Populating property values of event objects of an object-centric data model using image metadata |
US9380431B1 (en) | 2013-01-31 | 2016-06-28 | Palantir Technologies, Inc. | Use of teams in a mobile application |
US10997363B2 (en) | 2013-03-14 | 2021-05-04 | Palantir Technologies Inc. | Method of generating objects and links from mobile reports |
US10817513B2 (en) | 2013-03-14 | 2020-10-27 | Palantir Technologies Inc. | Fair scheduling for mixed-query loads |
US10037314B2 (en) | 2013-03-14 | 2018-07-31 | Palantir Technologies, Inc. | Mobile reports |
US9646396B2 (en) | 2013-03-15 | 2017-05-09 | Palantir Technologies Inc. | Generating object time series and data objects |
US9852205B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | Time-sensitive cube |
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US10452678B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Filter chains for exploring large data sets |
US10977279B2 (en) | 2013-03-15 | 2021-04-13 | Palantir Technologies Inc. | Time-sensitive cube |
US9852195B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | System and method for generating event visualizations |
US10216801B2 (en) | 2013-03-15 | 2019-02-26 | Palantir Technologies Inc. | Generating data clusters |
US10482097B2 (en) | 2013-03-15 | 2019-11-19 | Palantir Technologies Inc. | System and method for generating event visualizations |
US10453229B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Generating object time series from data objects |
US9779525B2 (en) | 2013-03-15 | 2017-10-03 | Palantir Technologies Inc. | Generating object time series from data objects |
US10264014B2 (en) | 2013-03-15 | 2019-04-16 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures |
US10360705B2 (en) | 2013-05-07 | 2019-07-23 | Palantir Technologies Inc. | Interactive data object map |
US9953445B2 (en) | 2013-05-07 | 2018-04-24 | Palantir Technologies Inc. | Interactive data object map |
US10719797B2 (en) | 2013-05-10 | 2020-07-21 | Opower, Inc. | Method of tracking and reporting energy performance for businesses |
US10001792B1 (en) | 2013-06-12 | 2018-06-19 | Opower, Inc. | System and method for determining occupancy schedule for controlling a thermostat |
US9424337B2 (en) | 2013-07-09 | 2016-08-23 | Sas Institute Inc. | Number of clusters estimation |
US10699071B2 (en) | 2013-08-08 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for template based custom document generation |
US10719527B2 (en) | 2013-10-18 | 2020-07-21 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US9514200B2 (en) | 2013-10-18 | 2016-12-06 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US11100174B2 (en) | 2013-11-11 | 2021-08-24 | Palantir Technologies Inc. | Simple web search |
US10037383B2 (en) | 2013-11-11 | 2018-07-31 | Palantir Technologies, Inc. | Simple web search |
US10198515B1 (en) | 2013-12-10 | 2019-02-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US11138279B1 (en) | 2013-12-10 | 2021-10-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10230746B2 (en) | 2014-01-03 | 2019-03-12 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US10805321B2 (en) | 2014-01-03 | 2020-10-13 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US9483162B2 (en) | 2014-02-20 | 2016-11-01 | Palantir Technologies Inc. | Relationship visualizations |
EP2911100A1 (en) * | 2014-02-20 | 2015-08-26 | Palantir Technologies, Inc. | Relationship visualizations |
US10402054B2 (en) | 2014-02-20 | 2019-09-03 | Palantir Technologies Inc. | Relationship visualizations |
US10795723B2 (en) | 2014-03-04 | 2020-10-06 | Palantir Technologies Inc. | Mobile tasks |
US9495414B2 (en) * | 2014-03-11 | 2016-11-15 | Sas Institute Inc. | Cluster computation using random subsets of variables |
US20150261846A1 (en) * | 2014-03-11 | 2015-09-17 | Sas Institute Inc. | Computerized cluster analysis framework for decorrelated cluster identification in datasets |
US20160048579A1 (en) * | 2014-03-11 | 2016-02-18 | Sas Institute Inc. | Probabilistic cluster assignment |
US20160048577A1 (en) * | 2014-03-11 | 2016-02-18 | Sas Institute Inc. | Cluster computation using random subsets of variables |
US9489621B2 (en) * | 2014-03-11 | 2016-11-08 | Sas Institute Inc. | Graph based selection of decorrelated variables |
US20160048557A1 (en) * | 2014-03-11 | 2016-02-18 | Sas Institute Inc. | Graph based selection of decorrelated variables |
US9202178B2 (en) * | 2014-03-11 | 2015-12-01 | Sas Institute Inc. | Computerized cluster analysis framework for decorrelated cluster identification in datasets |
US9367602B2 (en) * | 2014-03-11 | 2016-06-14 | Sas Institute Inc. | Probabilistic cluster assignment |
US10180977B2 (en) | 2014-03-18 | 2019-01-15 | Palantir Technologies Inc. | Determining and extracting changed data from a data source |
US9727063B1 (en) | 2014-04-01 | 2017-08-08 | Opower, Inc. | Thermostat set point identification |
USD769916S1 (en) * | 2014-04-13 | 2016-10-25 | Jan Magnus Edman | Display screen with graphical user interface |
US10019739B1 (en) | 2014-04-25 | 2018-07-10 | Opower, Inc. | Energy usage alerts for a climate control device |
US10871887B2 (en) | 2014-04-28 | 2020-12-22 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US9857958B2 (en) | 2014-04-28 | 2018-01-02 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
USD792452S1 (en) * | 2014-05-14 | 2017-07-18 | Jan Magnus Edman | Display screen with graphical user interface |
US10162887B2 (en) | 2014-06-30 | 2018-12-25 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US10180929B1 (en) | 2014-06-30 | 2019-01-15 | Palantir Technologies, Inc. | Systems and methods for identifying key phrase clusters within documents |
US11341178B2 (en) | 2014-06-30 | 2022-05-24 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US9619557B2 (en) | 2014-06-30 | 2017-04-11 | Palantir Technologies, Inc. | Systems and methods for key phrase characterization of documents |
US10798116B2 (en) | 2014-07-03 | 2020-10-06 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US10929436B2 (en) | 2014-07-03 | 2021-02-23 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US9998485B2 (en) | 2014-07-03 | 2018-06-12 | Palantir Technologies, Inc. | Network intrusion data item clustering and analysis |
USD762690S1 (en) * | 2014-07-08 | 2016-08-02 | Jan Magnus Edman | Display screen with graphical user interface |
US11188929B2 (en) | 2014-08-07 | 2021-11-30 | Opower, Inc. | Advisor and notification to reduce bill shock |
US10410130B1 (en) | 2014-08-07 | 2019-09-10 | Opower, Inc. | Inferring residential home characteristics based on energy data |
USD789409S1 (en) * | 2014-08-26 | 2017-06-13 | Jan Magnus Edman | Display screen with graphical user interface |
US10866685B2 (en) | 2014-09-03 | 2020-12-15 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9454281B2 (en) | 2014-09-03 | 2016-09-27 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9880696B2 (en) | 2014-09-03 | 2018-01-30 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9501851B2 (en) | 2014-10-03 | 2016-11-22 | Palantir Technologies Inc. | Time-series analysis system |
US10360702B2 (en) | 2014-10-03 | 2019-07-23 | Palantir Technologies Inc. | Time-series analysis system |
US10664490B2 (en) | 2014-10-03 | 2020-05-26 | Palantir Technologies Inc. | Data aggregation and analysis system |
US9767172B2 (en) | 2014-10-03 | 2017-09-19 | Palantir Technologies Inc. | Data aggregation and analysis system |
US11004244B2 (en) | 2014-10-03 | 2021-05-11 | Palantir Technologies Inc. | Time-series analysis system |
US9984133B2 (en) | 2014-10-16 | 2018-05-29 | Palantir Technologies Inc. | Schematic and database linking system |
US11275753B2 (en) | 2014-10-16 | 2022-03-15 | Palantir Technologies Inc. | Schematic and database linking system |
US10135863B2 (en) | 2014-11-06 | 2018-11-20 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9558352B1 (en) | 2014-11-06 | 2017-01-31 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US10728277B2 (en) | 2014-11-06 | 2020-07-28 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US10033184B2 (en) | 2014-11-13 | 2018-07-24 | Opower, Inc. | Demand response device configured to provide comparative consumption information relating to proximate users or consumers |
US10447712B2 (en) | 2014-12-22 | 2019-10-15 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9589299B2 (en) | 2014-12-22 | 2017-03-07 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US10157200B2 (en) | 2014-12-29 | 2018-12-18 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US10552998B2 (en) | 2014-12-29 | 2020-02-04 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US9870389B2 (en) | 2014-12-29 | 2018-01-16 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US11093950B2 (en) | 2015-02-02 | 2021-08-17 | Opower, Inc. | Customer activity score |
US10198483B2 (en) | 2015-02-02 | 2019-02-05 | Opower, Inc. | Classification engine for identifying business hours |
US10074097B2 (en) | 2015-02-03 | 2018-09-11 | Opower, Inc. | Classification engine for classifying businesses based on power consumption |
US10371861B2 (en) | 2015-02-13 | 2019-08-06 | Opower, Inc. | Notification techniques for reducing energy usage |
US9727560B2 (en) | 2015-02-25 | 2017-08-08 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10474326B2 (en) | 2015-02-25 | 2019-11-12 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10459619B2 (en) | 2015-03-16 | 2019-10-29 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US9891808B2 (en) | 2015-03-16 | 2018-02-13 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
USD760761S1 (en) | 2015-04-07 | 2016-07-05 | Domo, Inc. | Display screen or portion thereof with a graphical user interface |
US9804666B2 (en) | 2015-05-26 | 2017-10-31 | Samsung Electronics Co., Ltd. | Warp clustering |
US10817789B2 (en) | 2015-06-09 | 2020-10-27 | Opower, Inc. | Determination of optimal energy storage methods at electric customer service points |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US10223748B2 (en) | 2015-07-30 | 2019-03-05 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US11501369B2 (en) | 2015-07-30 | 2022-11-15 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US9958360B2 (en) | 2015-08-05 | 2018-05-01 | Opower, Inc. | Energy audit device |
US10484407B2 (en) | 2015-08-06 | 2019-11-19 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US10444941B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10444940B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US11150917B2 (en) | 2015-08-26 | 2021-10-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US11934847B2 (en) | 2015-08-26 | 2024-03-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US10706434B1 (en) | 2015-09-01 | 2020-07-07 | Palantir Technologies Inc. | Methods and systems for determining location information |
US10296617B1 (en) | 2015-10-05 | 2019-05-21 | Palantir Technologies Inc. | Searches of highly structured data |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10559044B2 (en) | 2015-11-20 | 2020-02-11 | Opower, Inc. | Identification of peak days |
US10678860B1 (en) | 2015-12-17 | 2020-06-09 | Palantir Technologies, Inc. | Automatic generation of composite datasets based on hierarchical fields |
US20170185666A1 (en) * | 2015-12-28 | 2017-06-29 | Facebook, Inc. | Aggregated Broad Topics |
US10459950B2 (en) * | 2015-12-28 | 2019-10-29 | Facebook, Inc. | Aggregated broad topics |
US10540061B2 (en) | 2015-12-29 | 2020-01-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US9823818B1 (en) | 2015-12-29 | 2017-11-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US10437612B1 (en) | 2015-12-30 | 2019-10-08 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US10575790B2 (en) * | 2016-03-02 | 2020-03-03 | Roche Diabetes Care, Inc. | Patient diabetes monitoring system with clustering of unsupervised daily CGM profiles (or insulin profiles) and method thereof |
US20170251980A1 (en) * | 2016-03-02 | 2017-09-07 | Roche Diabetes Care, Inc. | Patient diabetes monitoring system with clustering of unsupervised daily cgm profiles (or insulin profiles) and method thereof |
US10437649B2 (en) * | 2016-03-11 | 2019-10-08 | Intel Corporation | Task mapping for heterogeneous platforms |
US10977093B2 (en) | 2016-03-11 | 2021-04-13 | Intel Corporation | Declarative properties for data collections |
US10684891B2 (en) | 2016-03-11 | 2020-06-16 | Intel Corporation | Memory operand descriptors |
US10474510B2 (en) | 2016-03-11 | 2019-11-12 | Intel Corporation | Declarative properties for data collections |
US10572305B2 (en) | 2016-03-11 | 2020-02-25 | Intel Corporation | Multi-grained memory operands |
US10698938B2 (en) | 2016-03-18 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10324609B2 (en) | 2016-07-21 | 2019-06-18 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10698594B2 (en) | 2016-07-21 | 2020-06-30 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10719188B2 (en) | 2016-07-21 | 2020-07-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US20180240256A1 (en) * | 2017-02-23 | 2018-08-23 | Wipro Limited. | Method and system for processing input data for display in an optimal visualization format |
US10628978B2 (en) * | 2017-02-23 | 2020-04-21 | Wipro Limited | Method and system for processing input data for display in an optimal visualization format |
US11599369B1 (en) | 2018-03-08 | 2023-03-07 | Palantir Technologies Inc. | Graphical user interface configuration system |
US10885021B1 (en) | 2018-05-02 | 2021-01-05 | Palantir Technologies Inc. | Interactive interpreter and graphical user interface |
US11016988B1 (en) | 2018-12-19 | 2021-05-25 | Airspeed Systems LLC | Matched array flight alignment system and method |
US10803085B1 (en) | 2018-12-19 | 2020-10-13 | Airspeed Systems LLC | Matched array airspeed and angle of attack alignment system and method |
US10896529B1 (en) | 2018-12-19 | 2021-01-19 | EffectiveTalent Office LLC | Matched array talent architecture system and method |
US10769825B1 (en) | 2018-12-19 | 2020-09-08 | EffectiveTalent Office LLC | Matched array talent alignment system and method |
US10657684B1 (en) | 2018-12-19 | 2020-05-19 | EffectiveTalent Office LLC | Matched array alignment system and method |
US11508103B2 (en) | 2018-12-19 | 2022-11-22 | EffectiveTalent Office LLC | Matched array general talent architecture system and method |
US11010941B1 (en) | 2018-12-19 | 2021-05-18 | EffectiveTalent Office LLC | Matched array general talent architecture system and method |
US11630836B2 (en) | 2018-12-19 | 2023-04-18 | Airspeed Systems LLC | Matched array flight alignment system and method |
US11010940B2 (en) | 2018-12-19 | 2021-05-18 | EffectiveTalent Office LLC | Matched array alignment system and method |
US20210396585A1 (en) * | 2019-03-06 | 2021-12-23 | Electric Pocket Limited | Thermal quality mappings |
US11403268B2 (en) * | 2020-08-06 | 2022-08-02 | Sap Se | Predicting types of records based on amount values of records |
Also Published As
Publication number | Publication date |
---|---|
JP2003532943A (en) | 2003-11-05 |
WO2001024060A9 (en) | 2002-12-05 |
WO2001024060A2 (en) | 2001-04-05 |
CA2385836A1 (en) | 2001-04-05 |
EP1323069A2 (en) | 2003-07-02 |
US6990238B1 (en) | 2006-01-24 |
JP2010282655A (en) | 2010-12-16 |
US20060106783A1 (en) | 2006-05-18 |
WO2001024060A3 (en) | 2003-04-17 |
AU7741300A (en) | 2001-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060093222A1 (en) | Data processing, analysis, and visualization system for use with disparate data types | |
US9984484B2 (en) | Computer-implemented system and method for cluster spine group arrangement | |
Hearst et al. | Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy | |
Minghim et al. | Content-based text mapping using multi-dimensional projections for exploration of document collections | |
US20170249369A1 (en) | Systems and methods for ranking data visualizations using different data fields | |
Fried et al. | Maps of computer science | |
Becker | Visualizing decision table classifiers | |
Keim et al. | Using visualization to support data mining of large existing databases | |
Sannakki et al. | Memory learning framework for retrieval of neural objects | |
Johnson | Methods for domain-specific information retrieval | |
Mandreoli et al. | Text clustering as a mining task | |
Aono et al. | Text document cluster analysis through visualization of 3D projections | |
Ontrup et al. | Interactive information retrieval as a step towards effective knowledge management in healthcare | |
Badr et al. | Automatic image description based on textual data | |
Karadi | Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy | |
Mohamed et al. | Evaluation of Partitional Algorithms for Clustering Medical Documents | |
Gregory et al. | Shape Search in Temporal Data to Facilitate Knowledge Discovery: A User Interface to Find Spikes, Sinks and Slopes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |