EP1483688A1 - Procedes et appareil d'analyse de donnees statistiques - Google Patents
Procedes et appareil d'analyse de donnees statistiquesInfo
- Publication number
- EP1483688A1 EP1483688A1 EP02791310A EP02791310A EP1483688A1 EP 1483688 A1 EP1483688 A1 EP 1483688A1 EP 02791310 A EP02791310 A EP 02791310A EP 02791310 A EP02791310 A EP 02791310A EP 1483688 A1 EP1483688 A1 EP 1483688A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- store
- rdf
- triples
- epoch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000007405 data analysis Methods 0.000 title description 2
- 230000009467 reduction Effects 0.000 claims abstract description 12
- 238000013459 approach Methods 0.000 claims abstract description 4
- 230000004044 response Effects 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 9
- 230000009183 running Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 5
- 239000012634 fragment Substances 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000007418 data mining Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
- G06F16/86—Mapping to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
Definitions
- the invention pertains to digital data processing and, more particularly, to methods and apparatus for enterprise business visibility and insight using real-time reporting tools.
- ERP enterprise resource planning
- a major impediment to enterprise business visibility is the consolidation of data from these disparate legacy databases with one another and with that from newer e-commerce databases.
- inventory on-hand data gleaned from a legacy ERP system may be diffi- cult to combine with customer order data gleaned from web servers that support e-commerce (and other web-based) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the marketing database system.
- An object of this invention is to provide improved methods and apparatus for digital data processing and, more particularly, for enterprise business visibility and insight (hereinafter, "enterprise business visibility").
- a further object is to provide such methods and apparatus as can rapidly and accurately retrieve information responsive to user inquiries.
- a further object of the invention is to provide such methods and apparatus as can be readily and inexpensively integrated with legacy, current and future database management systems.
- a still further object of the invention is to provide such methods and apparatus as can be implemented incrementally or otherwise without interruption of ente ⁇ rise operation.
- Yet a still further object of the invention is to provide such methods and apparatus as to facilitate ready access to up-to-date enterprise data, regardless of its underlying source.
- Yet still a further object of the invention is to provide such methods and apparatus as permit flexible presentation of enterprise data in an easily understood manner.
- the aforementioned are among the objects attained by the invention, one aspect of which provides a method of time-wise data reduction that includes the steps of inputting data from a source; summarizing that data according to one or more selected epochs in which it belongs; and generating for each such selected epoch one or more RDF triples characterizing the summarized data.
- the data source may be, for example, a database, a data stream or otherwise.
- the selected epoch may be a second, minute, hour, week, month, year, or so forth.
- RDF triples in the form of RDF document objects.
- RDF document objects can be stored, for example, in a hierarchical data store such as, for example, a WebDAV server.
- Still further related aspects of the invention provide for parsing triples from the RDF document objects and storing them in a relational data store.
- a further related aspect of the invention provides for storing the triples in a relational store that is organized according to a hashed with origin approach.
- Still yet other aspects of the invention provide for retrieving information represented by the triples in the hierarchical and/or relational data stores, e.g., for presentation to a user.
- Related aspects of the invention provide for retrieving triples containing time-wise reduced data, e.g., for presentation to a user.
- Related aspects of the invention provide methods as described above including a sum- marizing the input data according to one or more epochs of differing length. Further aspects of the invention provide methods as described above including querying the source, e.g., a legacy database, in order to obtain the input data. Related aspects of the invention provides for generating such queries in SQL format.
- Still other aspects of the invention provide methods as described above including the step of inputting an XML file that identifies one or more sources of input data, one or more fields thereof to be summarized in the time-wise reduction, and/or one or more epochs for which those fields are to be summarized.
- Further aspects of the invention provide methods as described above including responding to an input datum by updating summary data for an epoch of the shortest duration, e.g., a store of per day data.
- Related aspects of the invention provide for updating a store of summary data for epochs of greater duration, e.g., stores of per week or per month data, from summary data maintained in a store for an epoch of lesser duration, e.g., a store of per day data.
- Figure 1 depicts an improved enterprise business visibility and insight system according invention
- Figure 1 A depicts an architecture for a hologram data store according to the invention, e.g., in the system of claim 1;
- Figure IB depicts the tables in a model store and a triples store of the hologram data store of Figure 1A;
- Figure 2 depicts a directed graph representing data triples of the type maintained in a data store according to the invention.
- Figure 3 is a functional block diagram of a time-wise data reduction module in a system according to the module.
- FIG. 1 depicts a real-time enterprise business visibility and insight system according to the invention.
- the illustrated system 100 includes connectors 108 that provide software interfaces to legacy, e-commerce and other databases 140 (hereinafter, collectively, “legacy databases”).
- a “hologram” database 114 (hereinafter, “data store” or “hologram data store”), which is coupled to the legacy databases 140 via the connectors 108, stores data from those databases 140.
- a framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118.
- the server 116 can also permit updates to data in the data store 114 and, thereby, in the legacy databases 140.
- Legacy databases 140 represent existing (and future) databases and other sources of information (including data streams) in a company, organization or other entity (hereinafter
- databases 140 include a retail e-commerce database (e.g., as indicated by the cloud and server icons adjacent database 140c) maintained with a Sybase® database management system, an inventory database maintained with an Oracle® database management system and an ERP database maintained with a SAP® Enterprise Resource Planning system.
- a retail e-commerce database e.g., as indicated by the cloud and server icons adjacent database 140c
- an inventory database maintained with an Oracle® database management system
- an ERP database maintained with a SAP® Enterprise Resource Planning system.
- SAP® Enterprise Resource Planning system SAP® Enterprise Resource Planning
- Connectors 108 serve as an interface to legacy database systems 140. Each connector applies requests to, and receives information from, a respective legacy database, using that database's API or other interface mechanism. Thus, for example, connector 108a applies requests to legacy database 140a using the corresponding SAP API; connector 108b, to legacy database 140b using Oracle API; and connector 108c, to legacy database 140c using the corresponding Sybase API.
- these requests are for purposes of accessing data stored in the respective databases 140.
- the requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining.
- one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms known in the art of data mining to extract information from the databases.
- Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors.
- the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
- the requests can be stored in the connectors 108 for application and/or reapplication to the respective legacy databases 108 to provide one-time or periodic data store updates.
- Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
- Data and other information generated by the databases 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectival data for use by the data store in caching, purging, updating and selecting data.
- the messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
- the hologram data store 114 stores data from the legacy databases 140 (and from the framework server 116, as discussed below) as RDF triples.
- the data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 and the framework server 116.
- the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
- the hologram data store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the legacy databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user and the capabilities of the framework server 116.
- sources e.g., the legacy databases 140
- a preferred data store 114 stores the data from the legacy databases 140 in subject-predicate-object form, e.g., RDF triples, though those of ordinary skill in the art will appreciate that other forms may be used as well, or instead.
- RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their properties are referred to as predicates. And, the values of those properties are referred to as objects.
- an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
- Subjects also referred to as resources, can be anything that is described by an RDF expression.
- a subject can be person, place or thing — though, typically, only an identifier of the subject is used in an actual RDF expression, not the person, place or thing itself. Examples of subjects might be "car,” “Joe,” “http://www.metatomix.com.”
- a predicate identifies a property of a subject. According to the RDF specification, this may be any "specific aspect, characteristic, attribute, or relation used to describe a resource.” For the three exemplary subjects above, examples of predicates might be "make,” “citizenship,” “owner.”
- - ⁇ Objects can be literals, i.e., strings that identify or name the corresponding property
- a given subject may have multiple predicates, each predicate indexing an object.
- a subject postal zip code might have an index to an object town and an index to an object state, either (or both) index being a predicate URL
- RDF triples here, expressed in extensible markup language (XML) syntax.
- XML extensible markup language
- the listing shows only a sampling of the triples in a database 114, which typically would contain tens of thousands or more of such triples.
- Subjects are indicated within the listing using a "rdf:about” statement.
- the second line of the listing defines a subject as a resource named "postal://zip#02886.” That subject has predicates and objects that follow the subject declaration.
- the subjects and predicates are expressed as uniform resource indicators (URTs), e.g., of the type defined in Berniers-Lee et al, Uniform Resource Identifiers fURF): Generic Syntax (RFC 2396) (March 1998), and can be said to be expressed in a form ⁇ scheme>:// ⁇ path># ⁇ fragment>.
- UTRs uniform resource indicators
- ⁇ scheme> is "postal”
- ⁇ path> is "zip”
- ⁇ f ⁇ agment> is, for example, "02886" and "02901."
- predicates are expressed in the form ⁇ scheme>:// ⁇ path># ⁇ fragment>, as is evident to those in ordinary skill in the art.
- predicates that are formally expressed as: "http://www.metatomix.com/postalCode/ 1.0#town,” "http://www.metatomix.eom/postalCode/l .0#state,” "http://www.metatomix.com/ postalCode/1.0#country” and "http://www.metatomix.eom/postalCode/l .0#zip.”
- the ⁇ scheme> for the predicates is "http” and ⁇ path> is "www.metatomix.com/ postalCode/1.0.”
- the ⁇ fragment> portions are ⁇ town>, ⁇ state>, ⁇ country> and ⁇ zip>, respectively.
- Figure 2 depicts a directed graph composed of RDF triples of the type stored by the illustrated data store 114, here, by way of non-limiting example, triples representing relationships among four companies (id#l, id#2, id#3 and id#4) and between two of those companies (id#l and id#2) and their employees.
- terms and resource-type objects are depicted as oval-shaped nodes; literal-type objects are depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
- Figure 1A depicts an architecture for a preferred hologram data store 114 according to the invention.
- the illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
- RDF triples maintained by the store 114 are received ⁇ from the legacy databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) ⁇ in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application.
- DOM Document Object Model
- these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
- the model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A.
- the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store U4A in accord with the WebDAV protocol.
- the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store U4A in accord with the WebDAV protocol.
- Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry- standard, proprietary or otherwise.
- WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150.
- WebDAV client tools e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors
- This also allows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
- RDF triples received by the store 114 are also stored to a relational database, here, store
- RDBMS relational database management system
- the triples are divided into their constituent components (subject, predicate, and object), which are indexed and stored to respective tables in the manner of a "hashed with origin" approach.
- a parser 114E extracts its triples and conveys them to the RDBMS 114D with a corresponding indicator that they are to be added, updated or deleted from the relational database.
- Such a parser 114E operates in the conventional manner known in the art for extracting triples from RDF documents.
- the illustrated database store 114C has five tables interrelated as particularly shown in
- Figure IB (see dashed box labelled 114C).
- these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algorithm based on cyclical redundancy codes (CRCs) -- though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or other- wise.
- CRCs cyclical redundancy codes
- the "triples" table 534 maintains one record for each stored triple.
- Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag (“resource_flg”) indicating whether that object is of the resource or literal type.
- Each record also includes an aforementioned hash code (“mjiash”) identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
- the values of the subjects, predicates and objects are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536.
- the resources table 530 in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
- the resources table 530 maintains one record for each unique subject, predicate or resource-type object. Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed.
- r_value contained in each record of the resources table 530 reflects only the unique portion (e.g., ⁇ fragment> identifier) of each resource.
- the namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (hereinafter, "namespace"). Each record contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
- the literals table 536 maintains one record for each unique literal-type object. Each record contains the value of the object, along with its aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so forth). Again, it is the latter on which this table is indexed.
- the models table 538 maintains one record for each RDF document object contained in the model document store 114A.
- Each record contains the URJ of the corresponding document object ("uri_string”), along with its aforementioned 64-bit hash ("m_hash"). It is the latter on which this table is indexed.
- uri_string the URJ of the corresponding document object
- m_hash 64-bit hash
- each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
- relational triples store 114C is a schema- less structure for storing RDF triples.
- triples maintained in that store can be reconstituted via an SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal://zip#02886", apredicate equal to "http://www.metatomix.com/ postalCode/1.0#town", and an object equal to "Warwick”, the following SQL statement is applied:
- RDF documents and, more generally, objects maintained in the store 114 can be contained in other stores - structured relation- ally, hierarchically or otherwise ⁇ as well, in addition to or instead of stores 1 14A and 114C.
- time-wise data reduction component 150 comprises an XML parser 504, a query module 506, an analysis module 507 and an output module 508.
- the component 150 performs a time-wise reduction on data from the legacy databases 140. In some embodiments, that data is supplied to the component 150 by the connectors 108 in the form of RDF documents. In the illustrated embodiment, the component 150 functions, in part, like a connector itself — obtaining data directly from the legacy databases 140 before time-wise reducing it.
- illustrated component 150 outputs the reduced data in the form of RDF triples contained in RDF documents.
- these are stored in the model store 114A (and the underlying triples, in relational store 114C), alongside the RDF documents (and their respective underlying triples) from which the reduced data was gener- ated. This facilitates, for example, reporting of the time-wise reduced data, e.g., by the framework server 116, since that data is readily available for display to the user and does not require ad hoc generation of data summaries in response to user requests.
- Module 504 parses an XML file 502 which specifies one or more sources of data to be time-wise reduced. That file may be supplied by the framework server 116, or otherwise.
- the specified sources may be legacy databases, data streams, or otherwise 140. They may also be connectors 108, e.g., identified by symbolic name, virtual port number, or otherwise.
- the XML specification file 502 specifies the data items which are to be time-wise reduced. These can be field names, identifiers or otherwise.
- the XML file 502 further specifies the time periods or epochs over which data is to be time-wise reduced. These can be seconds, minutes, hours, days, months, weeks, years, and so forth, depending on the type of data to be reduced. For example, if the data source contains hospital patient data, the specified epochs may be weeks and months; whereas, if the data source contains web site access data, the specified epochs may be hours and days.
- the parser component 504 parses the XML file 502 to discern the aforementioned data source identifiers, field identifiers, and epochs. To this end, the parser 504 may be constructed and operated in the conventional manner known in the art.
- the query module 506 generates queries in order to obtain the field specified in the XML specification file 502. It queries the identified data source(s) in the manner appropriate to those sources. For example, the processing module 510 queries SQL-compatible databases using an SQL query. Other data sources are queried via their respective applications program interfaces (APIs), or otherwise. In embodiments where source data is supplied to the component 150 by the connectors 108, querying may be performed explicitly or implicitly by those connectors 108. Moreover, querying might not need to be performed on some data sources, e.g., data streams, from which data is broadcast or otherwise available without the need for request. In such instances, filtering may be substituted for querying in order that the specific fields or other items of data specified in the XML file are obtained.
- APIs applications program interfaces
- the analysis module 507 compiles time-wise statistics or summaries for each epoch specified in the XML file 502. To this end, it maintains for each such epoch one or more run- ning statistics (e.g., sums or averages) for each data field specified by the file 502 and received from the sources. As datum for each field are input, the running statistics for that field are updated. Such updating can include incrementing a count maintained for the field, recomput- ing a numerical total, modifying a concatenated string, and so forth, as appropriate to the type of the underlying field data.
- run- ning statistics e.g., sums or averages
- the analysis module 507 would maintain a store reflecting the number of hits thus far counted on a given day for that web site (e.g., based on data received from a source identifying each hit as it occurs, or otherwise).
- the module When no further data is received from the source for that day, the module generates RDF output (via the output module 508) reflecting that number of counts (or other specified summary information) for output to the hologram store 114.
- the analysis module 507 would maintain a separate store of counts for the month for which data is currently being received from the source. As above, when no further data is received from the source for that month, the module generates RDF output reflecting the total number of counts (or other specified summary information) for output to the hologram store 114.
- An analysis module 507 maintains stores for each epoch for which running statistics (.i.e., time-wise summaries) are to be maintained.
- the stores 514 can be allocated from an array, a pointer table or other data structure, with specific allocations made depending on the specific number of running statistics being tracked.
- an XML file 502 specifies that access statistics are to be maintained for a web site on daily and monthly bases using data from a first data source, and that running statistics for the numbers of visitors to a retail store are to be maintained on monthly and yearly bases from data from a second data source
- the analysis module 507 can maintain four stores: store 514A maintaining a daily count for the web site; store 514B maintaining a monthly count for the web site; store 514C maintaining a monthly account for the retail store; and store 514D maintaining a yearly count for the retail store.
- Each of the stores 514 is updated as corresponding data is received from the respective data sources.
- a count maintained in the first store 514A is incremented.
- the output module 508 can generate one or more RDF triples reflecting a count for the (then-complete) prior day for storage in the hologram store 114.
- the store 514A can be reset to zero and the process restarted for tracking accesses on that succeeding day.
- the second store 514B i.e., that tracking the longer epoch for data from the first source, can be incremented in parallel with the first store 514A as web access data is received from the source or, alternatively, can be updated when the first store 514A is rolled over, i.e. reset for tracking statistics for each successive day.
- RDF triples can be generated to reflect web access statistics for the then- completed prior month, concurrently with zeroing the second store 514B for tracking of statistics for the succeeding month.
- the analysis module 507 maintains running statistics for the epochs specified in the XML file 502, outputting RDF triples reflecting those statistics as data for each successive epoch is received.
- running statistics may be maintained in other ways, as well. For example, continuing the above example, in instances where data received from the first source is not received ordered by day (but, rather, is intermingled with respect to many days), multiple stores can be maintained ⁇ one for each day (or other epoch).
- the output module 508 generates RDF documents reflect- ing the summarized data stored in stores 514 for output to the hologram data store 114.
- This can be performed by generating and RDF stream ad hoc or, preferably, by utilizing native commands, e.g., of the Java programming language, to gather the epoch data into a document object model (DOM).
- DOM document object model
- the DOM can be output in RDF format to the hologram store 114 directly.
- the store 114 supports a SQL-like query languages called HxQL and HxML. This allows retrieval of RDF triples matching defined criteria.
- the data store 114 includes a graph generator (not shown) that uses RDF triples to generate directed graphs in response to queries (e.g., in HxQL or HxML form) from the framework server 116. These may be queries for information reflected by triples originating from data in one or more of the legacy databases 140 (one example might be a request for the residence cities of hotel guests who booked reservations on account over Independence Day weekend, as reflected by data from an e-Commerce database and an Accounts Receivable database).
- queries e.g., in HxQL or HxML form
- queries e.g., in HxQL or HxML form
- queries e.g., in HxQL or HxML form
- queries e.g., in HxQL or HxML form
- queries e.g., in HxQL or HxML form
- queries e.g., in HxQL or HxML form
- These may be queries for information reflected by triples originating from data in
- the data store 114 utilizes genetic, self- adapting, algorithms to traverse the RDF triples in response to queries from the framework server 116.
- genetic, self-adapting, algorithms can be beneficially applied to the RDF database which, due to its inherently flexible (i.e., schema-less) structure, is not readily searched using traditional search techniques.
- the data store utilizes a genetic algorithm that performs several searches, each utilizing a different methodol- ogy but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results and reapplies that search with additional terms or further granularity.
- the framework server 116 generates requests to the data store 114 (and/or indirectly to the legacy databases via connectors 108, as discussed above) and presents information therefrom to the user via browser 118.
- the requests can be based on
- HxQL or HxML requests entered directly by the user though are generated by the server 116 based on user selections/responses to questions, dialog boxes or other user-input controls.
- the framework server includes one or more user interface modules, plug-ins, or the like, each for generating queries of a particular nature.
- One such module for example, generates queries pertaining to marketing information, another such module generates queries pertaining to financial information, and so forth.
- queries to the data store are structured on a SQL based RDF query language, in the general manner of SquishQL, as known in the art.
- the framework server In addition to generating queries, the framework server (and/or the aforementioned modules) "walks" directed graphs generated by the data store 114 to present to the user (via browser 118) any specific items of requested information. Such walking of the directed graphs can be accomplished via any conventional technique known in the art. Presentation of questions, dialog boxes or other user-input controls to the user and, likewise, presentation of responses thereto based on the directed graph can be accomplished via conventional server/ browser or other user interface technology.
- the framework server 116 permits a user to update data stored in the data store 114 and, thereby, that stored in the legacy databases 140.
- changes made to data displayed by the browser 1 18 are transmitted by server 116 to data store 114.
- any triples implicated by the change are updated in store 114C, as are the corresponding RDF document objects in store 114A.
- An indication of these changes can be forwarded to the respective legacy databases 140, which utilize the corresponding API (or other interface mechanisms) to update their respective stores.
- changes made directly to the store 114C as discussed above, e.g., using a WebDAV client can be forwarded to the respective legacy database.
- the server 116 can present to the user not only data from the data store 114, but also data gleaned by the server directly from other sources.
- the server 116 can directly query an enterprise web site for statistics regarding web page usage, or otherwise.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US33205301P | 2001-11-21 | 2001-11-21 | |
US33221901P | 2001-11-21 | 2001-11-21 | |
US332053P | 2001-11-21 | ||
US332219P | 2001-11-21 | ||
PCT/US2002/037727 WO2003046769A1 (fr) | 2001-11-21 | 2002-11-21 | Procedes et appareil d'analyse de donnees statistiques |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1483688A1 true EP1483688A1 (fr) | 2004-12-08 |
Family
ID=26988039
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP02791310A Withdrawn EP1483688A1 (fr) | 2001-11-21 | 2002-11-21 | Procedes et appareil d'analyse de donnees statistiques |
EP02784576A Withdrawn EP1546921A2 (fr) | 2001-11-21 | 2002-11-21 | Procedes et appareil permettant d'interroger une memoire de donnees relationnelles a l'aide d'interrogations sans schema |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP02784576A Withdrawn EP1546921A2 (fr) | 2001-11-21 | 2002-11-21 | Procedes et appareil permettant d'interroger une memoire de donnees relationnelles a l'aide d'interrogations sans schema |
Country Status (4)
Country | Link |
---|---|
EP (2) | EP1483688A1 (fr) |
AU (2) | AU2002365577A1 (fr) |
CA (2) | CA2471467A1 (fr) |
WO (2) | WO2003044634A2 (fr) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9460129B2 (en) | 2013-10-01 | 2016-10-04 | Vmware, Inc. | Method for tracking a schema in a schema-less database |
US8458191B2 (en) | 2010-03-15 | 2013-06-04 | International Business Machines Corporation | Method and system to store RDF data in a relational store |
US10353966B2 (en) | 2015-11-19 | 2019-07-16 | BloomReach, Inc. | Dynamic attributes for searching |
CN108762915B (zh) * | 2018-04-19 | 2020-11-06 | 上海交通大学 | 一种在gpu内存中缓存rdf数据的方法 |
CN113836316B (zh) * | 2021-09-23 | 2023-01-03 | 北京百度网讯科技有限公司 | 三元组数据的处理方法、训练方法、装置、设备及介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5907837A (en) * | 1995-07-17 | 1999-05-25 | Microsoft Corporation | Information retrieval system in an on-line network including separate content and layout of published titles |
US5822780A (en) * | 1996-12-31 | 1998-10-13 | Emc Corporation | Method and apparatus for hierarchical storage management for data base management systems |
US20020049788A1 (en) * | 2000-01-14 | 2002-04-25 | Lipkin Daniel S. | Method and apparatus for a web content platform |
-
2002
- 2002-11-21 WO PCT/US2002/037729 patent/WO2003044634A2/fr not_active Application Discontinuation
- 2002-11-21 CA CA002471467A patent/CA2471467A1/fr not_active Abandoned
- 2002-11-21 EP EP02791310A patent/EP1483688A1/fr not_active Withdrawn
- 2002-11-21 CA CA002471468A patent/CA2471468A1/fr not_active Abandoned
- 2002-11-21 WO PCT/US2002/037727 patent/WO2003046769A1/fr not_active Application Discontinuation
- 2002-11-21 AU AU2002365577A patent/AU2002365577A1/en not_active Abandoned
- 2002-11-21 AU AU2002346510A patent/AU2002346510A1/en not_active Abandoned
- 2002-11-21 EP EP02784576A patent/EP1546921A2/fr not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO03046769A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2003046769A1 (fr) | 2003-06-05 |
CA2471468A1 (fr) | 2003-06-05 |
CA2471467A1 (fr) | 2003-05-30 |
AU2002346510A8 (en) | 2003-06-10 |
EP1546921A2 (fr) | 2005-06-29 |
WO2003044634A2 (fr) | 2003-05-30 |
WO2003044634A3 (fr) | 2003-12-11 |
AU2002346510A1 (en) | 2003-06-10 |
AU2002365577A1 (en) | 2003-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7302440B2 (en) | Methods and apparatus for statistical data analysis and reduction for an enterprise application | |
US10275540B2 (en) | Methods and apparatus for querying a relational data store using schema-less queries | |
US6856992B2 (en) | Methods and apparatus for real-time business visibility using persistent schema-less data storage | |
US6826557B1 (en) | Method and apparatus for characterizing and retrieving query results | |
US7805465B2 (en) | Metadata management for a data abstraction model | |
US7673065B2 (en) | Support for sharing computation between aggregations in a data stream management system | |
US7606791B2 (en) | Internal parameters (parameters aging) in an abstract query | |
CN103460208B (zh) | 用于将数据加载到时态数据仓库的方法和系统 | |
US8521867B2 (en) | Support for incrementally processing user defined aggregations in a data stream management system | |
Snodgrass et al. | Aggregates in the temporal query language TQuel | |
US20130110766A1 (en) | Method for performing transactions on data and a transactional database | |
US20070130171A1 (en) | Techniques for implementing indexes on columns in database tables whose values specify periods of time | |
US20100250574A1 (en) | User dictionary term criteria conditions | |
US20240119071A1 (en) | Relationship-based display of computer-implemented documents | |
WO2003046769A1 (fr) | Procedes et appareil d'analyse de donnees statistiques | |
US11347804B2 (en) | Methods and apparatus for querying a relational data store using schema-less queries | |
US7203677B1 (en) | Creation of duration episodes from single time events | |
Chandrasekaran | Query processing over live and archived data streams | |
Lee et al. | Query optimization for web BBS by analytic function and function-based index in oracle DBMS | |
Parsian | Exploring the Statement | |
Brooke III | A Web-enabled temporal database human resources application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20041004 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: GREENBLATT, HOWARD Inventor name: BIGWOOD, DAVID Inventor name: KUMAR, ASHOK Inventor name: BRITTON, COLIN, P. |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20070531 |