US20140330859A1 - Systems and methods for information retrieval - Google Patents

Systems and methods for information retrieval Download PDF

Info

Publication number
US20140330859A1
US20140330859A1 US14/276,118 US201414276118A US2014330859A1 US 20140330859 A1 US20140330859 A1 US 20140330859A1 US 201414276118 A US201414276118 A US 201414276118A US 2014330859 A1 US2014330859 A1 US 2014330859A1
Authority
US
United States
Prior art keywords
query
data
clause
model
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/276,118
Inventor
Aaron N. Bawcom
Christopher M. Madden
Michael C. Wronski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
StrataCloud Inc
Original Assignee
Reflex Systems LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Reflex Systems LLC filed Critical Reflex Systems LLC
Priority to US14/276,118 priority Critical patent/US20140330859A1/en
Assigned to STRATACLOUD, INC. reassignment STRATACLOUD, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Reflex Systems, LLC
Publication of US20140330859A1 publication Critical patent/US20140330859A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • G06F17/30424
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases

Definitions

  • This invention relates generally to computer system data retrieval techniques.
  • Information retrieval systems are known in the art. Such systems generally offer users a variety of means of expressing user intentions through queries. These include text search, parametric search, structured queries, selection from alternatives (i.e., browsing or navigation), and range specification. In general, the systems offer users a means of expressing queries using either a structured language (e.g., a language like SQL) or an informal input mechanism (e.g., English keyword search). When the input mechanism is informal, the problems of ambiguity may arise from the language itself. But, even when the input mechanism is formal, the user may not always succeed in expressing his or her intention in the formal query language.
  • a structured language e.g., a language like SQL
  • an informal input mechanism e.g., English keyword search
  • IT infrastructures can become large and complex, and this creates a need to provide administrators and others with the ability to query, display, and manipulate their operation in a timely manner.
  • both experienced and inexperienced users need the ability to quickly find data within the IT infrastructure; likewise, external software programs that can be executed from a programmatic environment need the ability to quickly find specific data based on complex conditions and then output results in an object-oriented structured form.
  • the subject matter of this disclosure provides for enhanced information retrieval, e.g., in an IT infrastructure information retrieval system.
  • the techniques provide the ability to produce query results based on codification of IT infrastructure with complex relationships, even where the user has only minimal knowledge of the environment.
  • the information retrieval system implements a search language, through which a querying entity (e.g., a user, a program or process, or the like) formulates a search query.
  • a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations applied to it.
  • Each clause includes a clause pipeline, and a time constraint.
  • a clause pipeline includes an ordered set of clause specifications separated by a pipeline operator.
  • a clause specification can be either an expansion operation or a filtering operation.
  • a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification.
  • the search language is exposed to users (typically, IT administrators), and one or more builder programs within the system (each referred to as a “model builder”) are used internally to present data models to the search language.
  • a model builder extracts data from a given type of data source (including, without limitation, a relational database system, an application programming interface (API), or the like), and enables that data to be presented to one or more constructs of the search language according to a single unified data model.
  • the model builder functions as an abstraction layer, which can be used to introduce new objects, relationships and properties to the search mechanism without modifying the search language itself.
  • Object relationships either can be encoded as facts in the underlying model or can be derived from facts within the model builder itself.
  • Expansion operations in the query remove the complexity of object navigation by using the object schema provided by model builders to traverse an object graph for desired objects.
  • Filtering operations in the query use standard grouping Boolean logic for object property comparisons.
  • the language need not include any type information for objects but instead provides operators that the model builders map to native object type operations to produce results.
  • a query system that implements the search language comprises a parser, a lexical analyzer, an abstract syntax tree, and one or more model builders.
  • the processing of the query is distributed across multiple model builders for the purpose of data correlation and improving query performance.
  • model builders can provide data from any data source
  • the system can also provide query specifications to a model builder that processes streaming data (as opposed to mere static data or data that is retrieved on-demand).
  • the language can specify a single query across multiple model builders to simultaneously provide query results from a static model builder, a dynamic model builder, and a streaming model builder. In this way, the language correlates data from multiple data sources into a single unified data model.
  • the disclosed system provides for a domain-specific language to map abstract data structures into different specific data types of varying composition and then into a format that is known and can be utilized by an application program (such as a data query and retrieval system).
  • the language provides natural language expressions and default contextual type conversions to handle a full range of user-based queries for a given environment, a virtualized IT infrastructure being just one non-limiting example.
  • FIG. 1 is an exemplary block diagram of a data processing environment in which exemplary aspects of the illustrative embodiments may be implemented;
  • FIG. 2 is a data flow diagram of the subject matter described herein;
  • FIG. 3 illustrates the processing steps carried out by the compiler
  • FIG. 4 illustrates a graphical depiction of an abstract syntax tree generated by the compiler
  • FIG. 5 illustrates the processing steps of the query processor
  • FIG. 6 illustrates components of a model builder object
  • FIG. 7 is an example of an HTTP-based query entity making a request to the query
  • FIG. 8 is a UML-based model of a unified type model generated from different model builders
  • FIG. 9 is a representative UML-based model of an IT infrastructure that is generated by a model builder
  • FIG. 10 is a representative UML-based model of a streaming performance data 20 model that is generated by a model builder
  • FIG. 11 is a set of representative examples of requests for desired information and,
  • the query string that is used to generate the desired results using the techniques described herein For each request, the query string that is used to generate the desired results using the techniques described herein.
  • a “query” is a string of characters representing data
  • An “abstract syntax tree” is a graph of data nodes that represent a parsed structure of a query
  • a “query processor” refers to iterations of the abstract syntax tree to produce query results
  • a “compiler” is a set of software components that translate a query into an abstract syntax tree
  • model builder is a software component that conforms to a defined interface to the query processor for the purpose of providing data from a specific type of data source;
  • a “query system” is a collection of components that perform data retrieval.
  • FIG. 1 is a simplified block diagram of a representative information retrieval system in which the subject matter described herein may be implemented and comprises a data processing system.
  • a data processing system 100 suitable for storing and/or executing program code will include at least one processor 102 coupled directly or indirectly to memory elements through a system bus 105 .
  • the memory elements can include local memory 104 employed during actual execution of the program code, bulk storage 106 , and cache memories 108 that provide temporary storage of at least some program code to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers 116 .
  • Network adapters 118 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or devices through intervening private or public networks 120 .
  • the techniques described herein may task the limitations of a single computational server's resources, and thus it is contemplated that one or more process steps or functions may be distributed onto a set or hierarchy of multiple computational servers.
  • any other hardware, software, systems, devices and the 30 like may be used.
  • the subject matter described herein may be implemented with any collection of one or more autonomous computers (together with their associated software, systems, protocols and techniques) linked by a network or networks.
  • a representative information retrieval system in which the disclosed subject matter may be implemented comprises a set of computer resources (typically, software processes, programs, utilities and associated interfaces, data and protocols) to provide a monitoring and reporting tool for IT system administrators and that includes a search capability.
  • the tool has the capability of consolidating logs, metrics and other data from various applications, servers and network devices into a searchable repository and can be used to generate graphs, reports, and alerts, and to take other actions.
  • the information retrieval system helps administrators to identify anomalies and to diagnosis problems in the infrastructure.
  • the tool may have the ability to correlate log files across systems and software components, and such data can help users determine root causes when problems arise within the infrastructure.
  • the described subject matter may be implemented within the information retrieval system, or as an adjunct to that system.
  • the IT infrastructure is a virtualized infrastructure, such as the Virtualization Management CenterTM (VMC) solution available from Reflex Systems, LLC, of Atlanta, Ga.
  • VMC Virtualization Management CenterTM
  • VMC provides a single authoritative visual interface, central management and security for heterogeneous virtual environments.
  • VMC enables the user to administer, audit, secure and monitor complex, dynamic, virtual infrastructures. This results in better network and event visibility for a faster and more efficient management and security response.
  • FIG. 2 is a data flow diagram illustrating the basic components and operation of an information retrieval system and method according to this disclosure.
  • a query entity 101 (which need not be part of the system itself) formulates a query (basically a string of characters) and supplies it to the information retrieval system.
  • the system comprises compiler 200 , abstract syntax tree 300 , query processor 400 , and one or more model builders 500 .
  • the querying entity may be a user, a program or process, or the like.
  • Each model builder 500 generates an associated data model 600 , and the data models are mapped to a uniform data model.
  • the output of the query processor 400 is an object set 700 , which is the desired response to the query.
  • the querying entity submits the query to the compiler 200 , which may be implemented in the manner shown in FIG. 3 .
  • the compiler 200 performs lexical analysis 302 of the query string 103 based off of a set of lexical constructs (as described below).
  • the lexical analysis produces a set of tokens 304 , which are then applied through a parser 306 to generate the abstract syntax tree 300 .
  • the compiler may be implemented in any known manner. If desired, the lexical analysis and the parsing may be carried out with just a parser.
  • FIG. 4 is a graphical depiction of the abstract syntax tree 300 generated by the compiler.
  • a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations (the “set operator”) applied to it.
  • Each individual clause includes a clause pipeline, and a time constraint (or “specifier”).
  • a clause pipeline includes an ordered set of clause specifications separated by a pipeline operator (or “pipe”).
  • a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification.
  • a clause specification is either a filtering operation or an expansion operation.
  • Filtering operations in the query use standard grouping Boolean logic for object property comparisons.
  • Expansion operations in the query remove the complexity of object navigation by using an object schema provided by model builders to traverse an object graph for desired objects.
  • the expansion operation typically includes an expansion source type, an expansion keyword, and an expansion target type.
  • FIG. 5 illustrates the query processor operation.
  • the query processor takes as an input the abstract syntax tree generated by the compiler.
  • a test is performed at step 502 to determine whether first clause specification is to be processed. If so, the routine branches to step 504 .
  • the query processor iterates through all model builders, calling a set expansion for each object type in the object schema (as will be described below).
  • a result set is added to a current object set 700 . If, however, the result of the test at step 502 is negative, a test 508 is performed to determine the clause specification type (as the processing is now being performed within a given clause pipeline).
  • step 510 expands the current object set based off of an expansion target type. This step calls a set expansion from the model builder that provides expansion type source. The output of step 510 is provided to the current object set 700 . If the clause specification type is reduction, however, the routine branches to step 512 to reduce the current object set based off of given Boolean logic. The routine then moves to a next clause specification at step 514 and the process iterates until all clause specifications in the query are processed. At the conclusion of the clause definition processing, a set of objects is produced. If there is more than one clause definition in the query the user may specify standard mathematical set operations to be applied between the ordered sets, and these operations may include union, intersection, and compliment (or “subtraction”). A single final object set is constructed from all set operations applied to the object sets produced from each clause definition.
  • model builder one or more builder programs within the system (each referred to as a “model builder”) are used internally to present data models to the query processor of the system.
  • a model builder extracts data from a given type of data source, and it enables that data to be presented to one or more constructs of the search language according to a single unified data model.
  • the model builder functions as an abstraction layer, which can be used to introduce new objects, relationships and properties to the search mechanism without modifying the search language itself.
  • a model builder object 601 is characterized by one or more of the following: an object schema 602 , a set expansion 604 and a set reduction 606 .
  • the properties of an object are provided by the object schema 602 , and a representative schema is illustrated using a UML-based model.
  • a set expansion 604 produces a set of new objects based off of a source object.
  • a set reduction filters a superset of objects down to a smaller set based off of given filter criteria.
  • the objects in an object set pipeline are provided by an underlying system of model builders.
  • a given model builder provides the query processor with an object schema, object set expansions, and object set reductions.
  • the expansion specifications are defined in more detail below in the “Parse Constructs” section.
  • the model builders provide this data to the search language constructs in a single unified data model.
  • the object types constitute generalized IT objects. Examples of object types include, but are not limited to: virtual machines, physical hosts, groups of systems, clusters, network interfaces, switches, software, processes, protocols, memory, CPUs, system events, storage, files, and networks. Of course, these examples are merely illustrative, and the object type of course will vary depending on the implementation environment.
  • a model builder thus provides primitive data objects to the query processor.
  • a model builder obtains data from a data source and generates a data model.
  • a data source may be one of: a historical database, an on-demand data source accessed via an API, a streaming data source, or the like.
  • the models are used during the processing by the query processor of the abstract syntax tree, which is generated as an output from the compiler (which receives as input tokens produced from lexical analysis of the query string).
  • the boundary between the query processor and a model builder conforms to a well-defined interface.
  • this interface ensures that a model builder provides the query processor with a heterogeneous object schema, object set expansions based on expansion specifications, and object set reductions based off of filters.
  • an object set expansion produces a set of new objects based off a source object.
  • the query processor calls on the model builder provided object set explanation function and preferably passes in an instance of a vm object; the model builder expands that object to a set of vnic objects, which set is then returned to the query processor.
  • an example object set reduction simply filters a superset of objects down to a smaller set based on a given filter criteria.
  • the query processor preferably passes in the filter vm.mem ⁇ 2048 to the model builder provided set reduction function. This method then returns a subset of the vm objects that were passed into the method call.
  • the query processor maps the different heterogeneous object schemas together from multiple model builders to provide a single unified query interface, even from potentially heterogeneous data storage systems.
  • the query processor uses two different model builders.
  • the first model builder uses a historical SQL-based database system
  • the second model builder uses a SOAP based API.
  • the query processor has integrated the two different heterogeneous object types into a single unified object model.
  • the query processor deduces that the vm object type is provided by the historical database system and delegates the filtering of the vm object named foo to it.
  • the query processor then deduces that the cpuperf object type is provided by the second model builder.
  • Once the vm object is returned by the first model builder, it is then passed in conjunction with a time specification as filter parameters to the second model builder, thereby producing a resulting cpuperf object.
  • the processor When the query processor concludes the execution of a query, the processor returns to the query entity the current object set (as illustrated in FIG. 5 ).
  • object interchange formats can be used to return the resulting data objects including, without limitation, JSON, YAML, XML, .Net, Java, or Google Protocol Buffers.
  • the output of the query can be used by any existing procedural language to conduct any further operations on the data, such as auto-response capabilities.
  • FIG. 7 is an example of an HTTP-based query entity making a request and the results returned from that request.
  • FIG. 8 is a UML-based model of a unified type model generated from different model builders.
  • FIG. 9 is a representative UML-based model of an IT infrastructure that is generated by a model builder.
  • FIG. 10 is a representative UML-based model of a streaming performance data model that is generated by a model builder.
  • FIG. 11 is a set of representative examples of requests for desired information and, for each request, the query string that is used to generate the desired results using the techniques described herein.
  • the various components typically are implemented as one or more computer programs, where a computer program is a set of computer program instructions stored in a computer memory or data store and adapted to be executed by one or more processors to perform a process or method having a set of functions or operation steps.
  • a model builder extracts data from any type of data source, such as a relational database system (RDBMS) or an application programming interface (API).
  • RDBMS relational database system
  • API application programming interface
  • the language can query data in several different heterogeneous data storage systems, and it can then provide any type of dictionary of objects and properties.
  • the model builder can introduce new objects, relationships, and properties without modifying the language structure itself.
  • the model builder can query a single relational database system and the contents of that database can include data from multiple heterogeneous systems.
  • Model builders can provide data from any data source. Preferably, querying a model builder not only queries factual data but can also query correlated information, as well as logically-derived information at the time of query execution from a data source.
  • a model builder can store the fact that virtual network interface A is on VLAN 12 and virtual network interface B is on VLAN 12 .
  • a model builder can correlate those two facts into a new primitive property that has a relationship with both virtual machines. The model builder can present this new property to the language without requiring a change to the language.
  • model builders can derive new properties and objects based on factual data
  • model builders themselves can create new language properties and objects based on persisted state within a source data structure.
  • a model builder can provide a named property on a virtual machine object if there is data within the source data store that provides evidence that the virtual machine was connected to an insecure network in the past.
  • object relationships either can be encoded as facts in the underlying model, or they can be derived from facts within the model builder itself.
  • Expansion operations in the query remove the complexity of object navigation by using the object schema provided by model builders to traverse an abstract syntax tree for desired objects.
  • Filtering operations in the query use standard grouping Boolean logic for object property comparisons.
  • the language need include any type information for objects but instead provides operators that the model builders map to native object type operations to produce results.
  • the model builders When a user issues an on-demand query to the query system, the model builders preferably distribute the processing of the query across multiple data processing engines for the purpose of data correlation and improving query performance. Moreover, because model builders can provide data from any data source, the system can also provide query specifications to a model builder that processes streaming data (as opposed to mere static data or data that is retrieved on-demand) Thus, the language can specify a single query across multiple model builders to simultaneously provide query results from a static model builder, a dynamic model builder, and a streaming model builder. In this way, and as has been described, the system correlates data from multiple data sources.
  • a query processor that implements the language comprises a parser, a lexical analyzer, an abstract syntax tree, and one or more model builders as well as model builders to different types of data storage systems providing high performance pipeline selection logic of disparate objects and properties.
  • the execution of the query processor may or may not occur within the process space of a model builder. In appropriate circumstances, it may be desirable to implement remote execution of the query processor to improve query performance over local-process execution.
  • the language formatting is not line terminated and preferably uses keywords, spacing, and identifiers suitable for UTF-8 encoding in URLs.
  • a context-free grammar for the language is described below.
  • the compiler performs lexical analysis of a query string based off a set of lexical constructs. This lexical analysis produces a set of tokens that are then parsed to produce the abstract syntax tree. These constructs are set forth below:
  • DIGIT+ WORD :: LETTER (LETTER
  • DOUBLE_QUOTE_STRING TIME_INTERVAL :: WORD // allow for minute, minutes, min, etc
  • a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations applied to it.
  • Each clause includes a clause pipeline, and a time constraint.
  • a clause pipeline includes an ordered set of clause specifications separated by a pipeline operator.
  • a clause specification can be either an expansion operation or a filtering operation.
  • a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification.
  • An expansion specification uses the object schema provided by the model builder to navigate to specific information the user desires without requiring the user to understand the relationships between all object relationships. As noted above, this technique minimizes query specification size and reduces complexity.
  • Clause specifications are pipelined together, meaning that the objects from a previous clause specification are used as input to a subsequent clause specification. If the previous objects are used to produce the subsequent set but are not included, the keyword “project” is used. If the user wishes to include the previous objects into the new set of objects, then the user specifies the “expand” keyword.
  • time specification There are two primary methods for time specification. The first uses the keyword “at,” and the second uses the keywords “between” and “thru.” Certain types of data can exist across a span of time. An example of this type of data is the running state of a virtual machine, e.g., the virtual machine ran from time A to time B. In these instances, the keyword “at” is used to denote a specific place within that span of time. Other types of data exist only at a particular instance of time as opposed to a time span. An example of that type of data is a particular event, such as an alarm which occurred at an exact time. Thus, a time span is used to include that type of data using the keywords between and thru.
  • the subject matter disclosed herein and described above provides many advantages.
  • the disclosed language differs from prior art techniques and existing languages by simultaneously allowing both natural language search and a relationship-driven structured search to be used within a single query specification on a unified data model.
  • the language also adds two other major differences from existing languages.
  • the language provides for the codification of time in query terms, thereby allowing users to limit query results based off of object property relationships, for example, using instances or ranges of time.
  • the described subject matter also can take advantage of set membership capabilities, including query result unions, intersections, and subtractions.
  • LEFT_PAREN FILTER_SPEC RIGHT_PAREN PROPERTY_SPEC :: // [object.]property[.property]... op value OBJECT_SPECIFIER?
  • PROPERTY_SPECIFIER SIMPLE_PROPERTY_OP DATE_PROPERTY_VALUE
  • NUMBER DATE_PROPERTY_VALUE :: ABSOLUTE_TIME_SPEC
  • RELATIVE_TIME_SPEC CLAUSE_TIME_SPEC :: AT ABSOLUTE_TIME_SPEC
  • the subject matter disclosed herein and described above provides many advantages.
  • the disclosed language differs from prior art techniques and existing languages by simultaneously allowing both natural language search and a relationship-driven structured search to be used within a single query specification on a unified data model.
  • the language also adds two other major differences from existing languages.
  • the language provides for the codification of time in query terms, thereby allowing users to limit query results based off of object property relationships, for example, using instances or ranges of time.
  • the described subject matter also can take advantage of set membership capabilities, including query result unions, intersections, and subtractions.
  • the present invention provides for a data retrieval system from dissimilar computer software environments to a storage system of computer data.
  • a domain specific language is provided to map abstract data structures into different specific data types of varying composition into a format that is known and can be utilized by an application program.
  • the language provides natural language expressions and default contextual type conversions to handle a full range of user based queries for IT infrastructure.
  • the query system is composed of a compiler as well as model builders to different types of data storage systems providing high performance pipeline selection logic of disparate objects and properties.
  • the disclosed techniques provide the ability to produce query results based on the codification of IT infrastructure with complex relationships requiring minimal knowledge of the environment.
  • the invention combines both approaches of free-form search as well as structured search using a query based language grammar that executes at real-time speeds.
  • the invention provides the ability to simplify query creation by providing natural language search capabilities where simple strings of characters can be used without specifying object relationships or special character delimiters or tokens.
  • the language is able to use object relationships to produce more accurate query results whose results can be relied upon, even by unattended software automation. More generally, the techniques described herein provide for a single unified language construct that simplifies query processing.
  • the functionality described above may be implemented as a standalone approach, e.g., a software-based function executed by a processor, or it may be available as a managed service (including as a web service via a SOAP/XML interface).
  • a managed service including as a web service via a SOAP/XML interface.
  • computing devices within the context of the disclosed invention are each a data processing system (such as shown in FIG. 1 ) comprising hardware and software, and these entities communicate with one another over a network, such as the Internet, an intranet, an extranet, a private network, or any other communications medium or link.
  • the applications on the data processing system provide native support for Web and other known services and protocols including, without limitation, support for HTTP, FTP, SMTP, SOAP, XML, WSDL, UDDI, and WSFL, among others.
  • W3C World Wide Web Consortium
  • IETF Internet Engineering Task Force
  • the scheme described herein may be implemented in or in conjunction with various server-side architectures including simple n-tier architectures, web portals, federated systems, and the like.
  • the subject matter described herein can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the function is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.
  • the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium can be any apparatus that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or a semiconductor system (or apparatus or device).
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • the computer-readable medium is a tangible item.
  • the computer program product may be a product having program instructions (or program code) to implement one or more of the described functions.
  • Those instructions or code may be stored in a computer readable storage medium in a data processing system after being downloaded over a network from a remote data processing system.
  • those instructions or code may be stored in a computer readable storage medium in a server data processing system and adapted to be downloaded over a network to a remote data processing system for use in a computer readable storage medium within the remote system.
  • the parser, lexical analyzer and model builder components are implemented in a special purpose computer, preferably in software executed by one or more processors.
  • the software is maintained in one or more data stores or memories associated with the one or more processors, and the software may be implemented as one or more computer programs.
  • this special-purpose hardware and software comprises a language processor that provides the described functionality.
  • the query mechanism may be any convenient client-side application, such as a web browser, a command line interface (CLI), a programmatic interface, or the like.
  • client-side application should be broadly construed to refer to an application, a page associated with that application, or some other resource or function invoked by a client-side request to the application.
  • a “browser” as used herein is not intended to refer to any specific browser (e.g., Internet Explorer, Safari, FireFox, or the like), but should be broadly construed to refer to any client-side rendering engine that can access and display Internet-accessible resources. Further, while typically the client-server interactions occur using HTTP, this is not a limitation either.
  • Any client-server interaction may be formatted to conform to the Simple Object Access Protocol (SOAP) and travel over HTTP (over the public Internet), FTP, or any other reliable transport mechanism may be used.
  • SOAP Simple Object Access Protocol
  • web site or “service provider” should be broadly construed to cover a web site (a set of linked web pages), a domain at a given web site or server, a trust domain associated with a server or set of servers, or the like.
  • a “service provider domain” may include a web site or a portion of a web site.
  • Any application or functionality described herein may be implemented as native code, by providing hooks into another application, by facilitating use of the mechanism as a plug-in, by linking to the mechanism, and the like.

Abstract

Systems and Methods for information retrieval, comprising: receiving object-oriented data from multiple data sources; receiving a query from a query application that formulates the query and supplies the query to an information retrieval system; parsing the query into a graph of data nodes; processing the data nodes in the graph on the object-oriented data to generate a current object set; and returning the current object set to the query application in response to the query.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation application of U.S. patent application Ser. No. 12/573,587, filed Oct. 5, 2009, which claims the benefit of U.S. Provisional Application No. 61/102,823, filed Oct. 4, 2008. All of the foregoing are incorporated by reference in their entireties.
  • BACKGROUND OF THE INVENTION
  • 1. Technical Field
  • This invention relates generally to computer system data retrieval techniques.
  • 2. Background of the Related Art
  • Information retrieval systems are known in the art. Such systems generally offer users a variety of means of expressing user intentions through queries. These include text search, parametric search, structured queries, selection from alternatives (i.e., browsing or navigation), and range specification. In general, the systems offer users a means of expressing queries using either a structured language (e.g., a language like SQL) or an informal input mechanism (e.g., English keyword search). When the input mechanism is informal, the problems of ambiguity may arise from the language itself. But, even when the input mechanism is formal, the user may not always succeed in expressing his or her intention in the formal query language.
  • Information Technology (IT) infrastructures can become large and complex, and this creates a need to provide administrators and others with the ability to query, display, and manipulate their operation in a timely manner. In particular, both experienced and inexperienced users need the ability to quickly find data within the IT infrastructure; likewise, external software programs that can be executed from a programmatic environment need the ability to quickly find specific data based on complex conditions and then output results in an object-oriented structured form.
  • Several organizations have attempted to provide a query language for IT infrastructure including the Distributed Management Task Force (DMTF) CIM Schema definitions, Microsoft Corporation's SQL for WMI (WQL), the Splunk search language, the Hyper9 natural language, HQL, and other XQuery based language representations. These languages have one or more deficiencies. They are narrowly defined for a specific vendor implementation, they do not provide domain specific capabilities, they are overly focused on natural language or structured queries, as the case may be, or they are so broad and verbose that they are only suited for programmatic execution (as opposed to, for example) execution from a simple command line. These existing languages also can require special programming expertise, and they may use rigid language constructs that do not allow for domain level data interpretation.
  • BRIEF SUMMARY
  • The subject matter of this disclosure provides for enhanced information retrieval, e.g., in an IT infrastructure information retrieval system. In one embodiment, the techniques provide the ability to produce query results based on codification of IT infrastructure with complex relationships, even where the user has only minimal knowledge of the environment.
  • The information retrieval system implements a search language, through which a querying entity (e.g., a user, a program or process, or the like) formulates a search query. Preferably, a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations applied to it. Each clause includes a clause pipeline, and a time constraint. A clause pipeline includes an ordered set of clause specifications separated by a pipeline operator. A clause specification can be either an expansion operation or a filtering operation. Preferably, a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification.
  • The search language is exposed to users (typically, IT administrators), and one or more builder programs within the system (each referred to as a “model builder”) are used internally to present data models to the search language. A model builder extracts data from a given type of data source (including, without limitation, a relational database system, an application programming interface (API), or the like), and enables that data to be presented to one or more constructs of the search language according to a single unified data model. Thus, the model builder functions as an abstraction layer, which can be used to introduce new objects, relationships and properties to the search mechanism without modifying the search language itself. Object relationships either can be encoded as facts in the underlying model or can be derived from facts within the model builder itself. Expansion operations in the query remove the complexity of object navigation by using the object schema provided by model builders to traverse an object graph for desired objects. Filtering operations in the query use standard grouping Boolean logic for object property comparisons. Preferably, the language need not include any type information for objects but instead provides operators that the model builders map to native object type operations to produce results.
  • A query system that implements the search language comprises a parser, a lexical analyzer, an abstract syntax tree, and one or more model builders. When a user issues an on-demand query to the query system, the processing of the query is distributed across multiple model builders for the purpose of data correlation and improving query performance. Moreover, because model builders can provide data from any data source, the system can also provide query specifications to a model builder that processes streaming data (as opposed to mere static data or data that is retrieved on-demand). Thus, the language can specify a single query across multiple model builders to simultaneously provide query results from a static model builder, a dynamic model builder, and a streaming model builder. In this way, the language correlates data from multiple data sources into a single unified data model.
  • More generally, the disclosed system provides for a domain-specific language to map abstract data structures into different specific data types of varying composition and then into a format that is known and can be utilized by an application program (such as a data query and retrieval system). The language provides natural language expressions and default contextual type conversions to handle a full range of user-based queries for a given environment, a virtualized IT infrastructure being just one non-limiting example.
  • The foregoing has outlined some of the more pertinent features of the invention. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed invention in a different manner or by modifying the invention as will be described.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
  • FIG. 1 is an exemplary block diagram of a data processing environment in which exemplary aspects of the illustrative embodiments may be implemented;
  • FIG. 2 is a data flow diagram of the subject matter described herein;
  • FIG. 3 illustrates the processing steps carried out by the compiler;
  • FIG. 4 illustrates a graphical depiction of an abstract syntax tree generated by the compiler;
  • FIG. 5 illustrates the processing steps of the query processor;
  • FIG. 6 illustrates components of a model builder object;
  • FIG. 7 is an example of an HTTP-based query entity making a request to the query
  • system and the results returned from that request;
  • FIG. 8 is a UML-based model of a unified type model generated from different model builders;
  • FIG. 9 is a representative UML-based model of an IT infrastructure that is generated by a model builder;
  • FIG. 10 is a representative UML-based model of a streaming performance data 20 model that is generated by a model builder;
  • FIG. 11 is a set of representative examples of requests for desired information and,
  • for each request, the query string that is used to generate the desired results using the techniques described herein.
  • DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
  • The following definitions apply to components of the system:
  • A “query” is a string of characters representing data;
  • An “abstract syntax tree” is a graph of data nodes that represent a parsed structure of a query;
  • A “query processor” refers to iterations of the abstract syntax tree to produce query results;
  • A “compiler” is a set of software components that translate a query into an abstract syntax tree;
  • A “model builder” is a software component that conforms to a defined interface to the query processor for the purpose of providing data from a specific type of data source;
  • A “query system” is a collection of components that perform data retrieval.
  • FIG. 1 is a simplified block diagram of a representative information retrieval system in which the subject matter described herein may be implemented and comprises a data processing system. As seen in FIG. 1, a data processing system 100 suitable for storing and/or executing program code will include at least one processor 102 coupled directly or indirectly to memory elements through a system bus 105. The memory elements can include local memory 104 employed during actual execution of the program code, bulk storage 106, and cache memories 108 that provide temporary storage of at least some program code to reduce the number of times code must be retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards 110, displays 112, pointing devices 114, etc.) can be coupled to the system either directly or through intervening I/O controllers 116. Network adapters 118 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or devices through intervening private or public networks 120.
  • In some embodiments, the techniques described herein may task the limitations of a single computational server's resources, and thus it is contemplated that one or more process steps or functions may be distributed onto a set or hierarchy of multiple computational servers. Of course, any other hardware, software, systems, devices and the 30 like may be used. More generally, the subject matter described herein may be implemented with any collection of one or more autonomous computers (together with their associated software, systems, protocols and techniques) linked by a network or networks.
  • A representative information retrieval system in which the disclosed subject matter may be implemented comprises a set of computer resources (typically, software processes, programs, utilities and associated interfaces, data and protocols) to provide a monitoring and reporting tool for IT system administrators and that includes a search capability. Preferably, the tool has the capability of consolidating logs, metrics and other data from various applications, servers and network devices into a searchable repository and can be used to generate graphs, reports, and alerts, and to take other actions. The information retrieval system helps administrators to identify anomalies and to diagnosis problems in the infrastructure. The tool may have the ability to correlate log files across systems and software components, and such data can help users determine root causes when problems arise within the infrastructure. The described subject matter may be implemented within the information retrieval system, or as an adjunct to that system. In one particular implementation, the IT infrastructure is a virtualized infrastructure, such as the Virtualization Management Center™ (VMC) solution available from Reflex Systems, LLC, of Atlanta, Ga. VMC enables next generation data centers to enforce IT policies, ensure compliance with government mandates and protect virtual servers, desktops, and networks across multiple platforms. In particular, VMC provides a single authoritative visual interface, central management and security for heterogeneous virtual environments. By combining a centralized event database, virtual infrastructure integration and analysis engines with a robust visual interface, VMC enables the user to administer, audit, secure and monitor complex, dynamic, virtual infrastructures. This results in better network and event visibility for a faster and more efficient management and security response.
  • FIG. 2 is a data flow diagram illustrating the basic components and operation of an information retrieval system and method according to this disclosure. A query entity 101 (which need not be part of the system itself) formulates a query (basically a string of characters) and supplies it to the information retrieval system. The system comprises compiler 200, abstract syntax tree 300, query processor 400, and one or more model builders 500. The querying entity may be a user, a program or process, or the like. Each model builder 500 generates an associated data model 600, and the data models are mapped to a uniform data model. The output of the query processor 400 is an object set 700, which is the desired response to the query.
  • The querying entity submits the query to the compiler 200, which may be implemented in the manner shown in FIG. 3. In particular, the compiler 200 performs lexical analysis 302 of the query string 103 based off of a set of lexical constructs (as described below). The lexical analysis produces a set of tokens 304, which are then applied through a parser 306 to generate the abstract syntax tree 300. The compiler may be implemented in any known manner. If desired, the lexical analysis and the parsing may be carried out with just a parser.
  • FIG. 4 is a graphical depiction of the abstract syntax tree 300 generated by the compiler. As can be seen, a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations (the “set operator”) applied to it. Each individual clause includes a clause pipeline, and a time constraint (or “specifier”). A clause pipeline includes an ordered set of clause specifications separated by a pipeline operator (or “pipe”). Preferably, a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification. Preferably, a clause specification is either a filtering operation or an expansion operation. Filtering operations in the query use standard grouping Boolean logic for object property comparisons. Expansion operations in the query remove the complexity of object navigation by using an object schema provided by model builders to traverse an object graph for desired objects. The expansion operation typically includes an expansion source type, an expansion keyword, and an expansion target type.
  • FIG. 5 illustrates the query processor operation. As noted above, the query processor takes as an input the abstract syntax tree generated by the compiler. A test is performed at step 502 to determine whether first clause specification is to be processed. If so, the routine branches to step 504. At this step, the query processor iterates through all model builders, calling a set expansion for each object type in the object schema (as will be described below). At step 506, a result set is added to a current object set 700. If, however, the result of the test at step 502 is negative, a test 508 is performed to determine the clause specification type (as the processing is now being performed within a given clause pipeline). If the clause specification type is expansion, the routine branches to step 510 to expand the current object set based off of an expansion target type. This step calls a set expansion from the model builder that provides expansion type source. The output of step 510 is provided to the current object set 700. If the clause specification type is reduction, however, the routine branches to step 512 to reduce the current object set based off of given Boolean logic. The routine then moves to a next clause specification at step 514 and the process iterates until all clause specifications in the query are processed. At the conclusion of the clause definition processing, a set of objects is produced. If there is more than one clause definition in the query the user may specify standard mathematical set operations to be applied between the ordered sets, and these operations may include union, intersection, and compliment (or “subtraction”). A single final object set is constructed from all set operations applied to the object sets produced from each clause definition.
  • As noted above, one or more builder programs within the system (each referred to as a “model builder”) are used internally to present data models to the query processor of the system. A model builder extracts data from a given type of data source, and it enables that data to be presented to one or more constructs of the search language according to a single unified data model. Thus, the model builder functions as an abstraction layer, which can be used to introduce new objects, relationships and properties to the search mechanism without modifying the search language itself.
  • Referring now to FIG. 6, a model builder object 601 is characterized by one or more of the following: an object schema 602, a set expansion 604 and a set reduction 606. The properties of an object are provided by the object schema 602, and a representative schema is illustrated using a UML-based model. A set expansion 604 produces a set of new objects based off of a source object. A set reduction filters a superset of objects down to a smaller set based off of given filter criteria. The objects in an object set pipeline are provided by an underlying system of model builders. Thus, a given model builder provides the query processor with an object schema, object set expansions, and object set reductions. The expansion specifications are defined in more detail below in the “Parse Constructs” section.
  • Advantageously, and as noted above, the model builders provide this data to the search language constructs in a single unified data model. In an IT infrastructure environment, the object types constitute generalized IT objects. Examples of object types include, but are not limited to: virtual machines, physical hosts, groups of systems, clusters, network interfaces, switches, software, processes, protocols, memory, CPUs, system events, storage, files, and networks. Of course, these examples are merely illustrative, and the object type of course will vary depending on the implementation environment.
  • A model builder thus provides primitive data objects to the query processor. A model builder obtains data from a data source and generates a data model. As noted, a data source may be one of: a historical database, an on-demand data source accessed via an API, a streaming data source, or the like. The models are used during the processing by the query processor of the abstract syntax tree, which is generated as an output from the compiler (which receives as input tokens produced from lexical analysis of the query string).
  • The boundary between the query processor and a model builder conforms to a well-defined interface. As noted above, this interface ensures that a model builder provides the query processor with a heterogeneous object schema, object set expansions based on expansion specifications, and object set reductions based off of filters. As noted above, an object set expansion produces a set of new objects based off a source object. For example, the query vm.name=foo project_vnic may expand a set of one vm object to a set of more than one vnic objects. In this case, the query processor calls on the model builder provided object set explanation function and preferably passes in an instance of a vm object; the model builder expands that object to a set of vnic objects, which set is then returned to the query processor. In contrast, an example object set reduction simply filters a superset of objects down to a smaller set based on a given filter criteria. Continuing with this example, if the current object set comprises of vm objects and each has a mem (memory) property, a clause specification of vm.mem>2048 would filter out any vm objects whose mem properties were less than or equal to 2048; thus, an object set reduction would produce a reduced set of objects from the original set. In this case, the query processor preferably passes in the filter vm.mem<2048 to the model builder provided set reduction function. This method then returns a subset of the vm objects that were passed into the method call.
  • As these examples illustrate, the query processor maps the different heterogeneous object schemas together from multiple model builders to provide a single unified query interface, even from potentially heterogeneous data storage systems.
  • As another concrete example, consider the query vm.name=foo project cpuperf at 1 week ago. This query uses two different model builders. The first model builder uses a historical SQL-based database system, and the second model builder uses a SOAP based API. Using the techniques described herein, the query processor has integrated the two different heterogeneous object types into a single unified object model. In particular, the query processor deduces that the vm object type is provided by the historical database system and delegates the filtering of the vm object named foo to it. The query processor then deduces that the cpuperf object type is provided by the second model builder. Once the vm object is returned by the first model builder, it is then passed in conjunction with a time specification as filter parameters to the second model builder, thereby producing a resulting cpuperf object.
  • When the query processor concludes the execution of a query, the processor returns to the query entity the current object set (as illustrated in FIG. 5). Several different types of object interchange formats can be used to return the resulting data objects including, without limitation, JSON, YAML, XML, .Net, Java, or Google Protocol Buffers. The output of the query can be used by any existing procedural language to conduct any further operations on the data, such as auto-response capabilities. FIG. 7 is an example of an HTTP-based query entity making a request and the results returned from that request.
  • FIG. 8 is a UML-based model of a unified type model generated from different model builders. FIG. 9 is a representative UML-based model of an IT infrastructure that is generated by a model builder. FIG. 10 is a representative UML-based model of a streaming performance data model that is generated by a model builder.
  • FIG. 11 is a set of representative examples of requests for desired information and, for each request, the query string that is used to generate the desired results using the techniques described herein.
  • The various components (e.g., compiler, query processor, model builder) typically are implemented as one or more computer programs, where a computer program is a set of computer program instructions stored in a computer memory or data store and adapted to be executed by one or more processors to perform a process or method having a set of functions or operation steps. As noted above, a model builder extracts data from any type of data source, such as a relational database system (RDBMS) or an application programming interface (API). By providing the model builder abstraction layer, the language itself can query data in several different heterogeneous data storage systems, and it can then provide any type of dictionary of objects and properties. Using this abstraction, the model builder can introduce new objects, relationships, and properties without modifying the language structure itself. In addition, the model builder can query a single relational database system and the contents of that database can include data from multiple heterogeneous systems.
  • Model builders can provide data from any data source. Preferably, querying a model builder not only queries factual data but can also query correlated information, as well as logically-derived information at the time of query execution from a data source. For example, a model builder can store the fact that virtual network interface A is on VLAN 12 and virtual network interface B is on VLAN 12. A model builder can correlate those two facts into a new primitive property that has a relationship with both virtual machines. The model builder can present this new property to the language without requiring a change to the language.
  • Moreover, because model builders can derive new properties and objects based on factual data, model builders themselves can create new language properties and objects based on persisted state within a source data structure. For example, a model builder can provide a named property on a virtual machine object if there is data within the source data store that provides evidence that the virtual machine was connected to an insecure network in the past.
  • As noted above, object relationships either can be encoded as facts in the underlying model, or they can be derived from facts within the model builder itself. Expansion operations in the query remove the complexity of object navigation by using the object schema provided by model builders to traverse an abstract syntax tree for desired objects. Filtering operations in the query use standard grouping Boolean logic for object property comparisons. Preferably, the language need include any type information for objects but instead provides operators that the model builders map to native object type operations to produce results.
  • When a user issues an on-demand query to the query system, the model builders preferably distribute the processing of the query across multiple data processing engines for the purpose of data correlation and improving query performance. Moreover, because model builders can provide data from any data source, the system can also provide query specifications to a model builder that processes streaming data (as opposed to mere static data or data that is retrieved on-demand) Thus, the language can specify a single query across multiple model builders to simultaneously provide query results from a static model builder, a dynamic model builder, and a streaming model builder. In this way, and as has been described, the system correlates data from multiple data sources.
  • Because model builders provide most of the breadth of the language, the set of tokens that constitute the language itself is fairly small. To provide natural language query simplicity, the language considers unrecognized tokens as string query specifications for the string versions of all object properties within the universe of the current clause specification the token was found in.
  • A query processor that implements the language comprises a parser, a lexical analyzer, an abstract syntax tree, and one or more model builders as well as model builders to different types of data storage systems providing high performance pipeline selection logic of disparate objects and properties.
  • As illustrated in FIG. 2, the execution of the query processor may or may not occur within the process space of a model builder. In appropriate circumstances, it may be desirable to implement remote execution of the query processor to improve query performance over local-process execution. In addition, the language formatting is not line terminated and preferably uses keywords, spacing, and identifiers suitable for UTF-8 encoding in URLs.
  • Grammar
  • A context-free grammar for the language is described below. As noted above, the compiler performs lexical analysis of a query string based off a set of lexical constructs. This lexical analysis produces a set of tokens that are then parsed to produce the abstract syntax tree. These constructs are set forth below:
  • Lexigraphical constructs
      LOGICAL_CONDITIONAL ::= ‘AND’ I ‘OR’
      EXPANSION ::= ‘PROJECT’ I ‘EXPAND’
      SET OPERATION ::= ‘INCLUDE’ ‘IN’ ‘WITHOUT’
      BOOLEAN ::= ‘true’ 1 ‘false’
      IS ::= ‘IS’
      AT ::= ‘AT’
      AGO ::= ‘AGO’
      NOT ::= ‘NOT’ | ‘!’
      BETWEEN ::= ‘BETWEEN’
      THRU ::= ‘THRU’
      EMPTY ::= ‘EMPTY’
      STARTSWITH ::= ‘STARTSWITH’
      ENDSWITH ::= ‘ENDSWITH’
      CONTAINS ::= ‘CONTAINS’
      REGEX ::= ‘REGEX’
      GREATER ::= ‘GREATER’
      LESS ::= ‘LESS’
      THAN ::= ‘THAN’
      EQUALS ::= ‘EQUALS’
      LEFT_PAREN ::= ‘(’
      RIGHT_PAREN ::= ‘)’
      PIPELINE ::= ‘|’
      OBJECT DELIMITER ::= ‘.’
      COLON ::= ‘:’
      DASH ::= ‘-‘
      SIMPLE_PROPERTY_OP ::= ‘=’ | ‘!=’ | ‘>’ | ‘>=’ | ‘<’ | ‘<=’ |
        CONTAINS   |
        STARTSWITH   |
        ENDSWITH   |
        REGEX   |
        NOT? EQUAL |
        GREATER THAN EQUAL?   |
        LESS THAN EQUAL?
      IS_PROPERTY_OP ::= IS NOT?
      UPPER ::= ‘A’ .. ‘Z’
      LOWER ::= ‘a’ .. ‘z’
      LETTER ::= UPPER | LOWER
      DIGIT ::= ‘0’ .. ‘9’
      SPACE ::= ‘ ’ | ‘\t’
      SINGLE_QUOTE ::= ‘\‘’
      DOUBLE_QUOTE ::= ‘“’
      NUMBER ::= DASH? DIGIT+
      WORD ::= LETTER (LETTER | DIGIT | ‘_’)*
      SINGLE_QUOTE_STRING ::= SINGLE_QUOTE
      (~SINGLE_QUOTE) * SINGLE_QUOTE
      DOUBLE_QUOTE_STRING ::= DOUBLE_QUOTE
    (~DOUBLE_QUOTE) * DOUBLE_QUOTE
      STRING_LITERAL = SINGLE_QUOTE_STRING |
      DOUBLE_QUOTE_STRING
      TIME_INTERVAL ::= WORD // allow for minute, minutes, min, etc
  • Parse Constructs
  • The following is a set of parsing constructs (described according to BNF, an industry standard grammar) that are implemented by the parser. Preferably, a search query is composed of an ordered set of clause definitions, and each clause can have set membership operations applied to it. Each clause includes a clause pipeline, and a time constraint. A clause pipeline includes an ordered set of clause specifications separated by a pipeline operator. A clause specification can be either an expansion operation or a filtering operation. Preferably, a first clause specification in a pipeline operates on an initial universe of all objects, and each subsequent clause specification operates on a set of objects produced from the previous clause specification.
  • An expansion specification uses the object schema provided by the model builder to navigate to specific information the user desires without requiring the user to understand the relationships between all object relationships. As noted above, this technique minimizes query specification size and reduces complexity. Preferably, there are two types of expansion mechanisms defined: “project” and “expand.” Clause specifications are pipelined together, meaning that the objects from a previous clause specification are used as input to a subsequent clause specification. If the previous objects are used to produce the subsequent set but are not included, the keyword “project” is used. If the user wishes to include the previous objects into the new set of objects, then the user specifies the “expand” keyword.
  • Another advantageous capability of the language is the inclusion of temporal specification within clause definitions. There are two primary methods for time specification. The first uses the keyword “at,” and the second uses the keywords “between” and “thru.” Certain types of data can exist across a span of time. An example of this type of data is the running state of a virtual machine, e.g., the virtual machine ran from time A to time B. In these instances, the keyword “at” is used to denote a specific place within that span of time. Other types of data exist only at a particular instance of time as opposed to a time span. An example of that type of data is a particular event, such as an alarm which occurred at an exact time. Thus, a time span is used to include that type of data using the keywords between and thru.
  • The subject matter disclosed herein and described above provides many advantages. The disclosed language differs from prior art techniques and existing languages by simultaneously allowing both natural language search and a relationship-driven structured search to be used within a single query specification on a unified data model. In addition, the language also adds two other major differences from existing languages. The language provides for the codification of time in query terms, thereby allowing users to limit query results based off of object property relationships, for example, using instances or ranges of time. The described subject matter also can take advantage of set membership capabilities, including query result unions, intersections, and subtractions.
  •    QUERY ::= CLAUSE_DEFINITION (SET_OPERATION CLAUSE_DEFINITION) *
       CLAUSE DEFINITION ::= CLAUSE PIPELINE CLAUSE TIME SPEC?
       CLAUSE_PIPELINE ::= CLAUSE_SPEC (PIPELINE CLAUSE_SPEC) *
       CLAUSE_SPEC ::= EXPANSION_SPEC | FILTER_SPEC EXPANSION_SPEC?
       EXPANSION_SPEC ::= EXPANSION WORD (COLON NUMBER)?
       FILTER_SPEC ::= NOT? PROPERTY_FILTER_SPEC (LOGICAL_CONDITIONAL?
       PROPERTY_FILTER_SPEC) *
       PROPERTY_FILTER_SPEC ::= PROPERTY_SPEC | LEFT_PAREN FILTER_SPEC
    RIGHT_PAREN
       PROPERTY_SPEC ::=
          // [object.]property[.property]... op value
          OBJECT_SPECIFIER? PROPERTY_SPECIFIER SIMPLE_PROPERTY_OP
    SIMPLE_PROPERTY_VALUE |
          OBJECT_SPECIFIER? PROPERTY_SPECIFIER IS_PROPERTY_OP
    ASSERTION_PROPRETY_VALUE |
          OBJECT_SPECIFIER? PROPERTY_SPECIFIER SIMPLE_PROPERTY_OP
    DATE_PROPERTY_VALUE |
          // really short cut, short cuts to object.property operator value
          SIMPLE_PROPERTY_VALUE
       OBJECT_SPECIFIER ::= WORD
       PROPERTY_SPECIFIER ::= (OBJECT_DELIMITER WORD)+ | WORD
       ASSERTION_PROPERTY_VALUE ::= EMPTY
       SIMPLE_PROPERTY_VALUE ::= WORD | STRING_LITERAL | BOOLEAN | NUMBER
       DATE_PROPERTY_VALUE ::= ABSOLUTE_TIME_SPEC | RELATIVE_TIME_SPEC
       CLAUSE_TIME_SPEC ::= AT ABSOLUTE_TIME_SPEC |
          AT RELATIVE_TIME_SPEC_AGO |
          BETWEEN ABSOLUTE_TIME_SPEC THRU ABSOLUTE_TIME_SPEC |
          BETWEEN RELATIVE_TIME_SPEC AGO? THRU RELATIVE_TIME_SPEC
    AGO?
       // YYYY-MM-DD HH:MM:SS
       ABSOLUTE TIME SPEC ::= NUMBER DASH NUMBER DASH NUMBER NUMBER
    COLON NUMBER COLON NUMBER
       // X intervals
       RELATIVE TIME SPEC ::= NUMBER TIME INTERVAL
  • The subject matter disclosed herein and described above provides many advantages. The disclosed language differs from prior art techniques and existing languages by simultaneously allowing both natural language search and a relationship-driven structured search to be used within a single query specification on a unified data model. In addition, the language also adds two other major differences from existing languages. The language provides for the codification of time in query terms, thereby allowing users to limit query results based off of object property relationships, for example, using instances or ranges of time. The described subject matter also can take advantage of set membership capabilities, including query result unions, intersections, and subtractions.
  • The present invention provides for a data retrieval system from dissimilar computer software environments to a storage system of computer data. In order to interface the environments, a domain specific language is provided to map abstract data structures into different specific data types of varying composition into a format that is known and can be utilized by an application program. The language provides natural language expressions and default contextual type conversions to handle a full range of user based queries for IT infrastructure. The query system is composed of a compiler as well as model builders to different types of data storage systems providing high performance pipeline selection logic of disparate objects and properties.
  • The disclosed techniques provide the ability to produce query results based on the codification of IT infrastructure with complex relationships requiring minimal knowledge of the environment. As described, the invention combines both approaches of free-form search as well as structured search using a query based language grammar that executes at real-time speeds. The invention provides the ability to simplify query creation by providing natural language search capabilities where simple strings of characters can be used without specifying object relationships or special character delimiters or tokens. The language is able to use object relationships to produce more accurate query results whose results can be relied upon, even by unattended software automation. More generally, the techniques described herein provide for a single unified language construct that simplifies query processing.
  • The functionality described above may be implemented as a standalone approach, e.g., a software-based function executed by a processor, or it may be available as a managed service (including as a web service via a SOAP/XML interface). The particular hardware and software implementation details described herein are merely for illustrative purposes are not meant to limit the scope of the described subject matter.
  • More generally, computing devices within the context of the disclosed invention are each a data processing system (such as shown in FIG. 1) comprising hardware and software, and these entities communicate with one another over a network, such as the Internet, an intranet, an extranet, a private network, or any other communications medium or link. The applications on the data processing system provide native support for Web and other known services and protocols including, without limitation, support for HTTP, FTP, SMTP, SOAP, XML, WSDL, UDDI, and WSFL, among others. Information regarding SOAP, WSDL, UDDI and WSFL is available from the World Wide Web Consortium (W3C), which is responsible for developing and maintaining these standards; further information regarding HTTP, FTP, SMTP and XML is available from Internet Engineering Task Force (IETF). Familiarity with these known standards and protocols is presumed.
  • The scheme described herein may be implemented in or in conjunction with various server-side architectures including simple n-tier architectures, web portals, federated systems, and the like.
  • Still more generally, the subject matter described herein can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the function is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like. Furthermore, as noted above, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or a semiconductor system (or apparatus or device). Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD. The computer-readable medium is a tangible item.
  • The computer program product may be a product having program instructions (or program code) to implement one or more of the described functions. Those instructions or code may be stored in a computer readable storage medium in a data processing system after being downloaded over a network from a remote data processing system. Or, those instructions or code may be stored in a computer readable storage medium in a server data processing system and adapted to be downloaded over a network to a remote data processing system for use in a computer readable storage medium within the remote system.
  • In a representative embodiment, the parser, lexical analyzer and model builder components are implemented in a special purpose computer, preferably in software executed by one or more processors. The software is maintained in one or more data stores or memories associated with the one or more processors, and the software may be implemented as one or more computer programs. Collectively, this special-purpose hardware and software comprises a language processor that provides the described functionality.
  • While the above describes a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
  • Finally, while given components of the system have been described separately, one of ordinary skill will appreciate that some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like.
  • The query mechanism may be any convenient client-side application, such as a web browser, a command line interface (CLI), a programmatic interface, or the like. As used herein, the “client-side” application should be broadly construed to refer to an application, a page associated with that application, or some other resource or function invoked by a client-side request to the application. A “browser” as used herein is not intended to refer to any specific browser (e.g., Internet Explorer, Safari, FireFox, or the like), but should be broadly construed to refer to any client-side rendering engine that can access and display Internet-accessible resources. Further, while typically the client-server interactions occur using HTTP, this is not a limitation either. Any client-server interaction may be formatted to conform to the Simple Object Access Protocol (SOAP) and travel over HTTP (over the public Internet), FTP, or any other reliable transport mechanism may be used. Also, the term “web site” or “service provider” should be broadly construed to cover a web site (a set of linked web pages), a domain at a given web site or server, a trust domain associated with a server or set of servers, or the like. A “service provider domain” may include a web site or a portion of a web site. Any application or functionality described herein may be implemented as native code, by providing hooks into another application, by facilitating use of the mechanism as a plug-in, by linking to the mechanism, and the like.
  • Having described our invention, what we now claim is as follows.

Claims (1)

1. A machine-implemented method of information retrieval, comprising:
receiving, at a processing device, object-oriented data from multiple data sources;
receiving a query comprising an ordered set of clause definitions each including a clause pipeline and a time constraint;
parsing the query into a graph of one or more data nodes;
processing, on an iterative basis, each data node in the graph to generate a current object set; and
returning the current object set to the query application in response to the query.
US14/276,118 2008-10-04 2014-05-13 Systems and methods for information retrieval Abandoned US20140330859A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/276,118 US20140330859A1 (en) 2008-10-04 2014-05-13 Systems and methods for information retrieval

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10282308P 2008-10-04 2008-10-04
US12/573,587 US8805875B1 (en) 2008-10-04 2009-10-05 Systems and methods for information retrieval
US14/276,118 US20140330859A1 (en) 2008-10-04 2014-05-13 Systems and methods for information retrieval

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/573,587 Continuation US8805875B1 (en) 2008-10-04 2009-10-05 Systems and methods for information retrieval

Publications (1)

Publication Number Publication Date
US20140330859A1 true US20140330859A1 (en) 2014-11-06

Family

ID=51267436

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/573,587 Expired - Fee Related US8805875B1 (en) 2008-10-04 2009-10-05 Systems and methods for information retrieval
US14/276,118 Abandoned US20140330859A1 (en) 2008-10-04 2014-05-13 Systems and methods for information retrieval

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/573,587 Expired - Fee Related US8805875B1 (en) 2008-10-04 2009-10-05 Systems and methods for information retrieval

Country Status (1)

Country Link
US (2) US8805875B1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200349555A1 (en) * 2018-01-16 2020-11-05 Zoe Life Technologies Holding AG Knowledge currency units
CN113486092A (en) * 2021-07-30 2021-10-08 苏州工业职业技术学院 Time graph approximate query method and device based on time constraint
US11218374B2 (en) * 2019-07-30 2022-01-04 Microsoft Technology Licensing, Llc Discovery and resolution of network connected devices
US11899643B2 (en) * 2015-05-22 2024-02-13 Open Text Holdings, Inc. System and method for generating, maintaining, and querying a database for computer investigations cross-reference to related application

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9081873B1 (en) * 2009-10-05 2015-07-14 Stratacloud, Inc. Method and system for information retrieval in response to a query
US10817508B2 (en) * 2010-12-13 2020-10-27 Micro Focus Llc Techniques for constructing and editing a search query using an overload cell
EP2608487A1 (en) * 2011-12-22 2013-06-26 Thibaud Arnault Method, system and computer program product for providing composite web application
US9430523B2 (en) 2013-09-06 2016-08-30 Sap Se Entity-relationship model extensions using annotations
US9354948B2 (en) 2013-09-06 2016-05-31 Sap Se Data models containing host language embedded constraints
US9639572B2 (en) 2013-09-06 2017-05-02 Sap Se SQL enhancements simplifying database querying
US9361407B2 (en) 2013-09-06 2016-06-07 Sap Se SQL extended with transient fields for calculation expressions in enhanced data models
US9619552B2 (en) 2013-09-06 2017-04-11 Sap Se Core data services extensibility for entity-relationship models
US9176801B2 (en) * 2013-09-06 2015-11-03 Sap Se Advanced data models containing declarative and programmatic constraints
US9442977B2 (en) 2013-09-06 2016-09-13 Sap Se Database language extended to accommodate entity-relationship models
US10114861B2 (en) * 2014-01-31 2018-10-30 Dell Products L.P. Expandable ad hoc domain specific query for system management
WO2016041594A2 (en) * 2014-09-17 2016-03-24 Siemens Aktiengesellschaft Method and digital tool for engineering software architectures of complex cyber-physical systems of different technical domains
WO2016183544A1 (en) 2015-05-14 2016-11-17 Walleye Software, LLC System performance logging
US11086874B2 (en) * 2016-08-29 2021-08-10 Vmware, Inc. Management of a virtual infrastructure via an object query language
US10241965B1 (en) 2017-08-24 2019-03-26 Deephaven Data Labs Llc Computer data distribution architecture connecting an update propagation graph through multiple remote query processors
US20210286793A1 (en) * 2020-03-16 2021-09-16 EMC IP Holding Company LLC Indexing stored data objects using probabilistic filters
CN114722185B (en) * 2022-04-06 2023-07-18 建识科技(成都)有限公司 Intelligent screening and determining method for building standard clause

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5905982A (en) 1997-04-04 1999-05-18 International Business Machines Corporation Handling null values in SQL queries over object-oriented data
US6253195B1 (en) * 1998-09-21 2001-06-26 Microsoft Corporation Optimized query tree
US6742175B1 (en) 1998-10-13 2004-05-25 Codagen Technologies Corp. Component-based source code generator
US6421662B1 (en) 1999-06-04 2002-07-16 Oracle Corporation Generating and implementing indexes based on criteria set forth in queries
US6918124B1 (en) * 2000-03-03 2005-07-12 Microsoft Corporation Query trees including or nodes for event filtering
US20030041142A1 (en) 2001-08-27 2003-02-27 Nec Usa, Inc. Generic network monitoring tool
US7035846B2 (en) 2002-09-23 2006-04-25 International Business Machines Corporation Methods, computer programs and apparatus for caching directory queries
US7010538B1 (en) 2003-03-15 2006-03-07 Damian Black Method for distributed RDSMS
US7472112B2 (en) * 2003-06-23 2008-12-30 Microsoft Corporation Distributed query engine pipeline method and system
US7519577B2 (en) * 2003-06-23 2009-04-14 Microsoft Corporation Query intermediate language method and system
US7801898B1 (en) 2003-12-30 2010-09-21 Google Inc. Methods and systems for compressing indices
US7644064B2 (en) 2004-02-19 2010-01-05 Microsoft Corporation Systems and methods for filter table optimization
US7181464B2 (en) 2004-02-20 2007-02-20 Microsoft Corporation Forward-only evaluation for XPATH inverse query processing
US20050210023A1 (en) 2004-03-18 2005-09-22 Renato Barrera Query optimizer using implied predicates
US7376642B2 (en) * 2004-03-30 2008-05-20 Microsoft Corporation Integrated full text search system and method
US7774485B2 (en) * 2004-05-21 2010-08-10 Bea Systems, Inc. Dynamic service composition and orchestration
US7512592B2 (en) 2004-07-02 2009-03-31 Tarari, Inc. System and method of XML query processing
US7299171B2 (en) 2004-08-17 2007-11-20 Contentguard Holdings, Inc. Method and system for processing grammar-based legality expressions
US7542990B2 (en) * 2004-10-26 2009-06-02 Computer Associates Think, Inc. System and method for providing a relational application domain model
US7539667B2 (en) 2004-11-05 2009-05-26 International Business Machines Corporation Method, system and program for executing a query having a union operator
US8935273B2 (en) 2005-06-23 2015-01-13 International Business Machines Corporation Method of processing and decomposing a multidimensional query against a relational data source
US8566789B2 (en) 2005-08-25 2013-10-22 Infosys Limited Semantic-based query techniques for source code
CA2519021A1 (en) 2005-09-13 2007-03-13 Cognos Incorporated System and method of providing date, arithmetic, and other functions for olap sources
US7929411B2 (en) 2005-09-28 2011-04-19 Cornell Research Foundation, Inc. Methods and systems for obtaining data from networks of sources
US20070078816A1 (en) 2005-10-05 2007-04-05 Microsoft Corporation Common sub-expression elimination for inverse query evaluation
US7827523B2 (en) * 2006-02-22 2010-11-02 Yahoo! Inc. Query serving infrastructure providing flexible and expandable support and compiling instructions
US7849069B2 (en) * 2006-06-21 2010-12-07 International Business Machines Corporation Method and system for federated resource discovery service in distributed systems
US20080120283A1 (en) * 2006-11-17 2008-05-22 Oracle International Corporation Processing XML data stream(s) using continuous queries in a data stream management system
US7680765B2 (en) * 2006-12-27 2010-03-16 Microsoft Corporation Iterate-aggregate query parallelization
US20080177748A1 (en) * 2007-01-24 2008-07-24 Christopher Rondot SQL Faceplate
WO2008148130A2 (en) * 2007-05-31 2008-12-04 Agent Logic, Inc. Distributed system for monitoring information events
US7676461B2 (en) * 2007-07-18 2010-03-09 Microsoft Corporation Implementation of stream algebra over class instances
US7984043B1 (en) * 2007-07-24 2011-07-19 Amazon Technologies, Inc. System and method for distributed query processing using configuration-independent query plans
US7941460B2 (en) * 2007-09-05 2011-05-10 International Business Machines Corporation Compilation model for processing hierarchical data in stream systems
US8046352B2 (en) 2007-12-06 2011-10-25 Oracle International Corporation Expression replacement in virtual columns
US7917547B2 (en) 2008-06-10 2011-03-29 Microsoft Corporation Virtualizing objects within queries
US8713048B2 (en) * 2008-06-24 2014-04-29 Microsoft Corporation Query processing with specialized query operators
US8204903B2 (en) 2010-02-16 2012-06-19 Microsoft Corporation Expressing and executing semantic queries within a relational database

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11899643B2 (en) * 2015-05-22 2024-02-13 Open Text Holdings, Inc. System and method for generating, maintaining, and querying a database for computer investigations cross-reference to related application
US20200349555A1 (en) * 2018-01-16 2020-11-05 Zoe Life Technologies Holding AG Knowledge currency units
US11218374B2 (en) * 2019-07-30 2022-01-04 Microsoft Technology Licensing, Llc Discovery and resolution of network connected devices
CN113486092A (en) * 2021-07-30 2021-10-08 苏州工业职业技术学院 Time graph approximate query method and device based on time constraint

Also Published As

Publication number Publication date
US8805875B1 (en) 2014-08-12

Similar Documents

Publication Publication Date Title
US8805875B1 (en) Systems and methods for information retrieval
US11645471B1 (en) Determining a relationship recommendation for a natural language request
US11308161B2 (en) Querying a data source on a network
US11670288B1 (en) Generating predicted follow-on requests to a natural language request received by a natural language processing system
US8140680B2 (en) Machine-processable semantic description for resource management
US20090248651A1 (en) Apparatus and method for maintaining metadata version awareness during set evaluation for olap hierarchies
Sellami et al. Supporting multi data stores applications in cloud environments
US20230306059A1 (en) Tool to build and store a data model and queries for a graph database
US11475053B1 (en) Providing completion recommendations for a partial natural language request received by a natural language processing system
KR20080019439A (en) System and method for knowledge extension and inference service based on dbms
US9323828B2 (en) Complex query handling
US11494395B2 (en) Creating dashboards for viewing data in a data storage system based on natural language requests
US9081873B1 (en) Method and system for information retrieval in response to a query
KR20100135789A (en) Query processing visualization system and method of visualizing query processing
US7877400B1 (en) Optimizations of XPaths
US20190034247A1 (en) Creating alerts associated with a data storage system based on natural language requests
Ribeiro et al. Transparent Interoperability Middleware between Data and Service Cloud Layers.
Bai et al. G-path: flexible path pattern query on large graphs
US20220236968A1 (en) Optimized data resolution for web components
US20150286725A1 (en) Systems and/or methods for structuring big data based upon user-submitted data analyzing programs
US20120078923A1 (en) Scripting using new ordering algorithm
Liu et al. Modeling fuzzy relational database in HBase
Abatal et al. SQL2SPARQL4RDF: Automatic SQL to SPARQL conversion for RDF querying
Abbas Dead-End Detection in an Ontology-Based Visual Query System
US11467860B2 (en) Optimized data resolution for web components

Legal Events

Date Code Title Description
AS Assignment

Owner name: STRATACLOUD, INC., GEORGIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:REFLEX SYSTEMS, LLC;REEL/FRAME:033113/0141

Effective date: 20140402

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION