US20150026153A1 - Search engine for information retrieval system - Google Patents
Search engine for information retrieval system Download PDFInfo
- Publication number
- US20150026153A1 US20150026153A1 US14/252,553 US201414252553A US2015026153A1 US 20150026153 A1 US20150026153 A1 US 20150026153A1 US 201414252553 A US201414252553 A US 201414252553A US 2015026153 A1 US2015026153 A1 US 2015026153A1
- Authority
- US
- United States
- Prior art keywords
- tokens
- token
- search
- user
- computing device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30864—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2423—Interactive query statement specification based on a database schema
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90324—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G06F17/30336—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Definitions
- BI business intelligence
- BI business intelligence
- BI is a set of theories, methodologies, processes, architectures, and technologies that may transform raw data into meaningful and useful information for business purposes.
- BI may handle large amounts of information to help identify and develop new opportunities. Making use of new opportunities and implementing an effective strategy can provide a competitive market advantage and long-term stability.
- the form is taken to an information technology (IT) specialist who generates the correct relational database management system (RDBMS) queries and then creates reports from on the query results.
- IT information technology
- RDBMS relational database management system
- the process of the business user specifying what data to retrieve and then getting the final report back from the IT specialist can often take several weeks.
- Existing database solutions may not be fast enough to provide an interactive business user experience when large volumes of data need to be processed.
- Internet search engines are designed to search through documents and help users formulate questions. However, Internet search engines are not aware of the relationships that exist in typical relational databases. Internet search queries only perform simply string matches and do not have the intelligence to perform searches on structured data and correlate/aggregate/filter the search results.
- FIG. 1 depicts an example information retrieval system.
- FIG. 2 depicts an example process flow for the information retrieval system.
- FIG. 3 depicts an example of how data columns are joined in the database system.
- FIG. 4 depicts an example process for generating structured search instructions from ad-hoc user inputs.
- FIGS. 5A-5C depict example inputs entered into the search engine.
- FIGS. 6A-6C depict example auto-completions and explanations performed by the search engine.
- FIGS. 7A-7B depict example error messages generated by the search engine.
- FIG. 8 depicts an example process for ranking and displaying tokens.
- FIG. 9 depicts example indexes having different token weightings.
- FIG. 10 depicts another example process for ranking tokens.
- FIG. 11 depicts an example state machine.
- FIGS. 12A-12G depict example electronic pages displayed based on states identified by the state machine in FIG. 11 .
- FIG. 13 depicts an example index used by the search engine.
- FIG. 14 depicts an example index for ranking tokens.
- FIG. 15 depicts an example priority queue.
- FIG. 16 depicts example token security bitmasks.
- FIG. 17 depicts an example index with security bitmasks.
- FIG. 18 depicts an example index structure used by the information retrieval system.
- FIG. 19 depicts an example distributed index structure.
- FIG. 20 depicts an example table mapping in the information retrieval system.
- FIG. 21 depicts an example user selectable table mapping.
- FIG. 22 depicts a user selectable table mapping for a first search term.
- FIG. 23 depicts an alternative user selectable table mapping for a second search term.
- FIG. 24 depicts another alternative user selectable table mapping for a third search term.
- FIG. 25 depicts an example ranking scheme for a user selectable table mapping.
- FIG. 26 depicts a scheme for dynamically linking together different tables.
- FIGS. 27 and 28 depict a scheme for generating a worksheet table.
- FIG. 29 depicts an example scheme for processing formulaic search queries.
- FIG. 30 depicts an example computing device for implementing the information retrieval system.
- a search engine converts unstructured ad-hoc search queries into search instructions that retrieve data from a structured relational database.
- the search engine allows most users to express search requirements by just typing a few words.
- Tokens are generated based on content, attributes, measures, and other metadata located in the relational database and in any other structured or unstructured data sources.
- the search engine identifies and displays the tokens in response to user inputs.
- the tokens may include any word, phrase, set of characters, value, symbols, or the like, or any combination thereof.
- the search engine uses the tokens to guide the user through a selection of search terms that are then used for generating a structured query instruction.
- the structured query instruction is then used for retrieving data in the database system.
- the search engine may determine and/or predict what tokens and/or data are associated with the user input. For example, the search engine may provide suggestions and display data before the user completes entering a search term or completes the search query. The suggestions help guide the user to relevant data that currently exists in the database system.
- the retrieval system may use a ranking-based scheme to predict user inputs and fetch corresponding results.
- the search engine may predict the user is looking for sales data based on only a few search characters.
- the search engine then may suggest search terms for accessing different types of sales data and may initiate a preliminary search to retrieve and display sales data for different sales regions.
- the preliminary search may display the precise information the user was initially searching. In other situations, the preliminary search may display other helpful information that the user did not know existed, that the user would not have normally requested, or that helps the user locate the correct information.
- the search engine may resolve ambiguities between the search input and the available data. For example, the user may enter the input REVENUE BY YEAR.
- a transaction table may include revenue data based on order date and ship date.
- the search engine may ask the user if revenue should be calculated based on the order date or the ship date?
- the search engine also may provide an explanation of how operations were performed on particular data to generate the results.
- the search engine also may use a security scheme to restrict what data can be viewed by different users.
- the search engine also may allow users the select mappings between search terms and columns in different tables.
- the search engine may generate a worksheet table containing the selected mappings.
- the search engine also may perform user definable formulaic operations on the data in the different user selected table mappings.
- FIG. 1 depicts an example information retrieval system 100 (retrieval system).
- Information retrieval system 100 may comprise an array of local and/or cloud-based computing and storage devices, such as servers, database systems and/or any other computing logic.
- the different computing devices and database systems within retrieval system 100 may be coupled together via one or more buses or networks.
- the buses or networks may comprise local area networks (LANs), wide area networks (WANs), fiber channel networks, Internet networks, or the like, or any combination thereof.
- a user may use computing device 114 to access retrieval system 100 .
- Computing device 114 may comprise a personal computer, computer terminal, mobile device, smart phone, electronic notebook, or the like, or any combination thereof.
- Relational database system 106 may comprise one or more structured relational databases and/or unstructured databases, such any database system storing business intelligence (BI) information for an enterprise.
- the BI information may include sales data, revenue data, profit data, tax data, or the like, or any combination thereof.
- database system 106 may comprise dynamic random access memory (DRAM), Flash memory, disc memory, or any other type of data storage medium with relatively fast data access.
- DRAM dynamic random access memory
- flash memory flash memory
- disc memory or any other type of data storage medium with relatively fast data access.
- some of the data in database system 106 is stored in a column/row structure and assigned additional metadata identifying particular data parameters and relationships.
- retrieval system 100 may identify different characteristics of the data such as, attributes, measures, values, unique identifiers, tags, links, column and row keys, etc.
- Retrieval system 100 may generate tokens based on the data characteristics and associate the tokens with different columns in database system 106 .
- a column in database system 106 may contain revenue data.
- Retrieval system 100 may generate a token comprising the word REVENUE and load the token into an index that compares user inputs with the letters in the word revenue.
- Search engine 104 may use the indexes and state machines to guide the user into using search terms associated with the data and data structures contained in database system 106 .
- the state machine may identify different states and types of data associated with the search input. For example, some search terms may be identified as attributes or measures.
- the attributes may comprise any data characteristic, category, content, etc. that in one example may be non-quantifiable or non-numeric.
- the attributes are drawn from a fixed universe of numeric values, such as age, zip codes, Boolean operators, status codes, or the like, or any combination thereof.
- the measures may comprise quantifiable numeric values such as sizes, amounts, degrees, etc.
- a first column containing the names of states may be considered an attribute column and a second column containing the numbers of orders received for the different states may be considered a measure column.
- Search engine 104 then may initiate a search query in response to a user input requesting revenue data (measure column) for a state (attribute column).
- Search engine 104 displays different tokens and/or data associated with current states of the user inputs.
- the tokens may be associated with the structured tables or columns in database system 106 .
- the tokens help guide the user into using search terms that refer to specific structured data within database system 106 .
- retrieval system 100 may bind the user into using tokens that are associated with data that currently exists within database system 106 . This prevents the user from generating search queries for data that does not exist in database system 106 and from using invalid search terms that cannot be linked with the data in database system 106 .
- the user also may enter valid search terms that have no corresponding data in database system 106 .
- Retrieval system 100 may generate a response that indicates no data exists for the search query.
- the search term may request the number of orders received for the month of July in California.
- the database system may contain columns for orders and California. However, the columns may have no orders for the month of July from California.
- the retrieval system may generate a response that indicates that no orders were received for the month of July from California.
- the user accesses user interface 102 via a web browser or a web application.
- the user may access a local native application that contacts information retrieval system 100 and search engine 104 .
- the user enters characters into a search field within user interface 102 .
- Search engine 104 compares the characters with the tokens previously generated from database system 106 .
- Search engine 104 may display suggested tokens to the user before the user completes the search query. For example, search engine 104 may display a revenue token immediately after the user enters the letter R. If relevant, the user may select the displayed revenue token. Otherwise, the user may enter additional characters into the search field and search engine 104 may compare the additional characters with the tokens.
- Search engine 104 may use ranking algorithms before displaying the tokens to the user. In other words, search engine 104 may try to predict what data in database system 106 is associated with the user inputs. For example, there may be hundreds of different tokens associated with the letter R. Search engine 104 may rank the tokens and initiate a search query based on the ranking.
- Token ranking may be based on usage, table cardinality, or any other criteria. For example, a first token may be associated with a first column in the database system having four values and a second token may be associated with a second column in the database system having hundreds of values. Search engine 104 may rank the first token higher than the second token, since the fewer number of values in the first column may have a higher likelihood of containing the correct data. Search engine 104 also may rank the tokens based on how frequently associated columns have been accessed in prior search queries.
- Search engine 104 initiates a search query by sending a search object to a business intelligence server 108 (BI server).
- BI server business intelligence server 108
- Search query and structured search query are used interchangeably below and refer to any combination of text, symbols, operators, commands, instructions, code, or the like or any combination thereof, used for accessing any type of data in database 106 .
- Search queries may be initiated before, during and/or after search terms are entered into the search field.
- search engine 104 may send a search object to BI server 108 after the user enters a few characters into user interface 102 .
- Search engine 104 may initiate other search queries and send associated search objects to BI server 108 after additional characters are entered for the user search query.
- BI server 108 generates instructions based on the search objects received from search engine 104 .
- the instructions comprise SQL-like queries.
- Database system 106 sends data back to BI server 108 in response to the search instructions.
- the search object and associated structured search instructions also may direct database system 106 to perform different operations.
- the structured search instructions may request database system 106 to join different columns and sum together different data in the columns.
- BI server 108 receives data back from database system 106 in response to the structured search instructions and displays the data within user interface 102 .
- the data may be displayed before the user completes entering a search term.
- search engine 104 may identify a revenue token and send an associated search object to BI server 108 .
- BI server 108 may generate structured search instructions based on the search object that retrieve content from database system 106 associated with the revenue token.
- BI server 108 receives the associated revenue data from database system 106 and displays the revenue data within user interface 102 .
- a cluster manager 110 manages the configuration and distribution of processes within retrieval system 100 .
- cluster manger 110 may assign user interface 102 , search engine 104 , BI server 108 , and associated backup processes, to different computing devices within retrieval system 100 .
- FIG. 2 depicts in further detail an example of operations performed by information retrieval system 100 .
- a user enters portions of search term 126 into search field 120 of user interface 102 .
- the portions of search term 126 are analyzed by search engine 104 .
- index 128 , state machine 130 , and a ranking algorithm 132 may analyze each character entered into search field 120 .
- search engine 104 may anticipate, predict, suggest, and/or direct the user to available data as the user is entering search term 126 .
- Search engine 104 may generate a search object 134 for different logic states of search term 126 .
- search engine 104 may generate a first search object 134 after detecting the measure search term REVENUE.
- the search engine 104 may generate a second search object 134 after the user completes entering the entire search term REVENUE BY STATE.
- search object 134 comprises pseudo-code, instructions, text or some other type of message 136 A that identifies the matching revenue token associated with measure column 146 in database system 106 .
- Instruction 136 A also includes an operator indicating the revenue data should be summed together.
- search object 134 also includes an instruction 136 B that identifies the matching state token associated with state attribute column 142 in database system 106 .
- Instruction 136 B also includes an operator indicating the revenue data in column 146 should be grouped according to the attributes in state column 142 .
- Search object 134 may include any other information needed for generating structured search query instructions.
- search engine 104 may include instructions 138 for filtering the data based on an identified search term.
- Search engine 104 may generate filter instructions 138 in search object 134 such as:
- Search object 134 also may include instructions for displaying the retrieved data.
- a time value may direct the BI server to plot the retrieved data on a time graph.
- BI server 108 operates as a metadata server and as a query delivery server.
- the metadata server generates relational models that identify the locations and relationships for different tables, columns, and data within database system 106 .
- BI server 108 may generate the relationship models from metadata extracted from the tables, columns, data, and link information in enterprise database. In another example, a system administrator may generate some of the relationship models.
- the query delivery server portion of BI server 108 converts search object 134 into structured search instructions 140 , sends instructions 140 to database system 106 , and delivers the data retrieved from database system 106 to user interface 102 .
- Structured search instructions 140 are based on the table, column, and/or data locations and relationships identified in the relational models.
- the relational models in BI server 108 may indicate that revenue column 146 is located in a table T 1 within database system 106 and state column 142 is located within a table T 2 of database system 106 .
- BI server 108 generates structured search instructions 140 that reference the tables T 1 and T 2 containing columns 146 and 142 , respectively.
- Search engine 104 may include relational models for determining when queries can be processed. For example, search engine 104 may identify valid tokens that are associated with columns in database system 106 . Search terms that do not correspond with valid tokens may be identified as invalid search requests.
- the user may enter the term REVENUE COUNTRY as the query.
- the search engine may determine the query is invalid because revenue comes from table T 1 and country comes from table T 2 and there is no relationship that connects table T 1 and table T 2 .
- the search engine uses an awareness of the relationships between data elements to improve efficiency and effectiveness of user search queries.
- the user may be able to dynamically map or link together table T 1 and table T 2 and generate search results for related columns.
- Search engine 104 also may use relational models for resolving search ambiguities.
- a user query comprising valid tokens may potentially map to multiple tables or multiple columns in database system 106 .
- search engine 104 may ask the user for more information to disambiguate user search terms or may suggest alternative search terms.
- Search engine 104 may determine that a first table or column exists for products sold in California and a second table or column exists for products produced in California. Search engine 104 may query the user to select between products sold and products produced.
- Structured search instructions 140 also may include operators that indicate how database system 106 should process the identified data.
- instructions 140 may include a sum operator that instructs database 106 to sum the data in revenue column 146 of table T 1 .
- Instructions 140 also may include a GROUP BY OPERATOR that instructs database 106 to group the revenue data in revenue column 146 of table T 1 according to the state names contained in state column 142 of table T 2 .
- search instructions 140 may comprise or be similar to SQL instructions.
- BI server 108 displays data retrieved back from database 106 within user interface 102 .
- BI server 108 may display a table 122 that identifies the total revenue for individual states.
- BI server 108 also may display the same data or different data in different formats, such as within bar graph 124 .
- BI server 108 may automatically change the format used for displaying data in user interface 102 based on the parameters and dimensions of the associated data in database system 106 .
- state column 142 may have a relatively few number of entries and revenue column 146 have a relatively large number of entries.
- BI server 108 may determine that bar graph 124 is the best format for displaying the data from columns 142 and 146 and may determine that the relatively few entries in state column 142 should be displayed on the x-axis of bar graph 124 while the relatively large number of entries in revenue column 146 should be displayed along the y-axis of bar graph 124 .
- BI server 108 may use other formats or criteria for displaying data.
- BI server 108 also may display the data based on any filter values 138 identified in search object 134 .
- filter instruction 138 may direct BI server to only display revenue data for California or for a particular year.
- BI server 108 may automatically and dynamically suggest additional filters based on the retrieved data.
- the relational models generated by BI server 108 may identify other columns in database system 106 associated with column 142 and/or column 146 .
- a third column may include the same state identifiers used in columns 142 and 146 .
- the third column may contain zip codes.
- BI server 108 may display a filter within user interface 102 that allows the user to filter the retrieved data by the different zip codes contained in the third column.
- FIG. 3 depicts an example of how database system 106 may join revenue column 146 from table T 1 with state column 142 from table T 2 in response to structured search instructions 140 .
- Database system 106 may join columns 146 and 142 by appending rows with common state identifiers 144 .
- the second row of state column 142 may include state identifier 2 associated with the attribute Alaska.
- the seventh row of revenue column 146 also may include the state identifier 2 associated with the measure $4,500.
- Database system 106 may join the data contained in the second row of column 142 in table T 2 with the data contained in the seventh row of column 146 of table T 1 .
- Database system 106 then may sum and group the joined columns based on operators in structured search instructions 140 . For example, database system 106 may group together all revenue data associated with California state identifier 5 and sum the group of data together to generate a total of $139,995. Database system 106 may perform similar operations for the other entries in columns 142 and 146 to generate results 147 . Database system 106 sends results 147 back to the BI server and the BI server then displays the results within the user interface.
- Joined table 145 may be cached by a particular database coordinator operating in database system 106 .
- the database coordinator that computes the final result may insert the results into the cache. If the BI server issues the same query to the same database coordinator, the BI server may receive the previously cached results.
- the BI server also may cache some results or may cache responses provided to the user interface instead of caching database query results.
- FIG. 4 depicts an example process for retrieving structured relational data based on unstructured user inputs.
- Operation 143 A loads data into the information retrieval system.
- the data may comprise any combination of structure relational data, and/or non-structured data.
- Operation 143 B identifies data, columns, attributes, measures, values and any other information associated with the data. Metadata tokens and data tokens are generated based on the identified data and used for building indexes.
- Operation 143 C receives a user input.
- the user input may comprise one or more characters or any other input entered into the search field of the user interface.
- Operation 143 D uses the knowledge of the metadata tokens, data tokens, operators and their rankings to parse the user input.
- Operation 143 E may generate suggestions based on the parsed user input.
- Operation 143 F generates a query object based on the parsing of the user input. For example, a first character of the user input may match a first character of four different tokens. Operation 143 F may generate a query object that requests the column data associated with all four tokens.
- the query object may identify the tokens and identify token classifications, such as an attribute, measure, or value.
- the query object also may identify operators such as, a group by operator, a summation operator, a standard deviation operator, an average operator, or a count operator.
- Operation 143 G generates structured search instructions based on the query object. For example, operation 143 G may identify the tables and columns in the database system associated with the tokens identified in the search object. The structured search instructions retrieve and join the data in the identified tables and columns. The structured search instructions also may instruct the database system to perform functions identified by the operators in the search object.
- Operation 143 H may display the data received back from the database system. For example, operation 143 H may display a sum for all revenue in the revenue column and may display tokens associated with each entry in the state column. Selection of a state token may display the revenue for the state associated with the selected token. Operation 143 H may identify multiple columns associated with the search query. All columns may be ranked based on cardinality and usage. The data and/or tokens associated with the highest ranked columns may be displayed in the user interface
- FIGS. 5-7 show inputs and responses received and generated by the search engine.
- the search engine may receive a natural input, such as “how many people live in America”.
- natural language algorithms may extract relevant information and map the information to a more structured form.
- the search engine may provide sophisticated spell correction. For example, if the input is CALP, the search engine provides feedback asking if the input should really be CLAP. After analyzing and recognizing the input, the search engine may provide additional guidance. For example, the search engine may provide some indication of possible additional search terms.
- Different search terms may be correlated to determine the correct search query. For example, if the user input includes the words Dublin and California, the search engine may determine that the search is related to Dublin Calif. and not Dublin Ireland.
- the search engine may provide prefix completions. For example, if the input is currently CAL, the search engine may automatically complete the input to CALIFORNIA.
- the search engine can also provide suffix completion. For example, if the input is currently PAL, the search engine may complete the input with PRINCIPAL.
- the search engine may complete the phrase and/or provide a ghost term completing the search term.
- the user may type in REVENUE.
- the search engine also may anticipate what may come after a properly entered input and query the user. For example, for the input REVENUE FOR CALIFORNIA, the search engine may ask DO YOU WANT QUARTLY REVENUE FOR CALIFORNIA? Thus, even with a correct and complete input, the search engine still may provide additional guidance for accessing other available and related information.
- the search engine may provide intelligent analysis to determine the error. If an input is not recognized, the search engine may provide suggestions of what should be typed or ask questions about the input. For example, the search engine may provide feedback “This doesn't make sense, did you mean this?” Certain keywords are part of the search engine language. If these keywords are used incorrectly, the search engine may provide suggestions for the correct usage of the keyword, phrase, or language construct.
- the search engine also may generate a query for ambiguous inputs.
- a transaction table may have data associated with an order date and a ship date.
- the input may request REVENUE BY YEAR.
- the search engine may resolve the ambiguity by asking the user if they want revenue data based on order date or ship date.
- the search engine also may sort out inconsistent data.
- the search engine may perform unstructured text searches.
- the database system may include unstructured data, such as employee notes, in addition to the structured data.
- the search engine may perform a keyword search for any other unstructured data.
- the search engine allows the user to select different join paths. For example, the search engine may allow the user to select which columns in which tables are associated with different search terms. The search engine then may generate a worksheet table containing the selected mappings.
- FIG. 5A depicts an example of inputs entered into a user interface 102 operated by the information retrieval system. This is just one example of a variety of different ways search terms may be entered and feedback presented to a user.
- User interface 102 may display an electronic page 148 within a web browser or other mobile or desktop application.
- a user may use computing device 114 in FIG. 1 to enter a search term 152 A into a search field 150 within electronic page 148 .
- the search engine may display a menu 154 including a first list of tokens 156 A alternatively referred to as suggestions.
- Menu 154 also may include a second list of operators 158 alternatively referred to as useful keywords.
- the search engine may display some tokens 156 A as soon as the user mouse clicks within search field 150 and before the user enters any characters into search field 150 .
- an initial set of tokens 156 A may be associated with the most frequently used search queries or the most frequently accessed data.
- Operators 158 may determine how search results are retrieved and displayed.
- a FOR operator 158 may direct the retrieval system to retrieve and display data for a particular attribute or category of search data, such as REVENUE DATA FOR CARS.
- a BY operator 158 may direct the retrieval system to group an identified category of search data by a particular sub-category, such as REVENUE BY YEAR.
- the retrieval system may automatically figure out the operator from the context. For example, the user may enter the search terms REVENUE STATE. The retrieval system may interpret the search term as REVENUE BY STATE since state is the name of a column. In another example, the user may enter the search term REVENUE CALIFORNIA. The retrieval system may interpret the search term as REVENUE FOR CALIFORNIA since California is a value in a column and is not the name of the column.
- Operators 158 also may include mathematical functions. For example, a count operator 158 may direct the retrieval system to display the number of an identified type of data, and minimum and maximum operators 158 may direct the retrieval system to display minimum and maximum values for the data, respectively. A summation (sum) operator 158 may direct the retrieval system to add up a total value for the data, an average operator 158 may direct the retrieval system to calculate an average value for the data, and a standard deviation (std. dev.) operator 158 may direct the retrieval system to calculate a standard deviation for the data. The retrieval system also may use other types of operators 158 .
- a count operator 158 may direct the retrieval system to display the number of an identified type of data
- minimum and maximum operators 158 may direct the retrieval system to display minimum and maximum values for the data, respectively.
- a summation (sum) operator 158 may direct the retrieval system to add up a total value for the data
- an average operator 158 may direct the retrieval system to calculate an average value for the
- the retrieval system may repeatedly update tokens 156 A in response to any combination of characters or terms entered into search field 150 .
- the user may initially enter the character C into search field 150 .
- the search engine may identify and display a first set of tokens CAR, CALIFORNIA, and COMPUTERS. Simultaneously, the search engine may initiate a structured search query for data in the database system associated with the displayed tokens.
- the user may enter a second character A into search field 150 .
- the search engine may identify and display a new second set of tokens CAR and CALIFORNIA.
- the search engine may refine the previously structured search query for data in the database system associated with the second set of displayed tokens.
- FIG. 5B shows another state of user interface 102 after a user has completed entry of a search term 152 B.
- the user may type in search term 152 B or the user may have selected the associated CALIFORNIA token 156 A previously displayed in menu 154 of FIG. 2A .
- the retrieval system may initiate another search in response the user entering the complete CALIFORNIA token into search field 150 .
- the retrieval system may automatically update tokens 156 B to correspond with search term 152 B.
- columns in the database system associated with tokens 156 B may be linked to the column in the database system associated with the CALIFORNIA token.
- REVENUE, YEAR, PART, TRANSACTION, AND WIDGET columns in the database system may be linked to the CALIFORNIA column in the database system.
- the retrieval system may display tokens 156 B associated with the REVENUE, YEAR, PART, TRANSACTION, AND WIDGET columns.
- the retrieval system may rank tokens based on the state of the search query, data usage, cardinality, or user preferences. For example, the retrieval system may have identified dozens of tokens associated with the CALIFORNIA token. The retrieval system may have given the REVENUE token 156 B a highest ranking. Accordingly, the retrieval system may display the REVENUE token 156 B in menu 154 and may display associated revenue data for the state of California within a graph 164 . The retrieval system also may display total revenue for the state of California within a display box 160 .
- the retrieval system also may display filters 162 based on the data associated with search term 152 B. For example, filters 162 may identify other revenue data associated with other geographic regions.
- FIG. 5C depicts yet another state of a user search query where additional parameters have been added to form search term 152 C. For example, a year value, a transaction value, and a product type have been added to search term 152 C.
- the retrieval system may identify a new refined set of tokens 156 C and also refine the search query and/or the displayed data in response to the additional parameters in search term 152 C. For example, the retrieval system may display the revenue data for California, for year 2013, for debit transactions, and for different types of widgets.
- the retrieval system also may display a time period 166 , a number of parts 168 , a total revenue 170 , and a total tax 172 within a headline view associated with search term 152 C.
- a time period 166 a number of parts 168 , a total revenue 170 , and a total tax 172 within a headline view associated with search term 152 C.
- the retrieval system also may display other tables 174 and graphs 176 associated with the retrieved data.
- the retrieval system may automatically determine a best format for displaying the data.
- the data associated with a search term may comprise a large range of values for a limited number of attribute categories.
- the retrieval system may decide a bar graph similar to graph 176 is the best way to display the data.
- the retrieval system may determine the number of measures, number of attributes, and number of values associated with the data and present data based on these data dimensions.
- the retrieval system may use other formats for displaying data, such as bubble charts, line graphs, pie charts, or the like, or combinations thereof, based on the dimensions of the associated tables or columns in the database system.
- FIG. 6A depicts another example of how the search engine may suggest, predict, and/or complete search terms.
- the user may enter a search term 152 A and the search engine may complete the input with search term 152 B.
- the search term REVENUE FOR EACH CUSTOMER REGION may determine that revenue data associated with the search term includes associated commit order dates.
- the search engine may complete the search term by adding FOR COMMIT DATA BEFORE MM/DD/YYYY.
- the search engine may display additional search term 152 B in a different font, color, and/or image intensity to represent the portion of the input added by the search engine.
- the search engine may generate a search request obtaining data 184 B identifying the unique number of customer regions and data 184 C identifying the total revenue for each of the identified customer regions.
- the search engine may automatically generate a bar graph 184 D also displaying the total revenue for each of the identified customer regions.
- the search engine also may sum the revenue for all of the different regions and display the results as total revenue 184 A.
- FIG. 6B depicts one example of how the search engine provides explanations regarding data 184 .
- An inquiry icon 186 is displayed in search field 150 .
- the search engine displays information box 188 explaining how revenue data was computed for each customer region and then sorted in ascending.
- Information box 188 further explains that the revenue information was generated by linking data from a line order table with data in a customer table. In this example, the columns and/or tables containing the data are underlined in information box 188
- FIG. 6C shows another example of how the search engine may suggest additional search terms based on a partial search term 152 .
- the search engine may identify tokens and/or operators 192 matching with the partial search term 152 .
- the search engine may identify tokens matching the partial input MICHAEL JAC.
- the search engine also may identify other attributes, measures, values and/or operators associated with the identified tokens.
- the search engine displays tokens 192 along with the other parameters in menu 190 as suggested or completed search terms.
- the search engine uses state machines, indexes, and the content of data to perform a structured analysis of the search terms. For example, the search engine may identify search terms as measures, attributes, values, and operators and determine a correct sequence of the search terms based on these characteristics of the search term. The search engine may identify valid data associated with the search term and may determine that the search term is not valid for a current state of the search term. The search engine also may provide suggestions for correct search terms to add to the current state of the search input.
- FIG. 7A depicts an example error message 194 generated by the search engine.
- the search engine may generate different error messages based on both the understood tokens and the unrecognized words in the search term.
- the search engine generates error message 194 in response to the search term GROWTH BY. OF may be the only valid input after GROWTH.
- the search engine may generate error message 194 identifying the incorrect input and suggesting a correct input for that particular state.
- FIG. 7B shows another example error message generated by the search engine.
- error message 196 explains that the search engine could not find customer data metadata associated with search term 198 .
- Error message 196 also indicates search term 198 was expected to be either a keyword or a numeric column name.
- Error messages may be generated in a variety of different ways.
- One scheme may provide error messages for different keywords.
- the search engine checks rules associated with the error messages that indicate where or how keywords should be used in a search term.
- the search engine displays the associated error message when the keyword is located in the wrong location or used in the wrong syntax within the search term.
- the error message may identify the current incorrect location of the keyword and the expected correct location of the keyword. For example, the error message may state: THE KEYWORD TOP IS LOCATED AT THE END OF YOUR SEARCH BUT WAS EXPECTED AT THE BEGINNING OF THE SEARCH.
- the state machines may branch to different error messages based on the state of the search.
- an acceptable search term may be MEASURE BY ATTRIBUTE.
- the entered search term may comprise MEASURE BY MEASURE.
- the search engine may include a state machine that branches to an associated error message in response to receiving the second measure.
- the search engine also may identify a current state of the search, determine all acceptable inputs for a next state, and generate error messages based on the analysis. For example, the search engine may generate an error message: YOU COULD HAVE TYPED A NUMERIC ATTRIBUTE OR TYPED “GREATER THAN (>)” BUT YOU DIDN'T AND ACTUALLY TYPED AN INPUT THAT WE DO NOT UNDERSTAND.
- the search engine may determine the meaning or type of search term by searching through tables and columns.
- the search engine may generate an error message when the type of search term does not correlate with a current state of the search. For example, the search engine may generate the error message: THE WORD YOU TYPED IS ACTUALLY A CUSTOMER NAME, BUT WE WERE EXPECTING A COLUMN NAME.
- the user may enter the search term REVENUE FOR TAX BEFORE.
- the token BEFORE may be a valid keyword for date but not a valid floating operator.
- the user may enter the keyword AND.
- the acceptable types of token entries for the current state may be attribute, measure, or value tokens.
- the search engine may generate a generic error message: THE WORD “AND” DOES NOT MAKE SENSE.
- the user may enter STATE as a search term.
- the user may enter a search term SUM REVENUE>100.
- the search term may be invalid and the user may be trying to sum up all of the rows of a revenue column that are greater than the value 100.
- the search engine may generate the error message: WE DON'T SUPPORT THIS SYNTAX. YOU COULD TRY ENTERING “SUM REVENUE, REVENUE>100”.
- the user may input the search term, REVENUE FOR CUSTOMER CITY MEERUT.
- the city MEERUT may not exist in any index.
- the search engine 104 may display a message saying: “MERRUT” DOES NOT EXIST IN THE DATABASE SYSTEM.
- the search engine generates different categories of error messages based on the semantics of the search term that help the search engine understand/predict what the user was trying to search.
- the search engine then generates context sensitive messages that guide the user back on a correct search path.
- the search engine might not display certain tokens and/or associated data to certain users.
- the search engine may provide both row level security and object level security.
- Row level security for example, may restrict a salesman in the state of California to only viewing transactions in a sale table related to the state of California.
- Object level security for example, may restrict a user from viewing all transactions from the sales table or from particular columns of the same table.
- a salesmen may only be authorized to view sales figures under a certain amount.
- human resources personnel may be authorized to view employee work history data but may not be authorized to view sensitive company financial data.
- the search engine not only limits what data is displayed to different users but also restricts which tokens are displayed to the users.
- Tokens may identify columns of data that some users should not know exists.
- the tokens may identify companies, people, products, locations, transactions, etc. in the database system that should be kept confidential.
- the tokens and associated data may be assigned security values.
- the tokens associated with the data may be assigned a particular security flag that only allows access by managers.
- the search engine compares a security tag for the user with the security flags assigned to the data and tokens. If the two security identifiers do not match, the token and associated data will not be displayed to the user.
- Tokens may be ranked based on different parameters, such as usage. Some columns may be used more than others. For example, a STATE column may be used more than a SALARY column. Based on usage, the search engine may determine which columns/data are more likely to be requested. The search engine then ranks the tokens associated the columns based on the most likely data associated with the user.
- Ranking also may take into account the sequence of tokens associated with the search terms. For example, the STATE column may be used more often than the SALARY column However, the sequence or combination of the SALARY column and a REVENUE column may be used more frequently than a combination of the STATE column and the REVENUE column. Therefore, when the term REVENUE is previously entered into the search field, a next SALARY token may be ranked higher than a next STATE token.
- the tokens also may be ranked based on the context of the search term, such as based on a search term being an attribute, value, or measure.
- the search engine also may rank tokens based on the cardinality of the associated columns. For example, a column #1 may identify 10 million products for the state of California. A column #2 may identify 50 states that include the state of California. If the user enters the search term CALIFORNIA, the search engine may display the list of states in column #2, since the answer may have a better chance of being associated with one of the 50 states versus being associated with one of the million products in column #1.
- FIG. 8 depicts an example process for ranking tokens.
- Operation 200 A receives the user inputs.
- Operation 200 B identifies tokens associated with the user inputs.
- tokens may be identified based on similarities between the characters in the user inputs and the characters in the tokens and the attribute, measure, and value classifications of the tokens.
- the state machine may require the first search term to be an attribute.
- Operation 200 B may only compare tokens classified as attributes with the first portion of the user input.
- Operation 200 B may query the user for a clarification if the user input is ambiguous.
- Operation 200 C identifies the usage and cardinality of the columns associated with the identified tokens and operation 200 D ranks the tokens based on the usage and cardinality. Operation 200 D displays the tokens based on the rankings. For example, the highest ten ranked tokens may be displayed to the user.
- FIG. 9 depicts examples of different indexes that the search engine may use for ranking tokens.
- indexes 210 A may be global indexes that track token usage for all users of an enterprise.
- Indexes 210 B may be department indexes that track token usage for different departments within the enterprise.
- a first department index 210 B may track token usage for a group of users associated with the sales department and a second department index 210 B may track token usage for a group of users associated with the finance department.
- User indexes 210 C may track token usage for individual users of the enterprise.
- Indexes 210 A, 210 B, and 210 C may have different token scores for different tokens.
- a California token may have a highest score in global index 210 A based on usage from all users. For example, the California token, and/or the data associated with California token, may have been displayed and/or selected by users of the enterprise more than any other token. However, the group of users in the legal department may have selected a different token more often than the California token. Accordingly, the tokens in global indexes 210 A may have different token scores than the tokens in legal department index 210 B.
- Token scores in user indexes 210 C may be different from token scores in global indexes 210 A and department indexes 210 B. For example, a user responsible for sales in Colorado may select a Colorado token more often than a California token. Accordingly, the Colorado token in user index 210 C associated with the user may have a highest token score.
- the search engine may rank tokens based on the token scores in indexes 210 A, 210 B, and 210 C.
- the search engine may identify one or more department indexes 210 B and a user index 210 C associated with the user.
- the search engine may identify the indexes 210 A, 210 B, and 210 C associated with the user and identify tokens in each of the indexes associated with the search term entered by the user.
- the search engine then may identify the top ranked tokens in each of indexes 210 A, 210 B, and 210 C based on the associated token scores. Some tokens may be assigned higher weights than other tokens. For example, tokens from user index 210 C associated with the user may be given a highest weighting, tokens from department indexes 210 C associated with the user may be given a next highest weighting, and tokens from global indexes 210 C may be assigned the lowest weighting.
- tokens with the highest scores in user index 210 C may be ranked higher than the tokens with the highest scores in department index 210 B and global index 210 A. If the user is associated with a particular department index 210 B, tokens identified in the associated department index 210 B having a given token score may be ranked higher that tokens in global index 210 A with an equivalent token score. Of course, other weighting can be assigned to the tokens scores in the different indexes 210 .
- weightings and ranking criteria may be assigned to the global indexes 210 A, department indexes 210 B, and user indexes 210 C. For example, the top three ranked tokens from each of the indexes 210 A, 210 B, and 210 C may be displayed. This enables the search engine to display a wider variety of tokens from a wider variety of columns.
- the search engine may perform another ranking based on the state machines that analyze the search term. Based on a state of the search term, the state machine may determine that one particular category of token is more likely than another type of token.
- the user may enter the search term REVENUE BY.
- the search engine may determine that the next token could either be an attribute or a measure.
- the state machine also may determine that the next search term is more likely an attribute token than a measure token.
- the search engine may determine that REVENUE BY STATE is a more likely search term than REVENUE BY AGE. Accordingly, the search engine may rank the state attribute token higher than the age measure token.
- FIG. 10 summarizes some of the ranking operations used by the search engine for predicting and displaying tokens.
- the search engine may identify potential tokens for displaying to the user. For example, based on a user input, one or more indexes may identify matching tokens or other tokens associated with the predicted intent of the user.
- Operation 220 B identifies the scores for the identified tokens. Not all tokens need to be identified in operation 220 A. As mentioned above, a ranked subset of tokens may be identified based on the token scores identified in operation 220 B. Operation 220 C weights the token scores based on the associated indexes. As mentioned above, token scores from a user index may be assigned a higher weighting that a token score from a global index.
- Operation 220 D may further weight the token scores based on the token column cardinality.
- a token associated with a column with few entries may be given a higher weight than a token associated with a column with a large number of entries.
- Operation 220 E also may weight token scores based on the types of tokens and the current state of the search term. As explained above, the search engine may determine a first token category is more likely than other token categories. The tokens with the first token category may be assigned higher weights than the other token categories. Operation 220 F displays tokens to the user based on the associated weighted token scores.
- FIG. 11 depicts an example state machine 300 used by the search engine.
- State machine 300 is just one example of almost limitless combinations of functions that may be performed based on any combination of search terms.
- state machine 300 identifies valid inputs and any constraints on completion of a search term.
- State machine 300 generates query instructions from the search term that are then used for performing a structured SQL type search or an unstructured keyword search in the database system.
- State machine 300 may accept search terms that are relatively unconstrained. For example, state machine 300 may accept search terms with different column orderings. In one example, the user may specify a value in the search term and then specify a filter in the search term. In another example, the user may specify the filter in the search term and then specify the value.
- state machine 300 may not have a designated end state thus allowing the user to continue to add search terms and change search results. For example, state machine 300 may determine the search term comprises an attribute and guide the user into selecting a filter. State machine 300 then may return to a previous state allowing the user to enter additional search terms.
- state machine 300 also may rank tokens. For example, state machine 300 may predict a next entry in the search term is most likely a measure, and assign a higher weight to measure tokens and assign a lower weight to attribute tokens. State machine 300 may determine that the next search term is least likely a value and assign even lower weights to the value tokens.
- State machine 300 could start either in state 302 A or state 302 D.
- the only valid input for state 302 A is GROWTH OF.
- State machine 300 may predict that GROWTH OF is a likely input and display GROWTH OF in menu 154 as one of keywords 158 .
- An index associated with state 302 D may include a GEORGIA token and a GREECE token.
- GEORGIA and GREECE are valid value type tokens for state 302 D. Accordingly, the search engine may display the GEORGIA token and the GREECE tokens in menu 154 .
- a REVENUE token may have a relatively high token score.
- the REVENUE token might be the most frequently used token in the global index, and/or a department index associated with the user, and/or a user index associated with the user.
- the REVENUE token is also a valid measure type input for state 302 D. Accordingly, the search engine also displays the REVENUE token in menu 154 .
- a GREEN token also may exist within one of the indexes associated with state 302 D and a value type GREEN token is also a valid input for state 302 D. According, the search engine also displays the GREEN token in menu 154 .
- the user may add a second letter R to search term 152 .
- GR is still a valid search term for state 302 A and GREECE and GREEN are still valid search terms for state 302 D.
- the search engine may continue to display GROWTH OF, GREECE and GREEN in menu 154 .
- the search engine now determines that the GEORGIA and REVENUE tokens are not unlikely inputs and removes them from menu 154 and/or displays other more likely tokens.
- FIG. 12C shows a next state where the user enters the letter O to search term 152 .
- state machine 300 determines that GREECE and GREEN are now unlikely tokens and removes them from menu 154 .
- state 302 D may no longer be valid and the user is limited to entering GROWTH OF into search field 150 or selecting the GROWTH OF keyword from menu 154 .
- FIG. 12D shows electronic page 148 displayed by the user interface after the user enters GROWTH OF into search field 150 .
- Search term 152 moves state machine 300 into state 302 B.
- the only valid entries in state 302 B are measure type tokens. Accordingly, state machine 300 may display the highest ranked REVENUE and TAX measure tokens. The user may type or select the REVENUE token moving state machine 300 into state 302 C.
- State machine 300 may automatically display FOR in menu 154 or may automatically add the FOR operator to search term 152 . State machine 300 then may display a correct format for a next valid input such as a date in menu 154 .
- FIG. 12F shows the electronic page 148 after the user adds a date to search term 152 and state machine 300 moves into state 302 D.
- state machine 300 may generate instructions for calculating revenue growth and display the revenue growth data 320 A and 320 B.
- State machine 300 also may display filters 322 identifying other columns associated with the revenue growth data.
- state machine 300 may generate growth data 320 for all products sold by the enterprise in 2013. In another example, state machine 300 may display revenue growth data 320 for the highest ranked tokens. For example, state machine 300 may generate and display revenue growth data for the state of California.
- valid tokens for state 302 D may include a measure token 310 A, an attribute token 310 B or 310 F, or a value token 310 E.
- Attribute tokens may include a highly ranked STATE token and value tokens may include a highly ranked CALIFORNIA token and a highly ranked OREGON token. Accordingly, state machine may display the STATE, CALIFORNIA, and OREGON tokens in menu 154 in response to moving into state 302 D.
- state machine 300 may display different state names in menu 154 .
- State machine 300 now moves back to state 302 D, and derives and displays more specific revenue growth data for the state of California. For example, state machine 300 may display overall revenue growth data 324 A for California, revenue growth data 324 B for different products sold in California, and display California revenue growth data 324 C for different years.
- Filters 322 may be updated to show other tokens associated with California.
- filters 322 may include tokens for other cities in the state of California or for other states.
- State machine 300 may automatically update growth data 344 based on selection or de-selection of filters 342 .
- the user may un-check the California filter.
- State machine 300 may automatically change search term 152 back to GROWTH OF REVENUE FOR 2013 and automatically regenerate the growth data 320 previously shown in FIG. 12F .
- the user may select LA filter 322 .
- the search may be modified by entering search terms into search field 150 or by selecting one of filters 322 .
- the search engine automatically displays the associated search terms in search field 150 .
- the search engine displays to the user the correct syntax for switching between the different data 320 and 324 .
- the search engine may automatically uncheck the CALIFORNIA filter and/or the 2013 filter when the user removes CALIFORNIA or 2013 from search term 152 .
- State machine 300 generates query generation instructions (QGI) at different states 302 .
- state 302 B may generate a first QGI instruction 312 A in response to detecting REVENUE in the search term, such as INSTRUCTION 1: ADD COLUMN FOR GROWTH (REVENUE).
- Instruction 312 A specifies the column containing the revenue data and includes a modifier that directs the database to calculate growth for the revenue data.
- State 302 B generates instruction 312 B in response to receiving a date in the search term, such as INSTRUCTION 2: USE COLUMN FOR GROWTH DIMENSION (DATE). Instruction 2 directs the database to compare the growth for the revenue column along a date dimension.
- a date in the search term such as INSTRUCTION 2: USE COLUMN FOR GROWTH DIMENSION (DATE).
- Instruction 2 directs the database to compare the growth for the revenue column along a date dimension.
- Instruction 312 C adds a filter to the column where the revenue data is extracted for the value California in the state column.
- instructions 312 are sent from search engine 104 in FIG. 1 to the BI server 108 .
- BI server 108 uses instructions 312 to generate SQL type instructions that access the identified data in database system 106 in FIG. 1 .
- the data generated by database system 106 is displayed in user interface 102 .
- State machine 300 can customize error messages and predict tokens based on the different phrases. For example, in state 302 C state machine 300 can generate a message that directs the user to use the REVENUE token. In state 302 D, state machine 300 may generate more general messages for a variety of possible token types.
- State machine 300 predicts the intent of the user and obtains data for the predicted inputs. For example, the user may enter the search term REVENUE. Since REVENUE is a measure, state machine 300 may predict that the user would like the measure values summed together. State machine 300 also may assume that since REVENUE is a measure, the user may want to view other related attributes and measures, such as revenue for state, city, tax, etc. Accordingly, state machine 300 may suggest STATE (attribute) and/or display revenue data for particular states, such as for CALIFORNIA (value).
- STATE attribute
- CALIFORNIA value
- State machine 300 also may predict the intent of the user based on keywords. For example, the user may enter the measure REVENUE followed by a keyword, such as FOR. State machine 300 may predict that the user is searching for revenue data for a particular geographic region or time period and display associated columns in the database, such as city, state, time periods, etc. This allows the user to explore the data within the structured database without having to manually generate SQL instructions.
- keywords For example, the user may enter the measure REVENUE followed by a keyword, such as FOR.
- State machine 300 may predict that the user is searching for revenue data for a particular geographic region or time period and display associated columns in the database, such as city, state, time periods, etc. This allows the user to explore the data within the structured database without having to manually generate SQL instructions.
- the search engine may have a hierarchy of state machines that include, but are not limited to, deterministic finite automata state machines and non-deterministic finite automata state machines.
- the search engine also may use a natural language processor (NLP) to analyze the search term, explain search results, display errors, and make suggestions for correctly entering search terms.
- NLP natural language processor
- the search engine also may perform unstructured text searches.
- the enterprise may include unstructured data, such as employee notes, in addition to the structured data in enterprise database 112 .
- the search engine may perform a keyword search with the search term for any other unstructured data that includes the keywords GROWTH, REVENUE and/or CALIFORNIA.
- FIG. 13 depicts an example of how the search engine may associate non-structured user inputs with columns in a relational database system.
- the retrieval system may generate different indexes based on the database tables uploaded from the enterprise relational database or based on any other structured or unstructured data uploaded from the enterprise database. For example, the retrieval system may generate an index 400 that identifies the names of states contained in column 180 .
- the retrieval system may identify three different state names California, Colorado, and Connecticut in column 402 in response to the user entering the letter C into the search field.
- the retrieval system not only associates the letter C with the three state names in column 402 but also may associate the user input C with revenue column 404 via state identifiers 5 , 6 , and 7 .
- the search engine also may anticipate or predict that the user is searching for revenue data for California, Colorado, or Connecticut.
- Index 400 moves into a second logic state after the user enters a second letter A.
- the retrieval system now may anticipate/predict that the user is searching for data related to California.
- the retrieval system may refine the data displayed in the user interface. For example, the retrieval system may display total revenue of $139,993 for California.
- the retrieval system also may display other data associated with California.
- the retrieval system may identify other columns in the database system containing state identifier 5 and display tokens and/or data associated with the identified columns.
- FIG. 14 depicts another example index 500 generated from columns in database system 106 in FIG. 1 .
- Hundreds of indexes 500 may be generated from database system 106 and for different users and departments with an enterprise.
- Large numbers of tokens 504 may exist in each index 500 .
- a user may enter a first character of a search term into a search field.
- the search engine may identify millions of potential tokens 504 based on the first character.
- the search engine ranks tokens 504 and uses the rankings to predict which tokens 504 the user intends to enter into the search field.
- the search engine may assign each token 504 a score 506 based on any combination of factors. For example, scores 506 may be based on the number of times a particular token 504 is selected by users. Scores 506 also may be based on the number or types of columns, rows, measures, attributes, values, etc. associated with tokens 504 .
- the search engine also may rank tokens 504 based on which type of token is most likely to follow a currently entered search term. For example, a current search term may comprise a measure token. It may be more likely that the next token used in the search term is an attribute type of token. Accordingly, attribute tokens may be ranked higher than measure or value tokens.
- the search engine may identify a root state 502 A based on the search term entered by the user. For example, the user may enter the search term STATES.
- the search engine may identify root state 502 A and display different tokens 504 within index 500 based on the associated scores 506 .
- Alabama token 504 A may have a score 506 A of 12
- Alaska token 504 B may have a score 506 B of 25
- Arizona token 504 C may have a score 506 C of 75
- California token 504 D may have a score 506 D of 100
- Connecticut token 504 E may have a score 506 E of 8
- Colorado token 504 F may have a score 506 F of 55.
- the search engine may display California token 504 D first, Arizona token 504 C second, Colorado token 504 F third, etc.
- root state 502 A the user may enter a character, such as the letter C.
- the search engine parses down the right branch of root state 502 A to identify tokens 504 D, 504 E, and 504 F and identifies the associated token scores 506 D, 506 E, and 506 F, respectively.
- the search engine ranks tokens 504 D, 504 E, and 504 F based on the associated scores 506 D, 506 E, and 506 F, respectively, and then displays one or more of the tokens in the ranked order CALIFORNIA, COLORADO, CONNECTICUT.
- tokens 504 may be associated with a particular index 500 .
- token scores 506 may be propagated up index 500 .
- the search engine then only needs to propagate down the branches associated with the highest ranked tokens 504 .
- the search engine may identify the highest ranked token in each separate branch. For example, California token 506 D has the highest score of 100 in the right branch of root state 502 A. The right branch of root state 502 A is assigned the score 100 and the left branch of state 502 D is also assigned the score 100. Colorado token 504 F has the highest score in the right branch of state 502 D and is assigned Colorado token score 55. Arizona token 504 C has the highest token score 506 C in the left branch of root state 502 A. The left branch of root state 502 A is therefore assigned the Arizona token score 75.
- the search engine can quickly determine the highest ranked tokens 504 .
- the search engine may want to display the two top ranked tokens 504 when in root state 502 A.
- the search engine would typically need to parse down each individual branch in index 500 to identify all of the possible tokens 504 and associated token scores 506 , and then display the two tokens 504 with the highest scores 506 .
- the search engine first identifies the right branch of root state 502 A as containing the highest token score. The search engine then determines that the left branch of state 502 D has a higher score of 100 then the token score of 55 for the right branch of state 502 D. Since, California token 504 D is the only token in the left branch of state 502 D, the search engine displays California token 504 D has the highest ranked token.
- the search engine determines that the next highest token score for the right branch of state 502 D is 55 and previously determined that the highest token score for the left branch of root state 502 A is 75. The search engine then moves down the left branch of root state 502 A to find the second highest token score. State 502 B indicates the right branch contains the second highest token score. Therefore, the search engine does not need to parse down the left branch of state 502 B. Arizona token 504 C is the only token in the right branch of state 502 B and is accordingly displayed to the user as the second highest ranked token.
- the search engine may generate a priority query 520 for index 500 .
- the search engine first may identify the scores for each branch in root state 502 A. In this example, only two branches and associated branch scores 100 and 75 are associated with root state 502 A.
- the search engine may parse down to a next level of the branch with the highest score and identify additional branch or token scores.
- state 502 D includes a first branch score 100 and a second branch score 55.
- Operation 522 B adds branch score 55 as a current third highest branch score in index 500 .
- the search engine parses down a next level of index 500 for the branch with the highest branch score.
- the search engine moves down the left branch of state 502 D.
- the left branch of state 502 D has no additional branches and terminates at California token 504 D. Accordingly, the search engine identifies California token 504 D as the highest ranked token for index 500 .
- priority queue 520 remains with branch scores 75 and 55.
- the search engine in operation 522 E moves down the right branch in state 502 B associated with the next highest branch score 75.
- the right branch of state 502 B has no additional branches and terminates at Arizona token 504 C.
- the search engine in operation 522 F identifies Arizona token 504 C as the next highest ranked token for index 500 . Operations 522 continue until all, or some specified number of, tokens in index 500 have been ranked.
- the search engine displays tokens 504 based on the rankings in priority queue 520 .
- a user may enter a search term that moves the search engine into root state 502 A.
- the search engine may automatically display the three top ranked tokens in index 500 .
- the search engine displays California token 504 , Arizona token 504 , and Colorado token 504 F. The user then may add the letter A to the search term.
- the search engine may display Arizona token 504 , Alaska token 504 , and Alabama token 504 in order as the three top ranked tokens that begin with the letter A.
- the search engine may assign security bitmasks 540 to tokens 504 .
- Security bitmasks 540 may include different bits associated with different user groups. For example, a first bit may be associated with a management group, a second bit may be associated with a human resources group, and a third bit may be associated with a sales group. These of course are just examples and any number of bits may be associated with any number of different groups.
- the search engine may set token security bitmasks 540 based on which groups of users are authorized to view the tokens and/or the associated data. For example, the search engine may set token bitmask 540 A so only management and sales groups can view California token 504 D and any associated data. The search engine may set token security bitmask 540 B so only the management group can view Arizona token 504 C and any associated data.
- different bits in security bitmasks 540 may be associated with different geographic regions. For example, different bits may be associated with different cities, states, countries, sales regions, etc. The geographic bits may limit some users to only viewing tokens and associated data for particular regions. For example, a salesman assigned to an Oregon sales region may only be prevented from viewing sales data for other states.
- FIG. 17 shows how the security bitmaps may determine which tokens are displayed to users.
- Branch security bitmaps 542 may be generated based on token security bits maps 540 .
- the left branch of state 502 E and the right branch of index state 502 E each comprise one token. Accordingly, left branch security bitmap 542 E and token security bitmap 540 F are the same, and right branch security bitmap 542 F and token security bitmap 540 C are the same.
- a next higher level branch security bitmap 542 D combines lower level branch security bitmaps 542 E and 542 F.
- branch security bitmaps 542 E and 542 F are logically OR'd together to derive the upper level branch security bitmap 542 D.
- root level branch security bitmap 542 B combines branch security bitmaps 542 C and 542 D.
- a user may be assigned a user security bitmap 544 .
- the user may enter the information retrieval system using a login identifier and password.
- the login identifier may be associated with a user profile that associates the user with certain groups and/or geographic regions.
- the user may belong to the sales group and the search engine sets the third bit in user security bitmap 544 . If authorization were controlled for different geographical regions, the search engine may set bits in user security bitmap 544 associated with the sales regions where the user has sales responsibilities.
- the user may enter a search term 546 , or enter a character for search term 546 , into the user interface.
- the user may enter a character C for search term 546 .
- the search engine parses down the right branch of root state 502 A.
- California token 504 E is the highest ranked token in index 500 but can only be viewed by the management group since only the first bit is set in branch security bitmap 542 C. Since the user only belongs to the sales group, the search engine does not parse down the left branch of state 502 D and does not display California token 504 D.
- the sales group is allowed to view certain tokens and associated data in the right branch of state 502 A.
- the search engine determines from branch security bitmap 542 E that the user is authorized to view the token in the left branch of state 502 E. Based on branch security bitmap 542 E, the token score for token 504 E, and the other weightings and rankings described above, the search engine may display Connecticut token 504 E and display the data associated with token 504 E, such as sales figures for the state of Connecticut.
- the right branch of state 502 E can only be viewed by users in the management group. Therefore, the search engine also does not display Colorado token 504 F or any data associated with Colorado token 504 F even though Colorado token 504 F has a higher token score then Connecticut token 504 E.
- the left branch of root state 502 A also can also only be viewed by users in the management group.
- the user while in root state 502 A may enter the character A as search term 546 .
- the search engine does not display any tokens that reside in the left branch of root state 502 A.
- user security bit map 544 may have the first management bit set and have another geographic bit set associated with the state of Colorado.
- Colorado token 504 F may have an associated geographic bit set in token security bitmap 540 C.
- the search engine may have entered root state 502 A in response to the user entering REVENUE as search term 546 . Since the user is specifically authorized to view data associated with Colorado token 504 F, the search engine may automatically display revenue figures for Colorado in response to the user simply entering the search term REVENUE. In other words, the matching geographic regions in the security bitmaps may produce a highest token ranking for Colorado token 504 F.
- user security bitmap 544 may include bits associated with different revenue levels. For example, a bit in user security bitmap 544 may enable the user to view financial transactions over $1 million. However, the user may not have authorization to view transactions over $1 million and therefore the search engine does not set the $1 million revenue level bit in user security bitmap 544 .
- the search engine may not access or display column rows with revenue values greater than $1 million.
- the search engine also may not access or display names, tokens, attributes, measures, values, etc. associated with any of the rows with values greater $1 million.
- the search engine may only display tokens with associated columns rows having values less than $1 million.
- the following three search queries may be received by the search engine.
- the information retrieval system may take into account all three queries when building global indexes 550 A. For example, token scores in global indexes 550 A may be adjusted based on the tokens identified in query 1, query 2, and query 3.
- the information retrieval system may add a revenue token and a state token to a first user index 550 C associated with user 1 in response to receiving query 1. If the revenue and/or state tokens already exist in the first user index 550 C, the associated token scores may be increased.
- the search engine may adjust a second user index 550 C associated with user 2 in response to query 2, and adjust a third user index 550 C associated with user 3 in response to query 3.
- the information retrieval system may add a TAX token to the second user index.
- the information retrieval system may adjust a first department index 550 B associated with a first department of an enterprise in response to queries 1 and 2. For example, user 1 and user 2 may both be associated with the marketing department of the enterprise.
- the information retrieval system may add tokens or adjust token scores in department index 550 B based on the tokens in query 1 and query 2.
- the search engine may adjust a second department index 550 B associated with a second department of the enterprise in response to query 3.
- indexes 550 may need to be updated whenever new data is added to database system 106 in FIG. 1 .
- the information retrieval system may build incremental indexes 550 D to avoid continuously rebuilding large indexes, such as global indexes 550 A and/or department indexes 550 B.
- rows may be added to a customer column that includes names for 100 new customers.
- the information retrieval system may need to add 100 new tokens to global index 550 A for the new customer data. Instead of rebuilding global index 550 A, the information retrieval system may add the 100 new tokens to an incremental index 550 D associated with the global index 550 A.
- the search engine uses both global index 550 A and incremental index 550 D to identify the customer tokens.
- FIG. 19 depicts an example of how indexes may be divided over different servers.
- a subset of each global index 550 A and/or department index 550 B may be distributed over different servers 570 A, 570 B, and 570 C.
- server 570 A may include a subset of indexes 550 containing tokens starting with letters A-H
- server 570 B may include a subset of indexes 550 containing tokens starting with letters I-P
- server 570 C may include a subset of indexes 550 containing tokens starting with letters Q-Z.
- a user may enter the search term CALIFORNIA.
- the search engine may send the search term to server 570 A. Search terms with no particular token characters may be sent in parallel to all three servers 570 A, 570 B, and 570 C. For example, the search engine may send the search term REVENUE to all three servers 570 A, 570 B, and 570 C. The highest ranking tokens from among all of indexes 550 A and 510 B are displayed in the user interface.
- indexes may include token prefixes and/or suffixes that detect incorrectly spelled search terms or sub-portions of search terms.
- server 570 A may contain a primary index with the correct token spellings and servers 570 B and 570 C may contain the prefix and suffix indexes with the partial or incorrect token spellings.
- the search engine may have to send the search term to the different indexes 550 on different severs 570 A, 570 B, and/or 570 C.
- the search engine may first send the search term to server 570 A containing the primary index with the correct token spelling.
- the search engine may limit the number of displayed tokens. For example, the search engine may only display ten tokens. If the primary index with the correct token spellings on server 570 A identifies ten tokens, the search engine might not search the other prefix and suffix indexes on servers 570 B and 570 C. If the primary index does not produce ten results, the search engine may send the search term to the other prefix and suffix indexes located on servers 570 B and 570 C.
- the search engine may terminate the search on servers 570 B and 570 C when no substring match is identified after some predetermined time period, such as 10 milliseconds. However, the search engine may need to identify some number of tokens from the primary index on server 570 A regardless of the required amount of time.
- the user may incorrectly type in the search term ROVENUE.
- the search engine may try dropping each one of the letters, such as rvenue, ovenue, roenue, rovnue, etc.
- the search engine then may try replacing each letter with other letters, such as aovenue, bovenue . . . ravenue, rbvenue. etc.
- the search engine then may insert characters between each existing character, such as arovenue, brovenue, . . . raovenue, rbovenue, etc.
- the search engine may apply the modified search terms into a bloom filter that determines if any of the candidate search terms exist in indexes 550 .
- the bloom filter may reduce the number of candidate search terms, such as down to ten candidate search terms.
- the ten candidate terms then may be submitted to the primary index. If four matching tokens are identified in the primary index, the search engine may use state machine rankings and other rankings described above to determine which matching tokens are displayed to the user.
- FIG. 20 depicts one example of how the search engine may provide selectable join paths.
- a table 600 in the database system identifies amounts for different transactions. The transactions may have different dates, such as an order date, a commit date, and a ship date.
- a second table 602 may identify dates associated with different holidays. For example, December 25 and January 1 st identified dentified as holidays and December 5 and January 2 are identified as non-holidays.
- a system administrator may have created links 603 A, 603 B, and 603 C between the order, commit, and ship date columns, respectively, and table 603 .
- search term 152 may be considered ambiguous since three different values may exist in table 600 .
- the amount is holiday is zero based on the commit date. However, the amount is holiday would be $100 based on the order date and the amount would be $200 based on the ship date.
- the search engine determines that search query 152 is ambiguous and displays a message 610 asking how to map the order table 600 to holiday table 602 .
- the search engine may display the different date columns of table 600 in menu 614 , display some of the different associated amounts in menu 612 , and display some of the associated values from holiday table 602 in menu 616 .
- the search engine may select a default date column and highlight the associated column name and associated values in menus 612 , 614 , and 616 . For example, the search engine highlights the amount $100 in menu 612 , the order date column name in menu 614 , and the true holiday value in menu 616 .
- the search engine may display other amounts or other column names in response to other selections in menus 612 , 614 , and/or 616 .
- the search engine may show the amount for the commit date column as $725 and show the holiday value as false in menu 616 . If the ship date column name in menu 614 is selected, the search engine may highlight the amount of $200 in menu 612 and highlight the true value in menu 616 .
- the user may select a confirm icon 618 to confirm the selection of different items in menus 612 , 614 , and 616 .
- the search engine may not query the user and/or display menus 612 , 614 , and 616 when search term 152 is unambiguous. For example, the user may enter the search term AMOUNT BY COMMIT DATE. The search term may generate one definite answer from table 600 . Therefore, no other selectable join paths need to be displayed in menus 612 , 614 , and/or 616 .
- FIG. 22 shows another example of how the search engine provides user selectable join paths.
- a table 620 in the database system may identify persons who referred someone for an employment position (referrers) and persons who were referred by the referrers (referees).
- a table 622 may contain the names of the referrers and referees and include identifiers for a current employer and a previous employer of the referrers and referees.
- Table 624 contains the names of the current and previous employers. For example, tables 620 and 622 indicate that Jones referred Smith. Tables 622 and 624 identify the current employer of Jones as Software, Inc. and the previous employer of Jones as Eagle, Corp.
- search engine determines search term 152 is not ambiguous since one list can be provided from tables 622 and 624 . Accordingly, search engine displays a list 626 of employee names from table 622 and associated current employers from table 624 .
- search engine 150 may display all names in table 622 for the first search term name and all names in table 622 for the second search term name. However, these search results may not provide useful information.
- the search engine may display windows 630 A and 630 B in response to the user clicking the first or second search term NAME, respectively. Windows 630 may ask the user if they would like to change a current table mapping. The search engine may identify other related column names in response to selection of window 630 A or window 630 B.
- the referrer and referee columns of table 620 may be linked to the employee identifiers in table 622 .
- the search engine may identify the names of the referrer and referee columns of table 620 in menus 632 A and 632 B. Selecting the referrer name in menu 632 A may cause the search engine to map the first NAME in search term 152 with the referrer column in table 620 . Selecting the referee name in menu 632 B may cause the search engine to map the second NAME in search term 152 with the referee column in table 620 . Accordingly, the search engine displays a list 636 of referrer names and associated referee names. Thus, the search engine allows the user to selectively map search terms to columns in different tables.
- FIG. 24 depicts another example where the user enters the search term COMPANY COMPANY.
- the search engine may display change mapping windows 630 A and 630 B in response to the user selecting the first or second COMPANY search term, respectively.
- the search engine may display different columns that are directly or indirectly linked with company name table 624 in FIG. 23 .
- the current employer and previously employer columns in table 622 may be directly linked to the company identifier column in company table 624 .
- the referrer and referee columns in table 620 are indirectly linked to company table 624 through employee table 622 .
- the search engine may identify the different columns in tables 620 , 622 , and 624 that can be joined with the two COMPANY tokens in search term 152 .
- sixteen different combinations may exist between referrer, referee, current employer, and previous employer.
- the first item in menu 632 A may identify the join path between the referrer column in table 620 and the current employer column in table 622 .
- the second item in menu 632 A may identify the join path between the referrer column in table 620 and the previous employer column in table 622 .
- the search engine maps the first COMPANY token in search term 152 with the referrer column in table 620 , the associated entries in the current employer column of table 622 , and the associated company names in table 624 .
- the search engine maps the second COMPANY token in search term 152 with the referee column in table 620 , the associated entries in the current employer column of table 622 , and the associated company names in table 624 .
- the search engine displays a list 636 A of current employers for employees that are identified as referrers in table 620 .
- the search engine also displays an associated list 636 B of current employers for the associated referees identified in table 620 .
- the search engine allows users to dynamically assign join paths to search terms/tokens 152 .
- the join paths may identify a root table and other selected paths/links from the root table to other tables.
- the possible join paths are displayed in a drop down menu in response to the user selecting search terms/tokens 152 in search field 150 .
- the user may dynamically change current join paths by selecting different columns names displayed in the drop down menus.
- the search engine may identify new join paths as additional tokens are added to search term 152 . For example, the user may add the token AMOUNT to search term 152 .
- Another table may be linked to the employee identifiers in table 622 that contains an amount column.
- the search engine may display additional join paths between the amount column in the new table and the columns in tables 620 , 622 , and 624 .
- FIG. 25 depicts an example of how the search engine may rank join paths.
- the join paths may be ranked based on usage or cardinality similar to the ways described above for tokens.
- the user may enter the search term AMOUNT NAME. Users may have selected the sales agent column in table 640 more than the customer column in table 640 . Based on usage, the search engine may rank the sales agent column higher than the customer column. Accordingly, the search engine may display the sales agent column name above the customer column name in menu 632 .
- the search engine also may rank join paths based on the number and level of join paths between a root table and other tables. For example, the search engine may have selected orders table 640 as a root table for search term 152 .
- a direct join path/link 643 links the tokens in search term 152 with the customer and sales agent columns in table 640 .
- Secondary join path/links 644 A-C link the NAME token in search term 152 to a boss identifier column in table 642 .
- the search engine may rank the customer and sales agent columns associated with search term 152 through direct link 643 higher than the boss column in table 642 associated with search term 152 through secondary links 644 A-C.
- the search engine accordingly displays the sales agent column and customer column names above the boss column name in menu 632 .
- FIG. 26 shows an example of how the search engine may handle multiple root tables.
- a user may enter the search term BUDGET AMOUNT EXPENSE AMOUNT.
- the search engine may identify a budget table 650 as the root table for the BUDGET AMOUNT portion of search term 152 and may identify an expense table 652 as the root table for the EXPENSE AMOUNT portion of search term 152 .
- the search engine may derive the total budget amount from root table 650 as $300 and derive the total expense amount from root table 652 as $270.
- search term 152 now comprises BUDGET AMOUNT EXPENSE AMOUNT PERSON.
- Each table 650 and 652 may have multiple columns associated with the PERSON search term.
- BUDGET AMOUNT may be associated with a buyer column or a signer column in table 650 .
- the EXPENSE AMOUNT search term may be associated with a buyer column and a signer column in table 652 .
- the search engine may display menu 632 A querying the user to select either the buyer or signer column in table 650 for mapping to the BUDGET AMOUNT in search term 152 and display menu 632 B querying the user to select either the buyer or sign column in table 652 for mapping to the EXPENSE AMOUNT in search term 152 .
- the search engine may select one join path between table 650 and table 654 as a default and one join path between table 652 and table 654 as a default.
- a system administrator may create links that link columns in different tables together. The user then selects between the different links to create the different join paths between the search terms and the table columns. However, links may not exist between certain tables.
- the search engine may identify currently unlinked tables that may have related columns and enable the user to then create links and associated join paths between the related columns.
- the search engine may determine that the buyer and signer identifiers in table 650 have similarities with the person identifiers in table 654 . For example, the values in tables 650 and 654 have use similar numbers or string values.
- the search engine may display an icon 658 that allows the user to create new links between tables 650 and 654 .
- the search engine may display a message asking the user if they would like to link budget table 650 to person table 654 .
- the search engine creates links 660 A and 660 B mapping the buyer and signer columns in table 650 with the person identifier column in table 654 .
- the search engine may display the names in table 654 associated with the buyer and signer columns of table 650 .
- the search engine also may give the user the option to select link names. For example, the user may assign the name AUTHORIZED SIGNER to link 660 B. The name AUTHORIZED SIGNER would then appear in menu 632 A.
- FIG. 27 depicts an example of how the search engine may create a worksheet.
- table 620 contains referrer and referee identifier columns and table 622 contains the names associated with the referrer and referee identifiers in table 620 and also contains current and previous employer identifiers.
- Table 624 contains the names associated with the current and previous employer identifiers in table 622 .
- the tables may have been previously linked together and search engine may display different menus showing selectable join paths between the search terms and the different columns in tables 620 , 622 , and 624 .
- the search engine may display menu 632 showing different join paths for associating with the first COMPANY in search term 152 .
- FIG. 28 shows how the search engine dynamically creates a worksheet based on the user selected join paths.
- the user may not want to continuously map search terms with different join paths.
- the search engine allows the user to create a worksheet table 680 that contains the selected table entries for the selected join paths. For example, the user may select the referrer column name in menu 632 A, the referee column name in menu 632 B, the referrer current company name in menu 632 C, and the referee current company name in menu 632 D.
- These user selections cause the search engine to map the first NAME in search term 152 to the referrer column in table 620 , map the second NAME in search term 152 to the referee entries in table 620 , map the first COMPANY in search term 152 to the referrer current company entries in table 622 , and map the second COMPANY entry in search term 152 to the referee current company entries in table 622 .
- the search engine In response to selecting icon 678 , the search engine generates a worksheet table 680 that contains the user selected table mappings. For example, the search engine adds the all referrer names identified in table 620 to a first column of worksheet 680 and adds the referee names associated with the referrer names identified in table 620 to a second column of worksheet 680 . The search engine creates a third column in worksheet table 680 containing the current company names associated with the referrer names and adds a fourth column containing the current company names associated with the referee names.
- the search engine may link any subsequent searches using all or part of search term 152 to worksheet table 680 .
- Worksheet 680 may be published to any defined group of users and may be assigned any selectable name. Any new data added to any of tables 620 , 622 , or 624 in FIG. 27 is automatically added to worksheet 680 . For example, any new referrer names added to table 620 and 622 are automatically added to worksheet 680 as well as any associated referee names, current company names, or previous company names.
- Worksheet table 680 may be assigned the same row level and object level security as other tables as described above. For example, a system administrator may restrict users to only viewing certain rows of worksheet 680 where the user is either the referrer or referee.
- the worksheet 680 may include a column of social security numbers associated with the referrers (not shown). The system administrator may assign a security bitmap to worksheet 680 that prevents certain groups of users from viewing the data in the social security column.
- the links joining columns from different tables together also may have associated security levels.
- the search engine may assign a security bitmap to the links between table 622 and table 624 that prevents certain users from seeing the current company names and previous company names contained in table 624 or contained in worksheet 680 .
- the search engine may generate indexes for the entries in worksheet 680 .
- links may be created between worksheet 680 and tables 620 , 622 , and 624 in FIG. 27 .
- the same names may exist in the first column of worksheet 680 and the name column of table 622 .
- the first column of worksheet 680 may be linked to the indexes previously created for the name column of table 622 .
- Worksheet 680 may only contain a subset of entries from tables 620 , 622 , or 624 . For example, only a subset of names from table 622 may be identified in worksheet 680 .
- the indexes for tables 620 , 622 , or 624 may include bitmaps that identify the subset of entries associated with worksheet 680 . These bitmaps may be similar to the priority bitmaps described above.
- the search engine may receive a search term.
- the search engine may map the search term to worksheet 680 .
- the worksheet may be linked to the indexes for table 622 .
- the indexes for table 622 may include a worksheet bitmap that identifies the different branches/tokens of the indexes associated with worksheet 680 .
- the search engine only parses down the branches of the indexes with bitmap values associated with entries in worksheet 680 .
- a large number of worksheets may create an unwieldy number of associated worksheet bitmaps.
- Each worksheet may be assigned a different bit in the worksheet bitmap.
- the search engine may limit the number of worksheets to restrict the amount of additional processing and data needed for handling the worksheet bitmaps.
- FIG. 29 shows an example formulaic search query.
- the search engine allows the user to specify arithmetic operations between different table columns. For example, the user may enter the search term ORDER AMOUNT—CUSTOMER SUPPORT COST.
- the search engine may identify the order amounts in table 690 and the customer costs in table 692 .
- the search engine calculates the differences between the transaction amounts in table 690 and the costs in table 694 and displays the results 694 in electronic page 148 .
- the search engine may reduce tables 690 and 692 into a customer profitability table that has the granularity for the smallest identified column. For example, order table 690 may contain thousands of entries but customer support table 692 may only contain a few dozen entries.
- the new table only may include the transactions in table 690 that also include associated cost amounts in table 692 .
- the search engine then calculates the profitability form the values in the new table.
- the search engine also may allow the user to select a name for the new table, such as customer profitability.
- the user may enter the search term COUNT FLIGHTS (FOR ARRIVAL TIME-DEPARTURE TIME ⁇ 3 HOURS).
- the search engine may use a different language for parsing the search terms with the parenthesis. For example, the search engine may subtract the departure time from the arrival time for every row in the flights table. The search engine then may compare the subtracted values to a fixed quantity of 3 hours and count the number of flights with a total flight time of less than 3 hours.
- the search engine also may perform arithmetic operations for user defined join paths. For example, the search engine allows the user to select columns in different tables to map together as described above. The search engine then performs the above described arithmetic operations between the user selected table columns.
- FIG. 30 shows a computing device 1000 that may be used for operating the information retrieval system and performing any combination of the information retrieval processes discussed above.
- the computing device 1000 may operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- computing device 1000 may be a personal computer (PC), a tablet, a Personal Digital Assistant (PDA), a cellular telephone, a smart phone, a web appliance, or any other machine or device capable of executing instructions 1006 (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA Personal Digital Assistant
- computing device 1000 may include any collection of devices or circuitry that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the operations discussed above.
- Computing device 1000 may be part of an integrated control system or system manager, or may be provided as a portable electronic device configured to interface with a networked system either locally or remotely via wireless transmission.
- Processors 1004 may comprise a central processing unit (CPU), a graphics processing unit (GPU), programmable logic devices, dedicated processor systems, micro controllers, or microprocessors that may perform some or all of the operations described above. Processors 1004 may also include, but may not be limited to, an analog processor, a digital processor, a microprocessor, multi-core processor, processor array, network processor, etc.
- CPU central processing unit
- GPU graphics processing unit
- programmable logic devices dedicated processor systems
- micro controllers microprocessors that may perform some or all of the operations described above.
- Processors 1004 may also include, but may not be limited to, an analog processor, a digital processor, a microprocessor, multi-core processor, processor array, network processor, etc.
- Processors 1004 may execute instructions or “code” 1006 stored in any one of memories 1008 , 1010 , or 1020 .
- the memories may store data as well. Instructions 1006 and data can also be transmitted or received over a network 1014 via a network interface device 1012 utilizing any one of a number of well-known transfer protocols.
- Memories 1008 , 1010 , and 1020 may be integrated together with processing device 1000 , for example RAM or FLASH memory disposed within an integrated circuit microprocessor or the like.
- the memory may comprise an independent device, such as an external disk drive, storage array, or any other storage devices used in database systems.
- the memory and processing devices may be operatively coupled together, or in communication with each other, for example by an I/O port, network connection, etc. such that the processing device may read a file stored on the memory.
- Some memory may be “read only” by design (ROM) by virtue of permission settings, or not.
- Other examples of memory may include, but may be not limited to, WORM, EPROM, EEPROM, FLASH, etc. which may be implemented in solid state semiconductor devices.
- Other memories may comprise moving parts, such a conventional rotating disk drive. All such memories may be “machine-readable” in that they may be readable by a processing device.
- Computer-readable storage medium may include all of the foregoing types of memory, as well as new technologies that may arise in the future, as long as they may be capable of storing digital information in the nature of a computer program or other data, at least temporarily, in such a manner that the stored information may be “read” by an appropriate processing device.
- the term “computer-readable” may not be limited to the historical usage of “computer” to imply a complete mainframe, mini-computer, desktop, wireless device, or even a laptop computer. Rather, “computer-readable” may comprise storage medium that may be readable by a processor, processing device, or any computing system. Such media may be any available media that may be locally and/or remotely accessible by a computer or processor, and may include volatile and non-volatile media, and removable and non-removable media.
- Computing device 1000 can further include a video display 1016 , such as a liquid crystal display (LCD) or a cathode ray tube (CRT)) and a user interface 1018 , such as a keyboard, mouse, touch screen, etc. All of the components of computing device 1000 may be connected together via a bus 1002 and/or network.
- a video display 1016 such as a liquid crystal display (LCD) or a cathode ray tube (CRT)
- a user interface 1018 such as a keyboard, mouse, touch screen, etc. All of the components of computing device 1000 may be connected together via a bus 1002 and/or network.
Abstract
A search engine performs semantic analysis of unstructured search terms to generate relational database queries. By understanding the semantics, the search engine more effectively identifies and predicts what information the user is searching for and provides more in-depth user assistance. The search engine uses different ranking schemes to identify which data is most likely associated with search terms and provides both row level and object level data security. The search engine may identify ambiguous search requests and allows the user to select different join paths that create different mappings between the data. The search engine may generate worksheets containing the selected mappings.
Description
- The present application is a continuation-in-part of U.S. patent application Ser. No. 13/944,491, Filed Jul. 17, 2013, Entitled: Information Retrieval System which is incorporated by reference in its entirety.
- Enterprises need to access specific business intelligence (BI) information. Business intelligence (BI) is a set of theories, methodologies, processes, architectures, and technologies that may transform raw data into meaningful and useful information for business purposes. BI may handle large amounts of information to help identify and develop new opportunities. Making use of new opportunities and implementing an effective strategy can provide a competitive market advantage and long-term stability.
- Business intelligence software tools attempt to automate the process of translating raw data sitting in relational databases into meaningful information that a business decision maker can use. However, the creation of new reports is often complex and involves several Information technology (IT) specialists. A business user first may need to fill out a form that describes what data to provide in a BI report.
- The form is taken to an information technology (IT) specialist who generates the correct relational database management system (RDBMS) queries and then creates reports from on the query results. The process of the business user specifying what data to retrieve and then getting the final report back from the IT specialist can often take several weeks. Existing database solutions may not be fast enough to provide an interactive business user experience when large volumes of data need to be processed.
- Internet search engines are designed to search through documents and help users formulate questions. However, Internet search engines are not aware of the relationships that exist in typical relational databases. Internet search queries only perform simply string matches and do not have the intelligence to perform searches on structured data and correlate/aggregate/filter the search results.
-
FIG. 1 depicts an example information retrieval system. -
FIG. 2 depicts an example process flow for the information retrieval system. -
FIG. 3 depicts an example of how data columns are joined in the database system. -
FIG. 4 depicts an example process for generating structured search instructions from ad-hoc user inputs. -
FIGS. 5A-5C depict example inputs entered into the search engine. -
FIGS. 6A-6C depict example auto-completions and explanations performed by the search engine. -
FIGS. 7A-7B depict example error messages generated by the search engine. -
FIG. 8 depicts an example process for ranking and displaying tokens. -
FIG. 9 depicts example indexes having different token weightings. -
FIG. 10 depicts another example process for ranking tokens. -
FIG. 11 depicts an example state machine. -
FIGS. 12A-12G depict example electronic pages displayed based on states identified by the state machine inFIG. 11 . -
FIG. 13 depicts an example index used by the search engine. -
FIG. 14 depicts an example index for ranking tokens. -
FIG. 15 depicts an example priority queue. -
FIG. 16 depicts example token security bitmasks. -
FIG. 17 depicts an example index with security bitmasks. -
FIG. 18 depicts an example index structure used by the information retrieval system. -
FIG. 19 depicts an example distributed index structure. -
FIG. 20 depicts an example table mapping in the information retrieval system. -
FIG. 21 depicts an example user selectable table mapping. -
FIG. 22 depicts a user selectable table mapping for a first search term. -
FIG. 23 depicts an alternative user selectable table mapping for a second search term. -
FIG. 24 depicts another alternative user selectable table mapping for a third search term. -
FIG. 25 depicts an example ranking scheme for a user selectable table mapping. -
FIG. 26 depicts a scheme for dynamically linking together different tables. -
FIGS. 27 and 28 depict a scheme for generating a worksheet table. -
FIG. 29 depicts an example scheme for processing formulaic search queries. -
FIG. 30 depicts an example computing device for implementing the information retrieval system. - A search engine converts unstructured ad-hoc search queries into search instructions that retrieve data from a structured relational database. The search engine allows most users to express search requirements by just typing a few words.
- Tokens are generated based on content, attributes, measures, and other metadata located in the relational database and in any other structured or unstructured data sources. The search engine identifies and displays the tokens in response to user inputs. The tokens may include any word, phrase, set of characters, value, symbols, or the like, or any combination thereof.
- The search engine uses the tokens to guide the user through a selection of search terms that are then used for generating a structured query instruction. The structured query instruction is then used for retrieving data in the database system.
- The search engine may determine and/or predict what tokens and/or data are associated with the user input. For example, the search engine may provide suggestions and display data before the user completes entering a search term or completes the search query. The suggestions help guide the user to relevant data that currently exists in the database system.
- The retrieval system may use a ranking-based scheme to predict user inputs and fetch corresponding results. For example, the search engine may predict the user is looking for sales data based on only a few search characters. The search engine then may suggest search terms for accessing different types of sales data and may initiate a preliminary search to retrieve and display sales data for different sales regions.
- The preliminary search may display the precise information the user was initially searching. In other situations, the preliminary search may display other helpful information that the user did not know existed, that the user would not have normally requested, or that helps the user locate the correct information.
- The search engine may resolve ambiguities between the search input and the available data. For example, the user may enter the input REVENUE BY YEAR. A transaction table may include revenue data based on order date and ship date. The search engine may ask the user if revenue should be calculated based on the order date or the ship date? The search engine also may provide an explanation of how operations were performed on particular data to generate the results. The search engine also may use a security scheme to restrict what data can be viewed by different users.
- The search engine also may allow users the select mappings between search terms and columns in different tables. The search engine may generate a worksheet table containing the selected mappings. The search engine also may perform user definable formulaic operations on the data in the different user selected table mappings.
-
FIG. 1 depicts an example information retrieval system 100 (retrieval system).Information retrieval system 100 may comprise an array of local and/or cloud-based computing and storage devices, such as servers, database systems and/or any other computing logic. The different computing devices and database systems withinretrieval system 100 may be coupled together via one or more buses or networks. The buses or networks may comprise local area networks (LANs), wide area networks (WANs), fiber channel networks, Internet networks, or the like, or any combination thereof. A user may usecomputing device 114 to accessretrieval system 100.Computing device 114 may comprise a personal computer, computer terminal, mobile device, smart phone, electronic notebook, or the like, or any combination thereof. -
Relational database system 106 may comprise one or more structured relational databases and/or unstructured databases, such any database system storing business intelligence (BI) information for an enterprise. The BI information may include sales data, revenue data, profit data, tax data, or the like, or any combination thereof. In one example,database system 106 may comprise dynamic random access memory (DRAM), Flash memory, disc memory, or any other type of data storage medium with relatively fast data access. In one example, some of the data indatabase system 106 is stored in a column/row structure and assigned additional metadata identifying particular data parameters and relationships. For example,retrieval system 100 may identify different characteristics of the data such as, attributes, measures, values, unique identifiers, tags, links, column and row keys, etc. -
Retrieval system 100 may generate tokens based on the data characteristics and associate the tokens with different columns indatabase system 106. For example, a column indatabase system 106 may contain revenue data.Retrieval system 100 may generate a token comprising the word REVENUE and load the token into an index that compares user inputs with the letters in the word revenue. -
Search engine 104 may use the indexes and state machines to guide the user into using search terms associated with the data and data structures contained indatabase system 106. The state machine may identify different states and types of data associated with the search input. For example, some search terms may be identified as attributes or measures. - The attributes may comprise any data characteristic, category, content, etc. that in one example may be non-quantifiable or non-numeric. In another example, the attributes are drawn from a fixed universe of numeric values, such as age, zip codes, Boolean operators, status codes, or the like, or any combination thereof.
- The measures may comprise quantifiable numeric values such as sizes, amounts, degrees, etc. A first column containing the names of states may be considered an attribute column and a second column containing the numbers of orders received for the different states may be considered a measure column.
Search engine 104 then may initiate a search query in response to a user input requesting revenue data (measure column) for a state (attribute column). -
Search engine 104 displays different tokens and/or data associated with current states of the user inputs. Again, the tokens may be associated with the structured tables or columns indatabase system 106. The tokens help guide the user into using search terms that refer to specific structured data withindatabase system 106. For example,retrieval system 100 may bind the user into using tokens that are associated with data that currently exists withindatabase system 106. This prevents the user from generating search queries for data that does not exist indatabase system 106 and from using invalid search terms that cannot be linked with the data indatabase system 106. - The user also may enter valid search terms that have no corresponding data in
database system 106.Retrieval system 100 may generate a response that indicates no data exists for the search query. For example, the search term may request the number of orders received for the month of July in California. The database system may contain columns for orders and California. However, the columns may have no orders for the month of July from California. The retrieval system may generate a response that indicates that no orders were received for the month of July from California. - In one example, the user accesses
user interface 102 via a web browser or a web application. In another example, the user may access a local native application that contactsinformation retrieval system 100 andsearch engine 104. The user enters characters into a search field withinuser interface 102.Search engine 104 compares the characters with the tokens previously generated fromdatabase system 106.Search engine 104 may display suggested tokens to the user before the user completes the search query. For example,search engine 104 may display a revenue token immediately after the user enters the letter R. If relevant, the user may select the displayed revenue token. Otherwise, the user may enter additional characters into the search field andsearch engine 104 may compare the additional characters with the tokens. -
Search engine 104 may use ranking algorithms before displaying the tokens to the user. In other words,search engine 104 may try to predict what data indatabase system 106 is associated with the user inputs. For example, there may be hundreds of different tokens associated with the letterR. Search engine 104 may rank the tokens and initiate a search query based on the ranking. - Token ranking may be based on usage, table cardinality, or any other criteria. For example, a first token may be associated with a first column in the database system having four values and a second token may be associated with a second column in the database system having hundreds of values.
Search engine 104 may rank the first token higher than the second token, since the fewer number of values in the first column may have a higher likelihood of containing the correct data.Search engine 104 also may rank the tokens based on how frequently associated columns have been accessed in prior search queries. -
Search engine 104 initiates a search query by sending a search object to a business intelligence server 108 (BI server). Search query and structured search query are used interchangeably below and refer to any combination of text, symbols, operators, commands, instructions, code, or the like or any combination thereof, used for accessing any type of data indatabase 106. - Search queries may be initiated before, during and/or after search terms are entered into the search field. For example,
search engine 104 may send a search object toBI server 108 after the user enters a few characters intouser interface 102.Search engine 104 may initiate other search queries and send associated search objects toBI server 108 after additional characters are entered for the user search query. -
BI server 108 generates instructions based on the search objects received fromsearch engine 104. In one example, the instructions comprise SQL-like queries. However, other query formats also may be used.Database system 106 sends data back toBI server 108 in response to the search instructions. The search object and associated structured search instructions also may directdatabase system 106 to perform different operations. For example, the structured search instructions may requestdatabase system 106 to join different columns and sum together different data in the columns. -
BI server 108 receives data back fromdatabase system 106 in response to the structured search instructions and displays the data withinuser interface 102. As mentioned above, the data may be displayed before the user completes entering a search term. For example, after the user enters the letter R,search engine 104 may identify a revenue token and send an associated search object toBI server 108.BI server 108 may generate structured search instructions based on the search object that retrieve content fromdatabase system 106 associated with the revenue token.BI server 108 receives the associated revenue data fromdatabase system 106 and displays the revenue data withinuser interface 102. - A
cluster manager 110 manages the configuration and distribution of processes withinretrieval system 100. For example,cluster manger 110 may assignuser interface 102,search engine 104,BI server 108, and associated backup processes, to different computing devices withinretrieval system 100. -
FIG. 2 depicts in further detail an example of operations performed byinformation retrieval system 100. A user enters portions ofsearch term 126 intosearch field 120 ofuser interface 102. The portions ofsearch term 126 are analyzed bysearch engine 104. For example,index 128,state machine 130, and aranking algorithm 132 may analyze each character entered intosearch field 120. As explained above, based in the data identified indatabase system 106,search engine 104 may anticipate, predict, suggest, and/or direct the user to available data as the user is enteringsearch term 126. -
Search engine 104 may generate asearch object 134 for different logic states ofsearch term 126. For example,search engine 104 may generate afirst search object 134 after detecting the measure search term REVENUE. Thesearch engine 104 may generate asecond search object 134 after the user completes entering the entire search term REVENUE BY STATE. In this example,search object 134 comprises pseudo-code, instructions, text or some other type ofmessage 136A that identifies the matching revenue token associated withmeasure column 146 indatabase system 106.Instruction 136A also includes an operator indicating the revenue data should be summed together. - In this example,
search object 134 also includes aninstruction 136B that identifies the matching state token associated withstate attribute column 142 indatabase system 106.Instruction 136B also includes an operator indicating the revenue data incolumn 146 should be grouped according to the attributes instate column 142. -
Search object 134 may include any other information needed for generating structured search query instructions. For example,search engine 104 may includeinstructions 138 for filtering the data based on an identified search term. For example,search term 126 may have included the value REVENUE BY STATE=CALIFORNIA.Search engine 104 may generatefilter instructions 138 insearch object 134 such as: - FILTER 1: STATE=CALIFORNIA.
-
Search object 134 also may include instructions for displaying the retrieved data. For example, a time value may direct the BI server to plot the retrieved data on a time graph. -
BI server 108 operates as a metadata server and as a query delivery server. The metadata server generates relational models that identify the locations and relationships for different tables, columns, and data withindatabase system 106.BI server 108 may generate the relationship models from metadata extracted from the tables, columns, data, and link information in enterprise database. In another example, a system administrator may generate some of the relationship models. - The query delivery server portion of
BI server 108 convertssearch object 134 into structuredsearch instructions 140, sendsinstructions 140 todatabase system 106, and delivers the data retrieved fromdatabase system 106 touser interface 102.Structured search instructions 140 are based on the table, column, and/or data locations and relationships identified in the relational models. For example, the relational models inBI server 108 may indicate thatrevenue column 146 is located in a table T1 withindatabase system 106 andstate column 142 is located within a table T2 ofdatabase system 106.BI server 108 generates structuredsearch instructions 140 that reference the tables T1 andT2 containing columns -
Search engine 104 may include relational models for determining when queries can be processed. For example,search engine 104 may identify valid tokens that are associated with columns indatabase system 106. Search terms that do not correspond with valid tokens may be identified as invalid search requests. - In another example, the user may enter the term REVENUE COUNTRY as the query. The search engine may determine the query is invalid because revenue comes from table T1 and country comes from table T2 and there is no relationship that connects table T1 and table T2. Thus, the search engine uses an awareness of the relationships between data elements to improve efficiency and effectiveness of user search queries. In another scheme described below, the user may be able to dynamically map or link together table T1 and table T2 and generate search results for related columns.
-
Search engine 104 also may use relational models for resolving search ambiguities. A user query comprising valid tokens may potentially map to multiple tables or multiple columns indatabase system 106. In these examples,search engine 104 may ask the user for more information to disambiguate user search terms or may suggest alternative search terms. - For example, a user may enter the search term PRODUCTS FOR CALIFORNIA.
Search engine 104 may determine that a first table or column exists for products sold in California and a second table or column exists for products produced in California.Search engine 104 may query the user to select between products sold and products produced. -
Structured search instructions 140 also may include operators that indicate howdatabase system 106 should process the identified data. For example,instructions 140 may include a sum operator that instructsdatabase 106 to sum the data inrevenue column 146 of table T1.Instructions 140 also may include a GROUP BY OPERATOR that instructsdatabase 106 to group the revenue data inrevenue column 146 of table T1 according to the state names contained instate column 142 of table T2. In one example, searchinstructions 140 may comprise or be similar to SQL instructions. -
BI server 108 displays data retrieved back fromdatabase 106 withinuser interface 102. For example,BI server 108 may display a table 122 that identifies the total revenue for individual states.BI server 108 also may display the same data or different data in different formats, such as withinbar graph 124. -
BI server 108 may automatically change the format used for displaying data inuser interface 102 based on the parameters and dimensions of the associated data indatabase system 106. For example,state column 142 may have a relatively few number of entries andrevenue column 146 have a relatively large number of entries. -
BI server 108 may determine thatbar graph 124 is the best format for displaying the data fromcolumns state column 142 should be displayed on the x-axis ofbar graph 124 while the relatively large number of entries inrevenue column 146 should be displayed along the y-axis ofbar graph 124. Of courseBI server 108 may use other formats or criteria for displaying data. -
BI server 108 also may display the data based on anyfilter values 138 identified insearch object 134. For example,filter instruction 138 may direct BI server to only display revenue data for California or for a particular year. -
BI server 108 may automatically and dynamically suggest additional filters based on the retrieved data. For example, the relational models generated byBI server 108 may identify other columns indatabase system 106 associated withcolumn 142 and/orcolumn 146. For example, a third column may include the same state identifiers used incolumns BI server 108 may display a filter withinuser interface 102 that allows the user to filter the retrieved data by the different zip codes contained in the third column. -
FIG. 3 depicts an example of howdatabase system 106 may joinrevenue column 146 from table T1 withstate column 142 from table T2 in response to structuredsearch instructions 140.Database system 106 may joincolumns common state identifiers 144. For example, the second row ofstate column 142 may includestate identifier 2 associated with the attribute Alaska. The seventh row ofrevenue column 146 also may include thestate identifier 2 associated with the measure $4,500.Database system 106 may join the data contained in the second row ofcolumn 142 in table T2 with the data contained in the seventh row ofcolumn 146 of table T1. -
Database system 106 then may sum and group the joined columns based on operators instructured search instructions 140. For example,database system 106 may group together all revenue data associated withCalifornia state identifier 5 and sum the group of data together to generate a total of $139,995.Database system 106 may perform similar operations for the other entries incolumns results 147.Database system 106 sendsresults 147 back to the BI server and the BI server then displays the results within the user interface. - Joined table 145 may be cached by a particular database coordinator operating in
database system 106. The database coordinator that computes the final result may insert the results into the cache. If the BI server issues the same query to the same database coordinator, the BI server may receive the previously cached results. The BI server also may cache some results or may cache responses provided to the user interface instead of caching database query results. -
FIG. 4 depicts an example process for retrieving structured relational data based on unstructured user inputs.Operation 143A loads data into the information retrieval system. As mentioned above, the data may comprise any combination of structure relational data, and/or non-structured data. -
Operation 143B identifies data, columns, attributes, measures, values and any other information associated with the data. Metadata tokens and data tokens are generated based on the identified data and used for building indexes. -
Operation 143C receives a user input. The user input may comprise one or more characters or any other input entered into the search field of the user interface.Operation 143D uses the knowledge of the metadata tokens, data tokens, operators and their rankings to parse the user input.Operation 143E may generate suggestions based on the parsed user input. -
Operation 143F generates a query object based on the parsing of the user input. For example, a first character of the user input may match a first character of four different tokens.Operation 143F may generate a query object that requests the column data associated with all four tokens. The query object may identify the tokens and identify token classifications, such as an attribute, measure, or value. The query object also may identify operators such as, a group by operator, a summation operator, a standard deviation operator, an average operator, or a count operator. -
Operation 143G generates structured search instructions based on the query object. For example,operation 143G may identify the tables and columns in the database system associated with the tokens identified in the search object. The structured search instructions retrieve and join the data in the identified tables and columns. The structured search instructions also may instruct the database system to perform functions identified by the operators in the search object. -
Operation 143H may display the data received back from the database system. For example,operation 143H may display a sum for all revenue in the revenue column and may display tokens associated with each entry in the state column. Selection of a state token may display the revenue for the state associated with the selected token.Operation 143H may identify multiple columns associated with the search query. All columns may be ranked based on cardinality and usage. The data and/or tokens associated with the highest ranked columns may be displayed in the user interface -
FIGS. 5-7 show inputs and responses received and generated by the search engine. The search engine may receive a natural input, such as “how many people live in America”. For natural language inputs, natural language algorithms may extract relevant information and map the information to a more structured form. The input also may be structured, such as “count of social security numbers, country=america.” - The search engine may provide sophisticated spell correction. For example, if the input is CALP, the search engine provides feedback asking if the input should really be CLAP. After analyzing and recognizing the input, the search engine may provide additional guidance. For example, the search engine may provide some indication of possible additional search terms.
- Different search terms may be correlated to determine the correct search query. For example, if the user input includes the words Dublin and California, the search engine may determine that the search is related to Dublin Calif. and not Dublin Ireland.
- The search engine may provide prefix completions. For example, if the input is currently CAL, the search engine may automatically complete the input to CALIFORNIA. The search engine can also provide suffix completion. For example, if the input is currently PAL, the search engine may complete the input with PRINCIPAL.
- The search engine may complete the phrase and/or provide a ghost term completing the search term. For example, the user may type in REVENUE. The search engine complete the input with REVENUE FOR STATE=CALIFORNIA.
- The search engine also may anticipate what may come after a properly entered input and query the user. For example, for the input REVENUE FOR CALIFORNIA, the search engine may ask DO YOU WANT QUARTLY REVENUE FOR CALIFORNIA? Thus, even with a correct and complete input, the search engine still may provide additional guidance for accessing other available and related information.
- If an input is not recognized, the search engine may provide intelligent analysis to determine the error. If an input cannot be matched with stored data, the search engine may provide suggestions of what should be typed or ask questions about the input. For example, the search engine may provide feedback “This doesn't make sense, did you mean this?” Certain keywords are part of the search engine language. If these keywords are used incorrectly, the search engine may provide suggestions for the correct usage of the keyword, phrase, or language construct.
- The search engine also may generate a query for ambiguous inputs. For example, a transaction table may have data associated with an order date and a ship date. The input may request REVENUE BY YEAR. The search engine may resolve the ambiguity by asking the user if they want revenue data based on order date or ship date. The search engine also may sort out inconsistent data. For example, the input may include year=1980. Since there were no smart phones for sale in 1980, the search engine may filter out tokens and data from the search query related to smart phones.
- The search engine may perform unstructured text searches. For example, the database system may include unstructured data, such as employee notes, in addition to the structured data. The search engine may perform a keyword search for any other unstructured data.
- The search engine allows the user to select different join paths. For example, the search engine may allow the user to select which columns in which tables are associated with different search terms. The search engine then may generate a worksheet table containing the selected mappings.
-
FIG. 5A depicts an example of inputs entered into auser interface 102 operated by the information retrieval system. This is just one example of a variety of different ways search terms may be entered and feedback presented to a user.User interface 102 may display anelectronic page 148 within a web browser or other mobile or desktop application. A user may usecomputing device 114 inFIG. 1 to enter asearch term 152A into asearch field 150 withinelectronic page 148. - The search engine may display a
menu 154 including a first list oftokens 156A alternatively referred to as suggestions.Menu 154 also may include a second list ofoperators 158 alternatively referred to as useful keywords. The search engine may display sometokens 156A as soon as the user mouse clicks withinsearch field 150 and before the user enters any characters intosearch field 150. For example, an initial set oftokens 156A may be associated with the most frequently used search queries or the most frequently accessed data. -
Operators 158 may determine how search results are retrieved and displayed. A FORoperator 158 may direct the retrieval system to retrieve and display data for a particular attribute or category of search data, such as REVENUE DATA FOR CARS. ABY operator 158 may direct the retrieval system to group an identified category of search data by a particular sub-category, such as REVENUE BY YEAR. - If operators are not specified, the retrieval system may automatically figure out the operator from the context. For example, the user may enter the search terms REVENUE STATE. The retrieval system may interpret the search term as REVENUE BY STATE since state is the name of a column. In another example, the user may enter the search term REVENUE CALIFORNIA. The retrieval system may interpret the search term as REVENUE FOR CALIFORNIA since California is a value in a column and is not the name of the column.
-
Operators 158 also may include mathematical functions. For example, acount operator 158 may direct the retrieval system to display the number of an identified type of data, and minimum andmaximum operators 158 may direct the retrieval system to display minimum and maximum values for the data, respectively. A summation (sum)operator 158 may direct the retrieval system to add up a total value for the data, anaverage operator 158 may direct the retrieval system to calculate an average value for the data, and a standard deviation (std. dev.)operator 158 may direct the retrieval system to calculate a standard deviation for the data. The retrieval system also may use other types ofoperators 158. - The retrieval system may repeatedly update
tokens 156A in response to any combination of characters or terms entered intosearch field 150. For example, the user may initially enter the character C intosearch field 150. The search engine may identify and display a first set of tokens CAR, CALIFORNIA, and COMPUTERS. Simultaneously, the search engine may initiate a structured search query for data in the database system associated with the displayed tokens. - The user may enter a second character A into
search field 150. The search engine may identify and display a new second set of tokens CAR and CALIFORNIA. At the same time, the search engine may refine the previously structured search query for data in the database system associated with the second set of displayed tokens. -
FIG. 5B shows another state ofuser interface 102 after a user has completed entry of asearch term 152B. The user may type insearch term 152B or the user may have selected the associated CALIFORNIA token 156A previously displayed inmenu 154 ofFIG. 2A . The retrieval system may initiate another search in response the user entering the complete CALIFORNIA token intosearch field 150. - The retrieval system may automatically update
tokens 156B to correspond withsearch term 152B. For example, columns in the database system associated withtokens 156B may be linked to the column in the database system associated with the CALIFORNIA token. For example, REVENUE, YEAR, PART, TRANSACTION, AND WIDGET columns in the database system may be linked to the CALIFORNIA column in the database system. The retrieval system may displaytokens 156B associated with the REVENUE, YEAR, PART, TRANSACTION, AND WIDGET columns. - The retrieval system may rank tokens based on the state of the search query, data usage, cardinality, or user preferences. For example, the retrieval system may have identified dozens of tokens associated with the CALIFORNIA token. The retrieval system may have given the REVENUE token 156B a highest ranking. Accordingly, the retrieval system may display the REVENUE token 156B in
menu 154 and may display associated revenue data for the state of California within agraph 164. The retrieval system also may display total revenue for the state of California within adisplay box 160. - The retrieval system also may display
filters 162 based on the data associated withsearch term 152B. For example, filters 162 may identify other revenue data associated with other geographic regions. -
FIG. 5C depicts yet another state of a user search query where additional parameters have been added to formsearch term 152C. For example, a year value, a transaction value, and a product type have been added tosearch term 152C. The retrieval system may identify a new refined set oftokens 156C and also refine the search query and/or the displayed data in response to the additional parameters insearch term 152C. For example, the retrieval system may display the revenue data for California, foryear 2013, for debit transactions, and for different types of widgets. - The retrieval system also may display a
time period 166, a number ofparts 168, atotal revenue 170, and atotal tax 172 within a headline view associated withsearch term 152C. Of course these are only examples of any data that may be extracted from the database system and displayed withinuser interface 102. The retrieval system also may display other tables 174 andgraphs 176 associated with the retrieved data. - In one example, the retrieval system may automatically determine a best format for displaying the data. For example, the data associated with a search term may comprise a large range of values for a limited number of attribute categories. The retrieval system may decide a bar graph similar to
graph 176 is the best way to display the data. - The retrieval system may determine the number of measures, number of attributes, and number of values associated with the data and present data based on these data dimensions. The retrieval system may use other formats for displaying data, such as bubble charts, line graphs, pie charts, or the like, or combinations thereof, based on the dimensions of the associated tables or columns in the database system.
-
FIG. 6A depicts another example of how the search engine may suggest, predict, and/or complete search terms. The user may enter asearch term 152A and the search engine may complete the input withsearch term 152B. For example, the user may enter the search term REVENUE FOR EACH CUSTOMER REGION. The search engine may determine that revenue data associated with the search term includes associated commit order dates. The search engine may complete the search term by adding FOR COMMIT DATA BEFORE MM/DD/YYYY. The search engine may displayadditional search term 152B in a different font, color, and/or image intensity to represent the portion of the input added by the search engine. - The search engine may generate a search
request obtaining data 184B identifying the unique number of customer regions anddata 184C identifying the total revenue for each of the identified customer regions. The search engine may automatically generate abar graph 184D also displaying the total revenue for each of the identified customer regions. The search engine also may sum the revenue for all of the different regions and display the results astotal revenue 184A. -
FIG. 6B depicts one example of how the search engine provides explanations regarding data 184. Aninquiry icon 186 is displayed insearch field 150. In response to selection oficon 186, the search engine displaysinformation box 188 explaining how revenue data was computed for each customer region and then sorted in ascending.Information box 188 further explains that the revenue information was generated by linking data from a line order table with data in a customer table. In this example, the columns and/or tables containing the data are underlined ininformation box 188 -
FIG. 6C shows another example of how the search engine may suggest additional search terms based on apartial search term 152. The search engine may identify tokens and/oroperators 192 matching with thepartial search term 152. For example, the search engine may identify tokens matching the partial input MICHAEL JAC. The search engine also may identify other attributes, measures, values and/or operators associated with the identified tokens. The searchengine displays tokens 192 along with the other parameters inmenu 190 as suggested or completed search terms. - The search engine uses state machines, indexes, and the content of data to perform a structured analysis of the search terms. For example, the search engine may identify search terms as measures, attributes, values, and operators and determine a correct sequence of the search terms based on these characteristics of the search term. The search engine may identify valid data associated with the search term and may determine that the search term is not valid for a current state of the search term. The search engine also may provide suggestions for correct search terms to add to the current state of the search input.
-
FIG. 7A depicts anexample error message 194 generated by the search engine. The search engine may generate different error messages based on both the understood tokens and the unrecognized words in the search term. In this example, the search engine generateserror message 194 in response to the search term GROWTH BY. OF may be the only valid input after GROWTH. The search engine may generateerror message 194 identifying the incorrect input and suggesting a correct input for that particular state. -
FIG. 7B shows another example error message generated by the search engine. In this example,error message 196 explains that the search engine could not find customer data metadata associated withsearch term 198.Error message 196 also indicatessearch term 198 was expected to be either a keyword or a numeric column name. - Error messages may be generated in a variety of different ways. One scheme may provide error messages for different keywords. The search engine checks rules associated with the error messages that indicate where or how keywords should be used in a search term. The search engine displays the associated error message when the keyword is located in the wrong location or used in the wrong syntax within the search term. The error message may identify the current incorrect location of the keyword and the expected correct location of the keyword. For example, the error message may state: THE KEYWORD TOP IS LOCATED AT THE END OF YOUR SEARCH BUT WAS EXPECTED AT THE BEGINNING OF THE SEARCH.
- The state machines may branch to different error messages based on the state of the search. For example, an acceptable search term may be MEASURE BY ATTRIBUTE. However, the entered search term may comprise MEASURE BY MEASURE. The search engine may include a state machine that branches to an associated error message in response to receiving the second measure.
- The search engine also may identify a current state of the search, determine all acceptable inputs for a next state, and generate error messages based on the analysis. For example, the search engine may generate an error message: YOU COULD HAVE TYPED A NUMERIC ATTRIBUTE OR TYPED “GREATER THAN (>)” BUT YOU DIDN'T AND ACTUALLY TYPED AN INPUT THAT WE DO NOT UNDERSTAND.
- The search engine may determine the meaning or type of search term by searching through tables and columns. The search engine may generate an error message when the type of search term does not correlate with a current state of the search. For example, the search engine may generate the error message: THE WORD YOU TYPED IS ACTUALLY A CUSTOMER NAME, BUT WE WERE EXPECTING A COLUMN NAME.
- In another example, the user may enter the search term REVENUE FOR TAX BEFORE. The token BEFORE may be a valid keyword for date but not a valid floating operator. The search engine may generate the error message: THE KEYWORD “BEFORE” IS ONLY APPLICABLE WHEN PRECEEDING A DATE COLUMN. TRY ENTERING AN OPERATOR, SUCH AS =, <, >.
- The user may enter the keyword AND. The acceptable types of token entries for the current state may be attribute, measure, or value tokens. The search engine may generate a generic error message: THE WORD “AND” DOES NOT MAKE SENSE.
- In another example, the user may enter STATE as a search term. The search engine may generate the helpful hint message: YOU TYPED IN “STATE” YOU CAN NOW TYPE IN “=” OR ANOTHER COLUMN NAME.
- In another example, the user may enter a search term SUM REVENUE>100. The search term may be invalid and the user may be trying to sum up all of the rows of a revenue column that are greater than the
value 100. The search engine may generate the error message: WE DON'T SUPPORT THIS SYNTAX. YOU COULD TRY ENTERING “SUM REVENUE, REVENUE>100”. - In another example, the user may input the search term, REVENUE FOR CUSTOMER CITY MEERUT. The city MEERUT may not exist in any index. The
search engine 104 may display a message saying: “MERRUT” DOES NOT EXIST IN THE DATABASE SYSTEM. - Thus, the search engine generates different categories of error messages based on the semantics of the search term that help the search engine understand/predict what the user was trying to search. The search engine then generates context sensitive messages that guide the user back on a correct search path.
- For security reasons, the search engine might not display certain tokens and/or associated data to certain users. The search engine may provide both row level security and object level security. Row level security, for example, may restrict a salesman in the state of California to only viewing transactions in a sale table related to the state of California. Object level security, for example, may restrict a user from viewing all transactions from the sales table or from particular columns of the same table.
- In another example, a salesmen may only be authorized to view sales figures under a certain amount. In yet another example, human resources personnel may be authorized to view employee work history data but may not be authorized to view sensitive company financial data.
- The search engine not only limits what data is displayed to different users but also restricts which tokens are displayed to the users. Tokens may identify columns of data that some users should not know exists. For example, the tokens may identify companies, people, products, locations, transactions, etc. in the database system that should be kept confidential.
- The tokens and associated data may be assigned security values. The tokens associated with the data may be assigned a particular security flag that only allows access by managers. The search engine compares a security tag for the user with the security flags assigned to the data and tokens. If the two security identifiers do not match, the token and associated data will not be displayed to the user.
- Tokens may be ranked based on different parameters, such as usage. Some columns may be used more than others. For example, a STATE column may be used more than a SALARY column. Based on usage, the search engine may determine which columns/data are more likely to be requested. The search engine then ranks the tokens associated the columns based on the most likely data associated with the user.
- Ranking also may take into account the sequence of tokens associated with the search terms. For example, the STATE column may be used more often than the SALARY column However, the sequence or combination of the SALARY column and a REVENUE column may be used more frequently than a combination of the STATE column and the REVENUE column. Therefore, when the term REVENUE is previously entered into the search field, a next SALARY token may be ranked higher than a next STATE token.
- The tokens also may be ranked based on the context of the search term, such as based on a search term being an attribute, value, or measure. The search engine also may rank tokens based on the cardinality of the associated columns. For example, a
column # 1 may identify 10 million products for the state of California. Acolumn # 2 may identify 50 states that include the state of California. If the user enters the search term CALIFORNIA, the search engine may display the list of states incolumn # 2, since the answer may have a better chance of being associated with one of the 50 states versus being associated with one of the million products incolumn # 1. -
FIG. 8 depicts an example process for ranking tokens.Operation 200A receives the user inputs.Operation 200B identifies tokens associated with the user inputs. As explained above, tokens may be identified based on similarities between the characters in the user inputs and the characters in the tokens and the attribute, measure, and value classifications of the tokens. For example, the state machine may require the first search term to be an attribute.Operation 200B may only compare tokens classified as attributes with the first portion of the user input.Operation 200B may query the user for a clarification if the user input is ambiguous. -
Operation 200C identifies the usage and cardinality of the columns associated with the identified tokens andoperation 200D ranks the tokens based on the usage and cardinality.Operation 200D displays the tokens based on the rankings. For example, the highest ten ranked tokens may be displayed to the user. -
FIG. 9 depicts examples of different indexes that the search engine may use for ranking tokens. For example,indexes 210A may be global indexes that track token usage for all users of an enterprise.Indexes 210B may be department indexes that track token usage for different departments within the enterprise. Afirst department index 210B may track token usage for a group of users associated with the sales department and asecond department index 210B may track token usage for a group of users associated with the finance department.User indexes 210C may track token usage for individual users of the enterprise. -
Indexes global index 210A based on usage from all users. For example, the California token, and/or the data associated with California token, may have been displayed and/or selected by users of the enterprise more than any other token. However, the group of users in the legal department may have selected a different token more often than the California token. Accordingly, the tokens inglobal indexes 210A may have different token scores than the tokens inlegal department index 210B. - Token scores in
user indexes 210C may be different from token scores inglobal indexes 210A anddepartment indexes 210B. For example, a user responsible for sales in Colorado may select a Colorado token more often than a California token. Accordingly, the Colorado token inuser index 210C associated with the user may have a highest token score. - The search engine may rank tokens based on the token scores in
indexes more department indexes 210B and auser index 210C associated with the user. The search engine may identify theindexes - The search engine then may identify the top ranked tokens in each of
indexes user index 210C associated with the user may be given a highest weighting, tokens fromdepartment indexes 210C associated with the user may be given a next highest weighting, and tokens fromglobal indexes 210C may be assigned the lowest weighting. - With all else equal, tokens with the highest scores in
user index 210C may be ranked higher than the tokens with the highest scores indepartment index 210B andglobal index 210A. If the user is associated with aparticular department index 210B, tokens identified in the associateddepartment index 210B having a given token score may be ranked higher that tokens inglobal index 210A with an equivalent token score. Of course, other weighting can be assigned to the tokens scores in the different indexes 210. - Any combination of weightings and ranking criteria may be assigned to the
global indexes 210A,department indexes 210B, anduser indexes 210C. For example, the top three ranked tokens from each of theindexes - The search engine may perform another ranking based on the state machines that analyze the search term. Based on a state of the search term, the state machine may determine that one particular category of token is more likely than another type of token.
- For example, the user may enter the search term REVENUE BY. The search engine may determine that the next token could either be an attribute or a measure. The state machine also may determine that the next search term is more likely an attribute token than a measure token. For example, the search engine may determine that REVENUE BY STATE is a more likely search term than REVENUE BY AGE. Accordingly, the search engine may rank the state attribute token higher than the age measure token.
-
FIG. 10 summarizes some of the ranking operations used by the search engine for predicting and displaying tokens. Inoperation 220A, the search engine may identify potential tokens for displaying to the user. For example, based on a user input, one or more indexes may identify matching tokens or other tokens associated with the predicted intent of the user. -
Operation 220B identifies the scores for the identified tokens. Not all tokens need to be identified inoperation 220A. As mentioned above, a ranked subset of tokens may be identified based on the token scores identified inoperation 220B.Operation 220C weights the token scores based on the associated indexes. As mentioned above, token scores from a user index may be assigned a higher weighting that a token score from a global index. -
Operation 220D may further weight the token scores based on the token column cardinality. A token associated with a column with few entries may be given a higher weight than a token associated with a column with a large number of entries. -
Operation 220E also may weight token scores based on the types of tokens and the current state of the search term. As explained above, the search engine may determine a first token category is more likely than other token categories. The tokens with the first token category may be assigned higher weights than the other token categories.Operation 220F displays tokens to the user based on the associated weighted token scores. -
FIG. 11 depicts anexample state machine 300 used by the search engine.State machine 300 is just one example of almost limitless combinations of functions that may be performed based on any combination of search terms. In one example,state machine 300 identifies valid inputs and any constraints on completion of a search term.State machine 300 generates query instructions from the search term that are then used for performing a structured SQL type search or an unstructured keyword search in the database system. -
State machine 300 may accept search terms that are relatively unconstrained. For example,state machine 300 may accept search terms with different column orderings. In one example, the user may specify a value in the search term and then specify a filter in the search term. In another example, the user may specify the filter in the search term and then specify the value. - In one example,
state machine 300 may not have a designated end state thus allowing the user to continue to add search terms and change search results. For example,state machine 300 may determine the search term comprises an attribute and guide the user into selecting a filter.State machine 300 then may return to a previous state allowing the user to enter additional search terms. - As mentioned above,
state machine 300 also may rank tokens. For example,state machine 300 may predict a next entry in the search term is most likely a measure, and assign a higher weight to measure tokens and assign a lower weight to attribute tokens.State machine 300 may determine that the next search term is least likely a value and assign even lower weights to the value tokens. - Referring both to
FIG. 11 andFIG. 12A , the user may enter the letter G intosearch field 150.State machine 300 could start either instate 302A orstate 302D. The only valid input forstate 302A is GROWTH OF.State machine 300 may predict that GROWTH OF is a likely input and display GROWTH OF inmenu 154 as one ofkeywords 158. - An index associated with
state 302D may include a GEORGIA token and a GREECE token. GEORGIA and GREECE are valid value type tokens forstate 302D. Accordingly, the search engine may display the GEORGIA token and the GREECE tokens inmenu 154. - While not starting with the letter G, a REVENUE token may have a relatively high token score. For example, the REVENUE token might be the most frequently used token in the global index, and/or a department index associated with the user, and/or a user index associated with the user. The REVENUE token is also a valid measure type input for
state 302D. Accordingly, the search engine also displays the REVENUE token inmenu 154. A GREEN token also may exist within one of the indexes associated withstate 302D and a value type GREEN token is also a valid input forstate 302D. According, the search engine also displays the GREEN token inmenu 154. - In
FIG. 12B , the user may add a second letter R tosearch term 152. GR is still a valid search term forstate 302A and GREECE and GREEN are still valid search terms forstate 302D. Accordingly, the search engine may continue to display GROWTH OF, GREECE and GREEN inmenu 154. The search engine now determines that the GEORGIA and REVENUE tokens are not unlikely inputs and removes them frommenu 154 and/or displays other more likely tokens. -
FIG. 12C shows a next state where the user enters the letter O tosearch term 152. In this example,state machine 300 determines that GREECE and GREEN are now unlikely tokens and removes them frommenu 154. In this example,state 302D may no longer be valid and the user is limited to entering GROWTH OF intosearch field 150 or selecting the GROWTH OF keyword frommenu 154. -
FIG. 12D showselectronic page 148 displayed by the user interface after the user enters GROWTH OF intosearch field 150.Search term 152 movesstate machine 300 intostate 302B. The only valid entries instate 302B are measure type tokens. Accordingly,state machine 300 may display the highest ranked REVENUE and TAX measure tokens. The user may type or select the REVENUE token movingstate machine 300 intostate 302C. - Referring to
FIG. 12E , the only valid input instate 302C is a FOR operator.State machine 300 may automatically display FOR inmenu 154 or may automatically add the FOR operator tosearch term 152.State machine 300 then may display a correct format for a next valid input such as a date inmenu 154. -
FIG. 12F shows theelectronic page 148 after the user adds a date tosearch term 152 andstate machine 300 moves intostate 302D. In response to enteringstate 302D,state machine 300 may generate instructions for calculating revenue growth and display therevenue growth data State machine 300 also may displayfilters 322 identifying other columns associated with the revenue growth data. - Since no specific geographic region or product category is identified prior to moving into
state 302D,state machine 300 may generate growth data 320 for all products sold by the enterprise in 2013. In another example,state machine 300 may display revenue growth data 320 for the highest ranked tokens. For example,state machine 300 may generate and display revenue growth data for the state of California. - Different types of tokens can be added to
search term 152 afterstate machine 300 moves intostate 302D. For example, valid tokens forstate 302D may include a measure token 310A, an attribute token 310B or 310F, or a value token 310E. Attribute tokens may include a highly ranked STATE token and value tokens may include a highly ranked CALIFORNIA token and a highly ranked OREGON token. Accordingly, state machine may display the STATE, CALIFORNIA, and OREGON tokens inmenu 154 in response to moving intostate 302D. - The user may add a FOR operator followed by a STATE attribute token to
search term 152 moving state machine intostate 302E. Instate 302E,state machine 300 requires an “=”operator 310C. In response to the user entering the “=” operator,state machine 300 may display different state names inmenu 154. -
FIG. 12G showsuser interface 102 after the user completes entering the search term GROWTH OF REVENUE FOR 2013 FOR STATE=CALIFORNIA.State machine 300 now moves back tostate 302D, and derives and displays more specific revenue growth data for the state of California. For example,state machine 300 may display overallrevenue growth data 324A for California,revenue growth data 324B for different products sold in California, and display Californiarevenue growth data 324C for different years. -
Filters 322 may be updated to show other tokens associated with California. For example, filters 322 may include tokens for other cities in the state of California or for other states.State machine 300 may automatically update growth data 344 based on selection or de-selection of filters 342. For example, the user may un-check the California filter.State machine 300 may automatically changesearch term 152 back to GROWTH OF REVENUE FOR 2013 and automatically regenerate the growth data 320 previously shown inFIG. 12F . In another example, the user may selectLA filter 322.State machine 300 may automatically changesearch term 152 to GROWTH OF REVENUE FOR 2013 FOR CITY=LOS ANGELES and display growth data for Los Angeles. - Thus, the search may be modified by entering search terms into
search field 150 or by selecting one offilters 322. In response to selecting one offilters 322, the search engine automatically displays the associated search terms insearch field 150. Thus, the search engine displays to the user the correct syntax for switching between the different data 320 and 324. The search engine may automatically uncheck the CALIFORNIA filter and/or the 2013 filter when the user removes CALIFORNIA or 2013 fromsearch term 152. -
State machine 300 generates query generation instructions (QGI) at different states 302. For example,state 302B may generate afirst QGI instruction 312A in response to detecting REVENUE in the search term, such as INSTRUCTION 1: ADD COLUMN FOR GROWTH (REVENUE).Instruction 312A specifies the column containing the revenue data and includes a modifier that directs the database to calculate growth for the revenue data. -
State 302B generates instruction 312B in response to receiving a date in the search term, such as INSTRUCTION 2: USE COLUMN FOR GROWTH DIMENSION (DATE).Instruction 2 directs the database to compare the growth for the revenue column along a date dimension. -
State 302F generates aninstruction 312C in response to receiving the value California in the search term, such as INSTRUCTION 3: ADD FILTER (COLUMN=STATE, OPERATOR=“=”, VALUE=CALIFORNIA).Instruction 312C adds a filter to the column where the revenue data is extracted for the value California in the state column. - As explained above, instructions 312 are sent from
search engine 104 inFIG. 1 to theBI server 108.BI server 108 uses instructions 312 to generate SQL type instructions that access the identified data indatabase system 106 inFIG. 1 . The data generated bydatabase system 106 is displayed inuser interface 102. - As shown above,
state machine 300 may break down a search term into different phrases. For example,state machine 300 broke the search term GROWTH OF REVENUE FOR DATE STATE=CALIFORNIA into a first phrase GROWTH OF REVENUE FOR DATE and a second phrase STATE=CALIFORNIA. This allowsstate machine 300 to make different annotations and assumptions regarding the search term. - The search term is relatively confined in
states 302A-302C but then can expand in multiple directions after reachingstate 302D.State machine 300 can customize error messages and predict tokens based on the different phrases. For example, instate 302 C state machine 300 can generate a message that directs the user to use the REVENUE token. Instate 302D,state machine 300 may generate more general messages for a variety of possible token types. -
State machine 300 predicts the intent of the user and obtains data for the predicted inputs. For example, the user may enter the search term REVENUE. Since REVENUE is a measure,state machine 300 may predict that the user would like the measure values summed together.State machine 300 also may assume that since REVENUE is a measure, the user may want to view other related attributes and measures, such as revenue for state, city, tax, etc. Accordingly,state machine 300 may suggest STATE (attribute) and/or display revenue data for particular states, such as for CALIFORNIA (value). -
State machine 300 also may predict the intent of the user based on keywords. For example, the user may enter the measure REVENUE followed by a keyword, such as FOR.State machine 300 may predict that the user is searching for revenue data for a particular geographic region or time period and display associated columns in the database, such as city, state, time periods, etc. This allows the user to explore the data within the structured database without having to manually generate SQL instructions. - The search engine may have a hierarchy of state machines that include, but are not limited to, deterministic finite automata state machines and non-deterministic finite automata state machines. The search engine also may use a natural language processor (NLP) to analyze the search term, explain search results, display errors, and make suggestions for correctly entering search terms.
- The search engine also may perform unstructured text searches. For example, the enterprise may include unstructured data, such as employee notes, in addition to the structured data in enterprise database 112. The search engine may perform a keyword search with the search term for any other unstructured data that includes the keywords GROWTH, REVENUE and/or CALIFORNIA.
-
FIG. 13 depicts an example of how the search engine may associate non-structured user inputs with columns in a relational database system. The retrieval system may generate different indexes based on the database tables uploaded from the enterprise relational database or based on any other structured or unstructured data uploaded from the enterprise database. For example, the retrieval system may generate anindex 400 that identifies the names of states contained in column 180. - The retrieval system may identify three different state names California, Colorado, and Connecticut in
column 402 in response to the user entering the letter C into the search field. The retrieval system not only associates the letter C with the three state names incolumn 402 but also may associate the user input C withrevenue column 404 viastate identifiers -
Index 400 moves into a second logic state after the user enters a second letter A. The retrieval system now may anticipate/predict that the user is searching for data related to California. In response to detecting the second index logic state, the retrieval system may refine the data displayed in the user interface. For example, the retrieval system may display total revenue of $139,993 for California. The retrieval system also may display other data associated with California. For example, the retrieval system may identify other columns in the database system containingstate identifier 5 and display tokens and/or data associated with the identified columns. -
FIG. 14 depicts anotherexample index 500 generated from columns indatabase system 106 inFIG. 1 . Hundreds ofindexes 500 may be generated fromdatabase system 106 and for different users and departments with an enterprise. Large numbers of tokens 504 may exist in eachindex 500. A user may enter a first character of a search term into a search field. The search engine may identify millions of potential tokens 504 based on the first character. - The search engine ranks tokens 504 and uses the rankings to predict which tokens 504 the user intends to enter into the search field. The search engine may assign each token 504 a score 506 based on any combination of factors. For example, scores 506 may be based on the number of times a particular token 504 is selected by users. Scores 506 also may be based on the number or types of columns, rows, measures, attributes, values, etc. associated with tokens 504.
- The search engine also may rank tokens 504 based on which type of token is most likely to follow a currently entered search term. For example, a current search term may comprise a measure token. It may be more likely that the next token used in the search term is an attribute type of token. Accordingly, attribute tokens may be ranked higher than measure or value tokens.
- The search engine may identify a
root state 502A based on the search term entered by the user. For example, the user may enter the search term STATES. The search engine may identifyroot state 502A and display different tokens 504 withinindex 500 based on the associated scores 506. - For example, Alabama token 504A may have a
score 506A of 12, Alaska token 504B may have ascore 506B of 25, Arizona token 504C may have ascore 506C of 75,California token 504D may have ascore 506D of 100,Connecticut token 504E may have ascore 506E of 8, and Colorado token 504F may have ascore 506F of 55. Based on scores 506, the search engine may display California token 504D first,Arizona token 504C second,Colorado token 504F third, etc. - In
root state 502A, the user may enter a character, such as the letter C. The search engine parses down the right branch ofroot state 502A to identifytokens tokens - As mentioned above, hundreds, thousands, or millions of tokens 504 may be associated with a
particular index 500. To reduce processing bandwidth, token scores 506 may be propagated upindex 500. The search engine then only needs to propagate down the branches associated with the highest ranked tokens 504. - The search engine may identify the highest ranked token in each separate branch. For example,
California token 506D has the highest score of 100 in the right branch ofroot state 502A. The right branch ofroot state 502A is assigned thescore 100 and the left branch ofstate 502D is also assigned thescore 100.Colorado token 504F has the highest score in the right branch ofstate 502D and is assigned Coloradotoken score 55. Arizona token 504C has the highesttoken score 506C in the left branch ofroot state 502A. The left branch ofroot state 502A is therefore assigned the Arizonatoken score 75. - The search engine can quickly determine the highest ranked tokens 504. For example, the search engine may want to display the two top ranked tokens 504 when in
root state 502A. As described above, the search engine would typically need to parse down each individual branch inindex 500 to identify all of the possible tokens 504 and associated token scores 506, and then display the two tokens 504 with the highest scores 506. - Instead the search engine first identifies the right branch of
root state 502A as containing the highest token score. The search engine then determines that the left branch ofstate 502D has a higher score of 100 then the token score of 55 for the right branch ofstate 502D. Since,California token 504D is the only token in the left branch ofstate 502D, the search engine displaysCalifornia token 504D has the highest ranked token. - The search engine determines that the next highest token score for the right branch of
state 502D is 55 and previously determined that the highest token score for the left branch ofroot state 502A is 75. The search engine then moves down the left branch ofroot state 502A to find the second highest token score.State 502B indicates the right branch contains the second highest token score. Therefore, the search engine does not need to parse down the left branch ofstate 502B. Arizona token 504C is the only token in the right branch ofstate 502B and is accordingly displayed to the user as the second highest ranked token. - Referring to
FIGS. 14 and 15 , the search engine may generate apriority query 520 forindex 500. Inoperation 522A, the search engine first may identify the scores for each branch inroot state 502A. In this example, only two branches and associatedbranch scores root state 502A. - In
operation 522B, the search engine may parse down to a next level of the branch with the highest score and identify additional branch or token scores. In this example,state 502D includes afirst branch score 100 and asecond branch score 55.Operation 522B addsbranch score 55 as a current third highest branch score inindex 500. - In
operation 522C, the search engine parses down a next level ofindex 500 for the branch with the highest branch score. In this example, the search engine moves down the left branch ofstate 502D. The left branch ofstate 502D has no additional branches and terminates at California token 504D. Accordingly, the search engine identifiesCalifornia token 504D as the highest ranked token forindex 500. - In
operation 522D,priority queue 520 remains withbranch scores operation 522E moves down the right branch instate 502B associated with the nexthighest branch score 75. In this example, the right branch ofstate 502B has no additional branches and terminates at Arizona token 504C. Accordingly, the search engine inoperation 522F identifiesArizona token 504C as the next highest ranked token forindex 500. Operations 522 continue until all, or some specified number of, tokens inindex 500 have been ranked. - The search engine displays tokens 504 based on the rankings in
priority queue 520. For example, a user may enter a search term that moves the search engine intoroot state 502A. In response to moving intoroot state 502A, the search engine may automatically display the three top ranked tokens inindex 500. Based on the rankings inpriority queue 520, the search engine displays California token 504, Arizona token 504, and Colorado token 504F. The user then may add the letter A to the search term. Based onpriority queue 520, the search engine may display Arizona token 504, Alaska token 504, and Alabama token 504 in order as the three top ranked tokens that begin with the letter A. - Referring to
FIG. 16 , the search engine may assign security bitmasks 540 to tokens 504. Security bitmasks 540 may include different bits associated with different user groups. For example, a first bit may be associated with a management group, a second bit may be associated with a human resources group, and a third bit may be associated with a sales group. These of course are just examples and any number of bits may be associated with any number of different groups. - The search engine may set token security bitmasks 540 based on which groups of users are authorized to view the tokens and/or the associated data. For example, the search engine may set
token bitmask 540A so only management and sales groups can viewCalifornia token 504D and any associated data. The search engine may settoken security bitmask 540B so only the management group can viewArizona token 504C and any associated data. - In another example, different bits in security bitmasks 540 may be associated with different geographic regions. For example, different bits may be associated with different cities, states, countries, sales regions, etc. The geographic bits may limit some users to only viewing tokens and associated data for particular regions. For example, a salesman assigned to an Oregon sales region may only be prevented from viewing sales data for other states.
-
FIG. 17 shows how the security bitmaps may determine which tokens are displayed to users. Branch security bitmaps 542 may be generated based on token security bits maps 540. For example, the left branch ofstate 502E and the right branch ofindex state 502E each comprise one token. Accordingly, leftbranch security bitmap 542E andtoken security bitmap 540F are the same, and rightbranch security bitmap 542F andtoken security bitmap 540C are the same. - A next higher level
branch security bitmap 542D combines lower levelbranch security bitmaps branch security bitmaps branch security bitmap 542D. Similarly, root levelbranch security bitmap 542B combinesbranch security bitmaps - A user may be assigned a
user security bitmap 544. For example, the user may enter the information retrieval system using a login identifier and password. The login identifier may be associated with a user profile that associates the user with certain groups and/or geographic regions. In this example, the user may belong to the sales group and the search engine sets the third bit inuser security bitmap 544. If authorization were controlled for different geographical regions, the search engine may set bits inuser security bitmap 544 associated with the sales regions where the user has sales responsibilities. - The user may enter a
search term 546, or enter a character forsearch term 546, into the user interface. In a first example, the user may enter a character C forsearch term 546. The search engine parses down the right branch ofroot state 502A.California token 504E is the highest ranked token inindex 500 but can only be viewed by the management group since only the first bit is set inbranch security bitmap 542C. Since the user only belongs to the sales group, the search engine does not parse down the left branch ofstate 502D and does not display California token 504D. - The sales group is allowed to view certain tokens and associated data in the right branch of
state 502A. The search engine determines frombranch security bitmap 542E that the user is authorized to view the token in the left branch ofstate 502E. Based onbranch security bitmap 542E, the token score for token 504E, and the other weightings and rankings described above, the search engine may display Connecticut token 504E and display the data associated with token 504E, such as sales figures for the state of Connecticut. - The right branch of
state 502E can only be viewed by users in the management group. Therefore, the search engine also does not display Colorado token 504F or any data associated with Colorado token 504F even thoughColorado token 504F has a higher token score then Connecticut token 504E. - The left branch of
root state 502A also can also only be viewed by users in the management group. The user while inroot state 502A may enter the character A assearch term 546. Based onbranch security bitmap 542A, the search engine does not display any tokens that reside in the left branch ofroot state 502A. - In another example, user
security bit map 544 may have the first management bit set and have another geographic bit set associated with the state of Colorado. Colorado token 504F may have an associated geographic bit set intoken security bitmap 540C. The search engine may have enteredroot state 502A in response to the user entering REVENUE assearch term 546. Since the user is specifically authorized to view data associated withColorado token 504F, the search engine may automatically display revenue figures for Colorado in response to the user simply entering the search term REVENUE. In other words, the matching geographic regions in the security bitmaps may produce a highest token ranking for Colorado token 504F. - In yet another example,
user security bitmap 544 may include bits associated with different revenue levels. For example, a bit inuser security bitmap 544 may enable the user to view financial transactions over $1 million. However, the user may not have authorization to view transactions over $1 million and therefore the search engine does not set the $1 million revenue level bit inuser security bitmap 544. - The user again may enter REVENUE as
search term 546. The search engine may not access or display column rows with revenue values greater than $1 million. The search engine also may not access or display names, tokens, attributes, measures, values, etc. associated with any of the rows with values greater $1 million. For example, the user may enter the search term REVENUE NAME=. The search engine may only display tokens with associated columns rows having values less than $1 million. - Referring to
FIG. 18 , the following three search queries may be received by the search engine. -
Query 1,user 1,department 1, search term=revenue by state.Query 2,user 2,department 1, search term=growth of tax by order date.Query 3,user 3,department 2, search term=ship priority delay by state. - The information retrieval system may take into account all three queries when building
global indexes 550A. For example, token scores inglobal indexes 550A may be adjusted based on the tokens identified inquery 1,query 2, andquery 3. - The information retrieval system may add a revenue token and a state token to a
first user index 550C associated withuser 1 in response to receivingquery 1. If the revenue and/or state tokens already exist in thefirst user index 550C, the associated token scores may be increased. - The search engine may adjust a
second user index 550C associated withuser 2 in response toquery 2, and adjust athird user index 550C associated withuser 3 in response toquery 3. For example, the information retrieval system may add a TAX token to the second user index. - The information retrieval system may adjust a
first department index 550B associated with a first department of an enterprise in response toqueries user 1 anduser 2 may both be associated with the marketing department of the enterprise. The information retrieval system may add tokens or adjust token scores indepartment index 550B based on the tokens inquery 1 andquery 2. The search engine may adjust asecond department index 550B associated with a second department of the enterprise in response toquery 3. - A substantial amount of processing bandwidth may be needed for building and maintaining large indexes 550. For example, indexes 550 may need to be updated whenever new data is added to
database system 106 inFIG. 1 . The information retrieval system may buildincremental indexes 550D to avoid continuously rebuilding large indexes, such asglobal indexes 550A and/ordepartment indexes 550B. - For example, on a
particular day 1000 rows may be added to a customer column that includes names for 100 new customers. The information retrieval system may need to add 100 new tokens toglobal index 550A for the new customer data. Instead of rebuildingglobal index 550A, the information retrieval system may add the 100 new tokens to anincremental index 550D associated with theglobal index 550A. - Subsequently a user may perform a search for some of the new customer data. The search engine uses both
global index 550A andincremental index 550D to identify the customer tokens. -
FIG. 19 depicts an example of how indexes may be divided over different servers. A subset of eachglobal index 550A and/ordepartment index 550B may be distributed overdifferent servers server 570A may include a subset of indexes 550 containing tokens starting with letters A-H,server 570B may include a subset of indexes 550 containing tokens starting with letters I-P, andserver 570C may include a subset of indexes 550 containing tokens starting with letters Q-Z. - A user may enter the search term CALIFORNIA. The search engine may send the search term to
server 570A. Search terms with no particular token characters may be sent in parallel to all threeservers servers indexes 550A and 510B are displayed in the user interface. - Some indexes may include token prefixes and/or suffixes that detect incorrectly spelled search terms or sub-portions of search terms. For example,
server 570A may contain a primary index with the correct token spellings andservers different severs - To save processing bandwidth, the search engine may first send the search term to
server 570A containing the primary index with the correct token spelling. The search engine may limit the number of displayed tokens. For example, the search engine may only display ten tokens. If the primary index with the correct token spellings onserver 570A identifies ten tokens, the search engine might not search the other prefix and suffix indexes onservers servers - The search engine may terminate the search on
servers server 570A regardless of the required amount of time. - In another example, instead of typing in REVENUE, the user may incorrectly type in the search term ROVENUE. The search engine may try dropping each one of the letters, such as rvenue, ovenue, roenue, rovnue, etc. The search engine then may try replacing each letter with other letters, such as aovenue, bovenue . . . ravenue, rbvenue. etc. The search engine then may insert characters between each existing character, such as arovenue, brovenue, . . . raovenue, rbovenue, etc.
- These search term variations may produce hundreds of candidate search terms. The search engine may apply the modified search terms into a bloom filter that determines if any of the candidate search terms exist in indexes 550. The bloom filter may reduce the number of candidate search terms, such as down to ten candidate search terms. The ten candidate terms then may be submitted to the primary index. If four matching tokens are identified in the primary index, the search engine may use state machine rankings and other rankings described above to determine which matching tokens are displayed to the user.
-
FIG. 20 depicts one example of how the search engine may provide selectable join paths. A table 600 in the database system identifies amounts for different transactions. The transactions may have different dates, such as an order date, a commit date, and a ship date. A second table 602 may identify dates associated with different holidays. For example, December 25 and January 1st identified dentified as holidays and December 5 and January 2 are identified as non-holidays. A system administrator may have createdlinks - A user may enter the search term AMOUNT IS HOLIDAY into
search field 150.Search term 152 may be considered ambiguous since three different values may exist in table 600. For example, the amount is holiday is zero based on the commit date. However, the amount is holiday would be $100 based on the order date and the amount would be $200 based on the ship date. - Referring to
FIG. 21 , the search engine determines thatsearch query 152 is ambiguous and displays amessage 610 asking how to map the order table 600 to holiday table 602. The search engine may display the different date columns of table 600 inmenu 614, display some of the different associated amounts inmenu 612, and display some of the associated values from holiday table 602 inmenu 616. The search engine may select a default date column and highlight the associated column name and associated values inmenus menu 612, the order date column name inmenu 614, and the true holiday value inmenu 616. - The search engine may display other amounts or other column names in response to other selections in
menus menu 616. If the ship date column name inmenu 614 is selected, the search engine may highlight the amount of $200 inmenu 612 and highlight the true value inmenu 616. The user may select aconfirm icon 618 to confirm the selection of different items inmenus - The search engine may not query the user and/or
display menus search term 152 is unambiguous. For example, the user may enter the search term AMOUNT BY COMMIT DATE. The search term may generate one definite answer from table 600. Therefore, no other selectable join paths need to be displayed inmenus -
FIG. 22 shows another example of how the search engine provides user selectable join paths. A table 620 in the database system may identify persons who referred someone for an employment position (referrers) and persons who were referred by the referrers (referees). A table 622 may contain the names of the referrers and referees and include identifiers for a current employer and a previous employer of the referrers and referees. Table 624 contains the names of the current and previous employers. For example, tables 620 and 622 indicate that Jones referred Smith. Tables 622 and 624 identify the current employer of Jones as Software, Inc. and the previous employer of Jones as Eagle, Corp. - A user may enter the search term NAME BY CURRENT EMPLOYER into
search field 150. The search engine determinessearch term 152 is not ambiguous since one list can be provided from tables 622 and 624. Accordingly, search engine displays alist 626 of employee names from table 622 and associated current employers from table 624. - Referring to
FIG. 23 , the user may enter anambiguous search term 152 intosearch field 150, such as NAME NAME. The two search terms NAME and NAME may refer to the name of the referrer or the name of the referee. The search engine could display all names in table 622 for the first search term name and all names in table 622 for the second search term name. However, these search results may not provide useful information. - The search engine may display
windows window 630A orwindow 630B. - For example, the referrer and referee columns of table 620 may be linked to the employee identifiers in table 622. The search engine may identify the names of the referrer and referee columns of table 620 in
menus menu 632A may cause the search engine to map the first NAME insearch term 152 with the referrer column in table 620. Selecting the referee name inmenu 632B may cause the search engine to map the second NAME insearch term 152 with the referee column in table 620. Accordingly, the search engine displays alist 636 of referrer names and associated referee names. Thus, the search engine allows the user to selectively map search terms to columns in different tables. -
FIG. 24 depicts another example where the user enters the search term COMPANY COMPANY. Referring toFIGS. 23 and 24 , simply displaying a list of all company names in table 624 for both the first search term COMPANY and the second search term COMPANY may not provide useful information. The search engine may displaychange mapping windows - In response to selection of windows 630, the search engine may display different columns that are directly or indirectly linked with company name table 624 in
FIG. 23 . For example, the current employer and previously employer columns in table 622 may be directly linked to the company identifier column in company table 624. The referrer and referee columns in table 620 are indirectly linked to company table 624 through employee table 622. - The search engine may identify the different columns in tables 620, 622, and 624 that can be joined with the two COMPANY tokens in
search term 152. In this example, sixteen different combinations may exist between referrer, referee, current employer, and previous employer. For example, the first item inmenu 632A may identify the join path between the referrer column in table 620 and the current employer column in table 622. The second item inmenu 632A may identify the join path between the referrer column in table 620 and the previous employer column in table 622. - When the top entry in
menu 632A is selected, the search engine maps the first COMPANY token insearch term 152 with the referrer column in table 620, the associated entries in the current employer column of table 622, and the associated company names in table 624. When the third entry inmenu 632B is selected, the search engine maps the second COMPANY token insearch term 152 with the referee column in table 620, the associated entries in the current employer column of table 622, and the associated company names in table 624. - Accordingly, the search engine displays a
list 636A of current employers for employees that are identified as referrers in table 620. The search engine also displays an associatedlist 636B of current employers for the associated referees identified in table 620. - Thus, the search engine allows users to dynamically assign join paths to search terms/
tokens 152. The join paths may identify a root table and other selected paths/links from the root table to other tables. The possible join paths are displayed in a drop down menu in response to the user selecting search terms/tokens 152 insearch field 150. The user may dynamically change current join paths by selecting different columns names displayed in the drop down menus. - The search engine may identify new join paths as additional tokens are added to
search term 152. For example, the user may add the token AMOUNT tosearch term 152. Another table may be linked to the employee identifiers in table 622 that contains an amount column. The search engine may display additional join paths between the amount column in the new table and the columns in tables 620, 622, and 624. -
FIG. 25 depicts an example of how the search engine may rank join paths. The join paths may be ranked based on usage or cardinality similar to the ways described above for tokens. For example, the user may enter the search term AMOUNT NAME. Users may have selected the sales agent column in table 640 more than the customer column in table 640. Based on usage, the search engine may rank the sales agent column higher than the customer column. Accordingly, the search engine may display the sales agent column name above the customer column name inmenu 632. - The search engine also may rank join paths based on the number and level of join paths between a root table and other tables. For example, the search engine may have selected orders table 640 as a root table for
search term 152. A direct join path/link 643 links the tokens insearch term 152 with the customer and sales agent columns in table 640. Secondary join path/links 644A-C link the NAME token insearch term 152 to a boss identifier column in table 642. - With all other ranking and weighting considerations being equal, the search engine may rank the customer and sales agent columns associated with
search term 152 throughdirect link 643 higher than the boss column in table 642 associated withsearch term 152 throughsecondary links 644A-C. The search engine accordingly displays the sales agent column and customer column names above the boss column name inmenu 632. -
FIG. 26 shows an example of how the search engine may handle multiple root tables. A user may enter the search term BUDGET AMOUNT EXPENSE AMOUNT. The search engine may identify a budget table 650 as the root table for the BUDGET AMOUNT portion ofsearch term 152 and may identify an expense table 652 as the root table for the EXPENSE AMOUNT portion ofsearch term 152. The search engine may derive the total budget amount from root table 650 as $300 and derive the total expense amount from root table 652 as $270. - The user may add the token PERSON so
search term 152 now comprises BUDGET AMOUNT EXPENSE AMOUNT PERSON. Each table 650 and 652 may have multiple columns associated with the PERSON search term. For example, BUDGET AMOUNT may be associated with a buyer column or a signer column in table 650. The EXPENSE AMOUNT search term may be associated with a buyer column and a signer column in table 652. - The search engine may display
menu 632A querying the user to select either the buyer or signer column in table 650 for mapping to the BUDGET AMOUNT insearch term 152 anddisplay menu 632B querying the user to select either the buyer or sign column in table 652 for mapping to the EXPENSE AMOUNT insearch term 152. Prior to displayingmenus 632, the search engine may select one join path between table 650 and table 654 as a default and one join path between table 652 and table 654 as a default. - A system administrator may create links that link columns in different tables together. The user then selects between the different links to create the different join paths between the search terms and the table columns. However, links may not exist between certain tables. The search engine may identify currently unlinked tables that may have related columns and enable the user to then create links and associated join paths between the related columns.
- For example, no link may currently exist between budget table 650 and name table 654. The search engine may determine that the buyer and signer identifiers in table 650 have similarities with the person identifiers in table 654. For example, the values in tables 650 and 654 have use similar numbers or string values. The search engine may display an
icon 658 that allows the user to create new links between tables 650 and 654. - In response to selecting
icon 658, the search engine may display a message asking the user if they would like to link budget table 650 to person table 654. In response to an additional user input, the search engine createslinks - The search engine also may give the user the option to select link names. For example, the user may assign the name AUTHORIZED SIGNER to link 660B. The name AUTHORIZED SIGNER would then appear in
menu 632A. -
FIG. 27 depicts an example of how the search engine may create a worksheet. As previously shown, table 620 contains referrer and referee identifier columns and table 622 contains the names associated with the referrer and referee identifiers in table 620 and also contains current and previous employer identifiers. Table 624 contains the names associated with the current and previous employer identifiers in table 622. - As explained above, the tables may have been previously linked together and search engine may display different menus showing selectable join paths between the search terms and the different columns in tables 620, 622, and 624. For example, the search engine may display
menu 632 showing different join paths for associating with the first COMPANY insearch term 152. -
FIG. 28 shows how the search engine dynamically creates a worksheet based on the user selected join paths. Referring toFIGS. 27 and 28 , the user, or a group of users, may not want to continuously map search terms with different join paths. The search engine allows the user to create a worksheet table 680 that contains the selected table entries for the selected join paths. For example, the user may select the referrer column name inmenu 632A, the referee column name inmenu 632B, the referrer current company name inmenu 632C, and the referee current company name inmenu 632D. - These user selections cause the search engine to map the first NAME in
search term 152 to the referrer column in table 620, map the second NAME insearch term 152 to the referee entries in table 620, map the first COMPANY insearch term 152 to the referrer current company entries in table 622, and map the second COMPANY entry insearch term 152 to the referee current company entries in table 622. - In response to selecting
icon 678, the search engine generates a worksheet table 680 that contains the user selected table mappings. For example, the search engine adds the all referrer names identified in table 620 to a first column ofworksheet 680 and adds the referee names associated with the referrer names identified in table 620 to a second column ofworksheet 680. The search engine creates a third column in worksheet table 680 containing the current company names associated with the referrer names and adds a fourth column containing the current company names associated with the referee names. - The search engine may link any subsequent searches using all or part of
search term 152 to worksheet table 680.Worksheet 680 may be published to any defined group of users and may be assigned any selectable name. Any new data added to any of tables 620, 622, or 624 inFIG. 27 is automatically added toworksheet 680. For example, any new referrer names added to table 620 and 622 are automatically added toworksheet 680 as well as any associated referee names, current company names, or previous company names. - Worksheet table 680 may be assigned the same row level and object level security as other tables as described above. For example, a system administrator may restrict users to only viewing certain rows of
worksheet 680 where the user is either the referrer or referee. In another example, theworksheet 680 may include a column of social security numbers associated with the referrers (not shown). The system administrator may assign a security bitmap to worksheet 680 that prevents certain groups of users from viewing the data in the social security column. - The links joining columns from different tables together also may have associated security levels. For example, the search engine may assign a security bitmap to the links between table 622 and table 624 that prevents certain users from seeing the current company names and previous company names contained in table 624 or contained in
worksheet 680. - The search engine may generate indexes for the entries in
worksheet 680. To reduce the processing required to maintain additional indexes, links may be created betweenworksheet 680 and tables 620, 622, and 624 inFIG. 27 . For example, the same names may exist in the first column ofworksheet 680 and the name column of table 622. The first column ofworksheet 680 may be linked to the indexes previously created for the name column of table 622. -
Worksheet 680 may only contain a subset of entries from tables 620, 622, or 624. For example, only a subset of names from table 622 may be identified inworksheet 680. The indexes for tables 620, 622, or 624 may include bitmaps that identify the subset of entries associated withworksheet 680. These bitmaps may be similar to the priority bitmaps described above. - For example, the search engine may receive a search term. The search engine may map the search term to
worksheet 680. The worksheet may be linked to the indexes for table 622. The indexes for table 622 may include a worksheet bitmap that identifies the different branches/tokens of the indexes associated withworksheet 680. The search engine only parses down the branches of the indexes with bitmap values associated with entries inworksheet 680. - A large number of worksheets may create an unwieldy number of associated worksheet bitmaps. Each worksheet may be assigned a different bit in the worksheet bitmap. In one example, the search engine may limit the number of worksheets to restrict the amount of additional processing and data needed for handling the worksheet bitmaps.
-
FIG. 29 shows an example formulaic search query. The search engine allows the user to specify arithmetic operations between different table columns. For example, the user may enter the search term ORDER AMOUNT—CUSTOMER SUPPORT COST. The search engine may identify the order amounts in table 690 and the customer costs in table 692. The search engine calculates the differences between the transaction amounts in table 690 and the costs in table 694 and displays theresults 694 inelectronic page 148. - The search engine may reduce tables 690 and 692 into a customer profitability table that has the granularity for the smallest identified column. For example, order table 690 may contain thousands of entries but customer support table 692 may only contain a few dozen entries. The new table only may include the transactions in table 690 that also include associated cost amounts in table 692. The search engine then calculates the profitability form the values in the new table. The search engine also may allow the user to select a name for the new table, such as customer profitability.
- In another example, the user may enter the search term COUNT FLIGHTS (FOR ARRIVAL TIME-DEPARTURE TIME<3 HOURS). The search engine may use a different language for parsing the search terms with the parenthesis. For example, the search engine may subtract the departure time from the arrival time for every row in the flights table. The search engine then may compare the subtracted values to a fixed quantity of 3 hours and count the number of flights with a total flight time of less than 3 hours.
- The search engine also may perform arithmetic operations for user defined join paths. For example, the search engine allows the user to select columns in different tables to map together as described above. The search engine then performs the above described arithmetic operations between the user selected table columns.
-
FIG. 30 shows acomputing device 1000 that may be used for operating the information retrieval system and performing any combination of the information retrieval processes discussed above. Thecomputing device 1000 may operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. In other examples,computing device 1000 may be a personal computer (PC), a tablet, a Personal Digital Assistant (PDA), a cellular telephone, a smart phone, a web appliance, or any other machine or device capable of executing instructions 1006 (sequential or otherwise) that specify actions to be taken by that machine. - While only a
single computing device 1000 is shown, thecomputing device 1000 may include any collection of devices or circuitry that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the operations discussed above.Computing device 1000 may be part of an integrated control system or system manager, or may be provided as a portable electronic device configured to interface with a networked system either locally or remotely via wireless transmission. -
Processors 1004 may comprise a central processing unit (CPU), a graphics processing unit (GPU), programmable logic devices, dedicated processor systems, micro controllers, or microprocessors that may perform some or all of the operations described above.Processors 1004 may also include, but may not be limited to, an analog processor, a digital processor, a microprocessor, multi-core processor, processor array, network processor, etc. - Some of the operations described above may be implemented in software and other operations may be implemented in hardware. One or more of the operations, processes, or methods described herein may be performed by an apparatus, device, or system similar to those as described herein and with reference to the illustrated figures.
-
Processors 1004 may execute instructions or “code” 1006 stored in any one ofmemories Instructions 1006 and data can also be transmitted or received over anetwork 1014 via anetwork interface device 1012 utilizing any one of a number of well-known transfer protocols. -
Memories processing device 1000, for example RAM or FLASH memory disposed within an integrated circuit microprocessor or the like. In other examples, the memory may comprise an independent device, such as an external disk drive, storage array, or any other storage devices used in database systems. The memory and processing devices may be operatively coupled together, or in communication with each other, for example by an I/O port, network connection, etc. such that the processing device may read a file stored on the memory. - Some memory may be “read only” by design (ROM) by virtue of permission settings, or not. Other examples of memory may include, but may be not limited to, WORM, EPROM, EEPROM, FLASH, etc. which may be implemented in solid state semiconductor devices. Other memories may comprise moving parts, such a conventional rotating disk drive. All such memories may be “machine-readable” in that they may be readable by a processing device.
- “Computer-readable storage medium” (or alternatively, “machine-readable storage medium”) may include all of the foregoing types of memory, as well as new technologies that may arise in the future, as long as they may be capable of storing digital information in the nature of a computer program or other data, at least temporarily, in such a manner that the stored information may be “read” by an appropriate processing device. The term “computer-readable” may not be limited to the historical usage of “computer” to imply a complete mainframe, mini-computer, desktop, wireless device, or even a laptop computer. Rather, “computer-readable” may comprise storage medium that may be readable by a processor, processing device, or any computing system. Such media may be any available media that may be locally and/or remotely accessible by a computer or processor, and may include volatile and non-volatile media, and removable and non-removable media.
-
Computing device 1000 can further include avideo display 1016, such as a liquid crystal display (LCD) or a cathode ray tube (CRT)) and auser interface 1018, such as a keyboard, mouse, touch screen, etc. All of the components ofcomputing device 1000 may be connected together via abus 1002 and/or network. - For the sake of convenience, operations may be described as various interconnected or coupled functional blocks or diagrams. However, there may be cases where these functional blocks or diagrams may be equivalently aggregated into a single logic device, program or operation with unclear boundaries.
- Having described and illustrated the principles of a preferred embodiment, it should be apparent that the embodiments may be modified in arrangement and detail without departing from such principles. Claim is made to all modifications and variation coming within the spirit and scope of the following claims.
Claims (36)
1. A method, comprising:
identifying, by a computing device, data items in a relational database;
generating, by the computing device, tokens representing the data items in the relational database;
generating, by the computing device, indexes containing the tokens;
receiving, by the computing device, a search term from a user interface;
comparing, by the computing device, the search term with the tokens in the indexes; and
selecting, by the computing device, the tokens for displaying in the user interface based on the comparison of the tokens with the search term.
2. The method of claim 1 , further comprising:
identifying, by the processing device, a current state of the search term;
predicting, by the processing device, a next token for adding to the search term, based on the current state of the search term; and
displaying, by the processing device, the next token in the user interface.
3. The method of claim 1 , further comprising:
identifying, by the processing device, at least one current token that currently exists in the search term;
identifying, by the processing device, the current token as an attribute token, a measure token, or a value token;
predicting, by the processing device, a next token for adding to the search term based on the current token being the attribute token, the measure token, or value token; and
displaying, by the processing device, the next token in the user interface.
4. The method of claim 1 , further comprising:
identifying, by the computing device, token scores for the tokens;
ranking, by the computing device, the tokens based on the associated token scores; and
displaying, by the computing device, the tokens based on the ranking.
5. The method of claim 4 , further comprising generating, by the computing device, the token scores based on usage of single tokens or on usage of a combination of the tokens.
6. The method of claim 1 , further comprising:
identifying, by the computing device, token scores for the tokens in the indexes;
assigning, by the computing device, branch scores for branches in the indexes based on the token scores for the tokens in the branches;
parsing, by the computing device, down the branches in the indexes based on portions of the search term matching the tokens and the branch scores for the branches.
7. The method of claim 1 , further comprising:
generating, by the processing device, priority queues for the indexes, wherein the priority queues rank the tokens based on associated token scores; and
displaying, by the processing device, the tokens based on the rank of the tokens in the priority queues.
8. The method of claim 1 , further comprising:
identifying, by the processing device, token scores for the tokens;
identifying, by the processing device, types of the indexes containing the tokens;
weighting, by the processing device, the token scores based on the types of the indexes containing the tokens;
displaying, by the processing device, the tokens based on the weighting of the token scores.
9. The method of claim 8 , further comprising:
identifying, by the processing device, a first one of the indexes as a global index for an enterprise;
identifying, by the processing device, a second one of the indexes as a department index associated with a department within the enterprise;
weighting, by the processing device, the token scores for the tokens in the department index higher than the token scores for the tokens in the global index.
10. The method of claim 9 , further comprising:
identifying, by the processing device, a third one of the indexes as a user index associated with an enterprise user; and
weighting, by the processing device, the token scores for the tokens identified in the user index higher than the token scores for the tokens identified in the global index or the tokens identified in the department index.
11. The method of claim 1 , further comprising:
generating, by the processing device, indexes that contain prefixes and/or suffixes of the tokens;
identifying, by the computing device, the prefixes and/or suffixes that at least partially match the search term; and
displaying, by the computing device, the tokens associated with the identified prefixes or suffixes.
12. The method of claim 1 , further comprising:
identifying, by the computing device, tables in the relational database associated with the search term;
identifying, by the computing device, different join paths between the tables;
displaying, by the computing device, the different join paths to the user;
identifying, by the computing device, the join paths selected by the user; and
displaying, by the computing device, data from the tables based on the join paths selected by the user.
13. The method of claim 1 , further comprising:
identifying, by the computing device, new data items added to the relational database;
generating, by the computing device, new tokens representing the new data items in the relational database;
generating, by the computing device, incremental indexes containing the new tokens;
comparing, by the computing device, the search term with the tokens in the indexes and the new tokens in the incremental indexes; and
displaying, by the computing device, the tokens in the indexes and the new tokens in the incremental indexes that at least partially match the search term.
14. The method of claim 1 , further comprising:
distributing, by the computing device, different portions of the indexes in different data servers;
comparing, by the computing device, the search term with the tokens in the different portions of the indexes in the different data servers; and
displaying the tokens in the different portions of the indexes in the different data servers that at least partially match the search term.
15. An apparatus, comprising:
a processing system configured to:
identify content in a database system;
generate tokens based on the content in the database system;
identify security values associated with the tokens;
receive a search term;
identify at least some of the tokens as candidate tokens based on the search terms; and
display the candidate tokens based on the associated security values.
16. The apparatus of claim 11 , wherein the processing system is further configured to:
identify token security bitmasks associated with the tokens;
identify a user security bitmask for a user entering the search term;
display the candidate tokens based on a comparison of the associated token security bitmasks with the user security bitmask.
17. The apparatus of claim 16 , wherein the processing system is further configured to:
identify a geographic region bit in the token security bitmasks and the user security bitmask;
display the candidate tokens based a comparison of the geographic region bit in the associated token security bitmasks with the geographic region bit in the user security bitmask.
18. The apparatus of claim 16 , wherein the processing system is further configured to:
identify a department bit in the token security bitmasks and the user security bitmask; and
display the candidate tokens based on a comparison of the department bit in the associated token security bitmasks with the department bit in the user security bitmask.
19. The apparatus of claim 16 , wherein the processing system is further configured to:
identify an amount limit bit in the token security bitmasks and the user security bitmask associated with a limit on an amount or size;
display the candidate tokens based on a comparison of the amount limit bit in the token security bitmasks with the amount limit bit in the user security bitmask.
20. The apparatus of claim 15 , wherein the processing system is further configured to:
generate indexes for the tokens;
identify token security bitmasks for the tokens in the indexes;
assign branch security bitmasks to branches in the indexes based on the token security bitmasks for the tokens in the branches;
identify a user security bitmask for a user entering the search term;
search the branches of the indexes for the candidate tokens based on a comparison of the branch security bitmasks with the user security bitmask.
21. An apparatus, comprising:
indexes containing tokens associated with data in a relational database, wherein the tokens have associated token categories; and
a state machine configured to:
receive a search input;
identify the tokens associated with the search input; and
display some of the identified tokens based on the associated token categories.
22. The apparatus of claim 21 , wherein the token categories comprise an attribute category, a measure category, and a value category.
23. The apparatus of claim 21 , wherein the state machine is further configured to generate error or help messages based on the token categories for the tokens matching a current portion of the search input and the token categories for tokens matching a newly added portion of the search input.
24. The apparatus of claim 21 , wherein the state machine is further configured to predict an additional search input based on the token categories associated with a selected one of the displayed tokens.
25. The apparatus of claim 24 , wherein the state machine is further configured to display the tokens associated with the additional search input based on rankings of the tokens in the indexes.
26. The apparatus of claim 21 , wherein the state machine is further configured to:
identify a first segment of the search term associated with first group of the data in the relational database;
generate a first query instruction accessing and displaying the first group of the data;
identify a second segment of the search term associated with a second sub-group of data within the first group of data in the relational database;
generate a second query instruction for displaying the second sub-group of data.
27. A method, comprising:
receiving, by a computing device, search terms;
identifying, by the computing device, tables associated with the search terms;
identifying, by the computing device, join paths between the tables;
displaying, by the computing device, the join paths;
receiving, by the computing device, inputs selecting the join paths; and
displaying, by the computing device, data from the tables associated with the selected join paths.
28. The method of claim 27 , further comprising:
displaying, by the computing device, names of columns in the tables associated with the join paths; and
identifying, by the computing device, the names of the columns selected by the inputs; and
displaying, by the computing device, data from the columns with the selected names.
29. The method of claim 28 , further comprising:
ranking, by the computing device, the columns; and
displaying, by the computing device, the names of the columns entries based on the ranking.
30. The method of claim 29 , further comprising:
identifying, by the computing device, the join paths associated with the columns as direct links or indirect links to a root table associated with the search terms; and
ranking, by the computing device, the columns based on the associated join paths being identified as the direct links or the indirect links.
31. The method of claim 27 , further comprising:
identifying, by the computing device, additional tables having no join paths with the previously identified tables;
displaying, by the computing device, additional join paths between the additional tables and the previously identified tables;
linking, by the computing device, the additional tables to the previously identified tables in response to user selections of the additional join paths.
32. The method of claim 27 , further comprising generating, by the processing device, a worksheet table containing the data from the tables associated with the selected join paths.
33. The method of claim 27 , further comprising linking, by the processing device, the worksheet table to indexes associated with tables.
34. The method of claim 33 , further comprising assigning, by the processing device, a worksheet bitmap to the indexes identifying branches of the indexes associated with the worksheet table.
35. The method of claim 33 , further comprising assigning, by the computing device, row and object level security access to the worksheet table.
36. The method of claim 27 , further comprising:
identifying, by the processing device, arithmetic operations associated with the search terms;
identifying, by the processing device, columns in different tables associated with the arithmetic operations; and
performing, by the processing device, the arithmetic operations on the data contained in the columns of the different tables.
Priority Applications (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/252,553 US20150026153A1 (en) | 2013-07-17 | 2014-04-14 | Search engine for information retrieval system |
EP14733015.3A EP3022665B1 (en) | 2013-07-17 | 2014-05-22 | Search engine for information retrieval system |
PCT/US2014/039230 WO2015009353A1 (en) | 2013-07-17 | 2014-05-22 | Search engine for information retrieval system |
CN201480050848.3A CN105531706A (en) | 2013-07-17 | 2014-05-22 | Search engine for information retrieval system |
EP20176612.8A EP3719672A1 (en) | 2013-07-17 | 2014-05-22 | Search engine for information retrieval system |
HK16108989.8A HK1221033A1 (en) | 2013-07-17 | 2016-07-27 | Search engine for information retrieval system |
US16/173,320 US11017035B2 (en) | 2013-07-17 | 2018-10-29 | Token based dynamic data indexing with integrated security |
US17/327,949 US11599587B2 (en) | 2013-07-17 | 2021-05-24 | Token based dynamic data indexing with integrated security |
US18/169,268 US11899638B2 (en) | 2013-07-17 | 2023-02-15 | Token based dynamic data indexing with integrated security |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/944,491 US9405794B2 (en) | 2013-07-17 | 2013-07-17 | Information retrieval system |
US14/252,553 US20150026153A1 (en) | 2013-07-17 | 2014-04-14 | Search engine for information retrieval system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/944,491 Continuation-In-Part US9405794B2 (en) | 2013-07-17 | 2013-07-17 | Information retrieval system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/173,320 Continuation US11017035B2 (en) | 2013-07-17 | 2018-10-29 | Token based dynamic data indexing with integrated security |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150026153A1 true US20150026153A1 (en) | 2015-01-22 |
Family
ID=51014634
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/252,553 Abandoned US20150026153A1 (en) | 2013-07-17 | 2014-04-14 | Search engine for information retrieval system |
US16/173,320 Active 2034-08-15 US11017035B2 (en) | 2013-07-17 | 2018-10-29 | Token based dynamic data indexing with integrated security |
US17/327,949 Active 2033-10-25 US11599587B2 (en) | 2013-07-17 | 2021-05-24 | Token based dynamic data indexing with integrated security |
US18/169,268 Active US11899638B2 (en) | 2013-07-17 | 2023-02-15 | Token based dynamic data indexing with integrated security |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/173,320 Active 2034-08-15 US11017035B2 (en) | 2013-07-17 | 2018-10-29 | Token based dynamic data indexing with integrated security |
US17/327,949 Active 2033-10-25 US11599587B2 (en) | 2013-07-17 | 2021-05-24 | Token based dynamic data indexing with integrated security |
US18/169,268 Active US11899638B2 (en) | 2013-07-17 | 2023-02-15 | Token based dynamic data indexing with integrated security |
Country Status (5)
Country | Link |
---|---|
US (4) | US20150026153A1 (en) |
EP (2) | EP3022665B1 (en) |
CN (1) | CN105531706A (en) |
HK (1) | HK1221033A1 (en) |
WO (1) | WO2015009353A1 (en) |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150019197A1 (en) * | 2013-07-09 | 2015-01-15 | Oracle International Corporation | Database modeling and analysis |
US20150310021A1 (en) * | 2014-04-28 | 2015-10-29 | International Business Machines Corporation | Big data analytics brokerage |
US20160147879A1 (en) * | 2014-11-24 | 2016-05-26 | Qiurong Huang | Fuzzy Search and Highlighting of Existing Data Visualization |
US20160217179A1 (en) * | 2015-01-27 | 2016-07-28 | International Business Machines Corporation | Search-based detection, link, and acquisition of data |
CN105843793A (en) * | 2015-01-30 | 2016-08-10 | 国际商业机器公司 | Detection and creation of appropriate row concept during automated model generation |
US20160283564A1 (en) * | 2015-03-26 | 2016-09-29 | Dejavuto Corp. | Predictive visual search enginge |
US20160350775A1 (en) * | 2014-01-27 | 2016-12-01 | Umbel Corporation | Systems and methods of generating and using a bitmap index |
US20160358352A1 (en) * | 2015-06-02 | 2016-12-08 | Kabushiki Kaisha Toshiba | Information generation system, method, and computer program product |
US20170060992A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | System and a method for associating contextual structured data with unstructured documents on map-reduce |
WO2017042744A1 (en) * | 2015-09-09 | 2017-03-16 | Quixey, Inc. | System for tokenizing text in languages without inter-word separation |
US20170124101A1 (en) * | 2015-10-29 | 2017-05-04 | Google Inc. | Indexing native application data |
US20170192991A1 (en) * | 2015-12-30 | 2017-07-06 | Google Inc. | Auto-completion of widely shared search queries |
EP3142028A3 (en) * | 2015-09-11 | 2017-07-12 | Google, Inc. | Handling failures in processing natural language queries through user interactions |
CN107357830A (en) * | 2017-06-19 | 2017-11-17 | 北京百度网讯科技有限公司 | Retrieval statement semantics fragment acquisition methods, device and terminal based on artificial intelligence |
WO2018022102A1 (en) * | 2016-07-28 | 2018-02-01 | Umbel Corporation | Systems and methods of managing data rights and selective data sharing |
US10002149B2 (en) | 2014-02-22 | 2018-06-19 | SourceThought, Inc. | Relevance ranking for data and transformations |
US20180253461A1 (en) * | 2014-08-21 | 2018-09-06 | Dropbox, Inc. | Multi-user search system with methodology for personalized search query autocomplete |
US10078696B1 (en) * | 2006-12-26 | 2018-09-18 | Paul Jerome Sadler | Relevant social searching and user centric data analysis via user and peer group parameters via a dynamic interface |
US20180285775A1 (en) * | 2017-04-03 | 2018-10-04 | Salesforce.Com, Inc. | Systems and methods for machine learning classifiers for support-based group |
US10157175B2 (en) | 2013-03-15 | 2018-12-18 | International Business Machines Corporation | Business intelligence data models with concept identification using language-specific clues |
US10282444B2 (en) * | 2015-09-11 | 2019-05-07 | Google Llc | Disambiguating join paths for natural language queries |
US10417247B2 (en) | 2014-09-25 | 2019-09-17 | Oracle International Corporation | Techniques for semantic searching |
US10430428B2 (en) * | 2016-03-31 | 2019-10-01 | Experian Health, Inc. | Smart mapping |
US10467294B2 (en) | 2016-04-29 | 2019-11-05 | Pilosa Corp. | Systems and methods of using a bitmap index to determine bicliques |
US10516980B2 (en) | 2015-10-24 | 2019-12-24 | Oracle International Corporation | Automatic redisplay of a user interface including a visualization |
US20200089700A1 (en) * | 2018-09-18 | 2020-03-19 | Tableau Software, Inc. | Natural Language Interface for Building Data Visualizations, Including Cascading Edits to Filter Expressions |
WO2020060720A1 (en) * | 2018-09-18 | 2020-03-26 | Tableau Software, Inc. | Analyzing natural language expressions in a data visualization user interface |
US10664488B2 (en) | 2014-09-25 | 2020-05-26 | Oracle International Corporation | Semantic searches in a business intelligence system |
US10698924B2 (en) | 2014-05-22 | 2020-06-30 | International Business Machines Corporation | Generating partitioned hierarchical groups based on data sets for business intelligence data models |
US10747824B2 (en) | 2016-12-06 | 2020-08-18 | International Business Machines Corporation | Building a data query engine that leverages expert data preparation operations |
US10762091B2 (en) * | 2014-09-08 | 2020-09-01 | Salesforce.Com, Inc. | Interactive feedback for changes in search relevancy parameters |
EP3771990A1 (en) * | 2019-07-29 | 2021-02-03 | ThoughtSpot, Inc. | Object indexing |
US10917587B2 (en) | 2017-06-02 | 2021-02-09 | Oracle International Corporation | Importing and presenting data |
US10956237B2 (en) | 2017-06-02 | 2021-03-23 | Oracle International Corporation | Inter-application sharing of business intelligence data |
US10977324B2 (en) | 2015-01-30 | 2021-04-13 | Dropbox, Inc. | Personal content item searching system and method |
US10984030B2 (en) | 2017-03-20 | 2021-04-20 | International Business Machines Corporation | Creating cognitive intelligence queries from multiple data corpuses |
US10997184B2 (en) | 2015-05-22 | 2021-05-04 | Coveo Solutions, Inc. | System and method for ranking search results |
US11023486B2 (en) | 2018-11-13 | 2021-06-01 | Thoughtspot, Inc. | Low-latency predictive database analysis |
CN112905591A (en) * | 2021-02-04 | 2021-06-04 | 成都信息工程大学 | Data table connection sequence selection method based on machine learning |
US11036696B2 (en) | 2016-06-07 | 2021-06-15 | Oracle International Corporation | Resource allocation for database provisioning |
US11048871B2 (en) * | 2018-09-18 | 2021-06-29 | Tableau Software, Inc. | Analyzing natural language expressions in a data visualization user interface |
US11120089B2 (en) | 2015-01-30 | 2021-09-14 | Dropbox, Inc. | Personal content item searching system and method |
US11194829B2 (en) | 2017-03-24 | 2021-12-07 | Experian Health, Inc. | Methods and system for entity matching |
US11256671B2 (en) | 2019-09-13 | 2022-02-22 | Oracle International Corporation | Integrated transition control center |
US11281672B2 (en) * | 2019-03-19 | 2022-03-22 | Sigma Computing, Inc. | Join key propagation |
US11294906B2 (en) * | 2019-06-05 | 2022-04-05 | Sap Se | Database record searching with multi-tier queries |
US11301631B1 (en) | 2020-10-05 | 2022-04-12 | Tableau Software, LLC | Visually correlating individual terms in natural language input to respective structured phrases representing the natural language input |
US11308527B1 (en) * | 2012-05-02 | 2022-04-19 | James E. Plankey | System and method for creating and managing multimedia sales promotions |
US11429579B2 (en) * | 2019-10-28 | 2022-08-30 | International Business Machines Corporation | Building a word embedding model to capture relational data semantics |
US11455339B1 (en) | 2019-09-06 | 2022-09-27 | Tableau Software, LLC | Incremental updates to natural language expressions in a data visualization user interface |
US20220318243A1 (en) * | 2021-04-06 | 2022-10-06 | Thoughtspot, Inc. | Distributed Pseudo-Random Subset Generation |
US20220414171A1 (en) * | 2021-06-28 | 2022-12-29 | Flipkart Internet Private Limited | System and method for generating a user query based on a target context aware token |
WO2023278037A1 (en) * | 2021-06-30 | 2023-01-05 | Microsoft Technology Licensing, Llc | Multiple semantic hypotheses for search query intent understanding |
US11604790B2 (en) | 2020-08-31 | 2023-03-14 | Unscrambl Inc | Conversational interface for generating and executing controlled natural language queries on a relational database |
US11614857B2 (en) | 2017-06-02 | 2023-03-28 | Oracle International Corporation | Importing, interpreting, and presenting data |
US11620397B2 (en) * | 2016-05-27 | 2023-04-04 | Intel Corporation | Methods and apparatus to provide group-based row-level security for big data platforms |
US11698933B1 (en) | 2020-09-18 | 2023-07-11 | Tableau Software, LLC | Using dynamic entity search during entry of natural language commands for visual data analysis |
US11734286B2 (en) * | 2017-10-10 | 2023-08-22 | Thoughtspot, Inc. | Automatic database insight analysis |
US11755328B2 (en) | 2019-01-08 | 2023-09-12 | Apple Inc. | Coprocessor operation bundling |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE48527E1 (en) | 2007-01-05 | 2021-04-20 | Agjunction Llc | Optical tracking vehicle control system and method |
US20150026153A1 (en) | 2013-07-17 | 2015-01-22 | Thoughtspot, Inc. | Search engine for information retrieval system |
CN106599083A (en) * | 2016-11-24 | 2017-04-26 | 海南州云藏藏文信息技术有限公司 | Cloud Tibetan search engine service system |
US11157564B2 (en) | 2018-03-02 | 2021-10-26 | Thoughtspot, Inc. | Natural language question answering systems |
EP3550444B1 (en) | 2018-04-02 | 2023-12-27 | Thoughtspot Inc. | Query generation based on a logical data model |
CN109446219B (en) * | 2018-10-10 | 2021-06-29 | 新华三大数据技术有限公司 | Authority management method and device |
CN111078680B (en) * | 2018-10-18 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | Table information processing method, apparatus, electronic device and readable storage medium |
US11544239B2 (en) | 2018-11-13 | 2023-01-03 | Thoughtspot, Inc. | Low-latency database analysis using external data sources |
US11580147B2 (en) | 2018-11-13 | 2023-02-14 | Thoughtspot, Inc. | Conversational database analysis |
US11416477B2 (en) | 2018-11-14 | 2022-08-16 | Thoughtspot, Inc. | Systems and methods for database analysis |
CN109669959B (en) * | 2018-11-27 | 2020-08-18 | 武汉达梦数据库有限公司 | One-key query method and device for structured database |
US11334548B2 (en) | 2019-01-31 | 2022-05-17 | Thoughtspot, Inc. | Index sharding |
US11928114B2 (en) | 2019-04-23 | 2024-03-12 | Thoughtspot, Inc. | Query generation based on a logical data model with one-to-one joins |
US11442932B2 (en) | 2019-07-16 | 2022-09-13 | Thoughtspot, Inc. | Mapping natural language to queries using a query grammar |
US10970319B2 (en) | 2019-07-29 | 2021-04-06 | Thoughtspot, Inc. | Phrase indexing |
US11586620B2 (en) | 2019-07-29 | 2023-02-21 | Thoughtspot, Inc. | Object scriptability |
US11200227B1 (en) | 2019-07-31 | 2021-12-14 | Thoughtspot, Inc. | Lossless switching between search grammars |
US11409744B2 (en) | 2019-08-01 | 2022-08-09 | Thoughtspot, Inc. | Query generation based on merger of subqueries |
CN110912794B (en) * | 2019-11-15 | 2021-07-16 | 国网安徽省电力有限公司安庆供电公司 | Approximate matching strategy based on token set |
US11544272B2 (en) | 2020-04-09 | 2023-01-03 | Thoughtspot, Inc. | Phrase translation for a low-latency database analysis system |
US11379495B2 (en) | 2020-05-20 | 2022-07-05 | Thoughtspot, Inc. | Search guidance |
CN112328915B (en) * | 2020-11-25 | 2023-02-28 | 山东师范大学 | Multi-source interest point fusion method and system based on spatial entity matching performance evaluation |
CN116361346B (en) * | 2023-06-02 | 2023-08-08 | 山东浪潮科学研究院有限公司 | Data table analysis method, device and equipment based on mask calculation and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050027717A1 (en) * | 2003-04-21 | 2005-02-03 | Nikolaos Koudas | Text joins for data cleansing and integration in a relational database management system |
US20070219974A1 (en) * | 2006-03-17 | 2007-09-20 | Microsoft Corporation | Using generic predictive models for slot values in language modeling |
US20090019019A1 (en) * | 2005-07-12 | 2009-01-15 | The Diallog Works Ltd. | Method and system for obtaining information |
US20090019022A1 (en) * | 2007-07-15 | 2009-01-15 | Dawning Technologies, Inc. | Rules-based data mining |
US20130339370A1 (en) * | 2012-06-19 | 2013-12-19 | Mary Holstege | Apparatus and Method for Executing Different Query Language Queries on Tree Structured Data Using Pre-Computed Indices of Selective Document Paths |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1265871A (en) * | 1986-11-18 | 1990-02-13 | Yawar Bakht Ali | Domain-independent natural language database interface |
US6523028B1 (en) * | 1998-12-03 | 2003-02-18 | Lockhead Martin Corporation | Method and system for universal querying of distributed databases |
US7225187B2 (en) | 2003-06-26 | 2007-05-29 | Microsoft Corporation | Systems and methods for performing background queries from content and activity |
US7428529B2 (en) * | 2004-04-15 | 2008-09-23 | Microsoft Corporation | Term suggestion for multi-sense query |
US7720674B2 (en) * | 2004-06-29 | 2010-05-18 | Sap Ag | Systems and methods for processing natural language queries |
US7689575B2 (en) * | 2004-12-29 | 2010-03-30 | Yahoo! Inc. | Bitmask access for managing blog content |
US8713025B2 (en) | 2005-03-31 | 2014-04-29 | Square Halt Solutions, Limited Liability Company | Complete context search system |
US8065204B2 (en) * | 2005-09-29 | 2011-11-22 | Sony Corporation | System and method for software integration and factory deployment |
US8386469B2 (en) | 2006-02-16 | 2013-02-26 | Mobile Content Networks, Inc. | Method and system for determining relevant sources, querying and merging results from multiple content sources |
US20070244990A1 (en) * | 2006-04-03 | 2007-10-18 | National Instruments Corporation | Web browser graph user interface element with rich interactive capabilities |
US20070234195A1 (en) * | 2006-04-03 | 2007-10-04 | National Instruments Corporation | Simultaneous update of a plurality of user interface elements displayed in a web browser |
US9519715B2 (en) | 2006-11-02 | 2016-12-13 | Excalibur Ip, Llc | Personalized search |
US7962532B1 (en) * | 2008-04-30 | 2011-06-14 | Netapp, Inc. | Management of directory quotas for file systems |
US20100082636A1 (en) * | 2008-09-25 | 2010-04-01 | Nec Laboratories America, Inc. | Methods and Apparatus for Content-Defined Node Splitting |
EP2207106A3 (en) | 2008-12-19 | 2011-03-02 | Aprimo, Incorporated | Complex relational database extraction system and method with respective based dynamic data modeling |
US9229982B2 (en) | 2008-12-23 | 2016-01-05 | SAP France S.A. | Processing queries using oriented query paths |
US9361346B2 (en) * | 2009-04-14 | 2016-06-07 | Oracle America, Inc. | Mapping information stored in a LDAP tree structure to a relational database structure |
EP2499562A4 (en) | 2009-11-09 | 2016-06-01 | Arcsight Inc | Enabling faster full-text searching using a structured data store |
US8577913B1 (en) | 2011-05-27 | 2013-11-05 | Google Inc. | Generating midstring query refinements |
US20140201241A1 (en) | 2013-01-15 | 2014-07-17 | EasyAsk | Apparatus for Accepting a Verbal Query to be Executed Against Structured Data |
US20170270159A1 (en) | 2013-03-14 | 2017-09-21 | Google Inc. | Determining query results in response to natural language queries |
US9223898B2 (en) | 2013-05-08 | 2015-12-29 | Facebook, Inc. | Filtering suggested structured queries on online social networks |
US20150026153A1 (en) | 2013-07-17 | 2015-01-22 | Thoughtspot, Inc. | Search engine for information retrieval system |
US9405794B2 (en) | 2013-07-17 | 2016-08-02 | Thoughtspot, Inc. | Information retrieval system |
EP3143526A4 (en) | 2014-05-12 | 2017-10-04 | Diffeo, Inc. | Entity-centric knowledge discovery |
WO2019075070A1 (en) * | 2017-10-10 | 2019-04-18 | Thoughtspot, Inc. | Automatic database analysis |
US11157564B2 (en) * | 2018-03-02 | 2021-10-26 | Thoughtspot, Inc. | Natural language question answering systems |
US11586620B2 (en) * | 2019-07-29 | 2023-02-21 | Thoughtspot, Inc. | Object scriptability |
US10970319B2 (en) * | 2019-07-29 | 2021-04-06 | Thoughtspot, Inc. | Phrase indexing |
US11200227B1 (en) * | 2019-07-31 | 2021-12-14 | Thoughtspot, Inc. | Lossless switching between search grammars |
-
2014
- 2014-04-14 US US14/252,553 patent/US20150026153A1/en not_active Abandoned
- 2014-05-22 WO PCT/US2014/039230 patent/WO2015009353A1/en active Application Filing
- 2014-05-22 EP EP14733015.3A patent/EP3022665B1/en active Active
- 2014-05-22 EP EP20176612.8A patent/EP3719672A1/en not_active Withdrawn
- 2014-05-22 CN CN201480050848.3A patent/CN105531706A/en active Pending
-
2016
- 2016-07-27 HK HK16108989.8A patent/HK1221033A1/en unknown
-
2018
- 2018-10-29 US US16/173,320 patent/US11017035B2/en active Active
-
2021
- 2021-05-24 US US17/327,949 patent/US11599587B2/en active Active
-
2023
- 2023-02-15 US US18/169,268 patent/US11899638B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050027717A1 (en) * | 2003-04-21 | 2005-02-03 | Nikolaos Koudas | Text joins for data cleansing and integration in a relational database management system |
US20090019019A1 (en) * | 2005-07-12 | 2009-01-15 | The Diallog Works Ltd. | Method and system for obtaining information |
US20070219974A1 (en) * | 2006-03-17 | 2007-09-20 | Microsoft Corporation | Using generic predictive models for slot values in language modeling |
US20090019022A1 (en) * | 2007-07-15 | 2009-01-15 | Dawning Technologies, Inc. | Rules-based data mining |
US20130339370A1 (en) * | 2012-06-19 | 2013-12-19 | Mary Holstege | Apparatus and Method for Executing Different Query Language Queries on Tree Structured Data Using Pre-Computed Indices of Selective Document Paths |
Cited By (98)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10078696B1 (en) * | 2006-12-26 | 2018-09-18 | Paul Jerome Sadler | Relevant social searching and user centric data analysis via user and peer group parameters via a dynamic interface |
US11308527B1 (en) * | 2012-05-02 | 2022-04-19 | James E. Plankey | System and method for creating and managing multimedia sales promotions |
US10157175B2 (en) | 2013-03-15 | 2018-12-18 | International Business Machines Corporation | Business intelligence data models with concept identification using language-specific clues |
US20150019197A1 (en) * | 2013-07-09 | 2015-01-15 | Oracle International Corporation | Database modeling and analysis |
US11157664B2 (en) * | 2013-07-09 | 2021-10-26 | Oracle International Corporation | Database modeling and analysis |
US9626687B2 (en) * | 2014-01-27 | 2017-04-18 | Umbel Corporation | Systems and methods of generating and using a bitmap index |
US10318510B2 (en) | 2014-01-27 | 2019-06-11 | Pilosa Corp. | Systems and methods of generating and using a bitmap index |
US20160350775A1 (en) * | 2014-01-27 | 2016-12-01 | Umbel Corporation | Systems and methods of generating and using a bitmap index |
US10002149B2 (en) | 2014-02-22 | 2018-06-19 | SourceThought, Inc. | Relevance ranking for data and transformations |
US20150310021A1 (en) * | 2014-04-28 | 2015-10-29 | International Business Machines Corporation | Big data analytics brokerage |
US10430401B2 (en) * | 2014-04-28 | 2019-10-01 | International Business Machines Corporation | Big data analytics brokerage |
US10698924B2 (en) | 2014-05-22 | 2020-06-30 | International Business Machines Corporation | Generating partitioned hierarchical groups based on data sets for business intelligence data models |
US10853348B2 (en) * | 2014-08-21 | 2020-12-01 | Dropbox, Inc. | Multi-user search system with methodology for personalized search query autocomplete |
US20180253461A1 (en) * | 2014-08-21 | 2018-09-06 | Dropbox, Inc. | Multi-user search system with methodology for personalized search query autocomplete |
US10762091B2 (en) * | 2014-09-08 | 2020-09-01 | Salesforce.Com, Inc. | Interactive feedback for changes in search relevancy parameters |
US11334583B2 (en) | 2014-09-25 | 2022-05-17 | Oracle International Corporation | Techniques for semantic searching |
US10417247B2 (en) | 2014-09-25 | 2019-09-17 | Oracle International Corporation | Techniques for semantic searching |
US10664488B2 (en) | 2014-09-25 | 2020-05-26 | Oracle International Corporation | Semantic searches in a business intelligence system |
US9798449B2 (en) * | 2014-11-24 | 2017-10-24 | Business Objects Software Ltd. | Fuzzy search and highlighting of existing data visualization |
US20160147879A1 (en) * | 2014-11-24 | 2016-05-26 | Qiurong Huang | Fuzzy Search and Highlighting of Existing Data Visualization |
US10318527B2 (en) * | 2015-01-27 | 2019-06-11 | International Business Machines Corporation | Search-based detection, link, and acquisition of data |
US10242046B2 (en) * | 2015-01-27 | 2019-03-26 | International Business Machines Corporation | Search-based detection, link, and acquisition of data |
CN105824872A (en) * | 2015-01-27 | 2016-08-03 | 国际商业机器公司 | Search-based detection, link, and acquisition of data |
US20160217180A1 (en) * | 2015-01-27 | 2016-07-28 | International Business Machines Corporation | Search-based detection, link, and acquisition of data |
US20160217179A1 (en) * | 2015-01-27 | 2016-07-28 | International Business Machines Corporation | Search-based detection, link, and acquisition of data |
US10002179B2 (en) | 2015-01-30 | 2018-06-19 | International Business Machines Corporation | Detection and creation of appropriate row concept during automated model generation |
US10891314B2 (en) | 2015-01-30 | 2021-01-12 | International Business Machines Corporation | Detection and creation of appropriate row concept during automated model generation |
CN105843793A (en) * | 2015-01-30 | 2016-08-10 | 国际商业机器公司 | Detection and creation of appropriate row concept during automated model generation |
US10977324B2 (en) | 2015-01-30 | 2021-04-13 | Dropbox, Inc. | Personal content item searching system and method |
US10019507B2 (en) | 2015-01-30 | 2018-07-10 | International Business Machines Corporation | Detection and creation of appropriate row concept during automated model generation |
US11120089B2 (en) | 2015-01-30 | 2021-09-14 | Dropbox, Inc. | Personal content item searching system and method |
US20160283564A1 (en) * | 2015-03-26 | 2016-09-29 | Dejavuto Corp. | Predictive visual search enginge |
US10997184B2 (en) | 2015-05-22 | 2021-05-04 | Coveo Solutions, Inc. | System and method for ranking search results |
US20160358352A1 (en) * | 2015-06-02 | 2016-12-08 | Kabushiki Kaisha Toshiba | Information generation system, method, and computer program product |
US10861201B2 (en) * | 2015-06-02 | 2020-12-08 | Kabushiki Kaisha Toshiba | Information generation system, method, and computer program product |
US20170060915A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | System and a method for associating contextual structured data with unstructured documents on map-reduce |
US20170060992A1 (en) * | 2015-08-27 | 2017-03-02 | International Business Machines Corporation | System and a method for associating contextual structured data with unstructured documents on map-reduce |
US10915537B2 (en) * | 2015-08-27 | 2021-02-09 | International Business Machines Corporation | System and a method for associating contextual structured data with unstructured documents on map-reduce |
US10885042B2 (en) * | 2015-08-27 | 2021-01-05 | International Business Machines Corporation | Associating contextual structured data with unstructured documents on map-reduce |
US10002128B2 (en) | 2015-09-09 | 2018-06-19 | Samsung Electronics Co., Ltd. | System for tokenizing text in languages without inter-word separation |
WO2017042744A1 (en) * | 2015-09-09 | 2017-03-16 | Quixey, Inc. | System for tokenizing text in languages without inter-word separation |
US10997167B2 (en) | 2015-09-11 | 2021-05-04 | Google Llc | Disambiguating join paths for natural language queries |
EP3142028A3 (en) * | 2015-09-11 | 2017-07-12 | Google, Inc. | Handling failures in processing natural language queries through user interactions |
US10282444B2 (en) * | 2015-09-11 | 2019-05-07 | Google Llc | Disambiguating join paths for natural language queries |
US10516980B2 (en) | 2015-10-24 | 2019-12-24 | Oracle International Corporation | Automatic redisplay of a user interface including a visualization |
US11956701B2 (en) | 2015-10-24 | 2024-04-09 | Oracle International Corporation | Content display and interaction according to estimates of content usefulness |
US11748429B2 (en) * | 2015-10-29 | 2023-09-05 | Google Llc | Indexing native application data |
US20170124101A1 (en) * | 2015-10-29 | 2017-05-04 | Google Inc. | Indexing native application data |
US10120949B2 (en) * | 2015-10-29 | 2018-11-06 | Google Llc | Indexing native application data |
US20230044789A1 (en) * | 2015-10-29 | 2023-02-09 | Google Llc | Indexing Native Application Data |
US20190026294A1 (en) * | 2015-10-29 | 2019-01-24 | Google Llc | Indexing native application data |
US10915592B2 (en) * | 2015-10-29 | 2021-02-09 | Google Llc | Indexing native application data |
US11500953B2 (en) * | 2015-10-29 | 2022-11-15 | Google Llc | Indexing native application data |
US20170192991A1 (en) * | 2015-12-30 | 2017-07-06 | Google Inc. | Auto-completion of widely shared search queries |
US10474719B2 (en) * | 2015-12-30 | 2019-11-12 | Google Llc | Auto-completion of widely shared search queries |
CN108027817A (en) * | 2015-12-30 | 2018-05-11 | 谷歌有限责任公司 | Inquiry is suggested in group association based on user |
US10430428B2 (en) * | 2016-03-31 | 2019-10-01 | Experian Health, Inc. | Smart mapping |
US10467294B2 (en) | 2016-04-29 | 2019-11-05 | Pilosa Corp. | Systems and methods of using a bitmap index to determine bicliques |
US11620397B2 (en) * | 2016-05-27 | 2023-04-04 | Intel Corporation | Methods and apparatus to provide group-based row-level security for big data platforms |
US11036696B2 (en) | 2016-06-07 | 2021-06-15 | Oracle International Corporation | Resource allocation for database provisioning |
WO2018022102A1 (en) * | 2016-07-28 | 2018-02-01 | Umbel Corporation | Systems and methods of managing data rights and selective data sharing |
US10747824B2 (en) | 2016-12-06 | 2020-08-18 | International Business Machines Corporation | Building a data query engine that leverages expert data preparation operations |
US10984030B2 (en) | 2017-03-20 | 2021-04-20 | International Business Machines Corporation | Creating cognitive intelligence queries from multiple data corpuses |
US11194829B2 (en) | 2017-03-24 | 2021-12-07 | Experian Health, Inc. | Methods and system for entity matching |
US20180285775A1 (en) * | 2017-04-03 | 2018-10-04 | Salesforce.Com, Inc. | Systems and methods for machine learning classifiers for support-based group |
US11614857B2 (en) | 2017-06-02 | 2023-03-28 | Oracle International Corporation | Importing, interpreting, and presenting data |
US10917587B2 (en) | 2017-06-02 | 2021-02-09 | Oracle International Corporation | Importing and presenting data |
US10956237B2 (en) | 2017-06-02 | 2021-03-23 | Oracle International Corporation | Inter-application sharing of business intelligence data |
CN107357830A (en) * | 2017-06-19 | 2017-11-17 | 北京百度网讯科技有限公司 | Retrieval statement semantics fragment acquisition methods, device and terminal based on artificial intelligence |
US11734286B2 (en) * | 2017-10-10 | 2023-08-22 | Thoughtspot, Inc. | Automatic database insight analysis |
US10902045B2 (en) * | 2018-09-18 | 2021-01-26 | Tableau Software, Inc. | Natural language interface for building data visualizations, including cascading edits to filter expressions |
US20200089700A1 (en) * | 2018-09-18 | 2020-03-19 | Tableau Software, Inc. | Natural Language Interface for Building Data Visualizations, Including Cascading Edits to Filter Expressions |
WO2020060720A1 (en) * | 2018-09-18 | 2020-03-26 | Tableau Software, Inc. | Analyzing natural language expressions in a data visualization user interface |
US11048871B2 (en) * | 2018-09-18 | 2021-06-29 | Tableau Software, Inc. | Analyzing natural language expressions in a data visualization user interface |
US11023486B2 (en) | 2018-11-13 | 2021-06-01 | Thoughtspot, Inc. | Low-latency predictive database analysis |
US11620306B2 (en) | 2018-11-13 | 2023-04-04 | Thoughtspot, Inc. | Low-latency predictive database analysis |
US11755328B2 (en) | 2019-01-08 | 2023-09-12 | Apple Inc. | Coprocessor operation bundling |
US11281672B2 (en) * | 2019-03-19 | 2022-03-22 | Sigma Computing, Inc. | Join key propagation |
US11294906B2 (en) * | 2019-06-05 | 2022-04-05 | Sap Se | Database record searching with multi-tier queries |
EP3771990A1 (en) * | 2019-07-29 | 2021-02-03 | ThoughtSpot, Inc. | Object indexing |
US11354326B2 (en) | 2019-07-29 | 2022-06-07 | Thoughtspot, Inc. | Object indexing |
US11797614B2 (en) | 2019-09-06 | 2023-10-24 | Tableau Software, LLC | Incremental updates to natural language expressions in a data visualization user interface |
US11550853B2 (en) | 2019-09-06 | 2023-01-10 | Tableau Software, Inc. | Using natural language expressions to define data visualization calculations that span across multiple rows of data from a database |
US11455339B1 (en) | 2019-09-06 | 2022-09-27 | Tableau Software, LLC | Incremental updates to natural language expressions in a data visualization user interface |
US11822526B2 (en) | 2019-09-13 | 2023-11-21 | Oracle International Corporation | Integrated transition control center |
US11256671B2 (en) | 2019-09-13 | 2022-02-22 | Oracle International Corporation | Integrated transition control center |
US11429579B2 (en) * | 2019-10-28 | 2022-08-30 | International Business Machines Corporation | Building a word embedding model to capture relational data semantics |
US11604790B2 (en) | 2020-08-31 | 2023-03-14 | Unscrambl Inc | Conversational interface for generating and executing controlled natural language queries on a relational database |
US11698933B1 (en) | 2020-09-18 | 2023-07-11 | Tableau Software, LLC | Using dynamic entity search during entry of natural language commands for visual data analysis |
US11301631B1 (en) | 2020-10-05 | 2022-04-12 | Tableau Software, LLC | Visually correlating individual terms in natural language input to respective structured phrases representing the natural language input |
US11842154B2 (en) | 2020-10-05 | 2023-12-12 | Tableau Software, LLC | Visually correlating individual terms in natural language input to respective structured phrases representing the natural language input |
CN112905591A (en) * | 2021-02-04 | 2021-06-04 | 成都信息工程大学 | Data table connection sequence selection method based on machine learning |
US11580111B2 (en) * | 2021-04-06 | 2023-02-14 | Thoughtspot, Inc. | Distributed pseudo-random subset generation |
US11836136B2 (en) | 2021-04-06 | 2023-12-05 | Thoughtspot, Inc. | Distributed pseudo-random subset generation |
US20220318243A1 (en) * | 2021-04-06 | 2022-10-06 | Thoughtspot, Inc. | Distributed Pseudo-Random Subset Generation |
US20220414171A1 (en) * | 2021-06-28 | 2022-12-29 | Flipkart Internet Private Limited | System and method for generating a user query based on a target context aware token |
WO2023278037A1 (en) * | 2021-06-30 | 2023-01-05 | Microsoft Technology Licensing, Llc | Multiple semantic hypotheses for search query intent understanding |
US11914600B2 (en) | 2021-06-30 | 2024-02-27 | Microsoft Technology Licensing, Llc | Multiple semantic hypotheses for search query intent understanding |
Also Published As
Publication number | Publication date |
---|---|
US11899638B2 (en) | 2024-02-13 |
EP3719672A1 (en) | 2020-10-07 |
EP3022665A1 (en) | 2016-05-25 |
US20190065604A1 (en) | 2019-02-28 |
US20230195807A1 (en) | 2023-06-22 |
EP3022665B1 (en) | 2020-07-08 |
US11599587B2 (en) | 2023-03-07 |
US11017035B2 (en) | 2021-05-25 |
US20210279292A1 (en) | 2021-09-09 |
CN105531706A (en) | 2016-04-27 |
WO2015009353A1 (en) | 2015-01-22 |
HK1221033A1 (en) | 2017-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11899638B2 (en) | Token based dynamic data indexing with integrated security | |
US9405794B2 (en) | Information retrieval system | |
US11386085B2 (en) | Deriving metrics from queries | |
US20230273923A1 (en) | Generating and/or utilizing a machine learning model in response to a search request | |
US9626453B2 (en) | Analytic solution with a self-learning and context-sensitive semantic layer | |
US7840601B2 (en) | Editable table modification | |
US8131748B2 (en) | Search query formulation | |
US11550838B2 (en) | Providing information cards using semantic graph data | |
US9805142B2 (en) | Ranking suggestions based on user attributes | |
US8463807B2 (en) | Augmented search suggest | |
US9268822B2 (en) | System and method for determining organizational hierarchy from business card data | |
US20190108235A1 (en) | Alternative query suggestion in electronic searching | |
US10152694B2 (en) | Semi-automatic object reuse across application parts | |
US11238102B1 (en) | Providing an object-based response to a natural language query | |
US10467292B2 (en) | Suggesting query items based on database fields | |
US9477768B1 (en) | Disambiguation of online social mentions | |
US11301636B2 (en) | Analyzing resumes and highlighting non-traditional resumes | |
US11423102B2 (en) | Learning model based search engine | |
US20160019204A1 (en) | Matching large sets of words | |
US20230297565A1 (en) | System and method for distribution, searching, and retrieval of data assets | |
US20240104112A1 (en) | Identity resolution in knowledge graph databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: THOUGHTSPOT, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GUPTA, SHASHANK;PRAKASH, AMIT;SINGH, AJEET;AND OTHERS;SIGNING DATES FROM 20140411 TO 20140414;REEL/FRAME:032669/0716 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |