WO2012174632A1 - Method and apparatus for preference guided data exploration - Google Patents

Method and apparatus for preference guided data exploration Download PDF

Info

Publication number
WO2012174632A1
WO2012174632A1 PCT/CA2011/001382 CA2011001382W WO2012174632A1 WO 2012174632 A1 WO2012174632 A1 WO 2012174632A1 CA 2011001382 W CA2011001382 W CA 2011001382W WO 2012174632 A1 WO2012174632 A1 WO 2012174632A1
Authority
WO
WIPO (PCT)
Prior art keywords
preferences
preference
order
user
item
Prior art date
Application number
PCT/CA2011/001382
Other languages
French (fr)
Inventor
Ihab F. Ilyas
Mohamed A. Soliman
Original Assignee
Primal Fusion Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Primal Fusion Inc. filed Critical Primal Fusion Inc.
Publication of WO2012174632A1 publication Critical patent/WO2012174632A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Definitions

  • the present disclosure relates generally to methods and systems for querying a database storing a plurality of items.
  • Data exploration systems such as search engines and database management systems, manage enormous volumes of information.
  • locating information of interest to a user in response to a search query e.g., in the form of a set of keywords
  • a search query e.g., in the form of a set of keywords
  • Query interface 12 is used to collect query predicates in the form of keywords and/or attribute values (e.g., "used Toyota" with price in [$2000-$5000]). Query results are then sorted (14) on the values of one or more attributes (e.g., order by Price then by Rating) in a major sort/minor sort fashion. The user then scans (16) through the sorted query answers to locate items of interest, refines query predicates, and repeats the exploration cycle (18).
  • This "Query, Sort, then Scan” model limits the flexibility of preference specification and imposes rigid data exploration schemes as highlighted in the following example.
  • Amy is searching online catalogs for a camera to buy. Amy is looking for a reasonably-priced camera, whose color is preferably silver and less preferably black or gray, and whose reviews contain the keywords "High Quality.” Amy is a money saver, so her primary concern is satisfying her Price preferences followed by her Color and Reviews preferences.
  • the data exploration model of FIG. 1 allows Amy to sort results in ascending price order. Amy then needs to scan through the results comparing colors and inspecting reviews to find the camera that she wants.
  • the path followed by Amy to explore search results is mainly dictated by her price preference, while other preferences are incorporated in the exploration task through Amy's effort, which can limit the possibility of finding items that closely match her requirements.
  • Conventional approaches to specifying user preferences suffer from a number of other drawbacks in addition to not simultaneously supporting different types of preferences. For example, preference specifications may be inconsistent with one another.
  • a typical example is having cycles in preferences among first-order preferences (preferences among attributes of items such as preferring one car to another car based on the price or on brand), which implies non-transitivity of preferences.
  • a user may indicate that a Hyundai is preferred to a Toyota, Toyota is preferred to a Nissan and a Nissan is preferred to a Hyundai.
  • second order preferences can result in further problems. For example, prioritized composition of a set of partial orders does not generally maintain the transitivity property in the resulting order.
  • Conventional systems for data exploration are unable to rank search results when preference specifications may be inconsistent.
  • FIG. 1 is a diagram of a "query, sort, then scan" data exploration model, in accordance with prior art
  • FIG. 2 is a diagram illustrating a relation, in accordance with some embodiments
  • FIG. 3 is a flowchart of an illustrative preference modeling process, in accordance with some embodiments.
  • FIG. 4 is a diagram illustrating scopes obtained from a relation, in accordance with some embodiments.
  • FIG. 5 is a diagram illustrating scope comparators, in accordance with some embodiments.
  • FIG. 6 is a diagram illustrating conjoint preferences, in accordance with some embodiments.
  • FIG. 7 is a diagram of an illustrative mapping of a partial order to linear extensions, in accordance with some embodiments;
  • FIG. 8 is a diagram of an illustrative preference graph, in accordance with some embodiments.
  • FIG. 10 is a diagram of an illustrative page-rank based matrix for prioritized comparators, in accordance with some embodiments.
  • FIG. 11 is a diagram of an illustrative weighted preference graph and tournaments derived from it, in accordance with some embodiments;
  • FIG. 12 is a flowchart for an illustrative process for interactively specifying preferences, in accordance with some embodiments.
  • FIG. 13 is an illustrative computer system on which some embodiments of the present disclosure may be implemented.
  • preferences include an ordering on all prices (a "total order” preference), an ordering between some colors (a "partial order” preference), a Boolean predicate from the presence of the words "High Quality” in the reviews, and an indication that price is more important than the other preferences.
  • Another situation in which it may be useful to specify different types of preferences may be a situation in which a user may have precise preferences for information in one domain because the user may possess a large amount of knowledge about the domain. Such precise preferences may be specified, for example, in the form of one or more scoring functions. However, the same user may have imprecise preferences for information in another domain because the user may not posses a large amount of knowledge about the other domain. In this case, preferences may be specified, for example, in the form of one or more partial orders on attribute values. There are many instances in which the user may need to specify both types of preferences (i.e., using a scoring function and using a partial order) as shown in Example 2 below.
  • Example 2 Alice is searching for a car to buy.
  • Alice has specific preferences regarding sport cars, and more relaxed preferences regarding SUVs.
  • Alice supplies the data exploration system with a scoring function to rank sport cars, and a set of partial orders encoding SUVs preferences.
  • Alice expects reported results to be ranked according to her preferences.
  • a data exploration system capable of integrating different preference types and ranking search results in response to a user query, in accordance with user-specified preferences, may address some of the above-discussed drawbacks of conventional approaches. However, not every embodiment addresses every one of these drawbacks, and some embodiments may not address any of them. As such, it should be appreciated that embodiments of the invention are not limited to addressing all or any of the above-discussed drawbacks of these conventional approaches.
  • a preference language for specifying different types of user preferences among items.
  • a data exploration system may assist a user to specify preferences using the preference language.
  • the specified preferences may be used to construct a general preference model that, in turn, may be used to produce a ranking of items in accordance with any user preferences.
  • Items may be any suitable items about which a user may express preferences.
  • an item may be any item that may be manufactured, sold and/or purchased.
  • an item may be a car or an airplane ticket— a user (e.g., a consumer) may have preferences for one car over another car and/or may prefer one airplane ticket to another airplane ticket.
  • an item may comprise information.
  • Users may prefer one item over another item based at least in part on the information that these items contain. For example, when searching for content (e.g., movie, music, images, webpages, text, sound, etc.) a user may prefer some content to other content. For instance, a user may prefer to see a webpage that contains infonnation related to cars over a webpage that contains information related to bicycles.
  • An item may comprise, or have associated with it, one or more attributes.
  • An attribute of an item may be related to the item and may be a characteristic of the item.
  • An attribute of an item may be a characteristic descriptive of the item. For instance, if an item is an item that may be purchased, an attribute of the item may be a price related to the item.
  • An attribute of an item may be a characteristic that may identify the item. For example, a characteristic of an item may be an identifier (e.g., name, serial number, or model number) of the item.
  • Attributes may be numerical attributes and may be categorical attributes.
  • Numerical attributes may comprise one or more values.
  • a numerical attribute may comprise a single number (e.g., 5) or a range of numbers (e.g., 1-1000).
  • Categorical attributes may also comprise one or more values.
  • a categorical value for the category "Color” may comprise a single color (e.g., Red) or a set of colors (e.g., ⁇ "Red", “Green” ⁇ ).
  • attribute values are not limited to being numbers and/or categories and may be any of numerous other types of values.
  • values may comprise alphabetic and alphanumeric strings.
  • An item may be represented by one or more tuples comprising values for one or more attributes associated with the item.
  • a tuple representing an item may comprise a value for each attribute associated with the item.
  • a tuple representing an item may comprise a value for only a portion of the attributes associated with the item.
  • FIG. 2 shows an illustrative example of a set of items, each item being represented by a tuple comprising values for the attributes of the items.
  • each item is a car and is associated with six attributes: "ID,” “Make,” “Model,” “Color,” “Price,” and “Deposit.” Though in this example all items share the same attributes, this is not a limitation of the present invention as different items may have different attributes from one another and some attributes may have unknown values.
  • Each item is represented by a tuple (i.e., a set) of attribute values. Accordingly, the first item has characteristics indicated by the first set of attribute values.
  • the first item is represented by the tuple in the first row of the table shown in FIG. 2. As illustrated, this first item is an $1800 Red Honda® identified by identifier "ti". A deposit of $500 may be required to purchase this car.
  • a user may express preferences for one item over another item in a set of items. User preferences may be of any suitable type and may be first-order user preferences, second-order user preferences, and even further-order preferences.
  • First-order preferences are preferences associated with attributes of items.
  • First-order preferences may be based on values of attributes of items.
  • a first-order preference may express a preference for an item over another item based on values of one more attribute of the two items.
  • a first-order preference may indicate an item with a lower price (value of the attribute "price") is preferred to an item with a higher price.
  • a first-order preference may indicate that a red (value of the attribute "color”) item (e.g., car) is preferred to a blue item.
  • Second-order preferences are preferences across first-order preferences. Second-order preferences may indicate which first-order preferences are more important to a user.
  • first-order preference A may be based on values of one attribute (e.g., "price") while first-order preference B may be based on values of another attribute (e.g., "color”).
  • a second- order preference may indicate that first-order preference B is preferred to first-order preference A (i.e., color may be more important than price).
  • first-order and second-order preferences there may be many different types of first-order and second-order preferences and these types of preferences, along with other aspects of first-order and second-order preferences are discussed in greater detail below in Sections II and III, respectively.
  • the data exploration system may be any system for exploring data, infonnation or knowledge.
  • the data exploration system may allow one or more users to query the system.
  • a data exploration system may be a search engine such as an Internet search engine or a domain-specific search engine (e.g., a search engine created to search a particular infonnation domain such as a company's or institution's intranet, or a specific subject-matter information repository).
  • a data exploration system may be a database system that may allow user queries.
  • a query input by a user into a data exploration system may be any of numerous types of queries.
  • a query may comprise one or more keywords indicating what the user is seeking.
  • a query may comprise user preferences.
  • user preferences may be specified separately and/or independently from any user query.
  • a user may specify preferences that may apply to multiple user queries.
  • the specified preferences may comprise preferences of any suitable type such as first-order and/or second- order user preferences.
  • a data exploration system may assist a user to specify preferences.
  • a data exploration system may assist a user to specify preferences using the preference language, for example.
  • a preference model may be constructed from these preferences.
  • the preference model may be constructed from different types of preferences and may be constructed from first-order preferences of different types and/or from second-order preferences of different types.
  • a preference model may be represented by a data structure encoding the preference model.
  • the data structure may comprise any data necessary for representing the preference model and, for example, may comprise any parameters associated with the preference model.
  • a data structure encoding a preference model may be stored on any tangible computer- readable storage medium.
  • the computer-readable storage medium may be any suitable computer-readable storage medium and may be accessed by any physical computing device that may use the preference model encoded by the data structure.
  • the preference model may be a graph-based preference model and the data structure encoding the preference model may encode a graph, termed a preference graph, characterizing the graph-based preference model.
  • the preference graph may comprise a set of nodes (vertices) and a set of edges connecting nodes in the set of nodes. The edges may be directed edges or may be undirected edges. Accordingly, the data structure encoding the preference graph may encode the preference graph by encoding the graph's vertices and edges.
  • nodes of the graph may be associated with items.
  • a node in the graph may be associated with a tuple that, in turn, represents an item.
  • the graph may represent items that are related with one or more keywords in a query. For instance, a set of items may be selected in response to a user-provided query.
  • a first-order preference for one item over another item may be represented as an edge in the graph, with the edge connecting nodes associated with the tuples associated with the two items.
  • a weight may be associated to each node in the graph to provide an indication of a degree of preference for one of the nodes terminating the edge.
  • the weight may be computed based on first-order and/or second preferences. Aspects of a graph-based preference model, including how such a preference model may be constructed from user-specified preferences, are described in greater detail in Section IV, below.
  • the preference model may be used to obtain a ranking of items in a set of items.
  • a graph-based preference model may be used to construct such a ranking.
  • a graph- based preference model may be used to construct such a ranking in any of numerous ways. For instance, a complete directed graph may be obtained from the graph-based preference model and a ranking of items may be obtained based on the completed directed graph. As another example, a Markov-chain based algorithm may be applied to the graph-based preference model to obtain a ranking of items.
  • constructs described below are convenient abstractions used in various fields such as computer science, but each construct may be realized, in practice, by a data structure representing data characterizing the construct and/or processor-executable instructions for carrying out functions associated with the construct.
  • Such data structures and processor-executable instructions may be encoded on any suitable tangible compute-readable storage medium.
  • every reference to a construct is a reference to a data structure encoding the construct and/or processor-executable instructions that when executed by a processor perform functions associated with the construct, since explicitly referring to such data structures and processor- executable instructions for every reference to a construct is tedious.
  • the above-described embodiments can be implemented in any of numerous ways.
  • the embodiments may be implemented using hardware, software, or a combination thereof.
  • the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.
  • Software modules comprising program instructions may be provided to perform any of numerous of tasks in accordance with some embodiments. For example, one or multiple software modules for constructing a preference model may be provided. As another example, software modules for obtaining a ranking for a set of items based on (a data structure representing) the preference model may be provided. As another example, software modules comprising instructions for implementing any of numerous functions associated with a data exploration system may be provided. Though, it should be recognized that the above examples are not limiting and software modules may be provided to perform any functions in addition to or instead of the above examples.
  • a data exploration system that utilizes user preference may reflect some or all of the following design goals:
  • the system may assist users to formulate their preferences.
  • the system may support interactive preference management. For instance, the system may provide users with information to help users specify and/or modify preferences. As a specific example, the system may provide users with information about how to modify their preferences to widen or narrow the scope of their search. As another specific example, the system may provide users with information about how to modify their preferences such that the ranking of items presented to a user is modified. Though, these are only examples and the system may aid the user to formulate their preferences in any of numerous ways as described in greater detail below, in Section VI.
  • the data exploration system may be a system that may receive a query from one or more users.
  • the system may be a database system or a search engine and the query may comprise one or more keywords.
  • the system may assist a user to specify preferences.
  • such support may be based on pre-computed summaries in the form of facets that may be used for guiding data exploration.
  • Each facet may be associated with a number that may provide the user with an estimate on the expected number of results. Accordingly, facets may allow a user to get a quick and dirty view of the underlying set of items and/or domain, and how search results may be affected by tuning preferences.
  • the system may comprise a memory configured to store a plurality of tuples (recall that each tuple comprises one or more values for one or more attributes) and may receive a range of desired values for an attribute from a user. In response the system may output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.
  • the system may associate a number to each of these facets, the number indicating an expected number of tuples consistent with these facets.
  • the system may adopt the concept of contextualized preferences, where a user can assign different preference specifications to different subsets (contexts) of items.
  • a user may define a context by using predetermined facets or by defining custom facets.
  • the user has the flexibility of expressing first-order and second-order preferences within and across contexts.
  • Contextualized preferences may also part of a user's profile, which may be ascertained by any of the techniques disclosed herein as well as those disclosed in U.S. Non-Provisional Application Serial No. 12/555,293, filed September 08, 2009, and titled Synthesizing Messaging Using Context Provided By Consumers, which is hereby incorporated by reference. This way, they may be loaded, saved, and/or refined upon the user's request.
  • the data exploration system illustrated in FIG. 3 may maintain information regarding which preferences among the input preferences, affect the relative order of each pair of items in the final results ranking. This feature may be useful for the analysis and refinement of preferences in different scenarios. Examples include finding preference constructs that have dominating effect on results' ranking, decreasing/increasing the influence of some preference constructs, and understanding the effect of removing a certain preference construct. Additional ways in which a data exploration system may assist a user to input preferences are discussed below in Section VI.
  • the preference language may be based on capturing pairwise preferences on different granularity levels.
  • An items' description may follow a relational model, where each item may be represented as a tuple.
  • Preferences may be cast against a relation R with a known schema.
  • Our first construct is used to define a context for expressing first-order preferences.
  • a scope R is an arbitrary non-empty subset of tuples in R.
  • a scope defines a Boolean membership property that restricts the space of all possible tuples to a subset of tuples that are interesting for building preference relations.
  • Such a membership property may be defined using a SQL query posed against R.
  • FIG. 4 shows six different scopes Rj... R 6 in the relation "Car" illustrated in FIG. 2, where scopes are defined using SQL queries.
  • a database query language other than SQL may be used to define such a membership property.
  • the membership property may be defined using a set of variables and a database language may not be needed.
  • scopes may intersect.
  • a tuple in the relation R may belong to zero, one or two or more scopes. Tuples that do not belong to any scopes may be non-interesting with respect to a preference specification. Thus, for clarity, all subsequent discussion is with respect to tuples that belong to at least one scope.
  • the scope comparator fi j is a function that takes a pair of distinct tuples (one is from R, and the other is from Rj ), and returns a first value such as 1 (e.g., if the tuple from R, is preferred), a second value such as -I (e.g., the tuple from R j is preferred), or a null value "-L " (e.g., if there is no preference).
  • a scope comparator is a preference language construct for defining first-order preferences.
  • the scope comparator may be user-defined.
  • a scope comparator may be defined, automatically, by a computer.
  • a scope comparator may be defined by a combination of manual and automatic techniques.
  • a generic interface to a scope comparator may accept two tuples and return either a preference of one tuple over the other, or no preference can be made. Whenever a tuple t t is f . . ⁇
  • FIG. 5 shows illustrates 5 different scope comparators defined on the scopes shown in FIG. 4.
  • the scope comparators f iA and f u are unconditional (i.e., they produce first- order preferences without testing any conditions beyond the conditions captured by scope definition).
  • the scope comparators f u , fs.e , fe.2 are conditional (i.e., they produce preference relations conditioned on some logic).
  • Conditional scope comparators allow defining composite preferences that span multiple attributes given in scope definition and/or comparator logic (e.g., f 62 defines a composite preference on Price and Make attributes).
  • a total order on a scope R (which can be the whole relation R) may be encoded by defining a comparator f u , using the template in Algorithm 1, where f u operates on pairs of distinct tuples belonging to R h
  • P x be a partial order defined on the domain ofx.
  • Partial order-based preferences may be encoded using the template given by Algorithm
  • Template 3 [Skyline Preferences]. Given a set of attributes A, a tuple t, is preferred to tuple i, if there exists a non-empty subset ⁇ — - ⁇ . wliet V ' 6 A . tj .X j s preferred to t j , x , while for any other attribute no preference can be made between tj .X 1 Olid tj ..v Skyline preferences may be encoded as shown in the template given by Algorithm 3.
  • Example 3 is an example for specifying and managing conjoint analysis preferences.
  • Alice's preferences regarding cars may be expressed conjointly over the attribute pairs (Make, Color), and (Make, Price), as shown in FIG. 6.
  • the value in each cell is the rank assigned to each combination of attribute values.
  • Conjoint analysis may be based on an additive utility model in which ranks, assigned to combinations of attribute values, may be used to derive a utility (part-worth) of each attribute value. The objective is that the utility summation of attribute values reconstructs the given ranking.
  • 'Honda' is assigned utility value 40
  • 'Red' is assigned utility value 50.
  • the score of 'Honda, Red' is 90, which matches the assigned rank 1 in the given Make-Color preferences.
  • Utility values may be computed using regression. For instance, they may be computed using linear regression. Note the mapping between combinations of attribute values and ranks is modeled.
  • a POrder represents an ordering of scope comparators based on their relative importance.
  • a POrder may quantify the strength of different first-order preferences based on the semantics of second-order preferences, as discussed in greater detail below in Section IV.
  • Second-order preferences may be encoded using POrders.
  • a partial order PO on the set of scope comparators may encode partial information on the relative importance of different scope comparators.
  • the set ⁇ is called the set of linear extensions of PO.
  • FIG. 7 shows a partial order defined on four comparators and the corresponding set of linear extensions.
  • the set of linear extensions may be obtained using a simple recursive algorithm on the PO graph.
  • the pairwise second-order preference (f* ⁇ expresses the requirement that the first-order preferences corresponding to f are more important than the first-order preferences corresponding to f.
  • Pairwise second-order preferences PW may be formulated as the set of POrders ⁇ (fi ⁇ fi ) ⁇ ' (fi ⁇ fi ) £ P1 ⁇ ' ⁇ .
  • Pareto Preference Composition The importance of all scope comparators is equal.
  • the first-order preference ⁇ is produced if and only if at least one scope comparator states that ⁇ 0) , and no other scope comparator states that (' ?
  • Pareto preference composition is formulated as a set of singleton POrders, where each POrder is composed of a single comparator.
  • the first-order preference ⁇ ' is produced if and only if at least one scope comparator states that ( ⁇ »- ) .
  • Preferences aggregation may be formulated as a set of singleton POrders, where each POrder may be composed of a single comparator. IV. Compilation
  • a preference graph may be formally defined as follows: Definition 5 [Preference Graph] : A directed graph (V,E), where V is the set of tuples in
  • edge e ij denoted l(e Lj ) is the set of comparators inducing preference of t t over t j .
  • the compilation algorithm is described in Algorithm 5.
  • the algorithm constructs the set of vertices also termed nodes of the preference graph using the union of tuples involved in all input scopes.
  • each node in the preference graph is associated with a tuple.
  • each node in the preference graph may represent an item.
  • the set of applicable scope comparators may be found and used to compute graph edges and their labels.
  • an edge in the preference graph may correspond to a first-order preference, which may indicate a user preference for one of the two items represented by the nodes terminating the edge.
  • Edges of the preference graph may be directed edges and may be directed to the node associated with a preferred data item as indicated by the first-order preference associated with the edge.
  • edges may be undirected and an indication of which of nodes terminating the edge is preferred may be provided differently. For instance, such an indication may be provided by using a signed weight, with a negative weight indicating a preference for one node and a positive weight indicating a preference for the other node.
  • FIG. 8 illustrates example for the output of the compilation algorithm.
  • FIG. 8 shows the preference graph obtained from the set of scope comparators
  • Each edge is labeled with a set of supporting comparators. For example, for the edge e 26 , we have ⁇ ( ( ' 2 ⁇ ) ⁇ ⁇ / ⁇ ,2 ⁇ ⁇ ,2 ⁇ ; since the tuple t 2 is preferred over the tuple t 6 according to the scope comparators f I 2 and f 62 .
  • - t G induced preference graph may be a cyclic graph.
  • a cycle exists since t t is preferred over t 6 according to f 62 , while t 6 is preferred over t ⁇ according to f u .
  • Construction of a preference graph according to Algorithm 5 does not guarantee transitivity of graph edges.
  • the existence of the edges e 2 6 and e 6 does not imply the existence of the edge e 2 , / .
  • COMPILK-PRi- FS (S; a set of scopes, F: a set of comparators)
  • the computational complexity of constructing and processing a preference graph is quadratic in the number of tuples. There is a tradeoff between a preference graph's expressiveness and the scalability of its implementation. Though in some embodiments, preferences may be highly “selective" and, consequently, the preference graph may be sparse.
  • Scalability issues due to the size of the preference graph may be addressed in any of numerous ways.
  • One approach is to use distributed processing in a cloud environment, where storing and managing the preference graph is distributed over multiple nodes in the cloud.
  • a ranking algorithm described below in Section V.A may be easily adapted to function in a cloud environment.
  • Other approaches include sacrificing the precision of preference query results by conducting approximate processing, or thresholding managed preferences to prune weak preferences early, to reduce the size of the preference graph.
  • a preference graph allows heterogeneous user preferences to be encoded using a unified graphical representation. Though, in some embodiments, computing a ranking of query results using such representation may require additional quantification of preference strength.
  • Preference strength may be quantified based on the semantics of first-order and second-order preferences, while preserving the preference infomiation encoded by the preference graph. Preference strength may be represented by weights on edges of the preference graph. Given a preference graph G(V,E), the set of graph edges E may represent pairwise first- order preferences. Specifically, an edge e i:j may express the preference for tuple 3 ⁇ 4 over tuple t j according to one or more scope comparator(s).
  • a weight w may be associated with an edge e ; .
  • the weight w ;j may be a weight indicative of a degree of preference for the first node over the second node. Stronger preferences may be indicated by higher weights.
  • the weight may be a weight between 0 and 1 , inclusive and the sum of the weights w ;j and Wj i may equal 1. Disconnected vertices in the preference graph indicate that their corresponding tuples are indifferent with respect to each other.
  • computing the weight may comprise dividing the number of first- order preferences for item A relative to item B by the number of all first-order preferences indicating any preference (either for or not for) item A.
  • F the set of all scope comparators associated with the preference graph.
  • A the set of POrders of F according to the chosen semantics of second-order preferences.
  • J .' ⁇ ' ' ⁇ That is, F ⁇ is the set of scope comparators that state a preference relationship between tuples t t and t j .
  • Ay be the multiset of nonempty projections
  • the weight w may be computed as follows: i ⁇ A (1 )
  • Wjj corresponds the proportion of POrder projections, under which * ⁇ J , among the set of POrder projections computed based on comparators relevant to the edge (3 ⁇ 4, tj).
  • the weight w j may be similarly defined using the set ⁇ * . It follows that " ' ⁇ ./ w 3 ⁇ ⁇ .
  • Pareto composition at most one of the two edges and e jA can exist in the preference graph, since otherwise i, and t j would be incomparable. Hence, under Pareto composition, we remove any graph edge e, whenever an edge e, , exists.
  • FIG. 9 shows three weighted preference graphs, corresponding to the preference graph in FIG. 8, produced under different semantics of second-order preferences.
  • the different semantics of second-order preferences result in different edge weights and/or the removal of some edges in the original preference graph: ⁇
  • Under prioritized comparators, ei i6 is removed since, based on the shown comparator priorities, it may be determined that (t 6 >- ti).
  • the graph-based preference model described in Section IV may be used to obtain a ranking (a total order) of items in a set of items. This may be done in any of numerous ways.
  • One approach described in Section V.A obtains a ranking based on authority-based ranking algorithms.
  • Another approach described in Section V.B is a probabilistic algorithm based on inducing a set of complete directed graphs called tournaments from the graph-based preference model and computing a ranking for at least one tournament from the set.
  • a total order of items may be obtained by estimating an importance measure for each tuple using the preference weights encoded by the weighted preference graph.
  • Techniques related to the PageRank importance flow model may be used to compute such importance measures.
  • scores may be assigned to Web pages based on the frequency with which they are visited by a random surfer. Pages are then ranked according to these scores. Intuitively, pages pointed to by many important pages are also important.
  • PageRank importance flow model lends itself naturally to problems that require computing a ranking based on binary relationships among items.
  • the model may be applied based on the notion that an item may be important if is preferred over many other important items.
  • G (V, E) be a dominance graph (i.e., a directed graph in which an edge e, y means Vj > ⁇ I an( j j e an( j jj ⁇ j k e ⁇ set Q f no( j es dominated by and dominating v, respectively.
  • number called a damping factor.
  • the PageRank algorithm computes the PageRank score of node vdonating denoted 1i, according to: The PageRank score of a node v is determined by summing PageRank scores of all nodes v' dominated by v, normalized by the number of nodes dominating v' .
  • a pagerank-based algorithm may be used to calculate a total order of items from the weighted preference graph.
  • a pagerank-based algorithm refers to any algorithm based on calculating a value from a graph based on characteristics of a
  • the weighted preference graph has preference weights associated to edges.
  • the preference weights bias the probability of transition (flow) from one state to another, according to weight value, in contrast to the conventional case in which transitions are uniformly defined.
  • a pagerank-based algorithm may proceed as follows. Given a starting tuple t 0 (node) in the weighted preference graph, assume a random surfer that jumps to a next tuple t u among the set of tuples dominating t 0 , biased by the edge weights. Intuitively, this corresponds to a process where a tuple is constantly replaced by a more desired tuple (with respect to given preferences). Note that visiting tuples takes place in the opposite direction of edges (jumps are from a dominated tuple to a dominating tuple).
  • Ranking tuples based on their visit frequency defines an ordering that corresponds to their global desirability.
  • the weighted preference graph may be represented using a square matrix M, where each tuple may corresponds to one row and one column in M.
  • E j be the set of incoming edges to tuple t j in the weighted preference graph.
  • the entry M [i, j] may be computed as follows:
  • be the pagerank scores vector.
  • the pagerank scores are given by solving the equation ⁇ — M ⁇ ⁇ w hich is the same as finding the eigenvector of M corresponding to eigenvalue 1.
  • the solution that has been used practice for computing pagerank scores is using the iterative power method, where ⁇ computed by first choosing an initial vectorF , and then producing a next vector The process is repeated to generate a vector F ⁇ , at iteration T, using the vector ⁇ , generated at iteration ⁇ 1 1.
  • entries in T ⁇ are normalized so that they sum to 1.0.
  • the number of iterations needed for the power method to converge may be any suitable of iterations. For instance, tens or hundreds of iterations may be used.
  • FIG. 10 illustrates the pagerank matrix for the weighted preference graph with prioritized comparators illustrated in FIG. 9.
  • t 4 is a sink node with no incoming edges (i.e., t 4 has no other dominating tuples).
  • t 4 has no other dominating tuples.
  • a typical value of the damping factor a may be a value such as 0.15, but may be any value between 0 and 0.5.
  • a total order of items may be obtained from a complete directed graph derived from the preference model.
  • Computing a total order of items from a complete directed graph (also known as a tournament) is termed finding a tournament solution.
  • This problem may be stated as follows. Given an irreflexive, asymmetric, and complete binary relation over a set, find the set of maximal elements of this set.
  • Example methods for finding tournament solutions are computing Kendall scores, and finding a Condorcet winner.
  • the preference graph described in Section IV is not necessarily a tournament.
  • the preference graph may be symmetric and incomplete:
  • the symmetry problem implies that some pairwise preferences may go either way with possibly different weights, while incompleteness implies that some pairwise preferences may be unknown.
  • a probabilistic approach to obtaining a ranking from the preference graph may be used. Such an approach may rely on deriving one or more tournaments from the preference graph. Each tournament may be associated with a probability. As such, a weighted preference graph may be viewed as a compact representation of a space of possible tournaments, wherein each tournament is obtained by repairing the preference graph to obtain an asymmetric and complete digraph. In order to construct a tournament, two repair operations may be applied to the preference graph:
  • the value of the weight w it represents the probability of selecting a
  • repairing the weighted Preference graph generates a tournament (irreflexive, asymmetric, and complete digraph) whose probability is given by the product of the probabilities of all remaining graph edges.
  • c be the number of 2-length cycles in the Preference graph
  • d be the number of disconnected tuple pairs. Then, the number of possible tournaments is 2 +d .
  • FIG. 1 1 illustrates a weighted preference graph, and the corresponding set of possible tournaments ⁇ ⁇ 3 ⁇ 4 ⁇ .
  • the illustrated preference graph has two 2-length cycles
  • the problem of computing a total order of tuples with a minimum number of violations to tournament is known to be NP-hard. Multiple heuristics have been proposed to compute a total order from a tournament. We focus on using Kendall score for computing a total order.
  • the Kendall score of tuple t is the number of tuples dominated by t according to the tournament.
  • the space of possible tournaments allows computing a total order of tuples under any of numerous probabilistic ranking measures. Two specific measures are described below.
  • Expected ranking Compute a total order of tuples based on the expected ranking in the space of all the possible tournaments.
  • Finding the most probable tournament is done by maintaining the edge with the higher weight for each 2-length cycle in the preference graph, and adding an arbitrary edge for each pair of disconnected tuples. According to this method, there may be multiple tournaments with the highest probability among all possible tournaments. The computed total order under any of these tournaments is the required ranking. In the illustrative example of FIG. 11 , tournaments T 2 and T 6 are the most probable tournaments, each with probability 0.21. A total order of tuples in T 2 using
  • Finding the expected ranking may be done by computing the expected Kendall score for each tuple using the space of possible tournaments.
  • / / dominates one tuple in ⁇ w j m probability summation , r 1 2 ⁇ 1 4 ⁇ J 6 ⁇ J 8 ⁇ w ith probability summation 0.7.
  • the random variable s t may take the value 1 with probability 0.3, and takes the value 2 with probability 0.7.
  • the expected value of s is estimated as the mean of the generated score samples. It is well known that sample mean, computed from a sufficiently large set of independent samples, is an unbiased estimate of the true distribution mean. Let n be the number of tuples in the preference graph, and m be the number of drawn samples for each tuple, the complexity of the algorithm is 0((nm) 2 ), since we access the dominated set of each tuple m times to generate m score samples. VI. Interactive Preference Specification
  • a data exploration system may help a user to specify preferences.
  • preferences may be specified interactively.
  • a system may interact with a user through a series of prompts, displays, and/or indications of the type of input a user may provide the system.
  • the system may provide the user with information that may assist the user in specifying preferences.
  • a data exploration system may assist a user to query the system. To this end, the data exploration system may assist the user to specify preferences and may output query results, to the user, ranked in accordance with the specified preferences.
  • FIG. 12 shows a flowchart of an illustrative process 1200 for assisting a user to query a data exploration system.
  • Process 1200 may be used to assist a user specify user preferences in conjunction with a query, and may assist a user specify preferences associated with attributes related to one or more keywords in a query.
  • Process 1200 begins in act 1202 when a user query may be inputted.
  • the inputted query may be any suitable query and may be a text query.
  • the inputted query may be a multimedia query, for example, received through an audio input device that may be translated into text using any appropriate speech-recognition/speech-to-text software.
  • the inputted query may comprise one or more keywords.
  • the query may be, for example, a query for an item to purchase and/or may be a query for an item comprising information desired by a user.
  • the query may be a query containing the keyword "car" and may indicate that a user may be interested in looking at items related to cars.
  • the user may input a query "television" into an Internet search engine, which may indicate that a user may be interested in looking at any webpages containing information about television.
  • a query may be any suitable query, as known in the art.
  • Attributes may be related to one or more keywords contained in the query.
  • attributes may be a characteristic of a keyword in the query.
  • Attributes may be of any suitable type.
  • attributes may be categorical attributes or numerical attributes. For instance, if a query for a "car" were inputted in act 1202, then attributes related to car may be the attributes "Make,” “Color,” “Price,” and any other attributes of car such as the attributes illustrated in FIG. 2.
  • Attributes related to one or more keywords contained in a query may be identified in any suitable way as known in the art. They may be identified automatically by a computer or may be manually specified.
  • a user may be presented with these attributes, in act 1206.
  • the user may be shown these attributes visually using a display screen that contains these attributes.
  • the display screen may be any suitable screen containing a representation of the attributes, such as a text representation of the attributes.
  • the user may be prompted to select one or more of the presented attributes such that the system may assist the user to specify preferences associated with the selected attributes. For instance, a user may be presented with a list of previously-mentioned attributes associated with the keyword "car” and may select the attributes "Price” and "Color.”
  • attributes selected by the user may be received.
  • the user may be prompted to specify first- order preferences associated with one or more selected attributes, in act 1210.
  • the user may specify a first-order preference of any suitable type.
  • the user may specify score-based preferences, partial order preferences, skyline preferences, and/or conjoint analysis preferences as discussed with reference to Section II.
  • the user may be assisted in specifying any of the above-mentioned first-order preferences in any of numerous ways.
  • a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the first-order preferences (e.g., by drawing preferences).
  • the user may be provided with a series of prompts designed to obtain information required to specify first-order preferences.
  • the user may be prompted to specify a second-order preference among the received first-order preferences, in act 1212.
  • the user may specify a second-order preference of any suitable type. For instance, the user may specify prioritized preference composition preferences, partial order preferences, pairwise preferences, and/or Pareto preference composition preferences as discussed with reference to Section III. Similar to the case of first-order preferences, a user may be assisted in specifying any of the above-mentioned second-order preferences in any of numerous ways.
  • a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the second-order preferences.
  • the user may be provided with a series of prompts designed to obtain information required to specify second- order preferences. After first-order and second-order preferences have been specified, process 1200 completes.
  • the above-described embodiments of the present invention can be implemented in any of numerous ways.
  • the embodiments may be implemented using hardware, software or a combination thereof.
  • the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.
  • a computer may be embodied in any of numerous forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embodied in a device not generally regarded as a computer, but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, a tablet, a reader, or any other suitable portable or fixed electronic device.
  • PDA Personal Digital Assistant
  • a computer may have one or more input and output devices. These devices may be used, among other things, to present a user interface. Examples of output devices that may be used to provide a user interface include printers or display screens for visual presentation of output, and speakers or other sound generating devices for audible presentation of output. Examples of input devices that may be used for a user interface include keyboards, microphones, and pointing devices, such as mice, touch pads, and digitizing tablets.
  • Such computers may be interconnected by one or more networks in any suitable form, including networks such as a local area network (LAN) or a wide area network (WAN), such as an enterprise network, an intelligent network (IN) or the Internet.
  • networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, and/or fiber optic networks.
  • FIG. 13 shows, schematically, an illustrative computer 1300 on which various inventive aspects of the present disclosure may be implemented.
  • the computer 1300 includes a processor or processing unit 1301 and a memory 1302 that may include volatile and/or non- volatile memory.
  • the computer 1300 may also include storage 1305 (e.g., one or more disk drives) in addition to the system memory 1302.
  • the memory 1302 and/or storage 1305 may store one or more computer-executable instructions to program the processing unit 1301 to perform any of the functions described herein.
  • the storage 1305 may optionally also store one or more data sets as needed.
  • references herein to a computer can include any device having a programmed processor, including a rack-mounted computer, a desktop computer, a laptop computer, a tablet computer or any of numerous devices that may not generally be regarded as a computer, which include a programmed processor.
  • the exemplary computer 1300 may have one or more input devices and/or output devices, such as devices 1306 and 1307 illustrated in FIG. 13. These devices may be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
  • the computer 1300 may also comprise one or more network interfaces (e.g., the network interface 1310) to enable communication via various networks (e.g., the network 1320).
  • networks include a local area network or a wide area network, such as an enterprise network or the Internet.
  • Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
  • a method for querying a data exploration system managing a plurality of items comprising: querying the data exploration system with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items, and a second-order user preference indicative of the user's preferences among first-order user preferences in the plurality of first-order user preferences; calculating, with a processor, a ranking of an item in the plurality of items based at least in part on a data structure encoding a preference graph that represents the plurality of first- order user preferences and the second-order user preference; and outputting at least a subset of the plurality of items to the user, in accordance with the ranking.
  • calculating the ranking comprises: applying a pagerank-based algorithm to the data structure encoding the preference graph to calculate the ranking.
  • the preference graph comprises a plurality of nodes, wherein each node represents an item, and calculating the ranking comprises: calculating a pagerank score of a node in the plurality of nodes.
  • calculating the ranking comprises: computing a total order of nodes in a complete directed graph derived from the preference graph, wherein each node represents an item.
  • computing the total order comprises calculating a Kendall score for a node in the complete directed graph.
  • the preference graph comprises: a plurality of nodes, wherein each node corresponds to an item in the plurality of items; and a plurality of edges, wherein each edge corresponds to a first-order preference in the plurality of first-order preferences, the first- order preference indicating a user preference for one of the two items represented by nodes terminating the edge.
  • each edge is a directed edge, directed to a node associated with a preferred item as indicated by the corresponding first-order preference.
  • a weight is associated to an edge between a first node and a second node in the preference graph, the weight being indicative of a degree of preference for the first node over the second node.
  • each item in the plurality of items is represented as a tuple, the tuple comprising a plurality of attributes of the item.
  • a computer-readable storage medium article storing a data structure encoding a preference graph and a plurality of processor-executable instructions that when executed by a processor, cause the processor to perform the acts of: receiving a plurality of first-order user preferences indicative of user preferences among a plurality of items; receiving a second-order user preference indicative of user preferences among the first-order preferences in the plurality of first-order user preferences; computing a weight for an edge of the preference graph based on the plurality of first-order user preferences and the second-order user preference, wherein: the edge connects a first node associated with a first item and a second node associated with a second item, and the weight is indicative of a degree of preference for the first item over the second item; and outputting at least two of the plurality of items according to the preference graph.
  • the preference graph comprises a node for each item in the plurality of items and an edge for every pair of nodes associated with items related by a first-order preference in the plurality of first-order preferences.
  • the computing the weight comprises: computing a first number of first-order user preferences in the plurality of first-order user preferences indicating a user's preference for the first item relative to the second item; computing a second number of all first- order user preferences in the plurality of first-order user preferences indicating any preference associated with the first item; and setting the weight based on the first number divided by the second number.
  • receiving the plurality of first-order user preferences comprises receiving a first-order preference from a user.
  • each item in the plurality of data items is represented as a tuple, the tuple comprising values of a plurality of attributes; and each first-order user preference in the plurality of first-order user preferences indicates a user preference of one item over another item based at least in part on a value of an attribute of a first tuple, representing the one item, and a value of an attribute of a second tuple representing the other item.
  • the plurality of first-order user preferences comprises at least two types of first-order preferences selected from the group comprising score-based preferences, partial order preferences, skyline preferences, and conjoint analysis preferences.
  • the second-order user preference comprises a plurality of second-order user preference relations that comprises at least two types of second-order preferences selected from the group comprising prioritized preference composition, partial order preferences, pairwise preferences, and Pareto preference composition.
  • a database system comprising: a memory configured to store a plurality of tuples, a data structure encoding a preference graph to represent user preferences, wherein the user preferences comprise a plurality of first-order preferences representing user preferences among tuples and a second-order user preference representing user preferences among first-order preferences in the plurality of first-order preferences; and a processor configured to access contents of the memory and compute a ranking of a tuple in the plurality of tuples based on the data structure encoding the preference graph.
  • a system for interactive preference management comprising: a memory configured to store a plurality of tuples, each tuple comprising a value for at least one of a plurality of attributes; at least one processor configured to receive a range of values for an attribute in the plurality of attributes from a user, output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.
  • a computer-implemented method for interactive preference management comprising: receiving, with a processor, a query from a user, the query comprising a keyword; prompting the user to provide a plurality of first-order preferences associated with one or more attributes related to the keyword; and in response to receiving the plurality of first-order preferences, prompting the user to provide a second-order preference among the first-order preferences in the plurality of first-order preferences.
  • prompting the user to provide a plurality of first-order preferences comprises: presenting a list of attributes related to the keyword to the user; receiving a selection of attributes in the list of attributes from the user; and prompting the user to specify a first-order preference associated with the selected attribute.
  • a computer memory one or more floppy discs, computer discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, etc.
  • article(s) encoded with one or more programs that, when executed on one or more computers or other processors, implement the various process embodiments of the present invention.
  • the non-transitory computer-readable medium or media may be transportable, such that the programs stored thereon may be loaded onto any suitable computer resource to implement various aspects of embodiments as discussed above.
  • program or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed performs methods of one or more embodiments need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of one or more embodiments. Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, items, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • data structures may be stored in non-transitory computer-readable storage media articles in any suitable form.
  • data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields.
  • any suitable mechanism may be used to establish relationships among information in fields of the data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.
  • inventive concepts may be embodied as one or more methods, of which an example has been provided.
  • the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be construed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiment, or vice versa.
  • the phrase "at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
  • This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified.
  • At least one of A and B can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
  • a reference to "A and/or B", when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

There is disclosed a method and apparatus for querying a database storing a plurality of items. In an embodiment, the method comprises querying the database with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items and a second-order user preference indicative of the user's preferences among first-order user preferences in the plurality of first-order user preferences. In another embodiment, the method further comprises calculating a ranking of the plurality of items based at least in part on a data structure encoding a preference graph representing the plurality of first-order user preference and the second-order user preference and outputting at least a subset of the plurality of items to the user, in accordance with the ranking.

Description

METHOD AND APPARATUS FOR
PREFERENCE GUIDED DATA EXPLORATION
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 61/498,899, filed on June 20, 2011, titled "Method and Apparatus for Preference Guided Data Exploration".
FIELD OF INVENTION
The present disclosure relates generally to methods and systems for querying a database storing a plurality of items. BACKGROUND
Data exploration systems, such as search engines and database management systems, manage enormous volumes of information. As a result, locating information of interest to a user in response to a search query (e.g., in the form of a set of keywords) presents challenges.
Conventional approaches to search often shift the burden of finding the information of interest to the user. For example, all potentially-relevant results may be presented to the user in response to a search query. Subsequently, the user has to manually explore and/or rank these results in order to find the information of greatest interest. When the number of potentially- relevant results is large, which is often the case, the user may be overwhelmed and may fail to locate the information for which he is looking. One conventional technique for addressing this problem is to integrate a user's preferences into the search process. By presenting search results in accordance with the user's preferences, the user may be helped to find the information he seeks. However, conventional approaches to specifying user preferences severely limit the ways in which user preferences may be specified. Consider, for example, a data exploration model adopted by many search services and illustrated in FIG. 1. Query interface 12 is used to collect query predicates in the form of keywords and/or attribute values (e.g., "used Toyota" with price in [$2000-$5000]). Query results are then sorted (14) on the values of one or more attributes (e.g., order by Price then by Rating) in a major sort/minor sort fashion. The user then scans (16) through the sorted query answers to locate items of interest, refines query predicates, and repeats the exploration cycle (18). This "Query, Sort, then Scan" model limits the flexibility of preference specification and imposes rigid data exploration schemes as highlighted in the following example.
Example I
Amy is searching online catalogs for a camera to buy. Amy is looking for a reasonably-priced camera, whose color is preferably silver and less preferably black or gray, and whose reviews contain the keywords "High Quality." Amy is a money saver, so her primary concern is satisfying her Price preferences followed by her Color and Reviews preferences.
The data exploration model of FIG. 1 allows Amy to sort results in ascending price order. Amy then needs to scan through the results comparing colors and inspecting reviews to find the camera that she wants. The path followed by Amy to explore search results is mainly dictated by her price preference, while other preferences are incorporated in the exploration task through Amy's effort, which can limit the possibility of finding items that closely match her requirements. Conventional approaches to specifying user preferences suffer from a number of other drawbacks in addition to not simultaneously supporting different types of preferences. For example, preference specifications may be inconsistent with one another. A typical example is having cycles in preferences among first-order preferences (preferences among attributes of items such as preferring one car to another car based on the price or on brand), which implies non-transitivity of preferences. For instance, a user may indicate that a Honda is preferred to a Toyota, Toyota is preferred to a Nissan and a Nissan is preferred to a Honda. Even when first- order preferences are consistent, second order preferences (preferences among the first order preferences such as brand preferences are more important than price preferences) can result in further problems. For example, prioritized composition of a set of partial orders does not generally maintain the transitivity property in the resulting order. Conventional systems for data exploration are unable to rank search results when preference specifications may be inconsistent.
BRIEF DESCRIPTION OF DRAWINGS
In the drawings,
FIG. 1 is a diagram of a "query, sort, then scan" data exploration model, in accordance with prior art; FIG. 2 is a diagram illustrating a relation, in accordance with some embodiments;
FIG. 3 is a flowchart of an illustrative preference modeling process, in accordance with some embodiments;
FIG. 4 is a diagram illustrating scopes obtained from a relation, in accordance with some embodiments;
FIG. 5 is a diagram illustrating scope comparators, in accordance with some embodiments;
FIG. 6 is a diagram illustrating conjoint preferences, in accordance with some embodiments; FIG. 7 is a diagram of an illustrative mapping of a partial order to linear extensions, in accordance with some embodiments;
FIG. 8 is a diagram of an illustrative preference graph, in accordance with some embodiments;
FIG. 9 is a diagram of an illustrative computation of edge weights for different types of second-order preferences, in accordance with some embodiments;
FIG. 10 is a diagram of an illustrative page-rank based matrix for prioritized comparators, in accordance with some embodiments;
FIG. 11 is a diagram of an illustrative weighted preference graph and tournaments derived from it, in accordance with some embodiments; FIG. 12 is a flowchart for an illustrative process for interactively specifying preferences, in accordance with some embodiments; and
FIG. 13 is an illustrative computer system on which some embodiments of the present disclosure may be implemented.
DETAILED DESCRIPTION Inadequate incorporation of preferences in conventional data exploration systems is due at least partly to the inability of these systems to integrate different types of preferences. For instance, in the above-described example, preferences include an ordering on all prices (a "total order" preference), an ordering between some colors (a "partial order" preference), a Boolean predicate from the presence of the words "High Quality" in the reviews, and an indication that price is more important than the other preferences.
Another situation in which it may be useful to specify different types of preferences may be a situation in which a user may have precise preferences for information in one domain because the user may possess a large amount of knowledge about the domain. Such precise preferences may be specified, for example, in the form of one or more scoring functions. However, the same user may have imprecise preferences for information in another domain because the user may not posses a large amount of knowledge about the other domain. In this case, preferences may be specified, for example, in the form of one or more partial orders on attribute values. There are many instances in which the user may need to specify both types of preferences (i.e., using a scoring function and using a partial order) as shown in Example 2 below.
Example 2 Alice is searching for a car to buy. Alice has specific preferences regarding sport cars, and more relaxed preferences regarding SUVs. Alice supplies the data exploration system with a scoring function to rank sport cars, and a set of partial orders encoding SUVs preferences. Alice expects reported results to be ranked according to her preferences. A data exploration system capable of integrating different preference types and ranking search results in response to a user query, in accordance with user-specified preferences, may address some of the above-discussed drawbacks of conventional approaches. However, not every embodiment addresses every one of these drawbacks, and some embodiments may not address any of them. As such, it should be appreciated that embodiments of the invention are not limited to addressing all or any of the above-discussed drawbacks of these conventional approaches.
Accordingly, in some embodiments, a preference language is provided for specifying different types of user preferences among items. A data exploration system may assist a user to specify preferences using the preference language. The specified preferences may be used to construct a general preference model that, in turn, may be used to produce a ranking of items in accordance with any user preferences. Items may be any suitable items about which a user may express preferences. In some instances, an item may be any item that may be manufactured, sold and/or purchased. For example, an item may be a car or an airplane ticket— a user (e.g., a consumer) may have preferences for one car over another car and/or may prefer one airplane ticket to another airplane ticket. In some instances, an item may comprise information. Users may prefer one item over another item based at least in part on the information that these items contain. For example, when searching for content (e.g., movie, music, images, webpages, text, sound, etc.) a user may prefer some content to other content. For instance, a user may prefer to see a webpage that contains infonnation related to cars over a webpage that contains information related to bicycles. An item may comprise, or have associated with it, one or more attributes. An attribute of an item may be related to the item and may be a characteristic of the item. An attribute of an item may be a characteristic descriptive of the item. For instance, if an item is an item that may be purchased, an attribute of the item may be a price related to the item. An attribute of an item may be a characteristic that may identify the item. For example, a characteristic of an item may be an identifier (e.g., name, serial number, or model number) of the item.
Attributes may be numerical attributes and may be categorical attributes. Numerical attributes may comprise one or more values. For instance a numerical attribute may comprise a single number (e.g., 5) or a range of numbers (e.g., 1-1000). Categorical attributes may also comprise one or more values. For instance, a categorical value for the category "Color" may comprise a single color (e.g., Red) or a set of colors (e.g., {"Red", "Green"}). Though, it should be recognized that attribute values are not limited to being numbers and/or categories and may be any of numerous other types of values. For instance, values may comprise alphabetic and alphanumeric strings.
An item may be represented by one or more tuples comprising values for one or more attributes associated with the item. In some cases, a tuple representing an item may comprise a value for each attribute associated with the item. In other cases, a tuple representing an item may comprise a value for only a portion of the attributes associated with the item.
FIG. 2 shows an illustrative example of a set of items, each item being represented by a tuple comprising values for the attributes of the items. In the illustrative example of FIG. 2, each item is a car and is associated with six attributes: "ID," "Make," "Model," "Color," "Price," and "Deposit." Though in this example all items share the same attributes, this is not a limitation of the present invention as different items may have different attributes from one another and some attributes may have unknown values. Each item is represented by a tuple (i.e., a set) of attribute values. Accordingly, the first item has characteristics indicated by the first set of attribute values. For instance, the first item is represented by the tuple in the first row of the table shown in FIG. 2. As illustrated, this first item is an $1800 Red Honda Civic identified by identifier "ti". A deposit of $500 may be required to purchase this car. A user may express preferences for one item over another item in a set of items. User preferences may be of any suitable type and may be first-order user preferences, second-order user preferences, and even further-order preferences.
First-order preferences are preferences associated with attributes of items. First-order preferences may be based on values of attributes of items. For example, a first-order preference may express a preference for an item over another item based on values of one more attribute of the two items. For instance, a first-order preference may indicate an item with a lower price (value of the attribute "price") is preferred to an item with a higher price. As another example, a first-order preference may indicate that a red (value of the attribute "color") item (e.g., car) is preferred to a blue item. Second-order preferences are preferences across first-order preferences. Second-order preferences may indicate which first-order preferences are more important to a user. For example, first-order preference A may be based on values of one attribute (e.g., "price") while first-order preference B may be based on values of another attribute (e.g., "color"). A second- order preference may indicate that first-order preference B is preferred to first-order preference A (i.e., color may be more important than price).
There may be many different types of first-order and second-order preferences and these types of preferences, along with other aspects of first-order and second-order preferences are discussed in greater detail below in Sections II and III, respectively.
The data exploration system may be any system for exploring data, infonnation or knowledge. The data exploration system may allow one or more users to query the system. For instance, a data exploration system may be a search engine such as an Internet search engine or a domain-specific search engine (e.g., a search engine created to search a particular infonnation domain such as a company's or institution's intranet, or a specific subject-matter information repository). In another example, a data exploration system may be a database system that may allow user queries. A query input by a user into a data exploration system may be any of numerous types of queries. For instance, a query may comprise one or more keywords indicating what the user is seeking. In some cases, a query may comprise user preferences. Though, it should be appreciated that user preferences may be specified separately and/or independently from any user query. For instance, a user may specify preferences that may apply to multiple user queries. The specified preferences may comprise preferences of any suitable type such as first-order and/or second- order user preferences.
Regardless of the types of preferences that a user may wish to specify, a data exploration system may assist a user to specify preferences. A data exploration system may assist a user to specify preferences using the preference language, for example. Some example approaches to how a data exploration system may assist a user to specify preferences are described in greater detail in Sections I and VI, below.
After user-specified preferences are obtained (e.g., from a user-specified query or any other suitable source), a preference model may be constructed from these preferences. The preference model may be constructed from different types of preferences and may be constructed from first-order preferences of different types and/or from second-order preferences of different types.
A preference model may be represented by a data structure encoding the preference model. The data structure may comprise any data necessary for representing the preference model and, for example, may comprise any parameters associated with the preference model.
A data structure encoding a preference model may be stored on any tangible computer- readable storage medium. The computer-readable storage medium may be any suitable computer-readable storage medium and may be accessed by any physical computing device that may use the preference model encoded by the data structure. In some embodiments, the preference model may be a graph-based preference model and the data structure encoding the preference model may encode a graph, termed a preference graph, characterizing the graph-based preference model. The preference graph may comprise a set of nodes (vertices) and a set of edges connecting nodes in the set of nodes. The edges may be directed edges or may be undirected edges. Accordingly, the data structure encoding the preference graph may encode the preference graph by encoding the graph's vertices and edges. Any of numerous data structures for encoding graphs, as are known in the art, may be used to encode the preference graph, as the invention is not limited in this respect. In some embodiments, nodes of the graph may be associated with items. For instance, a node in the graph may be associated with a tuple that, in turn, represents an item. The graph may represent items that are related with one or more keywords in a query. For instance, a set of items may be selected in response to a user-provided query. A first-order preference for one item over another item may be represented as an edge in the graph, with the edge connecting nodes associated with the tuples associated with the two items. A weight may be associated to each node in the graph to provide an indication of a degree of preference for one of the nodes terminating the edge. The weight may be computed based on first-order and/or second preferences. Aspects of a graph-based preference model, including how such a preference model may be constructed from user-specified preferences, are described in greater detail in Section IV, below.
The preference model may be used to obtain a ranking of items in a set of items. For instance, a graph-based preference model may be used to construct such a ranking. A graph- based preference model may be used to construct such a ranking in any of numerous ways. For instance, a complete directed graph may be obtained from the graph-based preference model and a ranking of items may be obtained based on the completed directed graph. As another example, a Markov-chain based algorithm may be applied to the graph-based preference model to obtain a ranking of items. These and other approaches to obtaining a ranking of items in a set of items are described in greater detail in Section V, below. It should be appreciated that though a preference graph may be a convenient abstraction, which is helpful for reasoning about user preferences, in practice, a preference graph may be implemented on a physical system via a data structure that may encode the preference graph.
Similarly, many constructs described below (e.g., relations, scopes, scope comparators, and etc.) are convenient abstractions used in various fields such as computer science, but each construct may be realized, in practice, by a data structure representing data characterizing the construct and/or processor-executable instructions for carrying out functions associated with the construct.
Such data structures and processor-executable instructions may be encoded on any suitable tangible compute-readable storage medium.
Accordingly, for ease of reading, every reference to a construct (e.g., a graph, a relation, scope, scope comparator, etc.) is a reference to a data structure encoding the construct and/or processor-executable instructions that when executed by a processor perform functions associated with the construct, since explicitly referring to such data structures and processor- executable instructions for every reference to a construct is tedious. It should also be appreciated that the above-described embodiments can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software, or a combination thereof. When implemented in software, the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.
Software modules comprising program instructions may be provided to perform any of numerous of tasks in accordance with some embodiments. For example, one or multiple software modules for constructing a preference model may be provided. As another example, software modules for obtaining a ranking for a set of items based on (a data structure representing) the preference model may be provided. As another example, software modules comprising instructions for implementing any of numerous functions associated with a data exploration system may be provided. Though, it should be recognized that the above examples are not limiting and software modules may be provided to perform any functions in addition to or instead of the above examples.
I. Design Goals
In some embodiments, a data exploration system that utilizes user preference may reflect some or all of the following design goals:
• Guidance: The system may assist users to formulate their preferences. The system may support interactive preference management. For instance, the system may provide users with information to help users specify and/or modify preferences. As a specific example, the system may provide users with information about how to modify their preferences to widen or narrow the scope of their search. As another specific example, the system may provide users with information about how to modify their preferences such that the ranking of items presented to a user is modified. Though, these are only examples and the system may aid the user to formulate their preferences in any of numerous ways as described in greater detail below, in Section VI.
• Flexibility: Specification of different types of preferences may be supported for arbitrary subsets of items, sometimes referred to as "contexts." The system may accept natural descriptions of preferences and map these descriptions into preference constructs.
• Provenance: The system may be able to provide justification of how search results are generated and ranked by relating generated results to input preferences. FIG. 3 illustrates flowchart for an example process of modeling preferences that reflects the above-mentioned design goals. As illustrated in FIG. 3, the data exploration system may be a system that may receive a query from one or more users. For instance, the system may be a database system or a search engine and the query may comprise one or more keywords. Toward the guidance goal, the system may assist a user to specify preferences. In some embodiments, such support may be based on pre-computed summaries in the form of facets that may be used for guiding data exploration. Each facet may be associated with a number that may provide the user with an estimate on the expected number of results. Accordingly, facets may allow a user to get a quick and dirty view of the underlying set of items and/or domain, and how search results may be affected by tuning preferences.
For example, the system may comprise a memory configured to store a plurality of tuples (recall that each tuple comprises one or more values for one or more attributes) and may receive a range of desired values for an attribute from a user. In response the system may output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values. As a specific example, for a categorical attribute, a facet may comprise a possible attribute value (e.g., 'Color = Red'); while for a numerical attribute, a facet may comprise a range of possible values (e.g., 'Price in [$1000-$5000]'). Moreover, the user may be able to defme custom facets as Boolean conditions over multiple attributes (e.g., 'Color=Red AND price < $5000'). The system may associate a number to each of these facets, the number indicating an expected number of tuples consistent with these facets.
Toward the flexibility goal, the system may adopt the concept of contextualized preferences, where a user can assign different preference specifications to different subsets (contexts) of items. A user may define a context by using predetermined facets or by defining custom facets. As discussed below in Sections II and III, the user has the flexibility of expressing first-order and second-order preferences within and across contexts. Contextualized preferences may also part of a user's profile, which may be ascertained by any of the techniques disclosed herein as well as those disclosed in U.S. Non-Provisional Application Serial No. 12/555,293, filed September 08, 2009, and titled Synthesizing Messaging Using Context Provided By Consumers, which is hereby incorporated by reference. This way, they may be loaded, saved, and/or refined upon the user's request.
Toward the provenance goal, the data exploration system illustrated in FIG. 3 may maintain information regarding which preferences among the input preferences, affect the relative order of each pair of items in the final results ranking. This feature may be useful for the analysis and refinement of preferences in different scenarios. Examples include finding preference constructs that have dominating effect on results' ranking, decreasing/increasing the influence of some preference constructs, and understanding the effect of removing a certain preference construct. Additional ways in which a data exploration system may assist a user to input preferences are discussed below in Section VI.
II. First-Order Preferences
In some embodiments, the preference language may be based on capturing pairwise preferences on different granularity levels. An items' description may follow a relational model, where each item may be represented as a tuple. Preferences may be cast against a relation R with a known schema.
Our first construct is used to define a context for expressing first-order preferences.
Definition 1 [Scope] : A scope R, is an arbitrary non-empty subset of tuples in R.
A scope defines a Boolean membership property that restricts the space of all possible tuples to a subset of tuples that are interesting for building preference relations. Such a membership property may be defined using a SQL query posed against R. For example, FIG. 4 shows six different scopes Rj... R6 in the relation "Car" illustrated in FIG. 2, where scopes are defined using SQL queries. Though, it should be recognized that such a membership property may be defined using any of numerous other ways. As one example, a database query language other than SQL may be used to define such a membership property. As another example, the membership property may be defined using a set of variables and a database language may not be needed.
As shown in the illustrative diagram of FIG. 4, scopes may intersect. Thus, a tuple in the relation R may belong to zero, one or two or more scopes. Tuples that do not belong to any scopes may be non-interesting with respect to a preference specification. Thus, for clarity, all subsequent discussion is with respect to tuples that belong to at least one scope.
Definition 2 [Scope Comparator] : Let R, and Rj be two scopes in R. The scope comparator fij is a function that takes a pair of distinct tuples (one is from R, and the other is from Rj ), and returns a first value such as 1 (e.g., if the tuple from R, is preferred), a second value such as -I (e.g., the tuple from Rj is preferred), or a null value "-L " (e.g., if there is no preference).
A scope comparator is a preference language construct for defining first-order preferences. In some instances, the scope comparator may be user-defined. Though, in other instances, a scope comparator may be defined, automatically, by a computer. Still, in other embodiments a scope comparator may be defined by a combination of manual and automatic techniques.
A generic interface to a scope comparator may accept two tuples and return either a preference of one tuple over the other, or no preference can be made. Whenever a tuple tt is f . . †■
preferred to a tuple , , we say that /, dominates tj , denoted as 1 J.
FIG. 5 shows illustrates 5 different scope comparators defined on the scopes shown in FIG. 4. In FIG. 5, the scope comparators fiA and fu are unconditional (i.e., they produce first- order preferences without testing any conditions beyond the conditions captured by scope definition). On the other hand, the scope comparators fu , fs.e , fe.2 are conditional (i.e., they produce preference relations conditioned on some logic).
Algorithm 1 Score-based Preferences
SCORE-PREPS (fji tuple, t tuple, S: scoring function)
Figure imgf000014_0001
5 else return ±
Conditional scope comparators allow defining composite preferences that span multiple attributes given in scope definition and/or comparator logic (e.g., f62 defines a composite preference on Price and Make attributes).
The generality of scope definitions and preference comparators allow encoding different types of preferences, with different semantics. In the following we give templates for encoding different types of preferences using the above-described language constructs. Template 1 [Score-based Preferences]. Preferences are defined using a scoring function S, where tuples achieving better scores are preferred. Without loss of generality and without limitation, assume that higher scores are better, then score-based preferences can be specified using the template given by Algorithm 1.
A total order on a scope R, (which can be the whole relation R) may be encoded by defining a comparator fu, using the template in Algorithm 1, where fu operates on pairs of distinct tuples belonging to Rh
Template 2 [Partial Order Preferences]. For an attribute x, let Px be a partial order defined on the domain ofx. The partial order can be expressed as a set Px = {(vt > v)} for values v, and Vj in the domain of x, such that Px is:
• ir reflexive (i. e. , (vj > Px>.
• asymmetric (i.e., ( <?.,· > Vj )€ PT (t';- > r,; ) ^ Px).
• transitive (i.e., { (<·* > t¾ ). (¾ > ek) } C Px. =^ (·¾ > ·¾. ) 6 Px).
Partial order-based preferences may be encoded using the template given by Algorithm
2.
Template 3 [Skyline Preferences]. Given a set of attributes A, a tuple t, is preferred to tuple i, if there exists a non-empty subset ^ — -^. wliet V ' 6 A . tj .X js preferred to tj,x, while for any other attribute no preference can be made between tj .X1 Olid tj ..v Skyline preferences may be encoded as shown in the template given by Algorithm 3.
Algorithm 2 Partial Order Preferences
PARTIAL ORDER-PREPS (i,:tuple , tf tuple, I : partial order on attribute ;r )
1 if ((ί».3' I> tj.x)€ Pr)
2 then return 1
3 else if((fj.a- > ti. )€ P)
4 then return -1
5 else return J.
Algorithm 3 Skyline Preferences
SKYLINE-PRHFS (tf. tuple, fjrUi Je, . : subset of attributes)
1 pi <- 0
2 pj ^O
3 for all x e .4
4 do
5 if (tj.x is preferred to tj.x)
6 then -···■- p, + 1
o ί, .χ)
1
Figure imgf000016_0001
11 if0¾ >0)
12 then return 1
1 else if (¾ > 0)
14 then return -1
Template 4 [Conjoint Analysis Preferences]. Given a set of attributes A, conjoint analysis encodes preferences among attribute values in A when taken conjointly. This can be expressed as a function CA that maps each combination of values in A to a unique rank. The function CA is partial on the domains of all possible combinations of values in A. Hence, there can be combinations of values in A that are not mapped to rank under CA. Conjoint analysis preferences based on CA may be expressed using the template given by Algorithm 4.
The next example is an example for specifying and managing conjoint analysis preferences. Example 3:
Alice's preferences regarding cars may be expressed conjointly over the attribute pairs (Make, Color), and (Make, Price), as shown in FIG. 6. The value in each cell is the rank assigned to each combination of attribute values. Conjoint analysis may be based on an additive utility model in which ranks, assigned to combinations of attribute values, may be used to derive a utility (part-worth) of each attribute value. The objective is that the utility summation of attribute values reconstructs the given ranking. In FIG. 6, for example, 'Honda' is assigned utility value 40, while 'Red' is assigned utility value 50. Hence, the score of 'Honda, Red' is 90, which matches the assigned rank 1 in the given Make-Color preferences. Utility values may be computed using regression. For instance, they may be computed using linear regression. Note the mapping between combinations of attribute values and ranks is modeled.
III. Second-Order Preferences
Our main language construct for defining second-order preferences is a preferences order (POrder), defined as follows:
Definition 3 [POrder] : given a set of scope comparators F, a POrder is a permutation of comparators in F.
A POrder represents an ordering of scope comparators based on their relative importance. A POrder may quantify the strength of different first-order preferences based on the semantics of second-order preferences, as discussed in greater detail below in Section IV.
Definition 4 [POrder Projection!: Let A be a POrder defined on the set of comparators F. For £ F we denote with , H p' j4 ) a total order of comparators in ^ ordered according to A. It follows that = Α· Algorithm 4 Conjoint Analysis Preferences
CONJOINT ANALYSIS-PREPS it,: tuple, t tuple, ,4: subset of attributes, CA conjoint analysis map)
1 if (CA({ti.x : T e A}) is undefined
OR CA{{tj r : x€ A}) is undefined)
2 th n return 1
3 else if : x 6 A}) < : a- e A})
4 then return 1
5 else return -1
For example, for the POrder Ά— and the subset of comparators r —
{ /l ' /3 j , we have nF'4 = (./ s).
Given a POrder projection A' , we say that (^* ^ ) 11 n del A jf for a SCOpe comparator Jn t Λ , we have and there is no other scope comparator jh € where /b ^ aaccording to anc' fb i - tj ) = ~ 1.
Different types second-order preferences may be encoded using POrders.
• Prioritized Preference Composition. In this case, second-order preferences are defined as a total order of comparators ^ = {. ΐ ^ /2 ^ * · ' ί>_ .fm); which expresses the requirement that the first-order preferences corresponding to fi are more important than the first-order preferences corresponding tofi+i- Prioritized composition of preferences is formulated as a single POrder with the same comparators order given by O.
• Partially Ordered Preferences. A partial order PO on the set of scope comparators may encode partial information on the relative importance of different scope comparators. Let i2 be a set of comparator orderings consistent with PO, where an ordering UJ is consistent with PO if the relative order of any two scope comparators in J does not contradict with PO. The set Ω is called the set of linear extensions of PO. For example, FIG. 7 shows a partial order defined on four comparators and the corresponding set of linear extensions. The set of linear extensions may be obtained using a simple recursive algorithm on the PO graph. Partially-ordered preferences may be formulated as the set of POrders given by • Pairwise Preferences: A set PW = { <; /,· >- /, ) }of pairwise second-order preferences on scope comparators. The pairwise second-order preference (f* ^ expresses the requirement that the first-order preferences corresponding to f are more important than the first-order preferences corresponding to f. Pairwise second-order preferences PW may be formulated as the set of POrders { (fi ^ fi ) · ' (fi ^ fi ) £ P1\' }.
• Pareto Preference Composition: The importance of all scope comparators is equal. The first-order preference ^ ) is produced if and only if at least one scope comparator states that ~ 0) , and no other scope comparator states that (' ? Pareto preference composition is formulated as a set of singleton POrders, where each POrder is composed of a single comparator.
• Preferences Aggregation: The scope comparators act as voters on preference relations.
The first-order preference ^ ' is produced if and only if at least one scope comparator states that (^»- ) . Preferences aggregation may be formulated as a set of singleton POrders, where each POrder may be composed of a single comparator. IV. Compilation
Given a set of scopes and scope comparators, a graph-based representation of the preferences, termed a preference graph, may be obtained. In this Section, an algorithm for "compiling" the given set of scope and scope comparators (first-order preferences) is described. A preference graph may be formally defined as follows: Definition 5 [Preference Graph] : A directed graph (V,E), where V is the set of tuples in
R and an edge e,, £ E connects tuple tj to tuple tj if there exists at least one comparator applicable to land returning 1, or applicable to returning -1. The label of edge eij, denoted l(eLj) is the set of comparators inducing preference of tt over tj .
The compilation algorithm is described in Algorithm 5. The algorithm constructs the set of vertices also termed nodes of the preference graph using the union of tuples involved in all input scopes. In other words, each node in the preference graph is associated with a tuple. Accordingly, each node in the preference graph may represent an item. For each pair of distinct tuples, the set of applicable scope comparators may be found and used to compute graph edges and their labels. Accordingly, an edge in the preference graph may correspond to a first-order preference, which may indicate a user preference for one of the two items represented by the nodes terminating the edge. Edges of the preference graph may be directed edges and may be directed to the node associated with a preferred data item as indicated by the first-order preference associated with the edge. Though, in some embodiments, edges may be undirected and an indication of which of nodes terminating the edge is preferred may be provided differently. For instance, such an indication may be provided by using a signed weight, with a negative weight indicating a preference for one node and a positive weight indicating a preference for the other node.
FIG. 8 illustrates example for the output of the compilation algorithm. In particular, FIG. 8 shows the preference graph obtained from the set of scope comparators
\ ll,2 - /.3,4 , /s.6 - ./e.2 , ft, 5 } described with reference to FIG. 4. Each edge is labeled with a set of supporting comparators. For example, for the edge e26, we have ^(( '2β) ~ {/ΐ ,2 · β,2 }; since the tuple t2 is preferred over the tuple t6 according to the scope comparators fI 2 and f62.
Since scopes may intersect and arbitrary scope comparator logic may be allowed, the
- tG induced preference graph may be a cyclic graph. For example, in FIG. a cycle exists since tt is preferred over t6 according to f62, while t6 is preferred over t\ according to fu. Construction of a preference graph according to Algorithm 5 does not guarantee transitivity of graph edges. For example, in FIG. 8, the existence of the edges e2 6 and e6 does not imply the existence of the edge e2 ,/.
Algorithm 5 Preferenc s Compilation
COMPILK-PRi- FS (S; a set of scopes, F: a set of comparators)
1 V— (J {t : t 6 Sj } {find the union of all scopes}
2 " *— { } { initialize set of graph edges as empty }
3 for all (f; . f j } £ (V x V)\ ≠tj
4 do
5 for all /€ F
6 do
7 if (/ is applicable to ((,· , ))
8 then
Figure imgf000021_0001
1 1 then
12 Cl J - 1
1 3 append / to /(cij )
14 if (CiJ E)
15 then add fj.j to £
16 else if (p - 1 )
1 7 then
1 8 j *■— 1
1 9 append / to /(cj- j )
20 if (ejlt ^ £)
21 then add ej to
return ( { V. E ) {return Preferences Graph}
The computational complexity of constructing and processing a preference graph is quadratic in the number of tuples. There is a tradeoff between a preference graph's expressiveness and the scalability of its implementation. Though in some embodiments, preferences may be highly "selective" and, consequently, the preference graph may be sparse.
Scalability issues due to the size of the preference graph may be addressed in any of numerous ways. One approach is to use distributed processing in a cloud environment, where storing and managing the preference graph is distributed over multiple nodes in the cloud. For example, a ranking algorithm described below in Section V.A may be easily adapted to function in a cloud environment. Other approaches include sacrificing the precision of preference query results by conducting approximate processing, or thresholding managed preferences to prune weak preferences early, to reduce the size of the preference graph.
A preference graph allows heterogeneous user preferences to be encoded using a unified graphical representation. Though, in some embodiments, computing a ranking of query results using such representation may require additional quantification of preference strength. Preference strength may be quantified based on the semantics of first-order and second-order preferences, while preserving the preference infomiation encoded by the preference graph. Preference strength may be represented by weights on edges of the preference graph. Given a preference graph G(V,E), the set of graph edges E may represent pairwise first- order preferences. Specifically, an edge ei:j may express the preference for tuple ¾ over tuple tj according to one or more scope comparator(s). In some instances, a weight w, may be associated with an edge e; . The weight w;j may be a weight indicative of a degree of preference for the first node over the second node. Stronger preferences may be indicated by higher weights. In some instances, the weight may be a weight between 0 and 1 , inclusive and the sum of the weights w;j and Wj i may equal 1. Disconnected vertices in the preference graph indicate that their corresponding tuples are indifferent with respect to each other.
In some embodiments, computing the weight may comprise dividing the number of first- order preferences for item A relative to item B by the number of all first-order preferences indicating any preference (either for or not for) item A.
For instance, let F be the set of all scope comparators associated with the preference graph. Let A be the set of POrders of F according to the chosen semantics of second-order preferences. Let ^¾ ~~ ''· ' '·./ ) J .'·' '· That is, F^ is the set of scope comparators that state a preference relationship between tuples tt and tj . Let Ay be the multiset of nonempty projections
4 + c 4 ·
of POrders in A based on FLj . Let " * J — ~ be the set of POrder projections under which
L i ' ' and similarly let *> — ' be the set of POrder projections under which
¾>'
Figure imgf000022_0001
U4i.j , and that A*j is empty. The weight w may be computed as follows: i\A (1 )
That is, Wjj corresponds the proportion of POrder projections, under which * ^~ J , among the set of POrder projections computed based on comparators relevant to the edge (¾, tj).
The weight wj may be similarly defined using the set ^* . It follows that " '·./ w3 ~~ ^ . For the case of Pareto composition, at most one of the two edges and ejA can exist in the preference graph, since otherwise i, and tj would be incomparable. Hence, under Pareto composition, we remove any graph edge e, whenever an edge e, , exists.
We next give an example illustrating how to compute preference weights under different semantics of second-order preferences. Example 4
FIG. 9 shows three weighted preference graphs, corresponding to the preference graph in FIG. 8, produced under different semantics of second-order preferences. The different semantics of second-order preferences result in different edge weights and/or the removal of some edges in the original preference graph: · Under prioritized comparators, eii6 is removed since, based on the shown comparator priorities, it may be determined that (t6 >- ti).
• Under partially-ordered comparators, we have that w =wi2--5, since for the relevant (t213) set of comparators is {f5>6, fli5} and the given partial order induces four POrder projections i <Λ,»· δ,β). ( ι,5· /s.e) , </δ,6, h >)- (/δ,β- /l,5> }, where (t2 - 13) under the two POrder projections ( s.G; ίΐ,5 } · {/s,6 /l,s). while
(t3 >- t2) under the other two POrder projections ί^1,5 · {/ΐ,δ - Ζδ,β)
• Under pairwise preferences, w5 6 - 0:33 since (h te) based on (fe.2) , which is one out of three POrder projections { </6,6> . </β,2 > , </5,β> }.
V. Ranking The graph-based preference model described in Section IV may be used to obtain a ranking (a total order) of items in a set of items. This may be done in any of numerous ways. One approach described in Section V.A obtains a ranking based on authority-based ranking algorithms. Another approach described in Section V.B is a probabilistic algorithm based on inducing a set of complete directed graphs called tournaments from the graph-based preference model and computing a ranking for at least one tournament from the set.
A. Importance Flow Ranking
A total order of items (or, equivalently, tuples representing these items) may be obtained by estimating an importance measure for each tuple using the preference weights encoded by the weighted preference graph. Techniques related to the PageRank importance flow model may be used to compute such importance measures. Under the PageRank model, scores may be assigned to Web pages based on the frequency with which they are visited by a random surfer. Pages are then ranked according to these scores. Intuitively, pages pointed to by many important pages are also important.
The PageRank importance flow model lends itself naturally to problems that require computing a ranking based on binary relationships among items. In the context of preferences, the model may be applied based on the notion that an item may be important if is preferred over many other important items.
Let G = (V, E) be a dominance graph (i.e., a directed graph in which an edge e,y means Vj >~ I an(j je an(j jj^j ke ^ set Qf no(jes dominated by and dominating v, respectively. Let
Figure imgf000024_0001
number called a damping factor. The PageRank algorithm, as known in the art, computes the PageRank score of node v„ denoted 1i, according to:
Figure imgf000024_0002
The PageRank score of a node v is determined by summing PageRank scores of all nodes v' dominated by v, normalized by the number of nodes dominating v' . It is well known that when ∑ I) G ^ = ^ Equation 2 corresponds to a stationary distribution of a Markov chain, and that a unique stationary distribution exists if the chain is irreducible (i.e., the dominance graph is strongly connected), and aperiodic. Nodes that have no incoming edges (i.e., nodes that are not dominated by any other nodes) lead to sinks in the Markov chain, which makes the chain irreducible. This problem may be handled by adding self loops at sink nodes, or (unifonn) transitions from sink states to all other states in the Markov chain. The damping factor a captures the requirement that each node is reachable from every other node. The value of a is the probability that we stop following the graph edges, and start the Markov chain from a new random node. This may help to avoid getting trapped in cycles between nodes that have no edges to the rest of the graph.
Accordingly, in some embodiments a pagerank-based algorithm may be used to calculate a total order of items from the weighted preference graph. Herein, a pagerank-based algorithm refers to any algorithm based on calculating a value from a graph based on characteristics of a
Markov chain defined with respect to the graph. Note that a difference between the above- described weighted preference graph and the graphs that the PageRank algorithm to which is conventionally applied is that the weighted preference graph has preference weights associated to edges. The preference weights bias the probability of transition (flow) from one state to another, according to weight value, in contrast to the conventional case in which transitions are uniformly defined.
A pagerank-based algorithm may proceed as follows. Given a starting tuple t0 (node) in the weighted preference graph, assume a random surfer that jumps to a next tuple tu among the set of tuples dominating t0, biased by the edge weights. Intuitively, this corresponds to a process where a tuple is constantly replaced by a more desired tuple (with respect to given preferences). Note that visiting tuples takes place in the opposite direction of edges (jumps are from a dominated tuple to a dominating tuple). Hence, it follows that tuples that are visited more frequently, according to this process, are more likely to be desirable than tuples that are visited less frequently. Ranking tuples based on their visit frequency (pagerank-based scores) defines an ordering that corresponds to their global desirability. The weighted preference graph may be represented using a square matrix M, where each tuple may corresponds to one row and one column in M. Let Ej be the set of incoming edges to tuple tj in the weighted preference graph. The entry M [i, j] may be computed as follows:
Figure imgf000025_0001
Hence, the sum of all entries in each column in Mis 1.0 unless the tuple corresponding to that column has no incoming edges. Matrices in which all the entries are nonnegative and the sum of the entries in every column is 1.0 are called column stochastic matrices. A stochastic matrix defines a Markov chain whose stationary distribution is the set of importance measures we need for ranking. In order to maintain the irreducibility of the chain, we need to eliminate sinks (nodes with no incoming edges in the preference graph). We handle the problem of sinks by adding a self loop, with weight 1.0, at each sink node.
Let Γ be the pagerank scores vector. Then, based on the previous matrix representation, the pagerank scores are given by solving the equation Γ— M Γ which is the same as finding the eigenvector of M corresponding to eigenvalue 1. The solution that has been used practice for computing pagerank scores is using the iterative power method, where Γ computed by first choosing an initial vectorF , and then producing a next vector The process is repeated to generate a vector F^, at iteration T, using the vector Γ , generated at iteration^1 1. For convergence, at each iteration T, entries in T^are normalized so that they sum to 1.0. In practice, the number of iterations needed for the power method to converge may be any suitable of iterations. For instance, tens or hundreds of iterations may be used.
FIG. 10 illustrates the pagerank matrix for the weighted preference graph with prioritized comparators illustrated in FIG. 9. Note that t4 is a sink node with no incoming edges (i.e., t4 has no other dominating tuples). Hence, we add a self loop with weight 1.0 to t4, represented by the matrix entry M[4, 4]. A typical value of the damping factor a may be a value such as 0.15, but may be any value between 0 and 0.5.
B. Probabilistic Ranking
A total order of items (or top-ranked items) may be obtained from a complete directed graph derived from the preference model. Computing a total order of items from a complete directed graph (also known as a tournament) is termed finding a tournament solution. This problem may be stated as follows. Given an irreflexive, asymmetric, and complete binary relation over a set, find the set of maximal elements of this set. Example methods for finding tournament solutions are computing Kendall scores, and finding a Condorcet winner.
It should be appreciated, however, that the preference graph described in Section IV is not necessarily a tournament. In particular, the preference graph may be symmetric and incomplete:
• Symmetry: both edges eLj and e, , may exist in the preference graph,
• Incompleteness: both edges and eJA may be missing from the preference graph.
The symmetry problem implies that some pairwise preferences may go either way with possibly different weights, while incompleteness implies that some pairwise preferences may be unknown.
In some embodiments, a probabilistic approach to obtaining a ranking from the preference graph may be used. Such an approach may rely on deriving one or more tournaments from the preference graph. Each tournament may be associated with a probability. As such, a weighted preference graph may be viewed as a compact representation of a space of possible tournaments, wherein each tournament is obtained by repairing the preference graph to obtain an asymmetric and complete digraph. In order to construct a tournament, two repair operations may be applied to the preference graph:
• Remove an edge. Applying this operation eliminates a 2-length cycle by removing one of the involved edges.
· Add an edge. Applying this operation augments the graph by adding a missing edge.
As discussed earlier, the value of the weight wit represents the probability of selecting a
POrder, among the set of all POrders relevant to (t tj), under which >~ ^i >. We thus interpret as the probability with which tuple t, is preferred to tuple tj. We further assume the independence of values of different tuple pairs. For each tuple pair (th tj), if both w j > 0 and Wj.j > 0 (i.e., ti and t} are involved in a 2-length cycle), the operation remove edge removes the edge ejti with probability w , and removes the edge j otherwise. Alternatively, if ;v, = 0 and w = 0 (i.e., /, and /, are disconnected vertices), the operation add edge adds one of the edges ei or ejA with the same probability 0.5.
Based on the probabilistic process described above, repairing the weighted Preference graph generates a tournament (irreflexive, asymmetric, and complete digraph) whose probability is given by the product of the probabilities of all remaining graph edges. Let c be the number of 2-length cycles in the Preference graph, and d be the number of disconnected tuple pairs. Then, the number of possible tournaments is 2 +d.
FIG. 1 1 illustrates a weighted preference graph, and the corresponding set of possible tournaments Π ι ¾ }. The illustrated preference graph has two 2-length cycles
(†\—-2 and t —s) and one pair of disconnected tuples (t2, t4), and hence the number of possible tournaments is 8. The probability of each tournament is given by the product of the probabilities associated with its edges. For example, the probability of T, is 0.09, which is the product of 0.3, 0.6, and 0.5 representing w2J, w¾.?, and w4 , respectively. Given a tournament T and a total order of tuples O, we say that O violates T, with respect to the relative order of (/,, tj), if f ' ^ under O, while lJ li under T. The problem of computing a total order of tuples with a minimum number of violations to tournament is known to be NP-hard. Multiple heuristics have been proposed to compute a total order from a tournament. We focus on using Kendall score for computing a total order. The Kendall score of tuple t is the number of tuples dominated by t according to the tournament. The space of possible tournaments allows computing a total order of tuples under any of numerous probabilistic ranking measures. Two specific measures are described below.
• Most probable tournament ranking. Compute a total order of tuples based on the tournament with the highest probability.
· Expected ranking. Compute a total order of tuples based on the expected ranking in the space of all the possible tournaments.
Finding the most probable tournament is done by maintaining the edge with the higher weight for each 2-length cycle in the preference graph, and adding an arbitrary edge for each pair of disconnected tuples. According to this method, there may be multiple tournaments with the highest probability among all possible tournaments. The computed total order under any of these tournaments is the required ranking. In the illustrative example of FIG. 11 , tournaments T2 and T6 are the most probable tournaments, each with probability 0.21. A total order of tuples in T2 using
Kendall scores is ('-Ί - * · ?2 · H) while a total order of tuples in T6 is^1 ' ^2 ' t4 Let n be the number of tuples in the preference graph, the complexity of the algorithm is 0(n ), since we need to visit all edges of the preference graph.
Finding the expected ranking may be done by computing the expected Kendall score for each tuple using the space of possible tournaments. We model the score of tuple /,■ as a random variable st whose distribution is given by the space of possible tournaments. In the illustrative example of FIG. 11, // dominates one tuple in ι
Figure imgf000028_0001
wjm probability summation , r 1 2 · 1 4 · J 6 · J 8 { with probability summation 0.7. Hence, the random variable st may take the value 1 with probability 0.3, and takes the value 2 with probability 0.7. The expected value of s, is thus 1 *0.3+2*0.7=1.7.
Computing the exact expected score of each tuple requires materializing the space of possible tournaments, which is infeasible due to the exponential number of possible tournaments. We thus propose a sampling-based algorithm to approximate the expected value of s, of each tuple /,, and then rank tuples based on their estimated expected scores. Let L(t,) be the set of tuples dominated by /, in the weighted preference graph. For a tuple /,·, a sample Z is generated by adding each L(ti ) mpje {Q with probability All samples may be generated independently. Hence, a score sample from s, distribution is given by 1^1. The expected value of s, is estimated as the mean of the generated score samples. It is well known that sample mean, computed from a sufficiently large set of independent samples, is an unbiased estimate of the true distribution mean. Let n be the number of tuples in the preference graph, and m be the number of drawn samples for each tuple, the complexity of the algorithm is 0((nm)2), since we access the dominated set of each tuple m times to generate m score samples. VI. Interactive Preference Specification
A data exploration system may help a user to specify preferences. In some embodiments, preferences may be specified interactively. A system may interact with a user through a series of prompts, displays, and/or indications of the type of input a user may provide the system. The system may provide the user with information that may assist the user in specifying preferences. A data exploration system may assist a user to query the system. To this end, the data exploration system may assist the user to specify preferences and may output query results, to the user, ranked in accordance with the specified preferences.
FIG. 12 shows a flowchart of an illustrative process 1200 for assisting a user to query a data exploration system. Process 1200 may be used to assist a user specify user preferences in conjunction with a query, and may assist a user specify preferences associated with attributes related to one or more keywords in a query.
Process 1200 begins in act 1202 when a user query may be inputted. The inputted query may be any suitable query and may be a text query. The inputted query may be a multimedia query, for example, received through an audio input device that may be translated into text using any appropriate speech-recognition/speech-to-text software. The inputted query may comprise one or more keywords. The query may be, for example, a query for an item to purchase and/or may be a query for an item comprising information desired by a user. For instance, the query may be a query containing the keyword "car" and may indicate that a user may be interested in looking at items related to cars. As another example, the user may input a query "television" into an Internet search engine, which may indicate that a user may be interested in looking at any webpages containing information about television. Though a query may be any suitable query, as known in the art.
In response to receiving a user query, one or more attributes related to the query may be identified, in act 1204 of process 1200. Attributes may be related to one or more keywords contained in the query. For instance, attributes may be a characteristic of a keyword in the query. Attributes may be of any suitable type. For instance, attributes may be categorical attributes or numerical attributes. For instance, if a query for a "car" were inputted in act 1202, then attributes related to car may be the attributes "Make," "Color," "Price," and any other attributes of car such as the attributes illustrated in FIG. 2. Attributes related to one or more keywords contained in a query may be identified in any suitable way as known in the art. They may be identified automatically by a computer or may be manually specified.
Regardless of the way in which attributes are identified, in act 1204, a user may be presented with these attributes, in act 1206. The user may be shown these attributes visually using a display screen that contains these attributes. The display screen may be any suitable screen containing a representation of the attributes, such as a text representation of the attributes. The user may be prompted to select one or more of the presented attributes such that the system may assist the user to specify preferences associated with the selected attributes. For instance, a user may be presented with a list of previously-mentioned attributes associated with the keyword "car" and may select the attributes "Price" and "Color." In act 1208, attributes selected by the user may be received. In response to receiving the selected attributes, the user may be prompted to specify first- order preferences associated with one or more selected attributes, in act 1210. For each attribute, the user may specify a first-order preference of any suitable type. For instance, the user may specify score-based preferences, partial order preferences, skyline preferences, and/or conjoint analysis preferences as discussed with reference to Section II. The user may be assisted in specifying any of the above-mentioned first-order preferences in any of numerous ways. In some embodiments, a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the first-order preferences (e.g., by drawing preferences). In some embodiments, the user may be provided with a series of prompts designed to obtain information required to specify first-order preferences. In response to receiving first-order preferences, the user may be prompted to specify a second-order preference among the received first-order preferences, in act 1212. The user may specify a second-order preference of any suitable type. For instance, the user may specify prioritized preference composition preferences, partial order preferences, pairwise preferences, and/or Pareto preference composition preferences as discussed with reference to Section III. Similar to the case of first-order preferences, a user may be assisted in specifying any of the above-mentioned second-order preferences in any of numerous ways. In some embodiments, a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the second-order preferences. In some embodiments, the user may be provided with a series of prompts designed to obtain information required to specify second- order preferences. After first-order and second-order preferences have been specified, process 1200 completes.
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.
It should be appreciated that a computer may be embodied in any of numerous forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embodied in a device not generally regarded as a computer, but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, a tablet, a reader, or any other suitable portable or fixed electronic device.
Also, a computer may have one or more input and output devices. These devices may be used, among other things, to present a user interface. Examples of output devices that may be used to provide a user interface include printers or display screens for visual presentation of output, and speakers or other sound generating devices for audible presentation of output. Examples of input devices that may be used for a user interface include keyboards, microphones, and pointing devices, such as mice, touch pads, and digitizing tablets.
Such computers may be interconnected by one or more networks in any suitable form, including networks such as a local area network (LAN) or a wide area network (WAN), such as an enterprise network, an intelligent network (IN) or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, and/or fiber optic networks.
A computer system that may be used in connection with any of the embodiments described herein is shown in FIG. 13. FIG. 13 shows, schematically, an illustrative computer 1300 on which various inventive aspects of the present disclosure may be implemented. The computer 1300 includes a processor or processing unit 1301 and a memory 1302 that may include volatile and/or non- volatile memory. The computer 1300 may also include storage 1305 (e.g., one or more disk drives) in addition to the system memory 1302. The memory 1302 and/or storage 1305 may store one or more computer-executable instructions to program the processing unit 1301 to perform any of the functions described herein. The storage 1305 may optionally also store one or more data sets as needed. References herein to a computer can include any device having a programmed processor, including a rack-mounted computer, a desktop computer, a laptop computer, a tablet computer or any of numerous devices that may not generally be regarded as a computer, which include a programmed processor.
The exemplary computer 1300 may have one or more input devices and/or output devices, such as devices 1306 and 1307 illustrated in FIG. 13. These devices may be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
As shown in FIG. 13, the computer 1300 may also comprise one or more network interfaces (e.g., the network interface 1310) to enable communication via various networks (e.g., the network 1320). Examples of networks include a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
Thus, in an embodiment, there is provided a method for querying a data exploration system managing a plurality of items, the method comprising: querying the data exploration system with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items, and a second-order user preference indicative of the user's preferences among first-order user preferences in the plurality of first-order user preferences; calculating, with a processor, a ranking of an item in the plurality of items based at least in part on a data structure encoding a preference graph that represents the plurality of first- order user preferences and the second-order user preference; and outputting at least a subset of the plurality of items to the user, in accordance with the ranking. In an embodiment, calculating the ranking comprises: applying a pagerank-based algorithm to the data structure encoding the preference graph to calculate the ranking.
In another embodiment, the preference graph comprises a plurality of nodes, wherein each node represents an item, and calculating the ranking comprises: calculating a pagerank score of a node in the plurality of nodes.
In another embodiment, calculating the ranking comprises: computing a total order of nodes in a complete directed graph derived from the preference graph, wherein each node represents an item.
In another embodiment, computing the total order comprises calculating a Kendall score for a node in the complete directed graph.
In another embodiment, the preference graph comprises: a plurality of nodes, wherein each node corresponds to an item in the plurality of items; and a plurality of edges, wherein each edge corresponds to a first-order preference in the plurality of first-order preferences, the first- order preference indicating a user preference for one of the two items represented by nodes terminating the edge.
In another embodiment, each edge is a directed edge, directed to a node associated with a preferred item as indicated by the corresponding first-order preference.
In another embodiment, a weight is associated to an edge between a first node and a second node in the preference graph, the weight being indicative of a degree of preference for the first node over the second node.
In another embodiment, each item in the plurality of items is represented as a tuple, the tuple comprising a plurality of attributes of the item.
In another aspect, there is provided a computer-readable storage medium article storing a data structure encoding a preference graph and a plurality of processor-executable instructions that when executed by a processor, cause the processor to perform the acts of: receiving a plurality of first-order user preferences indicative of user preferences among a plurality of items; receiving a second-order user preference indicative of user preferences among the first-order preferences in the plurality of first-order user preferences; computing a weight for an edge of the preference graph based on the plurality of first-order user preferences and the second-order user preference, wherein: the edge connects a first node associated with a first item and a second node associated with a second item, and the weight is indicative of a degree of preference for the first item over the second item; and outputting at least two of the plurality of items according to the preference graph.
In an embodiment, the preference graph comprises a node for each item in the plurality of items and an edge for every pair of nodes associated with items related by a first-order preference in the plurality of first-order preferences.
In another embodiment, the computing the weight comprises: computing a first number of first-order user preferences in the plurality of first-order user preferences indicating a user's preference for the first item relative to the second item; computing a second number of all first- order user preferences in the plurality of first-order user preferences indicating any preference associated with the first item; and setting the weight based on the first number divided by the second number.
In another embodiment, receiving the plurality of first-order user preferences comprises receiving a first-order preference from a user. In another embodiment, each item in the plurality of data items is represented as a tuple, the tuple comprising values of a plurality of attributes; and each first-order user preference in the plurality of first-order user preferences indicates a user preference of one item over another item based at least in part on a value of an attribute of a first tuple, representing the one item, and a value of an attribute of a second tuple representing the other item. In another embodiment, the plurality of first-order user preferences comprises at least two types of first-order preferences selected from the group comprising score-based preferences, partial order preferences, skyline preferences, and conjoint analysis preferences.
In another embodiment, the second-order user preference comprises a plurality of second-order user preference relations that comprises at least two types of second-order preferences selected from the group comprising prioritized preference composition, partial order preferences, pairwise preferences, and Pareto preference composition.
In another aspect, there is provided a database system comprising: a memory configured to store a plurality of tuples, a data structure encoding a preference graph to represent user preferences, wherein the user preferences comprise a plurality of first-order preferences representing user preferences among tuples and a second-order user preference representing user preferences among first-order preferences in the plurality of first-order preferences; and a processor configured to access contents of the memory and compute a ranking of a tuple in the plurality of tuples based on the data structure encoding the preference graph.
In another aspect, there is provided a system for interactive preference management, the system comprising: a memory configured to store a plurality of tuples, each tuple comprising a value for at least one of a plurality of attributes; at least one processor configured to receive a range of values for an attribute in the plurality of attributes from a user, output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.
In another aspect, there is provided a computer-implemented method for interactive preference management, the method comprising: receiving, with a processor, a query from a user, the query comprising a keyword; prompting the user to provide a plurality of first-order preferences associated with one or more attributes related to the keyword; and in response to receiving the plurality of first-order preferences, prompting the user to provide a second-order preference among the first-order preferences in the plurality of first-order preferences. In an embodiment, prompting the user to provide a plurality of first-order preferences comprises: presenting a list of attributes related to the keyword to the user; receiving a selection of attributes in the list of attributes from the user; and prompting the user to specify a first-order preference associated with the selected attribute.
The various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of numerous suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a virtual machine or a suitable framework. In this respect, various inventive concepts may be embodied as at least one non- transitory computer-readable storage medium (e.g. a computer memory, one or more floppy discs, computer discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, etc.) article(s) encoded with one or more programs that, when executed on one or more computers or other processors, implement the various process embodiments of the present invention. The non-transitory computer-readable medium or media may be transportable, such that the programs stored thereon may be loaded onto any suitable computer resource to implement various aspects of embodiments as discussed above.
The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed performs methods of one or more embodiments need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of one or more embodiments. Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, items, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments. Also, data structures may be stored in non-transitory computer-readable storage media articles in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish relationships among information in fields of the data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.
Also, various inventive concepts may be embodied as one or more methods, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be construed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiment, or vice versa.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. The indefinite articles "a" and "an," as used herein, unless clearly indicated to the contrary, should be understood to mean "at least one."
As used herein, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B" (or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The phrase "and/or," as used herein, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with "and/or" should be construed in the same fashion, i.e., "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to "A and/or B", when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein, "or" should be understood to have the same meaning as "and/or" as defined above. For example, when separating items in a list, "or" or "and/or" shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items.
Having described several embodiments of the invention in detail, various modifications and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description is by way of example only, and is not intended as limiting. The invention is limited only as defined by the following claims and the equivalents thereto.

Claims

CLAIMS:
1. A method for querying a data exploration system managing a plurality of items, the method comprising: querying the data exploration system with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items, and a second-order user preference indicative of the user's preferences among first- order user preferences in the plurality of first-order user preferences; calculating, with a processor, a ranking of an item in the plurality of items based at least in part on a data structure encoding a preference graph that represents the plurality of first-order user preferences and the second-order user preference; and outputting at least a subset of the plurality of items to the user in accordance with the ranking.
2. The method of claim 1 , wherein calculating the ranking comprises: applying a pagerank-based algorithm to the data structure encoding the preference graph to calculate the ranking.
3. The method of claim 2, wherein the preference graph comprises a plurality of nodes, wherein each node represents an item, and calculating the ranking comprises: calculating a pagerank score of a node in the plurality of nodes.
4. The method of claim 1 , wherein calculating the ranking comprises: computing a total order of nodes in a complete directed graph derived from the preference graph, wherein each node represents an item.
5. The method of claim 4, wherein computing the total order comprises calculating a Kendall score for a node in the complete directed graph.
6. The method of claim 1 , wherein the preference graph comprises: a plurality of nodes, wherein each node corresponds to an item in the plurality of items; and a plurality of edges, wherein each edge corresponds to a first-order preference in the plurality of first-order preferences, the first-order preference indicating a user preference for one of the two items represented by nodes terminating the edge.
7. The method of claim 6, wherein each edge is a directed edge, directed to a node associated with a preferred item as indicated by the corresponding first-order preference.
8. The method of claim 6, wherein a weight is associated to an edge between a first node and a second node in the preference graph, the weight being indicative of a degree of preference for the first node over the second node.
9. The method of claim 1, wherein each item in the plurality of items is represented as a tuple, the tuple comprising a plurality of attributes of the item.
10. A computer-readable storage medium article storing a data structure encoding a preference graph and a plurality of processor-executable instructions that when executed by a processor, cause the processor to perform the acts of: receiving a plurality of first-order user preferences indicative of user preferences among a plurality of items; receiving a second-order user preference indicative of user preferences among the first-order preferences in the plurality of first-order user preferences; computing a weight for an edge of the preference graph based on the plurality of first- order user preferences and the second-order user preference, wherein: the edge connects a first node associated with a first item and a second node associated with a second item, and the weight is indicative of a degree of preference for the first item over the second item; and outputting at least two of the plurality of items according to the preference graph.
11. The computer-readable storage medium of claim 10, wherein the preference graph comprises a node for each item in the plurality of items and an edge for every pair of nodes associated with items related by a first-order preference in the plurality of first-order preferences.
12. The computer-readable storage medium of claim 10, wherein computing the weight comprises: computing a first number of first-order user preferences in the plurality of first-order user preferences indicating a user's preference for the first item relative to the second item; computing a second number of all first-order user preferences in the plurality of first- order user preferences indicating any preference associated with the first item; and setting the weight based on the first number divided by the second number.
13. The computer-readable storage medium of claim 10, wherein receiving the plurality of first-order user preferences comprises receiving a first-order preference from a user.
14. The computer-readable storage medium of claim 10, wherein: each item in the plurality of data items is represented as a tuple, the tuple comprising values of a plurality of attributes; and each first-order user preference in the plurality of first-order user preferences indicates a user preference of one item over another item based at least in part on a value of an attribute of a first tuple, representing the one item, and a value of an attribute of a second tuple representing the other item.
15. The computer-readable storage medium of claim 10, wherein: the plurality of first-order user preferences comprises at least two types of first-order preferences selected from the group comprising score-based preferences, partial order preferences, skyline preferences, and conjoint analysis preferences.
16. The computer-readable storage medium of claim 10, wherein the second-order user preference comprises a plurality of second-order user preference relations that comprises at least two types of second-order preferences selected from the group comprising prioritized preference composition, partial order preferences, pairwise preferences, and Pareto preference composition.
17. A database system comprising: a memory configured to store a plurality of tuples, a data structure encoding a preference graph to represent user preferences, wherein the user preferences comprise a plurality of first-order preferences representing user preferences among tuples and a second-order user preference representing user preferences among first-order preferences in the plurality of first-order preferences; and a processor configured to access contents of the memory and compute a ranking of a tuple in the plurality of tuples based on the data structure encoding the preference graph.
18. A system for interactive preference management, the system comprising: a memory configured to store a plurality of tuples, each tuple comprising a value for at least one of a plurality of attributes; at least one processor configured to receive a range of values for an attribute in the plurality of attributes from a user, output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.
19. A computer-implemented method for interactive preference management, the method comprising: receiving, with a processor, a query from a user, the query comprising a keyword; prompting the user to provide a plurality of first-order preferences associated with one or more attributes related to the keyword; and in response to receiving the plurality of first-order preferences, prompting the user to provide a second-order preference among the first-order preferences in the plurality of first-order preferences.
20. The method of claim 19, wherein prompting the user to provide a plurality of first-order preferences comprises: presenting a list of attributes related to the keyword to the user; receiving a selection of attributes in the list of attributes from the user; and prompting the user to specify a first-order preference associated with the selected attribute.
PCT/CA2011/001382 2011-06-20 2011-12-21 Method and apparatus for preference guided data exploration WO2012174632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161498899P 2011-06-20 2011-06-20
US61/498,899 2011-06-20

Publications (1)

Publication Number Publication Date
WO2012174632A1 true WO2012174632A1 (en) 2012-12-27

Family

ID=47421937

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CA2011/001382 WO2012174632A1 (en) 2011-06-20 2011-12-21 Method and apparatus for preference guided data exploration
PCT/CA2012/000603 WO2012174648A1 (en) 2011-06-20 2012-06-20 Preference-guided data exploration and semantic processing

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CA2012/000603 WO2012174648A1 (en) 2011-06-20 2012-06-20 Preference-guided data exploration and semantic processing

Country Status (4)

Country Link
AU (2) AU2012272479A1 (en)
CA (1) CA2841147C (en)
IL (2) IL230065A (en)
WO (2) WO2012174632A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016003737A1 (en) * 2014-07-03 2016-01-07 Google Inc. Promoting preferred content in a search query
US10001911B2 (en) 2015-04-10 2018-06-19 International Business Machines Corporation Establishing a communication link between plural participants based on preferences

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7849090B2 (en) 2005-03-30 2010-12-07 Primal Fusion Inc. System, method and computer program for faceted classification synthesis
US11294977B2 (en) 2011-06-20 2022-04-05 Primal Fusion Inc. Techniques for presenting content to a user based on the user's preferences
US20120324367A1 (en) 2011-06-20 2012-12-20 Primal Fusion Inc. System and method for obtaining preferences with a user interface

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126303A1 (en) * 2006-09-07 2008-05-29 Seung-Taek Park System and method for identifying media content items and related media content items
US7752199B2 (en) * 2004-03-01 2010-07-06 International Business Machines Corporation Organizing related search results
US20110040749A1 (en) * 2009-08-13 2011-02-17 Politecnico Di Milano Method for extracting, merging and ranking search engine results

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739281B2 (en) * 2003-09-16 2010-06-15 Microsoft Corporation Systems and methods for ranking documents based upon structurally interrelated information
US7464075B2 (en) * 2004-01-05 2008-12-09 Microsoft Corporation Personalization of web page search rankings
US8078884B2 (en) * 2006-11-13 2011-12-13 Veveo, Inc. Method of and system for selecting and presenting content based on user identification
US7689624B2 (en) * 2007-03-01 2010-03-30 Microsoft Corporation Graph-based search leveraging sentiment analysis of user comments
US7853599B2 (en) * 2008-01-21 2010-12-14 Microsoft Corporation Feature selection for ranking
US8972329B2 (en) * 2008-05-02 2015-03-03 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for ranking nodes of a graph using random parameters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752199B2 (en) * 2004-03-01 2010-07-06 International Business Machines Corporation Organizing related search results
US20080126303A1 (en) * 2006-09-07 2008-05-29 Seung-Taek Park System and method for identifying media content items and related media content items
US20110040749A1 (en) * 2009-08-13 2011-02-17 Politecnico Di Milano Method for extracting, merging and ranking search engine results

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016003737A1 (en) * 2014-07-03 2016-01-07 Google Inc. Promoting preferred content in a search query
US9852224B2 (en) 2014-07-03 2017-12-26 Google Llc Promoting preferred content in a search query
US10001911B2 (en) 2015-04-10 2018-06-19 International Business Machines Corporation Establishing a communication link between plural participants based on preferences

Also Published As

Publication number Publication date
AU2017221807B2 (en) 2019-07-18
AU2012272479A1 (en) 2014-01-16
IL248313A (en) 2017-07-31
CA2841147A1 (en) 2012-12-27
CA2841147C (en) 2022-03-01
WO2012174648A1 (en) 2012-12-27
IL230065A (en) 2016-10-31
AU2017221807A1 (en) 2017-09-21

Similar Documents

Publication Publication Date Title
US11960556B2 (en) Techniques for presenting content to a user based on the user&#39;s preferences
US10409880B2 (en) Techniques for presenting content to a user based on the user&#39;s preferences
AU2011269676B2 (en) Systems of computerized agents and user-directed semantic networking
Ristoski et al. Semantic Web in data mining and knowledge discovery: A comprehensive survey
Ostuni et al. Top-n recommendations from implicit feedback leveraging linked open data
US9703891B2 (en) Hybrid and iterative keyword and category search technique
AU2017221807B2 (en) Preference-guided data exploration and semantic processing
US11106719B2 (en) Heuristic dimension reduction in metadata modeling
CN110609902A (en) Text processing method and device based on fusion knowledge graph
Yang et al. Lenses: An on-demand approach to etl
WO2014093951A2 (en) Graph query processing using plurality of engines
Interdonato et al. A versatile graph-based approach to package recommendation
US11294977B2 (en) Techniques for presenting content to a user based on the user&#39;s preferences
KR101602342B1 (en) Method and system for providing information conforming to the intention of natural language query
US20240184839A1 (en) Techniques for presenting content to a user based on the user&#39;s preferences
Wang et al. Deep learning-based open api recommendation for mashup development
Vo et al. Towards scalable recommendation framework with heterogeneous data sources: preliminary results
Gopalakrishnan On unsupervised algorithms for semantically interpretative and contextually sensitive text-mining
Hsu Efficient query processing over personal process description graphs
Zhao Clustering and entity resolution for semi-structured data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11868017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11868017

Country of ref document: EP

Kind code of ref document: A1