WO2012174632A1

WO2012174632A1 - Method and apparatus for preference guided data exploration

Info

Publication number: WO2012174632A1
Application number: PCT/CA2011/001382
Authority: WO
Inventors: Ihab F. Ilyas; Mohamed A. Soliman
Original assignee: Primal Fusion Inc.
Priority date: 2011-06-20
Filing date: 2011-12-21
Publication date: 2012-12-27
Also published as: AU2017221807B2; AU2012272479A1; IL248313A; CA2841147A1; CA2841147C; WO2012174648A1; IL230065A; AU2017221807A1

Abstract

There is disclosed a method and apparatus for querying a database storing a plurality of items. In an embodiment, the method comprises querying the database with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items and a second-order user preference indicative of the user's preferences among first-order user preferences in the plurality of first-order user preferences. In another embodiment, the method further comprises calculating a ranking of the plurality of items based at least in part on a data structure encoding a preference graph representing the plurality of first-order user preference and the second-order user preference and outputting at least a subset of the plurality of items to the user, in accordance with the ranking.

Description

METHOD AND APPARATUS FOR

PREFERENCE GUIDED DATA EXPLORATION

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 61/498,899, filed on June 20, 2011, titled "Method and Apparatus for Preference Guided Data Exploration".

FIELD OF INVENTION

The present disclosure relates generally to methods and systems for querying a database storing a plurality of items. BACKGROUND

Data exploration systems, such as search engines and database management systems, manage enormous volumes of information. As a result, locating information of interest to a user in response to a search query (e.g., in the form of a set of keywords) presents challenges.

Conventional approaches to search often shift the burden of finding the information of interest to the user. For example, all potentially-relevant results may be presented to the user in response to a search query. Subsequently, the user has to manually explore and/or rank these results in order to find the information of greatest interest. When the number of potentially- relevant results is large, which is often the case, the user may be overwhelmed and may fail to locate the information for which he is looking. One conventional technique for addressing this problem is to integrate a user's preferences into the search process. By presenting search results in accordance with the user's preferences, the user may be helped to find the information he seeks. However, conventional approaches to specifying user preferences severely limit the ways in which user preferences may be specified. Consider, for example, a data exploration model adopted by many search services and illustrated in FIG. 1. Query interface 12 is used to collect query predicates in the form of keywords and/or attribute values (e.g., "used Toyota" with price in [$2000-$5000]). Query results are then sorted (14) on the values of one or more attributes (e.g., order by Price then by Rating) in a major sort/minor sort fashion. The user then scans (16) through the sorted query answers to locate items of interest, refines query predicates, and repeats the exploration cycle (18). This "Query, Sort, then Scan" model limits the flexibility of preference specification and imposes rigid data exploration schemes as highlighted in the following example.

Example I

Amy is searching online catalogs for a camera to buy. Amy is looking for a reasonably-priced camera, whose color is preferably silver and less preferably black or gray, and whose reviews contain the keywords "High Quality." Amy is a money saver, so her primary concern is satisfying her Price preferences followed by her Color and Reviews preferences.

The data exploration model of FIG. 1 allows Amy to sort results in ascending price order. Amy then needs to scan through the results comparing colors and inspecting reviews to find the camera that she wants. The path followed by Amy to explore search results is mainly dictated by her price preference, while other preferences are incorporated in the exploration task through Amy's effort, which can limit the possibility of finding items that closely match her requirements. Conventional approaches to specifying user preferences suffer from a number of other drawbacks in addition to not simultaneously supporting different types of preferences. For example, preference specifications may be inconsistent with one another. A typical example is having cycles in preferences among first-order preferences (preferences among attributes of items such as preferring one car to another car based on the price or on brand), which implies non-transitivity of preferences. For instance, a user may indicate that a Honda is preferred to a Toyota, Toyota is preferred to a Nissan and a Nissan is preferred to a Honda. Even when first- order preferences are consistent, second order preferences (preferences among the first order preferences such as brand preferences are more important than price preferences) can result in further problems. For example, prioritized composition of a set of partial orders does not generally maintain the transitivity property in the resulting order. Conventional systems for data exploration are unable to rank search results when preference specifications may be inconsistent.

BRIEF DESCRIPTION OF DRAWINGS

In the drawings,

FIG. 1 is a diagram of a "query, sort, then scan" data exploration model, in accordance with prior art; FIG. 2 is a diagram illustrating a relation, in accordance with some embodiments;

FIG. 3 is a flowchart of an illustrative preference modeling process, in accordance with some embodiments;

FIG. 4 is a diagram illustrating scopes obtained from a relation, in accordance with some embodiments;

FIG. 5 is a diagram illustrating scope comparators, in accordance with some embodiments;

FIG. 6 is a diagram illustrating conjoint preferences, in accordance with some embodiments; FIG. 7 is a diagram of an illustrative mapping of a partial order to linear extensions, in accordance with some embodiments;

FIG. 8 is a diagram of an illustrative preference graph, in accordance with some embodiments;

FIG. 9 is a diagram of an illustrative computation of edge weights for different types of second-order preferences, in accordance with some embodiments;

FIG. 10 is a diagram of an illustrative page-rank based matrix for prioritized comparators, in accordance with some embodiments;

FIG. 11 is a diagram of an illustrative weighted preference graph and tournaments derived from it, in accordance with some embodiments; FIG. 12 is a flowchart for an illustrative process for interactively specifying preferences, in accordance with some embodiments; and

FIG. 13 is an illustrative computer system on which some embodiments of the present disclosure may be implemented.

DETAILED DESCRIPTION Inadequate incorporation of preferences in conventional data exploration systems is due at least partly to the inability of these systems to integrate different types of preferences. For instance, in the above-described example, preferences include an ordering on all prices (a "total order" preference), an ordering between some colors (a "partial order" preference), a Boolean predicate from the presence of the words "High Quality" in the reviews, and an indication that price is more important than the other preferences.

Another situation in which it may be useful to specify different types of preferences may be a situation in which a user may have precise preferences for information in one domain because the user may possess a large amount of knowledge about the domain. Such precise preferences may be specified, for example, in the form of one or more scoring functions. However, the same user may have imprecise preferences for information in another domain because the user may not posses a large amount of knowledge about the other domain. In this case, preferences may be specified, for example, in the form of one or more partial orders on attribute values. There are many instances in which the user may need to specify both types of preferences (i.e., using a scoring function and using a partial order) as shown in Example 2 below.

Example 2 Alice is searching for a car to buy. Alice has specific preferences regarding sport cars, and more relaxed preferences regarding SUVs. Alice supplies the data exploration system with a scoring function to rank sport cars, and a set of partial orders encoding SUVs preferences. Alice expects reported results to be ranked according to her preferences. A data exploration system capable of integrating different preference types and ranking search results in response to a user query, in accordance with user-specified preferences, may address some of the above-discussed drawbacks of conventional approaches. However, not every embodiment addresses every one of these drawbacks, and some embodiments may not address any of them. As such, it should be appreciated that embodiments of the invention are not limited to addressing all or any of the above-discussed drawbacks of these conventional approaches.

Accordingly, in some embodiments, a preference language is provided for specifying different types of user preferences among items. A data exploration system may assist a user to specify preferences using the preference language. The specified preferences may be used to construct a general preference model that, in turn, may be used to produce a ranking of items in accordance with any user preferences. Items may be any suitable items about which a user may express preferences. In some instances, an item may be any item that may be manufactured, sold and/or purchased. For example, an item may be a car or an airplane ticket— a user (e.g., a consumer) may have preferences for one car over another car and/or may prefer one airplane ticket to another airplane ticket. In some instances, an item may comprise information. Users may prefer one item over another item based at least in part on the information that these items contain. For example, when searching for content (e.g., movie, music, images, webpages, text, sound, etc.) a user may prefer some content to other content. For instance, a user may prefer to see a webpage that contains infonnation related to cars over a webpage that contains information related to bicycles. An item may comprise, or have associated with it, one or more attributes. An attribute of an item may be related to the item and may be a characteristic of the item. An attribute of an item may be a characteristic descriptive of the item. For instance, if an item is an item that may be purchased, an attribute of the item may be a price related to the item. An attribute of an item may be a characteristic that may identify the item. For example, a characteristic of an item may be an identifier (e.g., name, serial number, or model number) of the item.

Attributes may be numerical attributes and may be categorical attributes. Numerical attributes may comprise one or more values. For instance a numerical attribute may comprise a single number (e.g., 5) or a range of numbers (e.g., 1-1000). Categorical attributes may also comprise one or more values. For instance, a categorical value for the category "Color" may comprise a single color (e.g., Red) or a set of colors (e.g., {"Red", "Green"}). Though, it should be recognized that attribute values are not limited to being numbers and/or categories and may be any of numerous other types of values. For instance, values may comprise alphabetic and alphanumeric strings.

An item may be represented by one or more tuples comprising values for one or more attributes associated with the item. In some cases, a tuple representing an item may comprise a value for each attribute associated with the item. In other cases, a tuple representing an item may comprise a value for only a portion of the attributes associated with the item.

FIG. 2 shows an illustrative example of a set of items, each item being represented by a tuple comprising values for the attributes of the items. In the illustrative example of FIG. 2, each item is a car and is associated with six attributes: "ID," "Make," "Model," "Color," "Price," and "Deposit." Though in this example all items share the same attributes, this is not a limitation of the present invention as different items may have different attributes from one another and some attributes may have unknown values. Each item is represented by a tuple (i.e., a set) of attribute values. Accordingly, the first item has characteristics indicated by the first set of attribute values. For instance, the first item is represented by the tuple in the first row of the table shown in FIG. 2. As illustrated, this first item is an $1800 Red Honda Civic identified by identifier "ti". A deposit of $500 may be required to purchase this car. A user may express preferences for one item over another item in a set of items. User preferences may be of any suitable type and may be first-order user preferences, second-order user preferences, and even further-order preferences.

First-order preferences are preferences associated with attributes of items. First-order preferences may be based on values of attributes of items. For example, a first-order preference may express a preference for an item over another item based on values of one more attribute of the two items. For instance, a first-order preference may indicate an item with a lower price (value of the attribute "price") is preferred to an item with a higher price. As another example, a first-order preference may indicate that a red (value of the attribute "color") item (e.g., car) is preferred to a blue item. Second-order preferences are preferences across first-order preferences. Second-order preferences may indicate which first-order preferences are more important to a user. For example, first-order preference A may be based on values of one attribute (e.g., "price") while first-order preference B may be based on values of another attribute (e.g., "color"). A second- order preference may indicate that first-order preference B is preferred to first-order preference A (i.e., color may be more important than price).

There may be many different types of first-order and second-order preferences and these types of preferences, along with other aspects of first-order and second-order preferences are discussed in greater detail below in Sections II and III, respectively.

The data exploration system may be any system for exploring data, infonnation or knowledge. The data exploration system may allow one or more users to query the system. For instance, a data exploration system may be a search engine such as an Internet search engine or a domain-specific search engine (e.g., a search engine created to search a particular infonnation domain such as a company's or institution's intranet, or a specific subject-matter information repository). In another example, a data exploration system may be a database system that may allow user queries. A query input by a user into a data exploration system may be any of numerous types of queries. For instance, a query may comprise one or more keywords indicating what the user is seeking. In some cases, a query may comprise user preferences. Though, it should be appreciated that user preferences may be specified separately and/or independently from any user query. For instance, a user may specify preferences that may apply to multiple user queries. The specified preferences may comprise preferences of any suitable type such as first-order and/or second- order user preferences.

Regardless of the types of preferences that a user may wish to specify, a data exploration system may assist a user to specify preferences. A data exploration system may assist a user to specify preferences using the preference language, for example. Some example approaches to how a data exploration system may assist a user to specify preferences are described in greater detail in Sections I and VI, below.

After user-specified preferences are obtained (e.g., from a user-specified query or any other suitable source), a preference model may be constructed from these preferences. The preference model may be constructed from different types of preferences and may be constructed from first-order preferences of different types and/or from second-order preferences of different types.

A preference model may be represented by a data structure encoding the preference model. The data structure may comprise any data necessary for representing the preference model and, for example, may comprise any parameters associated with the preference model.

A data structure encoding a preference model may be stored on any tangible computer- readable storage medium. The computer-readable storage medium may be any suitable computer-readable storage medium and may be accessed by any physical computing device that may use the preference model encoded by the data structure. In some embodiments, the preference model may be a graph-based preference model and the data structure encoding the preference model may encode a graph, termed a preference graph, characterizing the graph-based preference model. The preference graph may comprise a set of nodes (vertices) and a set of edges connecting nodes in the set of nodes. The edges may be directed edges or may be undirected edges. Accordingly, the data structure encoding the preference graph may encode the preference graph by encoding the graph's vertices and edges. Any of numerous data structures for encoding graphs, as are known in the art, may be used to encode the preference graph, as the invention is not limited in this respect. In some embodiments, nodes of the graph may be associated with items. For instance, a node in the graph may be associated with a tuple that, in turn, represents an item. The graph may represent items that are related with one or more keywords in a query. For instance, a set of items may be selected in response to a user-provided query. A first-order preference for one item over another item may be represented as an edge in the graph, with the edge connecting nodes associated with the tuples associated with the two items. A weight may be associated to each node in the graph to provide an indication of a degree of preference for one of the nodes terminating the edge. The weight may be computed based on first-order and/or second preferences. Aspects of a graph-based preference model, including how such a preference model may be constructed from user-specified preferences, are described in greater detail in Section IV, below.

The preference model may be used to obtain a ranking of items in a set of items. For instance, a graph-based preference model may be used to construct such a ranking. A graph- based preference model may be used to construct such a ranking in any of numerous ways. For instance, a complete directed graph may be obtained from the graph-based preference model and a ranking of items may be obtained based on the completed directed graph. As another example, a Markov-chain based algorithm may be applied to the graph-based preference model to obtain a ranking of items. These and other approaches to obtaining a ranking of items in a set of items are described in greater detail in Section V, below. It should be appreciated that though a preference graph may be a convenient abstraction, which is helpful for reasoning about user preferences, in practice, a preference graph may be implemented on a physical system via a data structure that may encode the preference graph.

Similarly, many constructs described below (e.g., relations, scopes, scope comparators, and etc.) are convenient abstractions used in various fields such as computer science, but each construct may be realized, in practice, by a data structure representing data characterizing the construct and/or processor-executable instructions for carrying out functions associated with the construct.

Such data structures and processor-executable instructions may be encoded on any suitable tangible compute-readable storage medium.

Accordingly, for ease of reading, every reference to a construct (e.g., a graph, a relation, scope, scope comparator, etc.) is a reference to a data structure encoding the construct and/or processor-executable instructions that when executed by a processor perform functions associated with the construct, since explicitly referring to such data structures and processor- executable instructions for every reference to a construct is tedious. It should also be appreciated that the above-described embodiments can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software, or a combination thereof. When implemented in software, the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.

Software modules comprising program instructions may be provided to perform any of numerous of tasks in accordance with some embodiments. For example, one or multiple software modules for constructing a preference model may be provided. As another example, software modules for obtaining a ranking for a set of items based on (a data structure representing) the preference model may be provided. As another example, software modules comprising instructions for implementing any of numerous functions associated with a data exploration system may be provided. Though, it should be recognized that the above examples are not limiting and software modules may be provided to perform any functions in addition to or instead of the above examples.

I. Design Goals

In some embodiments, a data exploration system that utilizes user preference may reflect some or all of the following design goals:

• Guidance: The system may assist users to formulate their preferences. The system may support interactive preference management. For instance, the system may provide users with information to help users specify and/or modify preferences. As a specific example, the system may provide users with information about how to modify their preferences to widen or narrow the scope of their search. As another specific example, the system may provide users with information about how to modify their preferences such that the ranking of items presented to a user is modified. Though, these are only examples and the system may aid the user to formulate their preferences in any of numerous ways as described in greater detail below, in Section VI.

• Flexibility: Specification of different types of preferences may be supported for arbitrary subsets of items, sometimes referred to as "contexts." The system may accept natural descriptions of preferences and map these descriptions into preference constructs.

• Provenance: The system may be able to provide justification of how search results are generated and ranked by relating generated results to input preferences. FIG. 3 illustrates flowchart for an example process of modeling preferences that reflects the above-mentioned design goals. As illustrated in FIG. 3, the data exploration system may be a system that may receive a query from one or more users. For instance, the system may be a database system or a search engine and the query may comprise one or more keywords. Toward the guidance goal, the system may assist a user to specify preferences. In some embodiments, such support may be based on pre-computed summaries in the form of facets that may be used for guiding data exploration. Each facet may be associated with a number that may provide the user with an estimate on the expected number of results. Accordingly, facets may allow a user to get a quick and dirty view of the underlying set of items and/or domain, and how search results may be affected by tuning preferences.

For example, the system may comprise a memory configured to store a plurality of tuples (recall that each tuple comprises one or more values for one or more attributes) and may receive a range of desired values for an attribute from a user. In response the system may output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values. As a specific example, for a categorical attribute, a facet may comprise a possible attribute value (e.g., 'Color = Red'); while for a numerical attribute, a facet may comprise a range of possible values (e.g., 'Price in [$1000-$5000]'). Moreover, the user may be able to defme custom facets as Boolean conditions over multiple attributes (e.g., 'Color=Red AND price < $5000'). The system may associate a number to each of these facets, the number indicating an expected number of tuples consistent with these facets.

Toward the flexibility goal, the system may adopt the concept of contextualized preferences, where a user can assign different preference specifications to different subsets (contexts) of items. A user may define a context by using predetermined facets or by defining custom facets. As discussed below in Sections II and III, the user has the flexibility of expressing first-order and second-order preferences within and across contexts. Contextualized preferences may also part of a user's profile, which may be ascertained by any of the techniques disclosed herein as well as those disclosed in U.S. Non-Provisional Application Serial No. 12/555,293, filed September 08, 2009, and titled Synthesizing Messaging Using Context Provided By Consumers, which is hereby incorporated by reference. This way, they may be loaded, saved, and/or refined upon the user's request.

Toward the provenance goal, the data exploration system illustrated in FIG. 3 may maintain information regarding which preferences among the input preferences, affect the relative order of each pair of items in the final results ranking. This feature may be useful for the analysis and refinement of preferences in different scenarios. Examples include finding preference constructs that have dominating effect on results' ranking, decreasing/increasing the influence of some preference constructs, and understanding the effect of removing a certain preference construct. Additional ways in which a data exploration system may assist a user to input preferences are discussed below in Section VI.

II. First-Order Preferences

In some embodiments, the preference language may be based on capturing pairwise preferences on different granularity levels. An items' description may follow a relational model, where each item may be represented as a tuple. Preferences may be cast against a relation R with a known schema.

Our first construct is used to define a context for expressing first-order preferences.

Definition 1 [Scope] : A scope R, is an arbitrary non-empty subset of tuples in R.

A scope defines a Boolean membership property that restricts the space of all possible tuples to a subset of tuples that are interesting for building preference relations. Such a membership property may be defined using a SQL query posed against R. For example, FIG. 4 shows six different scopes Rj... R₆ in the relation "Car" illustrated in FIG. 2, where scopes are defined using SQL queries. Though, it should be recognized that such a membership property may be defined using any of numerous other ways. As one example, a database query language other than SQL may be used to define such a membership property. As another example, the membership property may be defined using a set of variables and a database language may not be needed.

As shown in the illustrative diagram of FIG. 4, scopes may intersect. Thus, a tuple in the relation R may belong to zero, one or two or more scopes. Tuples that do not belong to any scopes may be non-interesting with respect to a preference specification. Thus, for clarity, all subsequent discussion is with respect to tuples that belong to at least one scope.

Definition 2 [Scope Comparator] : Let R, and R_j be two scopes in R. The scope comparator fi_j is a function that takes a pair of distinct tuples (one is from R, and the other is from Rj ), and returns a first value such as 1 (e.g., if the tuple from R, is preferred), a second value such as -I (e.g., the tuple from R_j is preferred), or a null value "-L " (e.g., if there is no preference).

A scope comparator is a preference language construct for defining first-order preferences. In some instances, the scope comparator may be user-defined. Though, in other instances, a scope comparator may be defined, automatically, by a computer. Still, in other embodiments a scope comparator may be defined by a combination of manual and automatic techniques.

A generic interface to a scope comparator may accept two tuples and return either a preference of one tuple over the other, or no preference can be made. Whenever a tuple t_t is f . . †■

preferred to a tuple , , we say that /, dominates t_j , denoted as ¹ J.

FIG. 5 shows illustrates 5 different scope comparators defined on the scopes shown in FIG. 4. In FIG. 5, the scope comparators f_iA and f_u are unconditional (i.e., they produce first- order preferences without testing any conditions beyond the conditions captured by scope definition). On the other hand, the scope comparators f_u , fs.e , fe.2 are conditional (i.e., they produce preference relations conditioned on some logic).

Algorithm 1 Score-based Preferences

SCORE-PREPS (fji tuple, t tuple, S: scoring function)

5 else return ±

Conditional scope comparators allow defining composite preferences that span multiple attributes given in scope definition and/or comparator logic (e.g., f₆₂ defines a composite preference on Price and Make attributes).

The generality of scope definitions and preference comparators allow encoding different types of preferences, with different semantics. In the following we give templates for encoding different types of preferences using the above-described language constructs. Template 1 [Score-based Preferences]. Preferences are defined using a scoring function S, where tuples achieving better scores are preferred. Without loss of generality and without limitation, assume that higher scores are better, then score-based preferences can be specified using the template given by Algorithm 1.

A total order on a scope R, (which can be the whole relation R) may be encoded by defining a comparator f_u, using the template in Algorithm 1, where f_u operates on pairs of distinct tuples belonging to R_h

Template 2 [Partial Order Preferences]. For an attribute x, let P_x be a partial order defined on the domain ofx. The partial order can be expressed as a set P_x = {(v_t > v)} for values v, and Vj in the domain of x, such that P_x is:

• ir reflexive (i. e. , (vj > P_x>.

• asymmetric (i.e., ( <?.,· > Vj )€ P_T (t'_;- > r,; ) ^ P_x).

• transitive (i.e., { (<·* > t¾ ). (¾ > e_k) } C P_x. =^ (·¾ > ·¾. ) 6 P_x).

Partial order-based preferences may be encoded using the template given by Algorithm

2.

Template 3 [Skyline Preferences]. Given a set of attributes A, a tuple t, is preferred to tuple i, if there exists a non-empty subset ^ — -^. wliet V ^' 6 A . tj .X j_s preferred to t_j,_x, while for any other attribute no preference can be made between tj .X¹ Olid tj ..v Skyline preferences may be encoded as shown in the template given by Algorithm 3.

Algorithm 2 Partial Order Preferences

PARTIAL ORDER-PREPS (i,:tuple , tf tuple, I : partial order on attribute ;r )

1 if ((ί».3' I> t_j.x)€ P_r)

2 then return 1

3 else if((f_j.a- > ti. )€ P)

4 then return -1

5 else return J.

Algorithm 3 Skyline Preferences

SKYLINE-PRHFS (tf. tuple, f_jrUi Je, . : subset of attributes)

1 pi <- 0

2 p_j ^O

3 for all x e .4

4 do

5 if (tj.x is preferred to t_j.x)

6 then -···■- p, + 1

o ί, .χ)

1

11 if0¾ >0)

12 then return 1

1 else if (¾^■ > 0)

14 then return -1

Template 4 [Conjoint Analysis Preferences]. Given a set of attributes A, conjoint analysis encodes preferences among attribute values in A when taken conjointly. This can be expressed as a function C_A that maps each combination of values in A to a unique rank. The function C_A is partial on the domains of all possible combinations of values in A. Hence, there can be combinations of values in A that are not mapped to rank under C_A. Conjoint analysis preferences based on C_A may be expressed using the template given by Algorithm 4.

The next example is an example for specifying and managing conjoint analysis preferences. Example 3:

Alice's preferences regarding cars may be expressed conjointly over the attribute pairs (Make, Color), and (Make, Price), as shown in FIG. 6. The value in each cell is the rank assigned to each combination of attribute values. Conjoint analysis may be based on an additive utility model in which ranks, assigned to combinations of attribute values, may be used to derive a utility (part-worth) of each attribute value. The objective is that the utility summation of attribute values reconstructs the given ranking. In FIG. 6, for example, 'Honda' is assigned utility value 40, while 'Red' is assigned utility value 50. Hence, the score of 'Honda, Red' is 90, which matches the assigned rank 1 in the given Make-Color preferences. Utility values may be computed using regression. For instance, they may be computed using linear regression. Note the mapping between combinations of attribute values and ranks is modeled.

III. Second-Order Preferences

Our main language construct for defining second-order preferences is a preferences order (POrder), defined as follows:

Definition 3 [POrder] : given a set of scope comparators F, a POrder is a permutation of comparators in F.

A POrder represents an ordering of scope comparators based on their relative importance. A POrder may quantify the strength of different first-order preferences based on the semantics of second-order preferences, as discussed in greater detail below in Section IV.

Definition 4 [POrder Projection!: Let A be a POrder defined on the set of comparators F. For £ F we denote with , H p' ^j4 ) a total order of comparators in ^ ordered according to A. It follows that ⁼ Α· Algorithm 4 Conjoint Analysis Preferences

CONJOINT ANALYSIS-PREPS it,: tuple, t tuple, ,4: subset of attributes, C_A conjoint analysis map)

1 if (C_A({ti.x : T e A}) is undefined

OR C_A{{t_j r : x€ A}) is undefined)

2 th n return 1

3 else if : x 6 A}) < : a- e A})

4 then return 1

5 else return -1

For example, for the POrder ^Ά— and the subset of comparators r —

{ /l _' /3 j , we have ⁿF'4 = (./ s).

Given a POrder projection A' , we say that (^* ^ ) ^{11 n} de^l A jf f_{or a SCO}p_e comparator Jn t Λ , we have and there is no other scope comparator jh € where /b ^ aaccording to ^anc' fb i - tj ) = ^~ 1.

Different types second-order preferences may be encoded using POrders.

• Prioritized Preference Composition. In this case, second-order preferences are defined as a total order of comparators ^ ⁼ {. ΐ ^ /2 ^ ^* · ' ί^>_ .fm)_; which expresses the requirement that the first-order preferences corresponding to fi are more important than the first-order preferences corresponding tof_i+i- Prioritized composition of preferences is formulated as a single POrder with the same comparators order given by O.

• Partially Ordered Preferences. A partial order PO on the set of scope comparators may encode partial information on the relative importance of different scope comparators. Let i2 be a set of comparator orderings consistent with PO, where an ordering UJ is consistent with PO if the relative order of any two scope comparators in J does not contradict with PO. The set Ω is called the set of linear extensions of PO. For example, FIG. 7 shows a partial order defined on four comparators and the corresponding set of linear extensions. The set of linear extensions may be obtained using a simple recursive algorithm on the PO graph. Partially-ordered preferences may be formulated as the set of POrders given by • Pairwise Preferences: A set PW = { <; /,· >- /, ) }_of pairwise second-order preferences on scope comparators. The pairwise second-order preference (f* ^ expresses the requirement that the first-order preferences corresponding to f are more important than the first-order preferences corresponding to f. Pairwise second-order preferences PW may be formulated as the set of POrders { (fi _^ fi ) · ^' (fi ^ fi ) £ P1\^' }.

• Pareto Preference Composition: The importance of all scope comparators is equal. The first-order preference ^ ) is produced if and only if at least one scope comparator states that ~ 0) , and no other scope comparator states that (' ? Pareto preference composition is formulated as a set of singleton POrders, where each POrder is composed of a single comparator.

• Preferences Aggregation: The scope comparators act as voters on preference relations.

The first-order preference ^ ' is produced if and only if at least one scope comparator states that (^»- ) . Preferences aggregation may be formulated as a set of singleton POrders, where each POrder may be composed of a single comparator. IV. Compilation

Given a set of scopes and scope comparators, a graph-based representation of the preferences, termed a preference graph, may be obtained. In this Section, an algorithm for "compiling" the given set of scope and scope comparators (first-order preferences) is described. A preference graph may be formally defined as follows: Definition 5 [Preference Graph] : A directed graph (V,E), where V is the set of tuples in

R and an edge e,, £ E connects tuple tj to tuple t_j if there exists at least one comparator applicable to land returning 1, or applicable to returning -1. The label of edge e_ij, denoted l(e_Lj) is the set of comparators inducing preference of t_t over t_j .

The compilation algorithm is described in Algorithm 5. The algorithm constructs the set of vertices also termed nodes of the preference graph using the union of tuples involved in all input scopes. In other words, each node in the preference graph is associated with a tuple. Accordingly, each node in the preference graph may represent an item. For each pair of distinct tuples, the set of applicable scope comparators may be found and used to compute graph edges and their labels. Accordingly, an edge in the preference graph may correspond to a first-order preference, which may indicate a user preference for one of the two items represented by the nodes terminating the edge. Edges of the preference graph may be directed edges and may be directed to the node associated with a preferred data item as indicated by the first-order preference associated with the edge. Though, in some embodiments, edges may be undirected and an indication of which of nodes terminating the edge is preferred may be provided differently. For instance, such an indication may be provided by using a signed weight, with a negative weight indicating a preference for one node and a positive weight indicating a preference for the other node.

FIG. 8 illustrates example for the output of the compilation algorithm. In particular, FIG. 8 shows the preference graph obtained from the set of scope comparators

\ ll,2 - /.3,4 , /s.6 - ./e.2 , ft, 5 } described with reference to FIG. 4. Each edge is labeled with a set of supporting comparators. For example, for the edge e₂₆, we have ^(^{( '}2^β) ^~ {/ΐ ,2 · ^β,2 }_; since the tuple t₂ is preferred over the tuple t₆ according to the scope comparators f_{I 2} and f₆₂.

Since scopes may intersect and arbitrary scope comparator logic may be allowed, the

- t_G induced preference graph may be a cyclic graph. For example, in FIG. a cycle exists since t_t is preferred over t₆ according to f₆₂, while t₆ is preferred over t_\ according to f_u. Construction of a preference graph according to Algorithm 5 does not guarantee transitivity of graph edges. For example, in FIG. 8, the existence of the edges e_{2 6} and e₆ does not imply the existence of the edge e₂ ,_/.

Algorithm 5 Preferenc s Compilation

COMPILK-PRi- FS (S; a set of scopes, F: a set of comparators)

1 V— (J {t : t 6 Sj } {find the union of all scopes}

2 " *— { } { initialize set of graph edges as empty }

3 for all (f_; . f _j } £ (V x V)\ ≠t_j

4 do

5 for all /€ F

6 do

7 if (/ is applicable to ((,· , ))

8 then

1 1 then

12 _{Cl J} - 1

1 3 append / to /(cij )

14 if (_CiJ E)

15 then add fj._j to £

16 else if (p - 1 )

1 7 then

1 8 j *■— 1

1 9 append / to /(c_j- j )

20 if (e_jlt ^ £)

21 then add e_j to

return ( { V. E ) {return Preferences Graph}

The computational complexity of constructing and processing a preference graph is quadratic in the number of tuples. There is a tradeoff between a preference graph's expressiveness and the scalability of its implementation. Though in some embodiments, preferences may be highly "selective" and, consequently, the preference graph may be sparse.

Scalability issues due to the size of the preference graph may be addressed in any of numerous ways. One approach is to use distributed processing in a cloud environment, where storing and managing the preference graph is distributed over multiple nodes in the cloud. For example, a ranking algorithm described below in Section V.A may be easily adapted to function in a cloud environment. Other approaches include sacrificing the precision of preference query results by conducting approximate processing, or thresholding managed preferences to prune weak preferences early, to reduce the size of the preference graph.

A preference graph allows heterogeneous user preferences to be encoded using a unified graphical representation. Though, in some embodiments, computing a ranking of query results using such representation may require additional quantification of preference strength. Preference strength may be quantified based on the semantics of first-order and second-order preferences, while preserving the preference infomiation encoded by the preference graph. Preference strength may be represented by weights on edges of the preference graph. Given a preference graph G(V,E), the set of graph edges E may represent pairwise first- order preferences. Specifically, an edge e_i:j may express the preference for tuple ¾ over tuple t_j according to one or more scope comparator(s). In some instances, a weight w, may be associated with an edge e_; . The weight w_;j may be a weight indicative of a degree of preference for the first node over the second node. Stronger preferences may be indicated by higher weights. In some instances, the weight may be a weight between 0 and 1 , inclusive and the sum of the weights w_;j and Wj i may equal 1. Disconnected vertices in the preference graph indicate that their corresponding tuples are indifferent with respect to each other.

In some embodiments, computing the weight may comprise dividing the number of first- order preferences for item A relative to item B by the number of all first-order preferences indicating any preference (either for or not for) item A.

For instance, let F be the set of all scope comparators associated with the preference graph. Let A be the set of POrders of F according to the chosen semantics of second-order preferences. Let ^¾ ^~~ ''· ' '·./ ) ^J .'·' '· That is, F^ is the set of scope comparators that state a preference relationship between tuples t_t and t_j . Let Ay be the multiset of nonempty projections

4 + c 4 _·

of POrders in A based on F_Lj . Let ^" * J — ~ be the set of POrder projections under which

^{L i} ' ' and similarly let *> — ^' be the set of POrder projections under which

¾>^'

^U4i.j , and that ^A*j is empty. The weight w may be computed as follows: i\A (1 )

That is, Wjj corresponds the proportion of POrder projections, under which * ^~ J , among the set of POrder projections computed based on comparators relevant to the edge (¾, tj).

The weight w_j may be similarly defined using the set ^* . It follows that " '·./ ^w3 ^~~ ^ . For the case of Pareto composition, at most one of the two edges and e_jA can exist in the preference graph, since otherwise i, and t_j would be incomparable. Hence, under Pareto composition, we remove any graph edge e, whenever an edge e, , exists.

We next give an example illustrating how to compute preference weights under different semantics of second-order preferences. Example 4

FIG. 9 shows three weighted preference graphs, corresponding to the preference graph in FIG. 8, produced under different semantics of second-order preferences. The different semantics of second-order preferences result in different edge weights and/or the removal of some edges in the original preference graph: · Under prioritized comparators, ei_i6 is removed since, based on the shown comparator priorities, it may be determined that (t₆ >- ti).

• Under partially-ordered comparators, we have that w =w_i2--5, since for the relevant (t₂1₃) set of comparators is {f_5>6, f_li5} and the given partial order induces four POrder projections i <Λ,»· δ,^β). ( ι,5· /s.e) , </δ,6, h >)- (/δ,^β- /l,5> }, where (t₂ - 1₃) under the two POrder projections ( s.G; ίΐ,5 } · {/s,6^■ /l,s). while

(t₃ >- t₂) under the other two POrder projections ί^¹,⁵ · {/ΐ,δ - Ζδ,^β)

• Under pairwise preferences, w_{5 6} - 0:33 since (h te) based on (fe.2) , which is one out of three POrder projections { </₆,6> . </β,2 > , </5,β> }.

V. Ranking The graph-based preference model described in Section IV may be used to obtain a ranking (a total order) of items in a set of items. This may be done in any of numerous ways. One approach described in Section V.A obtains a ranking based on authority-based ranking algorithms. Another approach described in Section V.B is a probabilistic algorithm based on inducing a set of complete directed graphs called tournaments from the graph-based preference model and computing a ranking for at least one tournament from the set.

A. Importance Flow Ranking

A total order of items (or, equivalently, tuples representing these items) may be obtained by estimating an importance measure for each tuple using the preference weights encoded by the weighted preference graph. Techniques related to the PageRank importance flow model may be used to compute such importance measures. Under the PageRank model, scores may be assigned to Web pages based on the frequency with which they are visited by a random surfer. Pages are then ranked according to these scores. Intuitively, pages pointed to by many important pages are also important.

The PageRank importance flow model lends itself naturally to problems that require computing a ranking based on binary relationships among items. In the context of preferences, the model may be applied based on the notion that an item may be important if is preferred over many other important items.

Let G = (V, E) be a dominance graph (i.e., a directed graph in which an edge e,_y means Vj >~ I _an(j j_{e an(}j jj^j k_e ^ _{set Q}f _no(j_es dominated by and dominating v, respectively. Let

number called a damping factor. The PageRank algorithm, as known in the art, computes the PageRank score of node v„ denoted 1i, according to:

The PageRank score of a node v is determined by summing PageRank scores of all nodes v' dominated by v, normalized by the number of nodes dominating v' . It is well known that when ∑ I) G ^ ⁼ ^ Equation 2 corresponds to a stationary distribution of a Markov chain, and that a unique stationary distribution exists if the chain is irreducible (i.e., the dominance graph is strongly connected), and aperiodic. Nodes that have no incoming edges (i.e., nodes that are not dominated by any other nodes) lead to sinks in the Markov chain, which makes the chain irreducible. This problem may be handled by adding self loops at sink nodes, or (unifonn) transitions from sink states to all other states in the Markov chain. The damping factor a captures the requirement that each node is reachable from every other node. The value of a is the probability that we stop following the graph edges, and start the Markov chain from a new random node. This may help to avoid getting trapped in cycles between nodes that have no edges to the rest of the graph.

Accordingly, in some embodiments a pagerank-based algorithm may be used to calculate a total order of items from the weighted preference graph. Herein, a pagerank-based algorithm refers to any algorithm based on calculating a value from a graph based on characteristics of a

Markov chain defined with respect to the graph. Note that a difference between the above- described weighted preference graph and the graphs that the PageRank algorithm to which is conventionally applied is that the weighted preference graph has preference weights associated to edges. The preference weights bias the probability of transition (flow) from one state to another, according to weight value, in contrast to the conventional case in which transitions are uniformly defined.

A pagerank-based algorithm may proceed as follows. Given a starting tuple t₀ (node) in the weighted preference graph, assume a random surfer that jumps to a next tuple t_u among the set of tuples dominating t₀, biased by the edge weights. Intuitively, this corresponds to a process where a tuple is constantly replaced by a more desired tuple (with respect to given preferences). Note that visiting tuples takes place in the opposite direction of edges (jumps are from a dominated tuple to a dominating tuple). Hence, it follows that tuples that are visited more frequently, according to this process, are more likely to be desirable than tuples that are visited less frequently. Ranking tuples based on their visit frequency (pagerank-based scores) defines an ordering that corresponds to their global desirability. The weighted preference graph may be represented using a square matrix M, where each tuple may corresponds to one row and one column in M. Let E_j be the set of incoming edges to tuple t_j in the weighted preference graph. The entry M [i, j] may be computed as follows:

Hence, the sum of all entries in each column in Mis 1.0 unless the tuple corresponding to that column has no incoming edges. Matrices in which all the entries are nonnegative and the sum of the entries in every column is 1.0 are called column stochastic matrices. A stochastic matrix defines a Markov chain whose stationary distribution is the set of importance measures we need for ranking. In order to maintain the irreducibility of the chain, we need to eliminate sinks (nodes with no incoming edges in the preference graph). We handle the problem of sinks by adding a self loop, with weight 1.0, at each sink node.

Let Γ be the pagerank scores vector. Then, based on the previous matrix representation, the pagerank scores are given by solving the equation Γ— M^■ Γ _which is the same as finding the eigenvector of M corresponding to eigenvalue 1. The solution that has been used practice for computing pagerank scores is using the iterative power method, where Γ computed by first choosing an initial vectorF , and then producing a next vector The process is repeated to generate a vector F^, at iteration T, using the vector Γ , generated at iteration^¹ 1. For convergence, at each iteration T, entries in T^are normalized so that they sum to 1.0. In practice, the number of iterations needed for the power method to converge may be any suitable of iterations. For instance, tens or hundreds of iterations may be used.

FIG. 10 illustrates the pagerank matrix for the weighted preference graph with prioritized comparators illustrated in FIG. 9. Note that t₄ is a sink node with no incoming edges (i.e., t₄ has no other dominating tuples). Hence, we add a self loop with weight 1.0 to t₄, represented by the matrix entry M[4, 4]. A typical value of the damping factor a may be a value such as 0.15, but may be any value between 0 and 0.5.

B. Probabilistic Ranking

A total order of items (or top-ranked items) may be obtained from a complete directed graph derived from the preference model. Computing a total order of items from a complete directed graph (also known as a tournament) is termed finding a tournament solution. This problem may be stated as follows. Given an irreflexive, asymmetric, and complete binary relation over a set, find the set of maximal elements of this set. Example methods for finding tournament solutions are computing Kendall scores, and finding a Condorcet winner.

It should be appreciated, however, that the preference graph described in Section IV is not necessarily a tournament. In particular, the preference graph may be symmetric and incomplete:

• Symmetry: both edges e_Lj and e, , may exist in the preference graph,

• Incompleteness: both edges and e_JA may be missing from the preference graph.

The symmetry problem implies that some pairwise preferences may go either way with possibly different weights, while incompleteness implies that some pairwise preferences may be unknown.

In some embodiments, a probabilistic approach to obtaining a ranking from the preference graph may be used. Such an approach may rely on deriving one or more tournaments from the preference graph. Each tournament may be associated with a probability. As such, a weighted preference graph may be viewed as a compact representation of a space of possible tournaments, wherein each tournament is obtained by repairing the preference graph to obtain an asymmetric and complete digraph. In order to construct a tournament, two repair operations may be applied to the preference graph:

• Remove an edge. Applying this operation eliminates a 2-length cycle by removing one of the involved edges.

· Add an edge. Applying this operation augments the graph by adding a missing edge.

As discussed earlier, the value of the weight w_it represents the probability of selecting a

POrder, among the set of all POrders relevant to (t tj), under which >~ ^i >. We thus interpret as the probability with which tuple t, is preferred to tuple t_j. We further assume the independence of values of different tuple pairs. For each tuple pair (t_h tj), if both w _j > 0 and W_j.j > 0 (i.e., ti and t_} are involved in a 2-length cycle), the operation remove edge removes the edge e_jti with probability w , and removes the edge _j otherwise. Alternatively, if ;v, = 0 and w = 0 (i.e., /, and /, are disconnected vertices), the operation add edge adds one of the edges e_i or e_jA with the same probability 0.5.

Based on the probabilistic process described above, repairing the weighted Preference graph generates a tournament (irreflexive, asymmetric, and complete digraph) whose probability is given by the product of the probabilities of all remaining graph edges. Let c be the number of 2-length cycles in the Preference graph, and d be the number of disconnected tuple pairs. Then, the number of possible tournaments is 2 ^+d.

FIG. 1 1 illustrates a weighted preference graph, and the corresponding set of possible tournaments Π ι ¾ }. The illustrated preference graph has two 2-length cycles

^(†\—^†-2 and t —^†s) and one pair of disconnected tuples (t₂, t₄), and hence the number of possible tournaments is 8. The probability of each tournament is given by the product of the probabilities associated with its edges. For example, the probability of T, is 0.09, which is the product of 0.3, 0.6, and 0.5 representing w_2J, w¾._?, and w₄ , respectively. Given a tournament T and a total order of tuples O, we say that O violates T, with respect to the relative order of (/,, tj), if ^f ' ^ under O, while ^lJ ^li under T. The problem of computing a total order of tuples with a minimum number of violations to tournament is known to be NP-hard. Multiple heuristics have been proposed to compute a total order from a tournament. We focus on using Kendall score for computing a total order. The Kendall score of tuple t is the number of tuples dominated by t according to the tournament. The space of possible tournaments allows computing a total order of tuples under any of numerous probabilistic ranking measures. Two specific measures are described below.

• Most probable tournament ranking. Compute a total order of tuples based on the tournament with the highest probability.

· Expected ranking. Compute a total order of tuples based on the expected ranking in the space of all the possible tournaments.

Finding the most probable tournament is done by maintaining the edge with the higher weight for each 2-length cycle in the preference graph, and adding an arbitrary edge for each pair of disconnected tuples. According to this method, there may be multiple tournaments with the highest probability among all possible tournaments. The computed total order under any of these tournaments is the required ranking. In the illustrative example of FIG. 11 , tournaments T₂ and T₆ are the most probable tournaments, each with probability 0.21. A total order of tuples in T₂ using

Kendall scores is ('-Ί - * · ?2 · H) while a total order of tuples in T₆ is^¹ ' ^² ' ^t4 Let n be the number of tuples in the preference graph, the complexity of the algorithm is 0(n ), since we need to visit all edges of the preference graph.

Finding the expected ranking may be done by computing the expected Kendall score for each tuple using the space of possible tournaments. We model the score of tuple /,■ as a random variable s_t whose distribution is given by the space of possible tournaments. In the illustrative example of FIG. 11, /_/ dominates one tuple in ι

_wj_m probability summation , _r ¹ 2 · 1 4 · J 6 · J 8 { _with probability summation 0.7. Hence, the random variable s_t may take the value 1 with probability 0.3, and takes the value 2 with probability 0.7. The expected value of s, is thus 1 *0.3+2*0.7=1.7.

Computing the exact expected score of each tuple requires materializing the space of possible tournaments, which is infeasible due to the exponential number of possible tournaments. We thus propose a sampling-based algorithm to approximate the expected value of s, of each tuple /,, and then rank tuples based on their estimated expected scores. Let L(t,) be the set of tuples dominated by /, in the weighted preference graph. For a tuple /,·, a sample Z is generated by adding each ^€ L(ti ) _mpj_{e {Q w}ith probability All samples may be generated independently. Hence, a score sample from s, distribution is given by 1^1. The expected value of s, is estimated as the mean of the generated score samples. It is well known that sample mean, computed from a sufficiently large set of independent samples, is an unbiased estimate of the true distribution mean. Let n be the number of tuples in the preference graph, and m be the number of drawn samples for each tuple, the complexity of the algorithm is 0((nm)²), since we access the dominated set of each tuple m times to generate m score samples. VI. Interactive Preference Specification

A data exploration system may help a user to specify preferences. In some embodiments, preferences may be specified interactively. A system may interact with a user through a series of prompts, displays, and/or indications of the type of input a user may provide the system. The system may provide the user with information that may assist the user in specifying preferences. A data exploration system may assist a user to query the system. To this end, the data exploration system may assist the user to specify preferences and may output query results, to the user, ranked in accordance with the specified preferences.

FIG. 12 shows a flowchart of an illustrative process 1200 for assisting a user to query a data exploration system. Process 1200 may be used to assist a user specify user preferences in conjunction with a query, and may assist a user specify preferences associated with attributes related to one or more keywords in a query.

Process 1200 begins in act 1202 when a user query may be inputted. The inputted query may be any suitable query and may be a text query. The inputted query may be a multimedia query, for example, received through an audio input device that may be translated into text using any appropriate speech-recognition/speech-to-text software. The inputted query may comprise one or more keywords. The query may be, for example, a query for an item to purchase and/or may be a query for an item comprising information desired by a user. For instance, the query may be a query containing the keyword "car" and may indicate that a user may be interested in looking at items related to cars. As another example, the user may input a query "television" into an Internet search engine, which may indicate that a user may be interested in looking at any webpages containing information about television. Though a query may be any suitable query, as known in the art.

In response to receiving a user query, one or more attributes related to the query may be identified, in act 1204 of process 1200. Attributes may be related to one or more keywords contained in the query. For instance, attributes may be a characteristic of a keyword in the query. Attributes may be of any suitable type. For instance, attributes may be categorical attributes or numerical attributes. For instance, if a query for a "car" were inputted in act 1202, then attributes related to car may be the attributes "Make," "Color," "Price," and any other attributes of car such as the attributes illustrated in FIG. 2. Attributes related to one or more keywords contained in a query may be identified in any suitable way as known in the art. They may be identified automatically by a computer or may be manually specified.

Regardless of the way in which attributes are identified, in act 1204, a user may be presented with these attributes, in act 1206. The user may be shown these attributes visually using a display screen that contains these attributes. The display screen may be any suitable screen containing a representation of the attributes, such as a text representation of the attributes. The user may be prompted to select one or more of the presented attributes such that the system may assist the user to specify preferences associated with the selected attributes. For instance, a user may be presented with a list of previously-mentioned attributes associated with the keyword "car" and may select the attributes "Price" and "Color." In act 1208, attributes selected by the user may be received. In response to receiving the selected attributes, the user may be prompted to specify first- order preferences associated with one or more selected attributes, in act 1210. For each attribute, the user may specify a first-order preference of any suitable type. For instance, the user may specify score-based preferences, partial order preferences, skyline preferences, and/or conjoint analysis preferences as discussed with reference to Section II. The user may be assisted in specifying any of the above-mentioned first-order preferences in any of numerous ways. In some embodiments, a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the first-order preferences (e.g., by drawing preferences). In some embodiments, the user may be provided with a series of prompts designed to obtain information required to specify first-order preferences. In response to receiving first-order preferences, the user may be prompted to specify a second-order preference among the received first-order preferences, in act 1212. The user may specify a second-order preference of any suitable type. For instance, the user may specify prioritized preference composition preferences, partial order preferences, pairwise preferences, and/or Pareto preference composition preferences as discussed with reference to Section III. Similar to the case of first-order preferences, a user may be assisted in specifying any of the above-mentioned second-order preferences in any of numerous ways. In some embodiments, a graphical user interface may be used. The graphical user interface may allow the user to graphically represent the second-order preferences. In some embodiments, the user may be provided with a series of prompts designed to obtain information required to specify second- order preferences. After first-order and second-order preferences have been specified, process 1200 completes.

The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code may be embodied as stored program instructions that may be executed on any suitable processor or collection of processors (e.g., a microprocessor or microprocessors), whether provided in a single computer or distributed among multiple computers.

It should be appreciated that a computer may be embodied in any of numerous forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embodied in a device not generally regarded as a computer, but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, a tablet, a reader, or any other suitable portable or fixed electronic device.

Also, a computer may have one or more input and output devices. These devices may be used, among other things, to present a user interface. Examples of output devices that may be used to provide a user interface include printers or display screens for visual presentation of output, and speakers or other sound generating devices for audible presentation of output. Examples of input devices that may be used for a user interface include keyboards, microphones, and pointing devices, such as mice, touch pads, and digitizing tablets.

Such computers may be interconnected by one or more networks in any suitable form, including networks such as a local area network (LAN) or a wide area network (WAN), such as an enterprise network, an intelligent network (IN) or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks, and/or fiber optic networks.

A computer system that may be used in connection with any of the embodiments described herein is shown in FIG. 13. FIG. 13 shows, schematically, an illustrative computer 1300 on which various inventive aspects of the present disclosure may be implemented. The computer 1300 includes a processor or processing unit 1301 and a memory 1302 that may include volatile and/or non- volatile memory. The computer 1300 may also include storage 1305 (e.g., one or more disk drives) in addition to the system memory 1302. The memory 1302 and/or storage 1305 may store one or more computer-executable instructions to program the processing unit 1301 to perform any of the functions described herein. The storage 1305 may optionally also store one or more data sets as needed. References herein to a computer can include any device having a programmed processor, including a rack-mounted computer, a desktop computer, a laptop computer, a tablet computer or any of numerous devices that may not generally be regarded as a computer, which include a programmed processor.

The exemplary computer 1300 may have one or more input devices and/or output devices, such as devices 1306 and 1307 illustrated in FIG. 13. These devices may be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.

As shown in FIG. 13, the computer 1300 may also comprise one or more network interfaces (e.g., the network interface 1310) to enable communication via various networks (e.g., the network 1320). Examples of networks include a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.

Thus, in an embodiment, there is provided a method for querying a data exploration system managing a plurality of items, the method comprising: querying the data exploration system with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items, and a second-order user preference indicative of the user's preferences among first-order user preferences in the plurality of first-order user preferences; calculating, with a processor, a ranking of an item in the plurality of items based at least in part on a data structure encoding a preference graph that represents the plurality of first- order user preferences and the second-order user preference; and outputting at least a subset of the plurality of items to the user, in accordance with the ranking. In an embodiment, calculating the ranking comprises: applying a pagerank-based algorithm to the data structure encoding the preference graph to calculate the ranking.

In another embodiment, the preference graph comprises a plurality of nodes, wherein each node represents an item, and calculating the ranking comprises: calculating a pagerank score of a node in the plurality of nodes.

In another embodiment, calculating the ranking comprises: computing a total order of nodes in a complete directed graph derived from the preference graph, wherein each node represents an item.

In another embodiment, computing the total order comprises calculating a Kendall score for a node in the complete directed graph.

In another embodiment, the preference graph comprises: a plurality of nodes, wherein each node corresponds to an item in the plurality of items; and a plurality of edges, wherein each edge corresponds to a first-order preference in the plurality of first-order preferences, the first- order preference indicating a user preference for one of the two items represented by nodes terminating the edge.

In another embodiment, each edge is a directed edge, directed to a node associated with a preferred item as indicated by the corresponding first-order preference.

In another embodiment, a weight is associated to an edge between a first node and a second node in the preference graph, the weight being indicative of a degree of preference for the first node over the second node.

In another embodiment, each item in the plurality of items is represented as a tuple, the tuple comprising a plurality of attributes of the item.

In another aspect, there is provided a computer-readable storage medium article storing a data structure encoding a preference graph and a plurality of processor-executable instructions that when executed by a processor, cause the processor to perform the acts of: receiving a plurality of first-order user preferences indicative of user preferences among a plurality of items; receiving a second-order user preference indicative of user preferences among the first-order preferences in the plurality of first-order user preferences; computing a weight for an edge of the preference graph based on the plurality of first-order user preferences and the second-order user preference, wherein: the edge connects a first node associated with a first item and a second node associated with a second item, and the weight is indicative of a degree of preference for the first item over the second item; and outputting at least two of the plurality of items according to the preference graph.

In an embodiment, the preference graph comprises a node for each item in the plurality of items and an edge for every pair of nodes associated with items related by a first-order preference in the plurality of first-order preferences.

In another embodiment, the computing the weight comprises: computing a first number of first-order user preferences in the plurality of first-order user preferences indicating a user's preference for the first item relative to the second item; computing a second number of all first- order user preferences in the plurality of first-order user preferences indicating any preference associated with the first item; and setting the weight based on the first number divided by the second number.

In another embodiment, receiving the plurality of first-order user preferences comprises receiving a first-order preference from a user. In another embodiment, each item in the plurality of data items is represented as a tuple, the tuple comprising values of a plurality of attributes; and each first-order user preference in the plurality of first-order user preferences indicates a user preference of one item over another item based at least in part on a value of an attribute of a first tuple, representing the one item, and a value of an attribute of a second tuple representing the other item. In another embodiment, the plurality of first-order user preferences comprises at least two types of first-order preferences selected from the group comprising score-based preferences, partial order preferences, skyline preferences, and conjoint analysis preferences.

In another embodiment, the second-order user preference comprises a plurality of second-order user preference relations that comprises at least two types of second-order preferences selected from the group comprising prioritized preference composition, partial order preferences, pairwise preferences, and Pareto preference composition.

In another aspect, there is provided a database system comprising: a memory configured to store a plurality of tuples, a data structure encoding a preference graph to represent user preferences, wherein the user preferences comprise a plurality of first-order preferences representing user preferences among tuples and a second-order user preference representing user preferences among first-order preferences in the plurality of first-order preferences; and a processor configured to access contents of the memory and compute a ranking of a tuple in the plurality of tuples based on the data structure encoding the preference graph.

In another aspect, there is provided a system for interactive preference management, the system comprising: a memory configured to store a plurality of tuples, each tuple comprising a value for at least one of a plurality of attributes; at least one processor configured to receive a range of values for an attribute in the plurality of attributes from a user, output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.

In another aspect, there is provided a computer-implemented method for interactive preference management, the method comprising: receiving, with a processor, a query from a user, the query comprising a keyword; prompting the user to provide a plurality of first-order preferences associated with one or more attributes related to the keyword; and in response to receiving the plurality of first-order preferences, prompting the user to provide a second-order preference among the first-order preferences in the plurality of first-order preferences. In an embodiment, prompting the user to provide a plurality of first-order preferences comprises: presenting a list of attributes related to the keyword to the user; receiving a selection of attributes in the list of attributes from the user; and prompting the user to specify a first-order preference associated with the selected attribute.

The various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of numerous suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a virtual machine or a suitable framework. In this respect, various inventive concepts may be embodied as at least one non- transitory computer-readable storage medium (e.g. a computer memory, one or more floppy discs, computer discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, etc.) article(s) encoded with one or more programs that, when executed on one or more computers or other processors, implement the various process embodiments of the present invention. The non-transitory computer-readable medium or media may be transportable, such that the programs stored thereon may be loaded onto any suitable computer resource to implement various aspects of embodiments as discussed above.

The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed performs methods of one or more embodiments need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of one or more embodiments. Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, items, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments. Also, data structures may be stored in non-transitory computer-readable storage media articles in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish relationships among information in fields of the data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.

Also, various inventive concepts may be embodied as one or more methods, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be construed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiment, or vice versa.

All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. The indefinite articles "a" and "an," as used herein, unless clearly indicated to the contrary, should be understood to mean "at least one."

As used herein, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B" (or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

The phrase "and/or," as used herein, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with "and/or" should be construed in the same fashion, i.e., "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to "A and/or B", when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.

As used herein, "or" should be understood to have the same meaning as "and/or" as defined above. For example, when separating items in a list, "or" or "and/or" shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items.

Having described several embodiments of the invention in detail, various modifications and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description is by way of example only, and is not intended as limiting. The invention is limited only as defined by the following claims and the equivalents thereto.

Claims

CLAIMS:

1. A method for querying a data exploration system managing a plurality of items, the method comprising: querying the data exploration system with a query comprising a plurality of first-order user preferences indicative of a user's preferences among items in the plurality of items, and a second-order user preference indicative of the user's preferences among first- order user preferences in the plurality of first-order user preferences; calculating, with a processor, a ranking of an item in the plurality of items based at least in part on a data structure encoding a preference graph that represents the plurality of first-order user preferences and the second-order user preference; and outputting at least a subset of the plurality of items to the user in accordance with the ranking.

2. The method of claim 1 , wherein calculating the ranking comprises: applying a pagerank-based algorithm to the data structure encoding the preference graph to calculate the ranking.

3. The method of claim 2, wherein the preference graph comprises a plurality of nodes, wherein each node represents an item, and calculating the ranking comprises: calculating a pagerank score of a node in the plurality of nodes.

4. The method of claim 1 , wherein calculating the ranking comprises: computing a total order of nodes in a complete directed graph derived from the preference graph, wherein each node represents an item.

5. The method of claim 4, wherein computing the total order comprises calculating a Kendall score for a node in the complete directed graph.

6. The method of claim 1 , wherein the preference graph comprises: a plurality of nodes, wherein each node corresponds to an item in the plurality of items; and a plurality of edges, wherein each edge corresponds to a first-order preference in the plurality of first-order preferences, the first-order preference indicating a user preference for one of the two items represented by nodes terminating the edge.

7. The method of claim 6, wherein each edge is a directed edge, directed to a node associated with a preferred item as indicated by the corresponding first-order preference.

8. The method of claim 6, wherein a weight is associated to an edge between a first node and a second node in the preference graph, the weight being indicative of a degree of preference for the first node over the second node.

9. The method of claim 1, wherein each item in the plurality of items is represented as a tuple, the tuple comprising a plurality of attributes of the item.

10. A computer-readable storage medium article storing a data structure encoding a preference graph and a plurality of processor-executable instructions that when executed by a processor, cause the processor to perform the acts of: receiving a plurality of first-order user preferences indicative of user preferences among a plurality of items; receiving a second-order user preference indicative of user preferences among the first-order preferences in the plurality of first-order user preferences; computing a weight for an edge of the preference graph based on the plurality of first- order user preferences and the second-order user preference, wherein: the edge connects a first node associated with a first item and a second node associated with a second item, and the weight is indicative of a degree of preference for the first item over the second item; and outputting at least two of the plurality of items according to the preference graph.

11. The computer-readable storage medium of claim 10, wherein the preference graph comprises a node for each item in the plurality of items and an edge for every pair of nodes associated with items related by a first-order preference in the plurality of first-order preferences.

12. The computer-readable storage medium of claim 10, wherein computing the weight comprises: computing a first number of first-order user preferences in the plurality of first-order user preferences indicating a user's preference for the first item relative to the second item; computing a second number of all first-order user preferences in the plurality of first- order user preferences indicating any preference associated with the first item; and setting the weight based on the first number divided by the second number.

13. The computer-readable storage medium of claim 10, wherein receiving the plurality of first-order user preferences comprises receiving a first-order preference from a user.

14. The computer-readable storage medium of claim 10, wherein: each item in the plurality of data items is represented as a tuple, the tuple comprising values of a plurality of attributes; and each first-order user preference in the plurality of first-order user preferences indicates a user preference of one item over another item based at least in part on a value of an attribute of a first tuple, representing the one item, and a value of an attribute of a second tuple representing the other item.

15. The computer-readable storage medium of claim 10, wherein: the plurality of first-order user preferences comprises at least two types of first-order preferences selected from the group comprising score-based preferences, partial order preferences, skyline preferences, and conjoint analysis preferences.

16. The computer-readable storage medium of claim 10, wherein the second-order user preference comprises a plurality of second-order user preference relations that comprises at least two types of second-order preferences selected from the group comprising prioritized preference composition, partial order preferences, pairwise preferences, and Pareto preference composition.

17. A database system comprising: a memory configured to store a plurality of tuples, a data structure encoding a preference graph to represent user preferences, wherein the user preferences comprise a plurality of first-order preferences representing user preferences among tuples and a second-order user preference representing user preferences among first-order preferences in the plurality of first-order preferences; and a processor configured to access contents of the memory and compute a ranking of a tuple in the plurality of tuples based on the data structure encoding the preference graph.

18. A system for interactive preference management, the system comprising: a memory configured to store a plurality of tuples, each tuple comprising a value for at least one of a plurality of attributes; at least one processor configured to receive a range of values for an attribute in the plurality of attributes from a user, output an integer indicative of a number of tuples comprising a value for the attribute such that the value is in the range of values.

19. A computer-implemented method for interactive preference management, the method comprising: receiving, with a processor, a query from a user, the query comprising a keyword; prompting the user to provide a plurality of first-order preferences associated with one or more attributes related to the keyword; and in response to receiving the plurality of first-order preferences, prompting the user to provide a second-order preference among the first-order preferences in the plurality of first-order preferences.

20. The method of claim 19, wherein prompting the user to provide a plurality of first-order preferences comprises: presenting a list of attributes related to the keyword to the user; receiving a selection of attributes in the list of attributes from the user; and prompting the user to specify a first-order preference associated with the selected attribute.