WO2010009314A2

WO2010009314A2 - System and method of using automated collaborative filtering for decision-making in the presence of data imperfections

Info

Publication number: WO2010009314A2
Application number: PCT/US2009/050848
Authority: WO
Inventors: Thanuka L. Wickramarathne; Kamal Premaratne; Miroslav Kubat; Dushyantha T. Jayaweera
Original assignee: University Of Miami
Priority date: 2008-07-16
Filing date: 2009-07-16
Publication date: 2010-01-21
Also published as: WO2010009314A3

Abstract

A system and method are provided for performing automated collaborative filtering for data that is gathered from the plurality of sources. The data includes contextual information and is stored in a database that includes filled data slots and empty data slots. A prediction module communicates with a client terminal device, receives the data, and generates prediction data based on the contextual information, wherein the prediction data is provided to populate the empty data slots. The invention models a wide class of data imperfections, propagates partial knowledge throughout a decision-making process, incorporates background knowledge into the automated collaborative filtering and provides reliability information for system predictions.

Description

SYSTEM AND METHOD OF USING AUTOMATED COLLABORATIVE FILTERING FOR DECISION-MAKING IN THE PRESENCE OF DATA

IMPERFECTIONS

FIELD OF THE INVENTION

The invention relates generally to data pattern analysis and in particular to a method and system for automatically accommodating data imperfections and making decisions using imperfect data, without implementing simplifying assumptions. BACKGROUND OF THE INVENTION

Automated Collaborative Filtering ("ACF") refers to a group of algorithms used in recommender systems implemented for various applications. An example of a typical application is an e-commerce system where customers rate items and receive automated recommendations based on detected similarity patterns. One of the problems encountered by conventional ACF algorithms is data imperfection, e.g., limited statistics, subjective judgment, etc. Existing techniques are rarely capable of dealing with imperfections in user-supplied ratings.

When such imperfections, e.g., ambiguities, cannot be avoided, designers typically resort to simplifying assumptions that impairs the system's performance and utility. Conventional algorithms either completely ignore imperfect user ratings or utilize some imputation mechanism to remove the imperfections, e.g., fill-in the missing entries. Neither strategy produces acceptable results, especially when a large percentage of the data is imperfect and/or little information is available about the reason for and the mechanism driving the imperfections. This is one reason that existing ACF algorithms have not been widely utilized in applications where data imperfections are commonplace and decisions being made are of critical importance, such as medical/healthcare data, homeland security and defense applications, etc. Simplifying assumptions made in such applications may harm the reliability of the decisions being made. Existing technologies cannot handle data imperfections without making assumptions that are not realistic and/or cannot be justified. Hence, the decisions and prediction being made by these methods cannot be relied upon, especially in critical and sensitive applications. Therefore, what is needed is a method and system for automatically making decisions in the presence of imperfect data without the need to make simplifying assumptions.

SUMMARY OF THE INVENTION Various aspects of the invention overcome at least some of these and other drawbacks of existing systems. The invention provides systems and methods of modeling various imperfections, propagating partial knowledge throughout the decision-making process, providing a framework for incorporating background knowledge into automated collaborative filtering and providing predictions having ass ociated reliability information .

According to one embodiment of the invention, an automated collaborative filtering device is provided that communicates with a client terminal device and receives data from a plurality of sources. The automated collaborative filtering device includes a storage module that stores data gathered from the plurality of sources, wherein the data includes contextual information and wherein the storage module has a database that includes filled data slots and empty data slots. A prediction module is provided that communicates with the storage module and the client terminal device, the prediction module being programmed to generate prediction data based on the contextual information, wherein the prediction data is provided to populate the empty data slots.

According to another embodiment, the invention provides a method of performing automated collaborative filtering that includes providing a database that includes filled data slots and empty data slots and storing data gathered from a plurality of sources into the database. Contextual information is obtained from the stored data and prediction data is generated based on the contextual information so that the empty data slots may be populated with the prediction data.

According to yet another embodiment, the invention provides an automated collaborative filtering device that communicates with a client terminal device and receives data from a plurality of sources. The automated collaborative filtering device includes a storage module that stores data gathered from the plurality of sources, wherein the data includes contextual information and wherein the storage module has a database that includes filled data slots and empty data slots. A probability rating module is provided that communicates with the storage module and the client terminal device, wherein the probability rating module is programmed to extract predefined values from the data and transform the predefined values into a probability of obtaining the predefined values. A prediction module is also provided that communicates with the probability rating module and is programmed to generate prediction data based on the contextual information and the probability of obtaining the predefined values, wherein the prediction data is provided to populate the empty data slots. BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the present invention, and the attendant advantages and features thereof, will be more readily understood by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein: FIG. 1 illustrates a system diagram according to one embodiment of the invention;

FIG. 2 illustrates a table containing user ratings of a provider's items;

FIG. 3 illustrates a table containing Dempster-Shafer ("DS") theoretic notations; FIG. 4 illustrates a set of partial probability models of user character profiles;

FIG. 5 illustrates a table containing a basic probability assignment ("BPA") corresponding to a Movielens_Rating = 2 of a ±1 tolerance user;

FIG. 6 illustrates a graph of the variation of the mean absolute error over time for different values of a dispersion factor when applying the principles of the invention to an exemplary dataset without imperfections;

FIG. 7 illustrates a graph of the variation of the mean absolute error in relation to the neighborhood size, when applying the principles of the invention to a dataset without imperfections; FIG. 8 illustrates a graph of the variation of the mean absolute error in relation to the similarity threshold when applying the principles of the invention to a dataset without imperfections;

FIG. 9 illustrates a graph of the variation of the mean absolute error in relation to the neighborhood size when applying the principles of the invention to a dataset that includes imperfections;

FIG. 10 illustrates a graph of the variation of the mean absolute error in relation to the similarity threshold when applying the principles of the invention to a dataset that includes imperfections; FIG. 11 illustrates a table containing exemplary predictions in accordance with the principles of the invention;

FIG. 12 illustrates a table containing performance comparison data between the invention and prior methods for making hard decisions on a dataset without imperfections; FIG. 13 illustrates a table containing performance comparison data between the invention and prior methods for making soft predictions on a dataset without imperfections;

FIG. 14 illustrates a table containing performance comparison data between the invention and prior methods for a dataset that includes imperfections; and FIG. 15 illustrates a graph of the variation of the new DS-theoretic measure

DS-PEl in relation to the neighborhood size when applying the principle of the invention to a dataset that includes imperfections.

DETAILED DESCRIPTION OF THE INVENTION

Automated collaborative filtering ("ACF") is a technique for making recommendations when presented with imperfect data, such as data having ambiguities and uncertainties in ratings and missing or incomplete data, among other imperfect data. Current ACF systems are not capable of adequately handling data imperfections. However, it is becoming increasingly important to have effective strategies in place to model and propagate these data imperfections throughout the decision-making process so that prediction tasks can be completed with high reliability.

While specific embodiments of the invention are discussed herein and are illustrated in the drawings appended hereto, the invention encompasses a broader spectrum than the specific subject matter described and illustrated. As would be appreciated by those skilled in the art, the embodiments described herein provide but a few examples of the broad scope of the invention. There is no intention to limit the scope of the invention only to the embodiments described.

Computer networks are used to implement the ACF. FIG. 1 illustrates an example of the system architecture 100 according to one embodiment of the invention. Client terminal devices 105a-105n (hereinafter identified collectively as 105) may be coupled to one or more automated collaborative filtering devices 110 via a wired network, a wireless network, a combination of the foregoing and/or other networks, such as a network 107. The client terminal devices 105 may include any number of different types of client terminal devices, such as personal computers, laptops, smart terminals, personal digital assistants (PDAs), cell phones, Web TV systems, video game consoles, and devices that combine the functionality of one or more of the foregoing or other client terminal devices. The client terminal devices 105 may include processors, RAM, USB interfaces, telephone interfaces, satellite interface, microphones, speakers, a stylus, a computer mouse, a wide area network interface, a local area network interface, hard disks, wireless communication interfaces, DVD/CD reader/burners, a keyboard, a flat touch-screen display, and a display, among other components. The client terminal devices 105 may communicate with the automated collaborative filtering device 110, other client terminal devices 105 and/or other systems.

Users may access the client terminal devices 105 to communicate with selected sources, including other client terminal devices 105 and the automated collaborative filtering device 110. Data requests that originate from the client terminal devices 105 may be broadcast to selected sources substantially in real-time if the client terminal devices 105 are coupled to the network 107. The automated collaborative filtering device 110 may include any number of different types of automated collaborative filtering devices, such as servers, personal computers, laptops, smart terminals, video game consoles, and devices that combine the functionality of one or more of the foregoing or other automated collaborative filtering devices.

The automated collaborative filtering device 110 may be of modular construction to facilitate adding, deleting, updating and/or amending modules therein and/or features within modules. Modules may include a prediction module 112, a storage module 114, a probability rating module 116 or other modules. It should be readily understood that a greater or lesser number of modules might be used. One skilled in the art will readily appreciate that the invention may be implemented using individual modules, a single module that incorporates the features of two or more separately described modules, individual software programs, and/or a single software program. Automated Collaborative Filtering (ACF) has spawned a whole family of techniques and algorithms employed in a myriad of e-commerce applications. The invention provides a prediction module 112 having a unified framework with ACF that is based on Dempster- Shafer ("DS") belief theoretic notions. The prediction module 112 provides Collaborative Filtering based on Dempster-Shafer belief theoretic framework ("CoFiDS") and is capable of conveniently modeling a wider class of data imperfections, propagating partial knowledge throughout the entire decision-making process, providing a framework for incorporating background knowledge into the ACF task, and providing a 'soft' decision that possess information regarding the reliability of the prediction being made. These features are not provided in existing technologies. Indeed, the absence of effective methods for handling data imperfections has been a hurdle that prevents ACF methods from being utilized in more sensitive and critical problem domains, such as medical/healthcare data and battlefield situation awareness, among other domains.

The automated collaborative filtering device 110 may include a storage module 114 for storing data. FIG. 2 illustrates an exemplary database structure that includes customer rating data directed to a subset of a company's products. The database structure may be stored in the storage module 114. Each row represents a customer (U₁-U5), and each column represents a product (I₁-I₇): the field {/ ,j} contains the user I₁ ^'s rating of the item I_j. An empty field indicates that the user has not rated the corresponding item. The invention is directed to improving prediction accuracy on the behaviors of alternative approaches by combining ACF with other recommendation systems and on explaining the predictions of the ACF algorithms. A function of recommender applications is to predict how a user might rate other items, such as the item indicated by the question mark in the last column of user U5 in FIG. 2, based on a given ratings matrix and based on known ratings provided by a particular user. Known recommender applications of ACF include, for example, AMAZON.COM^® book recommender and BLOCKBUSTER^® and NETFLIX^® video recommenders, among other recommenders.

The prediction module 112 is directed to providing a mechanism for accommodating imperfections in user ratings, e.g., ambiguities, uncertainties, etc. An exemplary database that has served as a benchmark domain in many studies is Movielens where about 1,000 users have rated more than 1,600 movies. Since ratings perforce are subjective, the invention considers that different users have varying criteria for providing a film with an "excellent" rating. Additionally, the prediction module 112 takes into account that a user may choose different values depending on other intangible factors, such as a momentary mood or perhaps in comparison to other films that he or she has seen, and rated, at about the same time, among other intangible factors. The prediction module 112 provides a method for handling ambiguous and uncertain ratings in order to create accurate modeling. Another exemplary embodiment of the invention is directed to HIV treatment by highly active antiretroviral therapy ("HAART"). Patients are administered drug combinations, referred to as drug cocktails. The concrete choice is based on the recommendations of the Department of Health and Human Services and on the results of large studies that may not reflect the idiosyncrasies of the given case. Physicians may adjust treatments based on their experience with the successes/failures under given circumstances. From the perspective of the ACF scenario, the physician may generate a "ratings matrix" whose rows and columns correspond to patients and drug cocktails, respectively. The entries may quantify an effectiveness of the drug cocktail when administered to the patient.

The physician may rate the drug response by the values from θ_pref = {Excellent, Good, Fair, Poor}. However, the ratings may not be "hard" (or "crisp", or "perfect"), such as when the ratings have been obtained by a team's collective decision. Even if the ACF algorithm can accommodate probabilistic user rating imperfections, the probabilistic model {Excellent = 0.1, Good = 0.7, Fair = 0.1, Poor = 0.1 } may have difficulty capturing the rating "Good with a 70% level confidence" that is allocated to a drug cocktail. This rating implies that a 30% level of confidence on the complement of Good is also unsatisfactory. Classical models are not good at reflecting conclusions such as, "the effectiveness of the drug cocktail is definitely not Poor, but more evidence is needed to discern further". Lack of mechanisms to accommodate such subjective issues often requires one to make various unwarranted "assumptions" and "interpolations".

The invention is directed to improving ACF methodology by addressing questions such as, how should user preferences be modeled? How can this model be used for extracting useful knowledge and making reliable predictions that are robust against data imperfections? How can the prediction accuracy be improved? How can data sparsity and cold- start, two common problems in traditional ACF algorithms, be addressed in this setup? Data sparsity refers to the difficulties generated by the sparse nature of the ratings matrix. Cold- start refers to the difficulties associated with making predictions for newly introduced users and/or items.

The invention exploits background knowledge that may be available in real- world applications and provides techniques for overcoming data imperfections. The prediction module 112 uses the DS theoretic framework and offers a mechanism to represent a variety of data imperfections, e.g., probabilistic uncertainties, qualitative evidence, evidence ambiguities, missing information, among other data imperfections. The prediction module 112 also may account for and represent ignorance and incomplete knowledge. The prediction module 112 may use DS-based techniques in applications where the integrity of the decision making process and its robustness against modeling errors caused by lack of precise information are critical, such as in battlefield target tracking and situation awareness, among other critical situations.

ACF systems operate using a subset of data. The data subset represents items having similar qualities to an item whose ratings are to be predicted. By contrast, non-ACF systems are deficient at least because they operate using entire item populations, wherein the entire item populations may be represented by a sparse rating matrix. The sparse rating matrix may render difficult the task of identifying similar items.

Referring back to the above described HAART scenario, the number of drug cocktails prescribed to each patient may be small compared to the number of available drug cocktails. Alternatively, the several drug cocktails may not have been rated at all. Furthermore, the number of items that are co-rated by more than one user may be small or the lack of statistical representativeness may render predictions unreliable. According to one embodiment, the prediction module 112 may apply background knowledge about the users and/or items to mitigate data imperfections and may fuse the background knowledge with that yielded by ACF.

The prediction module 112 may apply the DS theoretic basis of CoFiDS to address data sparsity. While it is known to replace each unrated entry of the ratings matrix by a vacuous mass structure, the prediction module 112 applies CoFiDS to narrow the uncertainty inherent in the vacuous mass structure by taking advantage of the background knowledge. For example, the prediction module 112 may fill in the values of unrated items prior to ACF, which increases the computed user-to-user similarity.

The invention further may resolve the cold-start problem of using the system when few ratings are available. In the HAART scenario, the prediction module 112 may apply the DS theoretic model and may use background knowledge to populate drug response entries corresponding to a new patient. The same concept may apply to a newly introduced drug cocktail.

The invention may apply the following underlying mathematical principles. First, in relation to ACF, U = (U₁, U₂, ... , U_M} and I = (I₁, 1₂, ... , I_N} denote exhaustive sets of M users and N items, respectively. Assume a user allocates a 'preference' or 'rating' to an item via a finite, rank-ordered set of L user preference labels θ_pref = {θi, θ₂, ..., Θ_L}, where θ_j < θi whenever j < 1. If a user allocates a rating to an item, then the item is identified as rated. Otherwise, the item is identified as not rated.

A user' s rating can be identified as a mapping f_B : UxI I— > Θ _f : U xl_t ι— > r, where r, ∈ Θ _f denotes the rating that a user U₁ allocates to item h. If h has not been rated by U₁, the system uses r_ιk = 0 . A ratings matrix may be created as an M xN matrix having R = [r_ιk ) where r_ιk ∈ Θ _ref for a rated item and r, = 0 for an unrated item. For i = \,M , the notation \ ] and

may also be used to denote the rated and unrated items of user U₁, respectively. A user whose ratings are currently being predicted may be referred to as an active user.

An item that is rated by multiple users may be referred to as co-rated by those users. In ACF, co-rated items are used to determine whether two users are 'similar' to each other. A similarity designation between a pair of users can be identified as a mapping , where denotes the similarity

between users U₁ and U_j. The higher the value of

the closer the similarity between U₁ and U_j. An M xM symmetric matrix created as may be referred to as the

(user-user) similarity matrix.

The prediction module 112 may apply a DS theory to define as

a finite set of mutually exclusive and exhaustive propositions about some problem domain. The propositions signify the corresponding 'scope of expertise' and are referred to as its frame of discernment ("FoD"). A proposition θ_i , referred to as a singleton, represents the lowest level of discernible information in this FoD.

Elements in 2^Θ , the power set of Θ form all propositions of interest. A proposition that is not a singleton is referred to as a composite, . The term

"proposition" denotes both singletons and composites. Cardinality of set A is denoted by IAI. The set A denotes all singletons in Θ that are not included in A c Θ , i.e., A = {Θ₁ ∈ Θ : Θ₁ ∈ A} = Θ\ A . The mapping m : 2^Θ h-> [0,1] is a basic probability assignment ("BPA") or mass structure for the FoD

The prediction module 112 uses DS theory to model the notion of ignorance by allowing the mass of a proposition to move freely into its individual singletons. For example, complete lack of evidence can be conveniently captured via the vacuous BPA: m(A) = 0, VA c Θ and m(Θ)=1.0. A proposition that possesses a nonzero mass is referred to as a focal element. The set of focal elements is the core and is denoted by F . The triple {Θ, F, m] is referred to as the body of evidence ("BoE"). The number of focal elements in this BoE is F . BoE {Θ, F, m] and A c Θ ,

B1 : 2^Θ ι-> [0,1] , the belief of A is B . The plausibility of A is

According to one embodiment, m(A) measures the support assigned to proposition A only and the belief assigned to A takes into account the supports for all proper subsets of A. Bl(A) represents the total support that can move into A without any ambiguity. Pl(A) represents the extent to which one finds A plausible. When the core contains only singletons, the BPA, belief and plausibility all reduce to probability.

The above-referenced DS theoretic notions allow the prediction module 112 to represent a wide variety of data imperfections with ease, as shown in FIG. 2. For example, in the HAART therapy scenario, the BPAs and m(Excellent, Good, Fair) = 1.0 elegantly

capture the ratings "Good with a 70% level of confidence" and "definitely not Poor but more evidence is needed to discern further," respectively. An unrated item may be captured via the vacuous BPA

The prediction module 112 renders a probability distribution Pr(●), such that

compatible with the underlying BPA m(●). An example of such a probability distribution is the pignistic probability distribution Bp(.)

The prediction module 112 may "pool" the evidence of two 'independent' BoEs to form a single BoE via the Dempster's Rule of Combination ("DRC").

Suppose the two BoEs

pan the same FoD Θ . Then, if , the DRC generates the

where . This combination operation is

denoted as m . The operation Θ is both associative and commutative thus

enabling the combination of multiple BoEs with ease. A variation of the DRC that accounts for evidence reliability is

) , where

Here, d_i ∈ [θ,l] is referred to as a discounting factor.

In accordance with the principles of the invention, each user preference rating is viewed as a BoE spanning over the FoD Θ_pref = {Θ₁,..., Θ_L) . The mapping that generates the BoE r_ik corresponding to the rating r^

is referred to as the DS modeling function. The M xN matrix created as is

referred to as the DS ratings matrix.

The invention is directed to selecting an appropriate DS modeling function f_R that captures the explicit and implicit user preference information, while accommodating the associated imperfections (e.g., ambiguities, uncertainties, missing values, etc.). Simple DS theoretic models may be used for this purpose and contextual information may be incorporated to aid the prediction task.

Even in domains where the user preferences are hard, the ratings assignment process possesses a level of uncertainty. For example, a user may have difficulty selecting or may be unwilling to select a single label as the proper preference rating, e.g., in a movie recommendation scenario where users must rate the movies via a '5- star' system.

Given the prediction module 112 may apply a DS modeling

function to capture the user uncertainty in a wide variety of scenarios using

The trust factor a

quantifies how likely the user assigned rating reflects the user's true perception. The value represents the case when the

user's rating is completely untrustworthy and may be modeled via the vacuous BoE.

The dispersion factor

quantifies how likely the user assigned rating would span a larger set. The value

represents the case when the user assigned rating is allocated a DS theoretic mass (provided that

According to one embodiment, the selection of trust and dispersion factors is domain and dataset dependent. Depending on the available evidence and the complexity of the process, the prediction module 112 may utilize user-wide, item- wide, or system- wide constants for these parameters. These constants also may be used to capture the 'significance' of a particular rating towards the overall ACF prediction process. For example, consider a scenario where most users allocate a similar rating for a particular item (e.g., most users in the Movielens dataset give a higher rating for the movie Titanic). That rating would play a less significant role in the CF prediction process. The prediction module 112 may use a smaller item- wide constant value for a_ιk .

The prediction module 112 combines the trust and dispersion factors to control the DS theoretic mass assigned to the user assigned rating. The DS modeling function in Equation (3) captures a wide variety of user uncertainty. For example, consider ACF algorithms where weighted majority voting strategies produce significant prediction performance improvements compared to correlation based methods. By allowing a ±1 tolerance on user ratings when calculating similarities, these algorithms accommodate a certain level of uncertainty in user rating assignment. The prediction module 112 may apply the DS modeling function in Equation (3) with the parameter {a_ιk , σ_ιk } = {1,1} to capture this scenario. Equation (3) is one simple model that may be used in accordance with the principles of the invention. However, one of ordinary skill in the art will readily appreciate that other modeling functions may be used.

The prediction module 112 may apply the vacuous BoE to model lack of evidence that manifests itself as an unrated entry. Although the system may simply proceed with this vacuous BoE model for an unrated entry, the prediction module 112 may incorporate contextual information to effectively reduce the uncertainty that would otherwise be introduced by a vacuous BoE representation. Embodiments of the invention exploit the power of DS theory to represent imperfections, while reducing the uncertainty of missing entries by using contextual information.

The prediction module 112 may completely populate the ratings matrix prior to the application of ACF and may combine information from multiple sources, taking into account their reliability and significance. Furthermore, the prediction module 112 provides solutions to difficulties associated with data sparsity and cold-start. According to one embodiment, empty slots may be filled in using implicit, explicit, and other contextual information before application of the ACF.

Returning to the HAART therapy scenario presented above, the patient criteria may include Drug_Compliance, Initial_Viral_Load, and Age, among other criteria. The patient criteria may impact the drug response of a drug cocktail and provide contextual domain expertise. The contextual domain expertise defines a "concept" for grouping patients. According to one embodiment, each concept may provide criterion for grouping the patients. For example, the concept Drug_Compliance may have the following groups: Drug_Compliance.High, Drug_Compliance. Medium, and Drug_Compliance.Low. Users associated with a group may be defined to possess similar drug responses to selected drugs. The groups corresponding to a selected concept may not partition the user space. For example, a user may belong to one or more groups from the same concept. This grouping is directed to the HIV drug treatment context. One of ordinary skill in the art will readily appreciate that alternate groupings may be provided for different context.

According to one embodiment, the prediction module 112 may apply contextual information to populate an unrated entry r_ik ∈ R . The system may combine or fuse an effectiveness rating that each group in which U₁ is a member allocates to I_k as a "whole." This fusion operation may be carried out in two stages. At the group level, the system fuses the group preference of each group to which U₁ belongs and generates a concept preference. At the concept level, the system fuses the concept preference of all grouping concepts and generates an overall contextual preference.

Regarding the HAART therapy discussed above, the prediction module 112 may apply item-based concepts in addition to the user-based concepts, among other concepts. For example, a physician may group the drug cocktails based on an item- based concept, such as Class_of_Drugs. One of ordinary skill in the art will readily recognize that the principles of the invention may extend equally to other concepts.

The prediction module 112 may assign "Q" as the number of groups belonging to the 'generic' concept "Concept," which may be defined as {Concept.Groupi, ..., Concept.Group_Q}. One concept is considered here for notational simplicity. For multiple concepts, a subscript/superscript / may be added to differentiate among concepts. According to one embodiment, the groups to which a user belongs may be identified via the mapping

f_c : U I— > \ Concept. GwUp₁,..., Concept. Group_Q J . This mapping is defined as the grouping function.

The prediction module 112 may apply a DS theoretic BPA to define how the group members belonging to the group Concept.Group_j would, as a whole, rate the item I_k . If information regarding the group preferences of each item is available, the prediction module 112 may use this information directly in a DS theoretic setting. Otherwise, the prediction module 112 may consider users within a given group that have already rated item I_k .

The group preference BPA may be defined as where

The corresponding BoE

is the group preference BoE. The concept preference BoE corresponding to user U₁ and item I_k may be obtained by combining or fusing these group preference BoEs.

The concept preference BPA may be defined as

where

The corresponding BoE

is the concept preference BoE. The overall contextual preference BoE corresponding to user U₁ and item I_k may be obtained by fusing all the concept preference BoEs.

The contextual preference BPA may be defined as

where

The corresponding BoE

is the contextual preference BoE.

The prediction module 112 may modify the DS ratings matrix R such that each unrated entry is replaced by its corresponding contextual preference BoE, i.e.,

when matrix element

. The prediction module 112 may employ this ratings matrix for future calculations.

In the fusion operations being carried out by the concept preference BoE and the contextual preference BoE, the prediction module 112 may employ a discounting factor to discount each constituent BoE prior to application of the DRC. This may be particularly relevant in an application such as the HAART therapy scenario, e.g., if one concept such as Age is known to have less of an impact on the drug response.

The BoE r_ik may be considered an 'intra-item' BoE that captures the user preference toward a single item. In order to capture a user preference toward all items as a whole, the prediction module 112 may use an appropriately constructed 'inter- item' BoE defined over the cross-product space of

A focal element may be extracted from the BoE r_ik . Its cylindrical

extension to the cross-product FoD Θ is

where The mapping where

generates a valid BPA defined on the FoD Θ . The corresponding BoE is the user-BoE generated by extending the

For user U₁ , consider the BoEs M_ιk (•) , k = 1, N , respectively. Then the BPA Af, : 2^Θ ι-> [0,l] where

is referred to as the user-BPA of user U₁. The corresponding BoE {Θ, F₁, M₁) is the user-BoE.

The following result is provided. User U_i 's user-BPA M₁ (defined over the

FoD Θ ) and the ratings BPAs m

are each defined over the FoD Θ_pref . Then, the pignistic probability of the singleton is

where Here, Bp_; (•) and B fer to user U₁ ' s

pignistic probability distributions corresponding to its user-BoE and ratings BoEs, respectively.

Since the user-BoE defines a user's 'joint' preference over all the items, a distance metric may be defined on the cross-product FoD Θ to calculate the 'distance' between two users. The 'distance' may be used to identify the similarity among users. If a distance measure between two probability mass functions (p.m.f.s) is available, via the application of the pignistic transformation in Equation (1), the prediction module 112 may use this distance measure as a distance measure between two BoEs. The distance measure between the two user-BPAs M₁ and M _} defined over the same cross-product FoD Θ is D(M₁ , M _} ) = CD(Bp₁ , Bp ) , where Bp, and Bp₇ denote the pignistic probability transformations corresponding to M₁ and M _} , respectively. CD(»,») refers to the Chan-Darwiche ("CD") distance measure:

Thus, the distance measure between the two user-BPAs M_i and M _j where Bp_i/k and Bp_j/k refer to the pignistic

probability distributions corresponding to the ratings BPAs of users U₁ and U , respectively.

When determining the distance between two user-BoEs, the prediction module 112 may use the distances between ratings BPAs (which are defined over

instead of directly computing the distance between the two user-BPAs, which are defined over the cross-product FoD Θ . The associated reduction in computational overhead is from _Or by a fraction of

A monotonically decreasing function is provided as satisfying (O) I d ( ) 0 With respect to

is referred to as the user-user similarity between users U₁

and

, the prediction module 112 may apply , where

is a domain specific constant. The M xM user-user similarity matrix then

may be generated as

The prediction module 112 may apply the K-nearest neighbor ("KNN") strategy, the minimum similarity thresholding ("MST") (where all users having a similarity higher than or equal to a specified threshold are selected), or a combination of both to perform neighborhood selection in ACF. According to one embodiment of the invention, the prediction module 112 may use the K-nearest neighbors with minimum similarity thresholding technique due to its ability to mitigate prediction errors that are generated from dissimilar users. With KNN alone, the prediction module 112 may select K neighbors even though all of them may not be sufficiently similar to the active user. For given parameters τ and K, the largest set that satisfies and

i_{s the}

neighborhood set Nbhd_ik of user

, The prediction module 112 may select

Nbhd_ik by applying MST to U and selecting those users who have rated item I_k and meet the minimum similarity threshold τ with U₁. The prediction module 112 may then apply KNN to select at most K users from this user set having the highest similarity with U To determine the neighborhood corresponding to a new user, the

condition may apply for Nbhd,^.

The ACF predictions are usually generated by evidence gathered from neighboring data entries that rate an item of interest. The prediction module 112 enhances the ACF predictions by applying contextual information to populate the ratings of all the neighboring data entries for the users. Therefore, the invention is able to exploit the evidence from all the neighboring data entries rather than only the neighboring data entries that include rating data for selected items. The prediction module 112 may represent the prediction of the unrated item I_k of the active user U₁ as the B , where

Here, m_ιk ^> is the BPA corresponding to the neighborhood prediction

Since the prediction module 112 captures the similarity between users via the user-user similarity, the above equation utilizes the user-user similarity as a discounting factor to 'discount' the ratings BoEs of the neighbor data entries prior to fusion. The predictions of the invention offer more flexibility to the decision-maker than what other ACF schemes may provide. The predictions provide information regarding the confidence associated with the ratings prediction and allow the system to make decisions that correspond more closely to the application domain requirements.

For a hard decision on a singleton classification, the prediction module 112 may use the pignistic probability in equation (1) and may select a singleton as the preference label. If one preference label such as a singleton or a composite is desired, the prediction module 112 may apply the maximum belief with non- overlapping interval strategy (maxBL). The prediction module 112 may select the singleton preference label whose belief is greater than the plausibility of any other singleton. If this preference label does not exist, the prediction module 112 may select the composite preference label that includes a singleton label that has a maximum belief and those singletons that have a higher plausibility. According to an exemplary embodiment, the above concepts are implemented using the Movielens Dataset. Movielens is a movie recommendation dataset widely used by researchers for benchmarking purposes. At the time of use, Movielens included 100,000 ratings from 943 users for 1682 movies. In Movielens, that ratings are assigned integer values between 1 and 5, with 5 representing the highest possible rating ( Θ_pref = {l, 2, 3, 4, 5} ). In addition to ratings, Movielens includes the Genre of each movie, user-related information and item-related information. The user-related information includes age, gender, and occupation of individual users, among other user-related information. The item-related information includes title and IMDb_URL, among other item-related information. While Movielens, with its integer ratings, is not an ideal dataset to demonstrate the full functionality and effectiveness of the invention, it is considered appropriate for traditional ACF algorithms. Thus, Movielens provides a domain for performance evaluation and comparisons with the principles of the invention.

A domain with soft user ratings is provided to demonstrate the functionality of the invention. A probability rating module 116 is provided to create a DS_Movielens dataset by modifying Movielens through artificially introducing imperfections into the data. The modification process introduces imperfections while preserving existing user-user, item-item and user-item relationships that are needed by ACF algorithms.

The probability rating module 116 applies the following viewpoint regarding the Movielens ratings to generate the DS_Movielens dataset. Suppose the users considered "soft" ratings. Since the Movielens domain only shows hard ratings, the probability rating module 116 employs a mechanism to transform a hard rating in Movielens to a soft rating in DS_Movielens. The probability rating module 116 applies partial probability models to create different user profiles. Such partial probability models, together with the power set method, are used to convert data rife with diverse types of imperfections into DS theoretic evidence.

FIG. 4 illustrates graphical summaries of partial probability models for four user profiles based on zero tolerance, ±1 tolerance, end-weighted ±1 tolerance, and ±2 tolerance. In each of the four graphs of FIG. 4, the horizontal axis represents the user rating as it appears in the Movielens dataset. The vertical axis represents the 'true' rating that a movie received.

The probability rating module 116 employs user profiles to generate the DS_Movielens dataset. For a movie having a True_Rating = 2, a ±1 tolerance user may, with equal probability, allocate either True_Rating = 2 or a rating from the set (1,2,3) (see FIG. 4(b)). In other words, if the hard rating Movielens dataset allowed soft ratings, a ±1 tolerance user may sometimes rather use the interval-valued rating (1, 2, 3) for this same movie instead of the hard rating Movielens_Rating = 2. The system identifies these as the black or gray distribution, with the black distribution representing the hard rating Movielens_Rating = 2 and the gray distribution representing the interval-valued rating (1,2,3). A zero tolerance user, for the same movie, would allocate a Movielens _Rating = 2 (see FIG. 4(a)). An end-weighted ±1 tolerance user behaves similar to a ±1 tolerance user (see FIG. 4(c)) and a ±2 tolerance user may, with equal probability, allocate either True_Rating = 2 or a rating from the set (1,2,3,4) (see FIG. 4(d)). Clearly, these four user profiles represent a relatively broad spectrum of users. To determine what user's opinion could have generated Movielens_Rating = 2, the probability rating module 116 employs the power set approach while generating DS_Movielens from Movielens. The power set approach accounts for user rating imperfections, without resorting to various "assumptions" and "interpolations." The power set approach applied by the probability rating module 116 may identify the gray and black distributions as 0 and 1, respectively. The "state of nature" may be considered to be in one of 2⁵ = 32 states. Suppose a ±1 tolerance user allocates a Movielens_Rating = 2. If the state of nature is defined as { l,x,l,x,x}, then the generating distributions are black for True_Rating = { 1,3} and either gray or black for the other ratings. In view of this, the only "feasible" true rating that could have generated Movielens_Rating = 2 is in fact True_Rating = 2. Alternatively, if the state of nature is defined as {0,x,0,x,x}, then the generating distributions are gray for True_Rating = { 1,3} and either gray or black for the other ratings. In view of this, the feasible true ratings are True_Rating = { 1,2,3}. In this manner, the probability rating module 116 may complete the "Feasible True_ Rating" column in FIG. 5. The set of feasible true ratings of any other Movielens rating corresponding to an arbitrary user 'character' profile can be obtained similarly. FIG. 5 shows the BPA corresponding to Movielens_Rating = 2 of a ±1 tolerance user.

According to one embodiment, the probability rating module 116 generates five DS_Movielens datasets with different values for p, viz., p= {0.1,0.3,0.5,0.7,0.9}. The probability rating module 116 may create each DS_Movielens dataset in the following manner. A user-item pair may be selected that has been rated as r^.

Randomly, with the probabilities { p, (l - p ) / 3, (l - p ) / 3, (l - p ) / 3] , one user profile may be selected from FIG. 4(a), 4(b), 4(c), and 4(d), respectively. The corresponding feasible true ratings and DS theoretic BPA r_ιk may be obtained via the procedure described above and r± may be replaced with r_ik . The probability rating module 116 may repeat this process for all rated entries in Movielens dataset.

The probability rating module 116 may transform the DS theoretic user ratings in the DS_Movielens dataset generated in the previous step into probabilities via the pignistic transformation in Equation (1). This is how the PR_Movielens dataset is generated. The dataset that is generated by selecting the most likely ratings in FIG. 5, which provides the rating {2}, produces the Movielens dataset that was initially selected.

For performance comparisons, the system generated re-implementations of the following ACF algorithms. A broadly used ACF system based on correlation analysis is identified by the acronym CORR. An algorithm proposed by Nakamura and Abe in a periodical identified by A. Nakamura and N. Abe, "Collaborative filtering using weighted majority prediction algorithms," Proc. International Conference on Machine learning (ICML '98), San Francisco, CA: Morgan Kaufmann, 1998, pp. 395-403, is identified by the acronym NA. The authors of NA provided three variants: one based on user-to-user similarity (u-NA), one based on item-to-item similarity (i-NA), and one combining these two (c-NA). These algorithms enable the user to accommodate the ignorance inherent in user ratings. In experiments reported by Nakamura and Abe, NA compares favorably with correlation-based methods. While neither CORR nor NA is directly applicable to the DS_Movielens datasets, the probability rating module 116 may apply CORR to the PR_Movielens with non-integer ratings generated by weighing each rating by its corresponding probability. By contrast, integer- valued ratings are used in NA. If the probability rating module 116 generates such dataset from the DS_Movielens dataset, the information provided by the user ratings may be significantly distorted. As a result, NA is applied to the Movielens dataset.

According to one embodiment, simulations were run with the DS modeling function in Equation (3). In view of the absence of adequate information, the two model parameters, trust factors and dispersion factors, were replaced with system- wide constants: {a_ιk , σ_ιk } ≡ {a, σ] , Vik .

For example, the concept of Genre information was used to generate item- based contextual information. Adopting the nomenclature detailed above, the invention defines Concept := Genre and Groups := {Genre. Groupi,....,Genre.Group_Q}, where the concept groups may be Drama, Thrillers, and Romance, among other concept groups. When generating the group-preference BoEs, the probability rating module 116 seeks to capture how movies from a given genre would, as a whole, be rated by user U₁. Since Movielens does not provide users the opportunity to express their genre preferences explicitly, the probability rating module 116 estimates the users' preferences using the group preference BPA applied to movies that have already been rated by user U_i. According to one embodiment, no discounting is applied.

According to one embodiment, the probability rating module 116 does not apply discounting to the definitions for a concept preference BPA and a contextual preference BPA described above. If additional concepts are utilized, not all concepts contribute equally to user preferences. For example, the contribution of concept Director may be different than the contribution of concept Cast. These differences may be accommodated through discounting. The ratings in the DS_Movielens dataset were used as the user preference BoE without additional modeling and the genre information was used as in the above case. The following methodology was used for consistency with prior methods in conducting performance experiments which apply the principles of the invention. According to one embodiment, the system randomly selects 10% of users and withholds five randomly selected movie ratings for each user. In other words, the probability rating module 116 hides five non-empty fields in the ratings matrix and prevents these fields from being used during training. The system subsequently uses these withheld ratings as an independent testing set. The remaining ratings represent the training set. This process may be repeated for ten different random splits into training and testing sets. The resulting sets are denoted by and

where £ = 1, ...,10. The results shown below are averages obtained from the 10 splits. For user-to-user similarity, set γ = 10^-4 .

The probability rating module 116 denotes the 'true' rating that user U₁ gives to item in the case of the Movielens dataset and by

in the case of the DS_Movielens dataset. The ratings predicted by the CORR and NA techniques are denoted by r,k and the ratings predicted by the invention are denoted

Performance criteria used for evaluations of ACF algorithms in environments with hard ratings include the mean absolute error (MAE). Other metrics include Precision or Recall. The MAE corresponding to the rating of the testing set

m_{ay be calculated as follows:}

where identifies the user- item pairs whose true

rating is . With obvious modifications, the probability rating module 116

may obtain the overall MAE measure for the ACF algorithm.

The MAE expects hard ratings to generate the 'predicted' rating. As a result, in order for the MAE to compare CoFiDS with the CORR and NA, the DS theoretic predictions are converted to hard predictions through, for example, the pignistic transformation. Alternately, the probability rating module 116 may compare soft predictions, such as those provided by CoFiDS, with hard ratings using the following DS theoretic measures:

where

Here,

refers to the pignistic probability that corresponds to the DS theoretic

. The following performance criteria are provided:

where β

. For example, measure.

In environments where the user preference ratings are soft, such as the

DS_Movielens dataset, the degree to which one BPA (viz.,r» ) approximates another BPA

is determined using the following definition:

where and

and denotes the Euclidean norm. Here,

are each a size

column vector containing the masses allocated to each subset of by and respectively. matrix with

According to one embodiment, both DS-PEl and DS-PE2 may take values from [0,1]. For DS-PE2, the probability rating module 116 may have used the KL- divergence instead of the Euclidean norm. In this case, the error would not be bounded by the closed interval [0,1]. Moreover, KL-divergence may use the pignistic distributions corresponding to the true and predicted BPAs to have identical supports.

The behavior of CoFiDS depends on a few parameters, which leads to the examination of how the concrete settings of these parameters affect CoFiDS' performance. An elementary performance measure is the MAE. FIG. 6 illustrates α = 0.9, which indicates 90% confidence in each user rating. FIG. 6 illustrates how CoFiDS' MAE varies with different values of the dispersion factor σ (for several choices of {K,τ }). The results indicate that the performance is minimally sensitive as long as σ is somewhere in the interval [0.4, 0.7], with the best overall MAE being obtained when σ ~ 2/3 . FIG. 12 illustrates experiments for σ = 2/3 , as described below.

FIG. 7 illustrates how CoFiDS' MAE changes with the neighborhood size, K. FIG. 8 illustrates how MAE varies with the similarity threshold τ. These graphs show the impact of some other parameters, with other variables remaining fixed. As shown, the MAE first drops with increasing K, but then appears to stabilize for higher values, such as around K > 70. The MAE remains generally constant beyond K>70.

As for the similarity threshold, τ, FIG. 8 illustrates a minimum for MAE around τ = 0.79. The results of these experiments are used as guidance in the next set of experiments, where {K, τ} = {80, 0.79}. For a concrete domain, the value of these parameters may need to be established using a cross-validation technique. Proceeding to the more appropriate DS -based performance criteria, FIGS. 9 and 10 illustrate how the value of DS-MAE for CoFiDS varies with changing neighborhood size, K, and similarity threshold, τ, respectively. In FIGS. 9 and 10, all other parameters kept constant. The nature of the DS theoretic predictions renders subjective the direct comparison of CoFiDS' performance with that of CORR and NA.

FIG. 11 illustrates a few exemplary CoFiDS predictions performed by the probability rating module 116 using Movielens and single-label predictions obtained by the pignistic transformation and the maxBL strategy. The decision that corresponds to the user-item pair (72, 550) is not controversial. By contrast, the decision that corresponds to the user- item pair (2, 251) shows that there may be a challenge capturing the richer information content of the DS theoretic BoE with a single-label decision. FIG. 11 illustrates that although the pignistic transformation and the maxBL strategy both favor a "4" rating, the CoFiDS prediction does not appear clearly to discriminate between the "4" and "5" (true) ratings. For the user- item pair (116, 758) illustrated in FIG. 11, while the maxBL strategy captures the indecision that is apparent in the CoFiDS prediction, the pignistic transformation does not.

In view of the different nature of the systems being compared, two strategies emerge for comparing the predictions. A first strategy includes converting CoFiDS' predictions to hard ones. A second strategy includes interpreting CORR' s and NA' s predictions as soft predictions. Each of these strategies is addressed separately.

After the probability rating module 116 converts the CoFiDS' predictions to hard decisions for direct comparison with the CORR and NA, the pignistic transformation may be used to generate hard decisions from the soft CoFiDS predictions. This approach reduces the effectiveness of CoFiDS, whose strength is the ability to generate soft decisions. This strategy is available for cases where hard decisions are satisfactory.

According to one embodiment, the basic parameters for CoFiDS are set to { α, σ} = {0.9, 2/3}. To quantify the prediction performance, MAE and other field information retrieval criteria, such as Precision, Recall, and Fi are used. A high value is desired for Precision in certain domains to ensure that the system's prediction of value True_Rating is accurate. This desire is valid even if the system may have missed many cases where the true user's rating was True_Rating, such as if the system predicts "2." While this value may be relied upon, the system may have missed many cases where the true value was "2". By contrast, Recall is desired in domains where the system needs to correctly recall as many occurrences of ratings True_Rating as possible. Fi combines the two criteria and is preferred in domains where Precision and Recall are deemed equally important.

FIG. 12 summarizes the results of these experiments. Bold values indicate the best performance in each category. As the differences are substantial, the statistical significance is not evaluated. In FIG. 12, each of the five possible ratings ("1" through "5") is provided a column. The experiments show that NA-based predictions are seldom the best, which appears to indicate that the technique is not well suited for soft ratings of this particular kind. The situation is less straightforward when CORR and CoFiDS are compared. A superficial observation demonstrates that, on average, CoFiDS' mean error is lower. This apparent performance edge may be attributed to this system's higher ability to predict the "middle" ratings of "3" and "4". By contrast, CORR more accurately predicted "1." In this example, the margin between the two systems is low. The Fi criterion provides similar impressions, with Fi components potentially offering deep insights. According to one embodiment, CoFiDS may be preferable in domains where the user emphasizes Precision, whereas CORR may be a better choice when Recall is of importance.

The prediction module 112 may provide the CoFiDS results even though a conversion to a hard decision may not exploit the full strength and functionality of the underlying DS -theoretic basis. According to one embodiment, coverage performance that calculates the percentage of items for which the ACF algorithm can make correct predictions is lower for CORR if the ACF algorithm parameters have been tuned for lower MAE. Both NA and CoFiDS provide nearly complete coverage. So, for an improved comparison with CORR, a configuration is used that minimizes MAE, while keeping the 90% level coverage. Turning now to a second analysis strategy where CORR and NA decisions are interpreted as soft predictions, integer-valued predictions are not needed for CORR. This simplifies the comparison of CORR with CoFiDS along the soft predictions. By contrast, the NA decisions cannot be readily "softened" because they are integer- valued decisions. For this reason, a comparison of CORR with CoFiDS is provided below.

The prediction module 112 may apply the following DS -theoretic BPA to interpret a CORR prediction,

, as soft:

where

and

denote the highest integer ratings that do not exceed the CORR

prediction

, and the lowest integer rating that does not fall below the CORR prediction , respectively. According to one embodiment, the CORR prediction 3.3

with is interpreted as the Bayesian statement, "The rating is 3 with

70% confidence, and it is 4 with 30% confidence". Equation (7) corresponds well with this typical interpretation of a CORR prediction.

FIG. 13 summarizes the results for the configuration that yields the best overall DS-MAE being used for each ACF algorithm. The same CoFiDS parameters are used as before. Again, bold values in FIG. 13 indicate the best performance in each category.

While the average mean error is lower in the case of CoFiDS, the correlation- based approach provides improved results for predicting the maximum and minimum values ("1" and "5," respectively). By contrast, CoFiDS provides enhanced results for predicting the "middle" values of "3" and "4." This same conclusion is reached in the case of the F₁ criterion, whose components display different behavior for each of the two systems. CoFiDS is preferred in domains where high precision is desired, while CORR is preferred in applications where high recall is desired.

According to one embodiment, the true ratings may be soft to permit performance comparisons using the criteria DS-PEl and DS-PE2 from above. While the CoFiDS predictions are provided in the soft form, CORR predictions are converted to soft predictions using Equation (7). FIG. 14 illustrates the comparison for several different values of p, the probability with which the zero tolerance user was selected. The other 3 user profiles were selected with equal probability. Since the CoFiDS consistently outperforms the CORR system by a large margin, the evaluation of statistical significance is not performed.

FIG. 15 illustrates how DS-PEl varies with the changing neighborhood size K when p = 0.1. The performance is poor as long as the neighborhood is small. The performance peaks and then starts slowly degrading. In a realistic domain, the graceful performance degradation after reaching the optimum value supports the notion that the optimum value of K can be obtained by cross-validation techniques. In a general field of recommender systems, the invention provides methods of accommodating data imperfections for domains where the user ratings are subjective or are otherwise unreliable. The system applies coarse setting to system-wide parameters. According to one embodiment, the CoFiDS performance compares favorably with performance derived using conventional ACF techniques. The invention uses CoFiDS to generate soft decisions where domain experts offer subjective opinions. The invention propagates the uncertainties from the user- preference ratings to the output predictions. By contrast, conventional methods of "forcing" decisions into crisp integer values is deficient. The invention may be realized in hardware, software, or a combination of hardware and software. Any kind of computing system or other apparatus adapted for carrying out the methods described herein is suited to perform the functions described herein.

A typical combination of hardware and software could be a specialized or general purpose computer system having one or more processing elements. A computer program may be provided and stored on a storage medium that controls the computer system when loaded and executed, such that it carries out the methods described herein. The invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which, when loaded in a computing system is able to carry out these methods. Storage medium refers to any volatile or non-volatile storage device.

Computer program or application in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.

In addition, unless mention is made above to the contrary, it should be noted that all of the accompanying drawings are not to scale. Significantly, this invention can be embodied in other specific forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be had to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.

Claims

What is claimed is:

1. An automated collaborative filtering device in communication with a client terminal device and receiving data from a plurality of sources, the automated collaborative filtering device comprising: a storage module that stores data gathered from the plurality of sources, wherein the data includes contextual information and wherein the storage module has a database that includes filled data slots and empty data slots; and a prediction module that communicates with the storage module and the client terminal device, the prediction module is programmed to generate prediction data based on the contextual information, wherein the prediction data is provided to populate the empty data slots.

2. The automated collaborative filtering device according to claim 1, wherein the prediction module processor applies Dempster-Shafer belief theoretic collaborative filtering to populate the empty data slots.

3. The automated collaborative filtering device according to claim 1, wherein the prediction module generates reliability information for the prediction data.

4. The automated collaborative filtering device according to claim 3, wherein the prediction module generates predictions with reliability information for each of the plurality of sources.

5. The automated collaborative filtering device according to claim 4, wherein the prediction module combines the predictions with the reliability information from the plurality of sources to provide an aggregate prediction with aggregate reliability information..

6. The automated collaborative filtering device according to claim 1, wherein the prediction module populates the empty data slots prior to performing automated collaborative filtering.

7. The automated collaborative filtering device according to claim 1, wherein the prediction module organizes the gathered data into at least one category based on criteria including at least one of user data and item data.

8. The automated collaborative filtering device according to claim 7, wherein the prediction module generates prediction data based on one category using at least one of a K-nearest neighbor selection and a minimum similarity threshold selection.

9. The automated collaborative filtering device according to claim 7, wherein the prediction module generates prediction data based on at least two categories using at least one of a K-nearest neighbor selection and a minimum similarity threshold selection.

10. A method of performing automated collaborative filtering, the method comprising: providing a database that includes filled data slots and empty data slots; storing data gathered from a plurality of sources into the database, obtaining contextual information from the stored data; generating prediction data based on the contextual information; and populating the empty data slots with the prediction data.

11. The method according to claim 10, further comprising applying

Dempster-Shafer belief theoretic collaborative filtering to populate the empty data slots.

12. The method according to claim 10, further comprising generating predictions with reliability information for the prediction data.

13. The method according to claim 12, wherein the predictions with reliability information is generated for each of the plurality of sources.

14. The method according to claim 13, further comprising: combining the predictions with reliability information from the plurality of sources; and providing an aggregate prediction with aggregate reliability information.

15. The method according to claim 10, further comprising populating the empty data slots prior to performing automated collaborative filtering.

16. The method according to claim 10, further comprising organizing the gathered data into at least one category based on criteria including at least one of user data and item data.

17. The method according to claim 16, further comprising generating prediction data based on one category using at least one of a K-nearest neighbor selection and a minimum similarity threshold selection.

18. The method according to claim 16, further comprising generating prediction data based on at least two categories using at least one of a K-nearest neighbor selection and a minimum similarity threshold selection.

19. An automated collaborative filtering device in communication with a client terminal device and receiving data from a plurality of sources, the automated collaborative filtering device comprising: a storage module that stores data gathered from the plurality of sources, wherein the data includes contextual information and wherein the storage module has a database that includes filled data slots and empty data slots; a probability rating module that communicates with the storage module and the client terminal device, the probability rating module being programmed to extract predefined values from the data and transform the predefined values into a probability of obtaining the predefined values; and a prediction module that communicates with the probability rating module, the prediction module being programmed to generate prediction data based on the contextual information and the probability of obtaining the predefined values, wherein the prediction data is provided to populate the empty data slots.

20. The automated collaborative filtering device according to claim 19, wherein the probability rating module is programmed to apply a weighting factor based on the source of the data.