Data processing apparatus for propagative correlation
Download PDFInfo
 Publication number
 US7853588B2 US7853588B2 US11858069 US85806907A US7853588B2 US 7853588 B2 US7853588 B2 US 7853588B2 US 11858069 US11858069 US 11858069 US 85806907 A US85806907 A US 85806907A US 7853588 B2 US7853588 B2 US 7853588B2
 Authority
 US
 Grant status
 Grant
 Patent type
 Prior art keywords
 matrix
 table
 step
 elements
 block
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active, expires
Links
Images
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRICAL DIGITAL DATA PROCESSING
 G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
 G06F17/30—Information retrieval; Database structures therefor ; File system structures therefor
 G06F17/30943—Information retrieval; Database structures therefor ; File system structures therefor details of database functions independent of the retrieved data type
Abstract
Description
This application claims priority of French Patent Application Number 0608861, filed on Oct. 10, 2006.
The invention relates to the bringing together of data which are a priori heterogeneous, for the purpose of drawing from them a concrete proposal, which will be referred to herein as a “propagative correlation”.
In general, the data in question associate objects from the real world and their perception or evaluation, particularly by people. As the overall field to be processed grows, the computing load increases very rapidly, as it is linked to the product of the number of objects by the number of users. The reason for this will be explained later on.
The current methods therefore use highly approximate models, and this obviously has a detrimental effect on the quality of the results obtained.
The invention sets out to improve the situation.
To this end, the invention proposes a data processing apparatus for propagative correlation which comprises:

 a memory for storing a raw matrix that crosses object identifiers and actor identifiers, in the presence or absence of a quantified matrix element,
 a background module capable of converting the raw matrix into rearranged matrix blocks as a function of a criterion linked to the quantified matrix element, and
 a selection manager capable, on receiving an input actor identifier, of looking up this input actor identifier in a first table which contains links, each of which associates an actor identifier and one or more matrix blocks associated with this actor identifier, and of presenting information depending on the contents of the matrix block or blocks associated with the input actor identifier.
In this apparatus, the background model comprises a crossclassifier for converting the raw matrix into reclassified matrix blocks in accordance with a criterion that combines a presence metric of the matrix element and a value metric of the matrix element, the selection manager operating on the basis of the rearranged matrix blocks.
An apparatus of this kind enables extremely satisfactory results to be obtained, as this approach makes it possible to take account of both the actor profiles and the object profiles. The classification thus carried out enables a configuration to be obtained which is particularly well suited to the problems of “voids” which are characteristic of this type of evaluation.
Optionally, and in particular embodiments, the apparatus described hereinbefore may have the following features:

 the background module may comprise a controller for calling up the crossclassifier with a criterion linked to the presence of the matrix element, in order to collate both the actor identifiers by the matrix element presence/absence profile according to the object identifiers, and the object identifiers by the matrix element presence/absence profile according to the actor identifiers, in order to convert the raw matrix into intermediate matrix blocks;
 the controller may iteratively and selectively call up the crossclassifier with a criterion linked to the value of the matrix element over at least some of the intermediate matrix blocks which have matrix element presence/absence profiles of a selected density to convert them into reclassified matrix blocks;
 the background module may further comprise a smoother capable of filling the emptied matrix elements of a given matrix block, as a function of a criterion linked to the density of the presence/absence profile of matrix elements of this block, to obtain a filled dense block;
 the smoother can selectively fill, by propinquity, the empty matrix elements of a matrix block which has a density of a chosen level;
 the smoother can fill the empty matrix elements of a matrix block on the basis of an iteration that selectively crosses the values of adjacent matrix elements;
 the smoother can fill, by averaging, the empty matrix elements of a matrix block which has a density of a chosen level; and
 the selection manager can present the information as a function of a second table, which contains links, each of which associates a matrix block and one or more object identifiers.
Further features and advantages will be more readily apparent from a perusal of the following description, provided in an illustrative and nonrestrictive capacity, of embodiments illustrated in the drawings, wherein:
The drawings and description that follow essentially contain elements of a specific nature. They may therefore serve not only to help with the understanding of the present invention but also contribute to its definition, as necessary.
Moreover, the detailed description is supplemented by Annexe 1 which shows a particular method of collaborative filling, as will be explained hereinafter.
This annexe is set apart with the aim of clarification and to make it easier to refer to. It forms an integral part of the description and may therefore also contribute to the definition of the invention, if necessary.
Known data processing solutions make it possible to predict the evaluation that a person will make of an object which they do not know a priori, on the basis of evaluations of other objects by other people. These solutions conventionally consist in grouping objects or person that have similar characteristics into classes. These groups or classes are then used to determine, jointly in such a class, an evaluation parameter which will be propagated to other objects not yet processed. This involves a sort of prediction which may be reductively referred to as a “recommendation”.
A number of difficulties arise in this field. In fact the data available are generally fairly disparate and few and far between. As a result, the propagative correlation, or, if preferred, the “recommendation”, is at best of middling reliability.
Specifically, the complete set of possible objects has to be broken up into subsets containing a limited number of objects. The larger the subsets, the more hope there is of finding (at least) two people who have expressed preferences (“evaluation”) regarding “common” objects and are therefore capable of forming the basis for a recommendation. Conversely, to obtain a process that results in reliable recommendations, the number of objects in each subset should not be too large. In fact, this presentation is simplified, as it is necessary to have both a large number of common objects and a large number of people, and this is even more unusual.
It will be perceived that it is not reasonably possible to work on the complete set of available data all at once, as this would involve a prohibitive computing workload, linked to the product of the number of objects by the number of people.
For this reason, simplified, or even random, methods are currently used. This is what various Internet sites and other data processing applications do in order to present a proposal to the user. Within the scope of a trading site, for example, this recommendation is useful for suggesting to a user a product that is thought to suit his tastes, and thus possibly achieve an additional sale.
The offline part HL comprises a source of raw data 4 and a background module 6 which accesses the source 4 to produce a matrix model 8.
The data source 4 may be constructed in various ways, e.g. by means of a database such as the MovieLens database, or another database such as an LDAP directory.
The background module 6 comprises a controller 10 which interacts with a crossclassifier 12 and a smoother 14, as will be explained in the description of
The online part OL comprises a selection manager 16 which interacts with a user 18 on the one hand and with a set of online matrices 20 and an updating stack 22 on the other hand, to carry out evaluations which are here termed recommendations, as will be explained more fully in the description of
The user 18 may be connected directly to a computer or a terminal that uses the apparatus 2, or he may be connected to the latter through an Internet server using a selection manager 16, by means of a web service, for example. Numerous other implementations are possible for the selection manager 16.
The set 20 is a copy of the matrix model 8 or at least can be directly deduced therefrom. The stack 22 may be in the form of a table, a conventional stack or any other suitable form.
When the background module 6 has finished calculating the model 8, the data from the stack 22 are integrated into the data source 4 and the matrix 20 is updated with the model 8 that has just been calculated. Then, the background module 6 reiterates the calculation of the model 8 to take account of the new data.
The matrix model 8 will now be explained further by reference to
As shown in
Crossclassification is a known method of sorting which generally consists of simultaneously crossing the columns and rows of a data table according to a selected criterion.
One example of a method of crossclassification may be found in “Informationtheoretic coclustering”, by Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha, on pages 8998 of “Proceedings of the Ninth ACM SIGKGG International Conference on Knowledge Discovery and Data Mining (KDD), 2003”.
The crossclassification in step M1 is double, so as to obtain as output a classified raw matrix 30 which is made up of matrix blocks which are smaller in size, on the one hand, and transition matrices which enable the original matrix to be retrieved. The matrix blocks may be totally filled or may also have nonfilled matrix elements.
The matrix blocks in the matrix 30 constitute a simplified view of the starting matrix. Thus, where there may have been blocks of 1000*1000, for example, there are now blocks of 10*10. The double crossclassification is accompanied by the creation of a transition matrix, enabling the original matrix to be reconstituted from the reduced blocks obtained.
As has already been explained hereinbefore, the basic matrix is extremely hollow. This means that a large proportion of the matrix elements of this matrix are empty. This first metric used to classify the data, based on the presence or absence of the matrix element, is intended to make it possible to distinguish between the groups that have a large amount of information from the others.
In a step M2, the controller 10 processes all the blocks of the matrix 30 by calculating the density of the latter. This density is representative of the matrix element presence/absence profile in each block.
A test M3 relating to the density of each block determines the type of filling which is then applied to the empty elements of this block.
In fact, in the embodiment described here, the background module builds a model 8 which contains no empty matrix element, and thus makes it possible to have a recommendation of each object for each actor. It would nevertheless be possible to fill the model 8 only partially.
When a given block has a sufficient density, the controller 10 calls up the classifier 12 again in a step M4 and again carries out a double crossclassification on this block, this time with a criterion based on a metric reflecting the value of the matrix elements of this block. This classification results in an intermediate classified dense block 32) and the corresponding transition matrices.
To ensure that the block 32 is totally filled, the controller 10 calls up the smoother 14 in a step M5. The smoother 14 applies the algorithm described in Annexe 1 until a stability criterion has been achieved.
In the algorithm in Annexe 1, R denotes the matrix of the scores and R(u,a) is the score given by the actor u to the object a. The score P(u) is given to all the objects scored by the actor u and the score U(a) is given to all the actors who have scored the object a.
The iterative calculation is initialised with two identity matrices, called C0u (correlations between actors) and C0p (correlations between objects). In step n, the correlation matrix between actors is scored Cnu, and the correlation matrix between objects is scored Cnp. The calculation of the matrices Cnu and Cnp is carried out by crossing on the basis of the matrices calculated in the previous step.
Thus, as shown by formula 11 in Annexe 1, the matrix Cnu calculated in step n is established on the basis of the matrix C(n−1)p in step (n−1). Similarly, as shown in formula 21, the matrix Cnp, also calculated in step n, is established on the basis of the matrix (C(n−1)u in step (n−1) 21. The formulae 12 to 14 and 22 to 24 specify the elements of formulae 11 and 21.
Specifically, the algorithm in step M5 consists in filling the empty matrix elements on the basis of selected adjacent matrix elements (the objects scored by the actors for the matrix Cnu, and the actors having scored the objects for the matrix Cnp).
The filling is carried out by correlation according to the rows of the matrix during a first stage (Cnu), then according to the columns during a second stage (Cnp). The empty matrix elements are thus filled by iteration which crosses the values of adjacent matrix elements, thus constituting filling by correlation along both the rows and the columns simultaneously.
Although this algorithm may be seen in the form of an alternation between iteration across the rows/iteration down the columns, it would be possible to use an algorithm which, in a single operation, would carry out a similar type of filling based on a correlation over both the rows and columns simultaneously.
This algorithm results in a filled dense block 34.
In the event that the block 30 is not dense, the controller 10 calls up the smoother 14 in a step M6, the result of which is a filled hollow block 36.
The filling in step M6 is a socalled “averaged” filling, i.e. it consists in giving the empty matrix elements the value of the average of the nonempty elements in this block, if there are any, and in filling all the empty matrix elements on the basis of an average of the matrix elements of the surrounding blocks if the block contains only empty matrix elements.
In a step M7, a test is carried out to see whether all the blocks in the matrix 30 have been treated. The result obtained is a matrix of filled blocks 38 which will serve as a basis for the matrix model 8.
In a step M8, the controller establishes the set of online matrices 20.
The matrix 38 contains rows denoting groups of actors and columns denoting groups of objects. The matrix 40 contains rows denoting actor identifiers and columns designating groups of actors. The matrix 42 contains groups of objects in rows and object identifiers in columns.
The matrices 40 and 42 are obtained from transition matrices which are produced during the various double crossclassifications, and are constructed at the same time as the matrix M.
Because of the metric used for the double crossclassification used in the embodiment described here, the transition matrices 40 and 42 are extremely hollow, and contain only one nonzero value per row for the matrix 40 and only one nonzero value per column for the matrix 42.
However, it would be possible to carry out a crossclassification involving a weighted metric, which would enable the users to be assigned to a number of groups in weighted manner, and the same for the objects.
The evaluating function of the selection manager 16 is based on having previously obtained, in a step E1, a pair of identifiers (idUt, idIt) of an actor on the one hand and an object on the other hand.
In simultaneous steps E2 and E3, the manager recovers, from the tables Table_GU and Table_GI (the matrices 40 and 42), the row corresponding to idUt and the column corresponding to idIt, and stores them in two vectors $gu and $gi, respectively.
In a step E4, the evaluation is defined as the product of $gu*M*$gi. This formula is based on the hypothesis that the sum of the terms in the rows in the table Table_GU and the sum of the terms in the columns in the table Table_GI are both equal to 1,as the result of a standardisation carried out during the generation of these two tables.
In the embodiment described here, as the matrices 40 and 42 are hollow, this amounts to a multiplication. However, the sole interest of this matrix formalisation is in a classification with weighting as mentioned above. The value evaluated is then sent back in a step E5.
The evaluating function described above makes it possible to implement an important number of functions for the user 18. In fact, the user can directly designate an object for which he wishes to obtain an evaluation, but he may also designate a class of objects, which the selection manager 16 can selectively associate with object identifiers and send back their evaluations.
The user 18 may also supply a selected evaluation and obtain from the selection manager 16 a list of objects that correspond to this evaluation. It is also possible to combine a number of these requests to provide greater flexibility of use.
To ensure that the evaluation is achievable and reasonably reliable, the user designated by the identifier idUt must have votes in the data source 4 or in the stack 22. Otherwise, it is not possible to assign this user to a group in the table Table_GU and an evaluation can only be made for example based on the average of all the groups in the table Table_GU.
To update the data of a given user and take into account recent votes before the updating of the set 20 on the basis of the model 8, the selection manager 16 uses a Vote function shown in
Starting with a step V1 in which a user with an actor identifier idUt provides the selection manager 16 with a vote of value $val relating to an object with the object identifier idIt, the triplet (idUt, idIt, $val) is first of all recorded in the stack 22 in a step V2, for subsequent integration in the source 4 and in the model 8.
Then, in two steps V3 and V4, which may be carried out in parallel or in series, the selection manager updates the matrices 40 and 42 on the basis of the actor identifier idUt and object identifier idIt, respectively.
The steps V3 and V4 may also be carried out selectively and at separate times, in order to take account of the profile of a user or the product to which the vote relates.
In fact, the first votes of a user (or the first votes received by a product) have a primordial influence on its classification in a given group of actors (or a given group of objects, respectively).
Each vote thus has a tendency to make it change groups. After a certain number of votes, its membership of a group stabilises and there is little chance that a single vote will cause it to change.
It therefore seems more important to update the table Table_GU (or the table Table_GI, respectively) when a vote is registered for a user having few votes, than for a user who already has many votes.
When an actor (or a product, respectively) has a substantial number of votes, e.g. 100, the background module can therefore apply step V3 (or step V4, respectively) selectively and/or offset in time, for example every ten votes, so as to lighten the computing load, and thus improve the performance of the application.
Updating is carried out using an Update function described in connection with
In a step U1, the function is called up with, as argument, the table which is to be updated and the identifier number id of the row or column to be updated. This updating could also be carried out with a single function called up with the two identifiers.
In a step U2, a test is carried out to see whether the table to be updated is that of the groups of actors (matrix 40 or Table_GU) or that of the groups of objects (matrix 42 or Table_GI).
If it is the Table_GU, in a step U3, the values of the matrix elements associated with id in the source 4 and those contained in the stack 22 are combined in a vector describing all the votes (and nonvotes) of the user linked to id.
The vector thus formed is then “projected” in a step U4 onto M*Table_GI, in order to determine the group of actors to which this user should be attached. One embodiment of step U4 will be described hereinafter with reference to
If it is the Table_GI, in a step U5, the values of the matrix elements associated with id in the source 4 and those contained in the stack 22 are combined in a vector describing all the votes (and nonvotes) relating to the object linked to id.
The vector thus formed is then “projected” in a step U6 onto Table_GU*M, in order to determine the group of objects to which this object should be attached. One embodiment of step U6 will be described hereinafter with reference to
In a step P1, the Project function shown in
In a step P2, the type of table is determined and designates the operation which will be carried out. If it is the table Table_GU (matrix 40) that is involved, then the projection is carried out in a step P3, by forming the product of M*Table_GI*$vect. The column vector $vect2 which results from this product is the “projection” described hereinbefore.
The table Table_GU is then updated in a step P4 by updating the id row of this table, in which the only nonzero term is the one whose column corresponds to the index that represents the maximum value of $vect2. Thus, the table Table_GU keeps the same form as before.
As described hereinbefore, it would also be possible to keep all the values of $vect2 in the table Table_GU, in order to obtain a weighted assignment of the user to the groups of actors.
If it is the table Table_GI that is involved, then steps P5 and P6 are carried out, in operations similar to those in steps P3 and P4, which will become obvious in the light of
However, it should be noted that, in the case of step P3, the vector $vect is a row vector, whereas in the case of step P5, the vector $vect is a column vector.
Once the projection has been carried out and the table has been updated, the end of this function is carried out in a step P7 and the sequence of the preceding operations follow their course.
The invention also relates to a computer program code for operating the apparatus described above, and a data carrier for such a program.
In the embodiment described here, the model is obtained by means of two double crossclassifications with criteria based on a presence metric of the matrix element on the one hand and a value metric of the matrix element on the other hand.
However, it would also be feasible to carry out a single double crossclassification with a criterion combining these two metrics. Moreover, this specification has described methods of computation using conventional matrix computing, but it would also be possible to use slightly modified matrix operators, notably to take account of the partial void in the elements of such a calculation.
Annexe 1
Claims (10)
Priority Applications (2)
Application Number  Priority Date  Filing Date  Title 

FR0608861A FR2906910B1 (en)  20061010  20061010  correlation computing device of propagative 
FR0608861  20061010 
Publications (2)
Publication Number  Publication Date 

US20080086487A1 true US20080086487A1 (en)  20080410 
US7853588B2 true US7853588B2 (en)  20101214 
Family
ID=38089207
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

US11858069 Active 20280902 US7853588B2 (en)  20061010  20070919  Data processing apparatus for propagative correlation 
Country Status (3)
Country  Link 

US (1)  US7853588B2 (en) 
EP (1)  EP1912170A1 (en) 
FR (1)  FR2906910B1 (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

US20140278992A1 (en) *  20130315  20140918  Nfluence Media, Inc.  Ad blocking tools for interestgraph driven personalization 
US9619567B2 (en)  20110606  20170411  Nfluence Media, Inc.  Consumer selfprofiling GUI, analysis and rapid information presentation tools 
Families Citing this family (2)
Publication number  Priority date  Publication date  Assignee  Title 

GB0821813D0 (en) *  20081128  20090107  Listening Company The Ltd  Communications System 
CN104238994B (en) *  20140901  20170704  中国航天科工集团第三研究院第八三五七研究所  A method of increasing the operational efficiency of a coprocessor 
Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

US20010049678A1 (en) *  19970619  20011206  Fujitsu Limited  Data display apparatus and method for displaying data mining results as multidimensional data 
US20040158497A1 (en)  20030206  20040812  Brand Matthew E.  Online recommender system 
US20060036640A1 (en)  20040803  20060216  Sony Corporation  Information processing apparatus, information processing method, and program 
US7194483B1 (en) *  20010507  20070320  Intelligenxia, Inc.  Method, system, and computer program product for conceptbased multidimensional analysis of unstructured information 
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US20010049678A1 (en) *  19970619  20011206  Fujitsu Limited  Data display apparatus and method for displaying data mining results as multidimensional data 
US6470352B2 (en) *  19970619  20021022  Fujitsu Limited  Data display apparatus and method for displaying data mining results as multidimensional data 
US7194483B1 (en) *  20010507  20070320  Intelligenxia, Inc.  Method, system, and computer program product for conceptbased multidimensional analysis of unstructured information 
US20040158497A1 (en)  20030206  20040812  Brand Matthew E.  Online recommender system 
US20060036640A1 (en)  20040803  20060216  Sony Corporation  Information processing apparatus, information processing method, and program 
NonPatent Citations (3)
Title 

Corresponding French Search Report dated Jun. 8, 2007. 
Sarwar et al., "ItemBased Collaborative Filtering Recommendation Algorithms" Proceedings of the International Conference on the World Wide Web, May 1, 2001, pp. 285295, XP002228384. 
Yang et al., "An improved Collaborative Filtering Method for Recommendations' Generation" 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, Oct. 10, 2004, pp. 41354139, XP010772976. 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

US9619567B2 (en)  20110606  20170411  Nfluence Media, Inc.  Consumer selfprofiling GUI, analysis and rapid information presentation tools 
US20140278992A1 (en) *  20130315  20140918  Nfluence Media, Inc.  Ad blocking tools for interestgraph driven personalization 
Also Published As
Publication number  Publication date  Type 

US20080086487A1 (en)  20080410  application 
EP1912170A1 (en)  20080416  application 
FR2906910B1 (en)  20081226  grant 
FR2906910A1 (en)  20080411  application 
Similar Documents
Publication  Publication Date  Title 

Zuckerman  Randomnessoptimal oblivious sampling  
Manikrao et al.  Dynamic selection of web services with recommendation system  
Kuncheva et al.  Using diversity in cluster ensembles  
US5583763A (en)  Method and apparatus for recommending selections based on preferences in a multiuser system  
Tasgin et al.  Community detection in complex networks using genetic algorithms  
US5319781A (en)  Generation of schedules using a genetic procedure  
US6745172B1 (en)  Expert system adapted data network guidance engine  
Schafer et al.  Metarecommendation systems: usercontrolled integration of diverse recommendations  
Dogan et al.  Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing  
US6052723A (en)  Method for aggregate control on an electronic network  
Dey et al.  Online scheduling policies for a class of IRIS (increasing reward with increasing service) realtime tasks  
Deshpande et al.  Lifting the burden of history from adaptive query processing  
US6782391B1 (en)  Intelligent knowledge base content categorizer (IKBCC)  
US20060036743A1 (en)  System for balance distribution of requests across multiple servers using dynamic metrics  
Llorà et al.  Combating user fatigue in iGAs: partial ordering, support vector machines, and synthetic fitness  
US20030140037A1 (en)  Dynamic knowledge expert retrieval system  
US7617127B2 (en)  Approach for estimating user ratings of items  
US6092049A (en)  Method and apparatus for efficiently recommending items using automated collaborative filtering and featureguided automated collaborative filtering  
US6049777A (en)  Computerimplemented collaborative filtering based method for recommending an item to a user  
Kuncheva  A theoretical study on six classifier fusion strategies  
Fu  Optimization for simulation: Theory vs. practice  
US6851604B2 (en)  Method and apparatus for providing price updates  
Hu et al.  Highutility pattern mining: A method for discovery of highutility item sets  
US20100094863A1 (en)  Intentionality matching  
US7072966B1 (en)  Skillsbased routing of a communication session 
Legal Events
Date  Code  Title  Description 

AS  Assignment 
Owner name: CRITEO, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LE OUAY, FRANCK;REEL/FRAME:019891/0861 Effective date: 20070713 

CC  Certificate of correction  
FPAY  Fee payment 
Year of fee payment: 4 