MX2008010485A

MX2008010485A - Training a ranking function using propagated document relevance

Info

Publication number: MX2008010485A
Application number: MXMX/A/2008/010485A
Authority: MX
Inventors: Wang Jue; Li Mingjing; Ma Weiying; Li Zhiwei
Original assignee: Microsoft Corporation
Priority date: 2006-02-27
Filing date: 2008-08-14
Publication date: 2008-10-03

Abstract

A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.

Description

TRAINING OF A CLASSIFICATION FUNCTION USING PROPAGATED DOCUMENT RELEVANCE BACKGROUND Many search engine services, such as Google and Overture, provide search for information that is accessible through the Internet. These search engine services allow users to search for presentation pages, such as web pages, that may be of interest to users. After a user submits a search request (that is, a query) that includes search terms, the search engine service identifies web pages that can be related to those search terms. To quickly identify related web pages, search engine services can maintain a delineation of keywords to web pages. This delineation can be generated by "dragging" the web (that is, the Great World Network) to identify keywords from each web page. To drag the web, a search engine service can use a list of root web pages to identify all the web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using several well-known information retrieval techniques, such as identifying the words in a header, the words provided in the metadata of the web page, the words that they are highlighted, and so on. The search engine service identifies web pages that can be related to the search request based on which the keywords of a search page also match the words of the query. The search engine service then presents the user with links to the web pages identified in an order that is based on a classification that can be determined by its relevance to the query, popularity, importance, and / or some other measure. Three well-known techniques for classifying web pages are Page Sorting, HITS ("Hyperlink-Induced Theme Search"), and Direct HIT. Page Classification is based on the principle that web pages will have links to (that is, "outbound links") of important web pages. In this way, the importance of a web page is based on the number and importance of other web pages that are linked to that web page (that is, "inbound links"). In a simple way, links between web pages can be represented by adjacent matrix A, where A ij represents the number of outbound links from web page i to web page j. The importance mark Wj for web page j can be represented by the following equation: Wj =? | A || This equation can be solved by iterative calculations based on the following equation: ATw = w where w is the vector of importance marks for the pages web and is the main characteristic vector of AT. The HITS technique is additionally based on the principle that a web page that has many links to other important web pages by itself may be important. In this way, HITS divides "importance" of web pages into two related attributes: "central" and "authority". "Central" is measured by the "authority" of the web pages to which a web page is linked, and "authority" is measured by the "central" mark of the web pages that are linked to a web page. In contrast to Page Classification, which calculates the importance of web pages independently of the query, HITS calculates importance based on the web pages of the result and web pages that relate to the web pages of the result by following the inbound and outbound links. HITS sends a query to a search engine service and uses the web pages of the result as the initial group of web pages. HITS adds to the group those web pages that were the destinations of inbound links and those web pages that were the sources of outbound links from the web pages of the result. HITS then calculates the central authority and brand of each web page that uses an iterative algorithm. Authority and plant markings can be represented by the following equations: to. { p) =? h. { q) and k (p) ~? a. { q) where a (p) represents the authority mark for web page p and h (p) represents the central mark for the web page p. HITS uses an adjacent matrix A to represent the links. The adjacent matrix is represented by the following equation: union of page j, The vectors a and h correspond to the authority and central marks, respectively, of all the web pages in the group and can be represented by the following equations: a = ATh and h = Aa In this way, a and h are vectors characteristic of matrices AT A and AAT . HITS can also be modified to factor in the popularity of a web page as measured by the number of visits. Based on a click data analysis, b and j of the adjacent matrix may be increased any time a user travels from web page i to web page j. Direct HIT classifies pages based on past user history with similar query results. For example, if a user who sends similar queries typically selected the third web page of the result first, then their user history will be an indication that the third web page should be ranked higher. As another example, if the user who sends similar queries typically spends most of the time seeing the fourth web page of the result, then this user history will be an indication that the fourth page should be ranked higher. Direct HIT derives user histories from click data analysis.

Some classification techniques use algorithms of machine learning to learn a training data classification function that includes queries, feature vectors that represent pages, and for query, a ranking for each web page. A classification function serves as a delineation of features of a web page to its classification for a given query. The learning of a classification function was considered by some as a regression problem to learn the delineation of a feature vector to a member of an ordered group of numerical classifications. Some regression-based techniques attempt to provide a mark of absolute relevance that can be used to classify pages. A classification function, however, does not need to provide a mark of absolute relevance but needs only to provide a relative ranking of the pages. In this way, these regression-based techniques solve a problem that is more difficult than necessary. The machine learning algorithms for a classification function use queries, feature vectors, and relevance marks labeled by user as training data. To generate the training data, the queries can be sent to a search engine that generates the pages of the search result. The algorithms then generate the feature vectors for the pages and entries of a user of the relevant marks for each page. One difficulty with such an approach is that a search engine can return hundreds of pages as your search result. It can be very expensive to have a user tag of all the pages of a search result. In addition, it can be difficult for a user to accurately assess the relevance of such a large number of pages. Although a user can only label a small portion of the pages, learning based on such a small portion may not provide an accurate classification function.

BRIEF DESCRIPTION OF THE INVENTION A method and system is provided to propagate the relevance of tagged documents to a query for the relevance of untagged documents. The propagation system provides training data that includes queries, documents labeled with their relevance to queries, and untagged documents. The propagation system then calculates the similarities between pairs of documents in the training data. The propagation system then propagates the relevance of documents labeled with similar documents, but not labeled. The propagation system can iteratively propagate labels of the documents until the labels cover a solution. The training data with the spread propagation can then be used to train a classification function. This brief description is provided to introduce a selection of concepts in a simplified form that is also described later in the detailed description. This brief description does not intend to identify key characteristics or essential characteristics of the subject claimed, nor is it intended to be used as an auxiliary when determining the scope of the subject claimed.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a diagram illustrating a portion of a document chart. Figure 2 is a block diagram illustrating propagation system components in one embodiment. Figure 3 is a flow diagram illustrating the processing of classification function component to create the propagation system in a modality. Figure 4 is a flow diagram illustrating the processing of the propagation relevance component of the propagation system in a modality. Figure 5 of a flow diagram illustrating the processing of the graph component to build the propagation system in a modality. Figure 6 is a flow chart illustrating the processing of the weights to be generated for the graphics component of the propagation system in one embodiment. Figure 7 is a flow diagram illustrating the processing of the weights to normalize the graphics component of the propagation system in a modality. Figure 8 is a flow chart illustrating the processing of the relevance for propagation based on the graphics component of the propagation system in a modality.

DETAILED DESCRIPTION A method and system for propagating relevance of tagged documents to a query for untagged documents are provided. In one embodiment, the propagation system provides training data that includes queries, documents (represented by feature vectors) labeled with their relevance to queries, and untagged documents. For example, the preparation system can send a query to a search engine and use the search result as the documents (for example, web pages). The propagation system can then prompt a user to tag some of the search result documents based on their relevance to the query. The propagation system then calculates the similarity between pairs of documents in the training data. For example, the propagation system can represent each document by a feature reader and can calculate the similarity between documents based on the Euclidean distance in the characteristic space or based on a metric of similarity of the cosine. The propagation system then propagates the relevance of the tagged documents to similar documents, but not tagged. The propagation system can iteratively propagate labels of the documents until the labels lead to a solution. The training data with the spread propagation can then be used to train a classification function. In this way, the propagation system can automatically increase training data with additional training data based on similarities between documents. In one embodiment, the propagation system represents the documents using a document graphic with each node representing a document and each edge representing similarity between the documents represented by the connected nodes. The propagation system can represent the graph as a square matrix with a row and column for each document in which non-zero value indicates an edge between the node of the row and the node of the column. The propagation system can define borders for the graph that uses various techniques. For example, the propagation system can consider the graph to connect completely with cases of each node that have an edge to each other node. As another example, the propagation system can consider the nodes to connect through a minimum space tree. In one mode, the propagation system considers the modes to connect by using a neighbor algorithm closest to k. In particular, the propagation system identifies the nearest neighbors k for each node and adds one edge of this new to each of its nearest neighbors k. The propagation system then calculates weights for the edges based on the similarity between the documents represented by the connected edges. The propagation system can use several techniques to determine the similarity between documents. In one embodiment, the propagation system uses a Euclidean distance metric based on the feature vector of the documents in a feature space. The propagation system stores the similarity as the values of the square matrix that result in a similarity or affinity matrix. The preparation system can also normalize the similarity matrix. The propagation system can also set diagonal values to zero to prevent return during propagation of relevance. After generating the similarity matrix, the propagation system propagates the relevance of tagged documents to untagged documents by using a multiple classification based on propagation algorithm. A multiple classification based on algorithm is described in He, J. Li, M. , Zhang, H.J., and others, "Multiple Classification based on image recovery", Proc. of the 12th Annual ACM International Conference on Multimedia, 2004. The propagation system initially establishes the relevance of the documents labeled to the relevance mark provided by the user and the relevance of the documents not labeled to 0. The propagation system then distributes the relevance of the tagged documents to their untagged connected documents that factor similarity as a syndic by the similarity matrix. The propagation system iteratively distributes the relevance mark until the relevance marks are transported in a solution. Relevant marks resulting from untagged documents will be made in proportion to the probability that is relevant to the same query as the tagged documents. An unlabeled document that is very similar to many documents labeled with high relevance brands in that way will have a high relevance mark. Conversely, an unlabeled document that is not very similar to one of the tagged documents will have a low relevance mark. The propagation system can represent similarity when using a Laptace kernel, which can be represented by the following equation: x y represent the Ta dimension of x, and x, respectively, t represents the dimensions of the characteristic space, and ai represents a positive parameter that reflects the weights of different dimensions in the similarity calculation. In this way, the propagation system represents the weight of the edges by the following equation: Wt] = kL (? ,, Xj) =? T? "*, | / cr,) (2) where Wjj represents the similarity between documents i and j. The propagation system can omit the constant coefficient 1/20, since this effect on the matrix and similarity W against will act by normalizing the matrix. The propagation system normalizes the similarity matrix as represented by the following equation: S = D 1 / 2WD 1/2 (3) where S represents the normalized similarity matrix and D represents a diagonal matrix where (i, i) ) is equal to the sum of the row ava of similarity matrix W. Normalization normalizes the similarities to be relative to the similarity of the connected documents. The propagation system can represent each document as a vector x of characteristic of dimension t that forms a point in the Euclidean space. For a query, the propagation system receives the document result group X =. { x¡j, x¡2,. · x¡m > xui- Xu2, ·· | xun} he'. The first m points (in feature space) represent documents labeled by user, and the last n points (in feature space) represent untagged documents. The propagation system also receives a corresponding tag vector y =. { and ¡j, y¡2, ... y¡m, 0, 0, ... 0.}. T. The last n labels have the value of 0 to represent untagged documents. The propagation system can also allow the specification of negative labels, rather than only positive labels, to represent negative examples of relevance. The propagation system represents distance between documents in characteristic space such as d: X x X -, which assigns each pair of points x¡ and Xj a distance d (x¡, Xj), and represents a classification function of documents as f: x? D, which assigns to each point x¡ a classification mark f¡. The problem of learning classification function is to learn f: X? 0 of a group of queries with the characteristics X =. { xy} and the labels =. { yq} . The propagation system represents the limit of the propagation of relevance by the following equation: f * = (1-a) (l-aS) -1y (4) where f * represents the limit of relevance, and represents the labels initials, it already represents a factor of deterioration. Because it is computationally difficult to calculate the inverse of the normalized similarity matrix S, the propagation system approximates f * using a Taylor series expansion. The propagation system can represent the Taylor series expansion by the following equation: f * = (l-aS) 1y = (l + aS + a2S2 + ...) and (5) = y + aSy + aS (aSy) + ... The propagation system iteratively resolves f * until it covers a solution or for a fixed number of iterations. Once the relevance is propagated, the propagation labeling system can use the data groups of training (query and tagged feature vectors) to train a classification function. A classification function can be implemented as a support vector machine, an adaptive start sorter, a neural network sorter, and so on. A support director machine operates by finding a hyper surface in the space of possible entries. The hyper surface attempts to divide the positive examples of the negative examples by maximizing distance between the closest of the positive and negative examples to the hyper surface. This allows the correct classification of data that are similar but not identical to the training data. Various techniques can be used to train a support director machine. One technique uses a minimal sequential optimization algorithm that divides the problem of large quadratic programming into a series of small quadratic programming problems that can be solved analytically. (See Sequential Minimum Optimization, at http://busqueda.microsoft.com/~iplatt/smo.html.) Adaptive startup is an interactive procedure that runs multiple tests in a collection of training data. The adaptive start transforms a weak learning algorithm (an algorithm that performs at a level only slightly better than the opportunity) into a strong learning algorithm (an algorithm that presents a low error rate). The weak learning algorithm runs in different subgroups of the training data. The algorithm concentrates more and more of those examples in which the predecessors tend to show errors. The algorithm corrects errors made by previous weak learning. The algorithm is adaptable because it adjusts the error rates of its predecessors. Adaptive startup combines roughly and moderately imprecise rules of small views to create a high performance algorithm. The adaptive start combines the results of each performance test separately in a single, very precise classifier. A neural network model has three major components: architecture, cost function, and search algorithm. The architecture defines the functional form that is related to the inputs and outputs (in terms of network topology, unit connectivity, and activation functions) the search in weight space for a group of weights that minimizes the objective function is the training procedure. A neural network model can use a radial base function network ("RBF") and a descending standard gradient as its search technique. Figure 1 is a diagram illustrating a graph of documents returned as the search result of a query. In this example, subgraph 100 represents a portion of the documents returned in the search result. Nodes 101-112 represent 12 documents of the search result. The nodes 101 and 106 represent labeled documents. The document represented by node 101 was labeled with the relevance mark of .75, and the document represented by node 106 it was tagged with the relevance mark of .6. The propagation system generated from the edges between the nodes using a nearer neighbor algorithm. In this example, nodes 102, 103, and 104 each are from nearest neighbors k to 101, but nodes 105-112 are not one of the nearest neighbors k. The propagation system then calculated the similarity between connected nodes using a similarity mark algorithm. For example, the node 101 connects to the node 102 with an edge with the weight of .8, which indicates similarity between the connected nodes. Figure 2 is a block diagram illustrating components of the propagation system in one embodiment. The propagation system 230 is connected to the document storages 210 (e.g., websites) via communications link 220 (e.g., Internet). The propagation system includes a training data component to collect 231, a training data storage 232, and a document index 233. The document index contains an index of documents (e.g., web pages) in the storage of data. document. The document index can be generated by a web crawl. The document index can include a feature vector for each document that is used to train a classification function. Feature vectors can represent many different types of document characteristics such as inverse document frequency, keywords, font size, and so on. The component of training data to collect sends queries to a search engine (not shown), and receives documents that match the queries. The search engine can be independent of propagation system. In such a case, the propagation system can generate dynamically characteristic vectors of search results. The collection training data component can prompt a user to tag the relevance of some of the documents that match the queries. The training data component to collect stores the queries, search results (for example, feature vectors), and labels in the training data storage. The propagation system also includes a relevance component for propagating 235, a construction chart component 236, weights for generating graphic component 237, weights for standardization of graphics component 238, and a relevance for propagation based on the graphic component 239. The relevance component for propagation propagates the relevance of tagged documents to untagged documents that are stored in training data storage. The relevance component to propagate invokes the construction graph component to construct a graph that includes edges that represent the documents of a search result. The relevance component to propagate then invokes the weights to generate the graph component to generate the initial weights for the edges of the graph. He component to propagate relevance invokes the weights to normalize the graph component to normalize the generated weights. The component to propagate relevance then invokes the relevance to propagate based on chart component to realize the actual propagation of the relevance of tagged documents to untagged documents. The propagation system also includes a classification function component to create 241 and a classification function 242. The classification function to create uses training data with propagated relevance to create a classification function.

The computing devices in which the propagation system can be implemented can include a central processing unit, memory, input devices (for example, keyboard and pointing device), output devices (e.g., display devices), and storage devices (e.g., disk drives). Memory and storage devices are computer readable media that may contain instructions that implement the propagation system. In addition, data structures and message structures may be stored or transmitted through the data transmission means, such as a signal on a communication link. Various communication links may be used, such as the Internet, a local area network, a wide network, or a point-to-point dial-up connection. The propagation system can provide services to various computer systems or devices that include personal computers, server computers, mobile or portable devices, multiprocessor systems, microprocessor based systems, programmable consumer electronics, network PCs, pu er m in icom, macro computers, computing environments distributed that include any of the above systems or devices, and the like. The propagation system can be described in the general context of computer executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, and so on that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules can be combined or distributed as desired in various modalities. Figure 3 is a flow chart illustrating the processing of the classification function component to create the propagation system in a modality. The classification function component to create collect training data, propagates the relevance of tagged documents to untagged documents, and then trains a classification function. In block 301, the component collects the training data. In block 302, the component enters labels for a subset of the training data. In block 303, the component invokes the component to propagate relevance, to propagate the relevance of tagged documents to untagged documents. In block 304, the component trains the classification function that uses propagated propagation. Figure 4 is a flow chart illustrating component processing for propagating relevance of the propagation system in a modality. The component is provided with training data and propagates the relevance of documents from tagged documents to untagged documents. In block 401, the component invokes the construction graph component to construct the initial graph that includes edges. In block 402, the component invokes to generate weights for graph component to generate weights that indicate the similarity between documents represented by connected nodes. In block 403, the component invokes the weights to normalize the graphics component to normalize the weights of the graph. In block 404, the component invokes relevance to propagate based on chart component to propagate relevance. The component then returns. Figure 5 is a flow drama illustrating the processing of the graphics component to build the propagation system in a modality. The component creates a square matrix with each row and column representing a document. The component then identifies and adds a connection between each node and its nearest neighbors k (for example, k = 10). In block 501, the component selects the following document i. In decision block 502, all documents i have already been selected, then the component returns, also the component continues in block 503. In block 503, the component selects the following document j. In decision block 504, if all documents j for the selected document i have already been selected, then the component continues in block 506, also the component continues in block 505. In block 505, the component calculates the distance between the selected document i and the selected document j and then rotate to block 503 to select the next document j. In block 506, the component selects the 10 documents j from the smallest distance for a document i (ie, the nearest neighbors) and then rotates to block 501 to select the next document i. Figure 6 is a flowchart illustrates the processing of the weights to generate for traffic component of the propagation system in a mode. The component calculates the similarity between connected documents based on a Manhattan metric. In block 601, the component selects the following document i. In decision block 602, if all the documents i were already selected, then the component returns, the component also continues in block 603. In block 603, the component initiates the similarity of the document by itself to 0. In the block 604, the component selects the next closest document j (that is, a connected document) to the document selected i. In decision block 605, if all documents closest to j for the selected document i have already been selected, then the component rotates to block 601 to select the next document i, also the component continues in block 606. In block 606 , the component initiates the similarity between the selected document i and the selected document ja 1. In blocks 607-609, the component rotates when calculating the distance metric. In block 607, the component selects the next dimension I of the characteristic vector. In decision block 608, if all dimensions have already been selected, then the component rotates to block 604 to select the next closest document j, also the component continues in block 609. In block 609, the component establishes similarity between the selected document i and the selected document ja its current similarity multiplied by a function of the difference between the selected feature I of the selected document i and the selected document j according to Equation 2. The component then rotates to block 607 to select the next dimension. Figure 7 is a flow diagram illustrating the processing of the weights to normalize the graphics component of the propagation system in one embodiment. The component normalizes the weights of the similarity matrix. In block 701, the component selects the next row i of the matrix. In decision block 702, if all the rows were already selected, then the component continues in block 706, also the component continues in block 703. In blocks 703-705, the component calculates the value of the diagonal matrix D for the selected row. In block 703, the component selects the next column j of the similarity matrix. In decision block 704, if all columns have already been selected, then the component rotates to block 701 to select the next row, also the component continues in block 705. In block 705, the component adds the weights of the row selected i and the selected column j to the diagonal element for the selected row i. The component then rotates to block 703 to select the next column j for the selected row i. In block 706, the component normalizes the similarity matrix according to Equation 3. Figure 8 is a flow diagram illustrating the relevance procedure for propagating based on the graph component of the preparation system in the modality. The component iteratively calculates the Taylor series expansion of Equation 5 until it converges to a solution. In block 801, the component starts index i to 0. In block 802, the component initiates the solution vector to 0. In blocks 803-805, the component rotates until it converges to a solution. In block 803, the component calculates the value for the next iteration based on the value of the previous iteration plus the next factor of the Taylor series expansion. In decision block 804, if the values converge in a solution, then the component returns, also the component continues in block 805. In 805, the component increases the index of the next iteration and rotates back to block 803 to perform the next iteration. Although the subject is described in language specific to structural features and / or methodological acts, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. More than that, the specific features and acts described above are described as illustrative ways to implement the claims. The propagation system can be used to increase search results. For example, the search engine can generate a search result based on a certain body of documents. The relevance of the search result documents can then be propagated to documents from a different body that the propagation system uses. The different body documents with the higher relevance can then be added to the search result. The propagation system to be used to propagate relevance of documents labeled with their relevance to an individual query to labeled documents (intra-query propagation) or of documents labeled with their relevance to multiple queries to untagged documents (inter-query propagation). The propagation component trains the training component separately for each query with intra-query propagation and simultaneously for queries with inter-query propagation. Accordingly, the invention is not limited except by the appended claims.

Claims

1. - A system for training a document classification component, comprising: a storage of training data (232) containing training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query; a component to propagate relevance (235) that propagates relevance of tagged documents to untagged documents based on the similarity between documents; and a training component (241) for training a document classification component to classify relevance of documents to queries based on the propagated relevance of the documents of the training data.

2 - The system according to claim 1, wherein the document classification component implements a classification algorithm selected from a group consisting of a neural network algorithm, an adaptive reinforcement algorithm, and a vector machine algorithm of support.

3. The system according to claim 1, wherein the document classification component implements an algorithm based on regression.

4. The system according to claim 1, wherein the component to propagate relevance propagates relevance separately for each query and the training component trains the document classification component using the separately propagated relevances.

5. The system according to claim 1, wherein the component for propagating relevance propagates relevance simultaneously for multiple queries and the training component trains the document classification component using the propagating simultaneously propagated.

6. The system according to claim 1, which includes a graph component that creates a graph with the documents represented as nodes connected by edges representing similarity between documents.

7. - The system according to claim 6, wherein the chart component includes: a construction chart component that constructs a chart wherein the nodes representing similar documents are connected through edges; and a component for generating weights that generates weights for the edges based on similarity of the documents represented by the connected nodes.

8. - The system according to claim 7, wherein the construction graph component establishes edges between nodes using a nearer neighbor algorithm.

9. The system according to claim 1, wherein the component to propagate relevance propagates relevance using an algorithm based on multiple classification.

10. A computer-readable medium containing instructions for controlling a computer system for training a document classification component, through a method comprising: providing (232) document representations together with a labeling of some documents that indicates relevance of a document to a query; creating (236) a graph with the documents represented as nodes that are connected through edges representing similarity between documents represented by the connected nodes; propagate (239) relevance of documents tagged to unlabeled documents based on similarity between documents as indicated by the created chart and based on an algorithm based on multiple classification; and training (241) a document classification component to classify the relevance of documents to queries based on the propagated relevance of the documents.

11. The computer readable medium according to claim 10, wherein the document classification component implements a classification algorithm selected from a group consisting of a Bayes network algorithm, an adaptive reinforcement algorithm, and a support vector machine algorithm.

12. - The computer readable medium according to claim 10, wherein the document classification component implements a regression based classification algorithm.

13. The computer readable medium according to claim 10, wherein the propagation of the relevance propagates relevance separately for each query and the training of the document classification component trains using separately propagated relevance.

14. The computer readable medium according to claim 10, wherein the propagating relevance component propagates relevance [inter-query propagation].

15. - The computer readable medium according to claim 10, wherein the creation of a graphic includes: constructing a graphic in which the nodes representing similar documents are connected through edges; and generate weights for the edges based on similarity of the documents represented by the connected nodes.

16. A system for training a document classification component, comprising: a component (231) that provides document representations together with a labeling of some of the documents indicating relevance of the documents to queries; a component (236) that creates a graph with the documents represented as nodes that are connected through edges which represent similarity between documents represented by the connected nodes; a component (239) that propagates relevance of the tagged documents to the untagged documents based on the similarity between documents as indicated by the graphic created; and a component that generates a document classification component to classify the relevance of documents to queries based on the propagated relevance of the documents.

17. The system according to claim 16, wherein the component that propagates relevance propagates relevance based on an algorithm based on multiple classification. 18. - The system according to claim 17, wherein the component propagating relevance propagates relevance simultaneously for multiple queries and the component that generates the document classification component generates the component using simultaneously propagated relevance. 19. - The system according to claim 16, wherein the component that creates a graphic constructs a graphic, generates weights for the edges based on similarity of the documents represented by the connected nodes. 20. - The system according to claim 16, wherein the document classification component implements a regression based classification algorithm.