WO2013075745A1

WO2013075745A1 - Method and system for creating user models

Info

Publication number: WO2013075745A1
Application number: PCT/EP2011/070873
Authority: WO
Inventors: Jöran BEEL
Original assignee: GENZMEHR, Marcel; Langer, Stefan
Priority date: 2011-11-23
Filing date: 2011-11-23
Publication date: 2013-05-30

Abstract

Disclosed are a method and a system for creating a user model, in particular for a recommender system, from at least one tree data structure, wherein the user model comprises information about a user, wherein the at least one tree data structure can be associated with the user. The tree data structure comprises a root node and a number of child nodes which are connected to the root node or to a child node via edges, and at least one element is associated with at least one node. The elements associated with the nodes and representing a content of the corresponding node are assessed, the assessed elements are weighted and an element weight is assigned to each element, and a user model is generated, the generated user model comprising the assessed elements and the element weight assigned to the corresponding element.

Description

Method and system for creating user models

Field of the invention

The invention relates to a method and a system for creating user models and recommendations based thereon, preferably by analyzing tree-shaped data structures. Background of the invention and prior art

Users of computer systems differ in many ways, including their interests, their knowledge and their demographic data. Many computer systems try to cope with these differences by showing individual information or user interfaces depending, for example, on the knowledge and interests of the user. In order to be able to adapt software systems, such as computer programs or Internet-based applications, individually to its users, the software systems require access, for example, to the interests of the users. This data can either be entered manually by the user or automatically generated by the system. In any case, the information about users is stored in so-called user models.

Often, such user models are used by referral services. Depending on the interests of a user, these referral services will display individual recommendations for, for example, movies, books, music, or advertising tailored to the user. A recommendation service is always a so-called "User-Item Matching Problem": the question with this The problem is, which small selection of relevant items (eg music, websites, books, etc.) from a large amount of available items should be recommended to a user. This known from the prior art approach is shown in Fig. 1. Shown in Figure 1 is a set of users (User 1 to User 3) and a set of items (Item 1 to Item 3). With appropriate procedures, the relevance of users and items to each other is calculated. Thereafter, all items that exceed a certain threshold in terms of relevance can be recommended to the appropriate users. Recommendation services known in the art use two basic methods to generate or make recommendations to user models. These methods are known as "Content Based Filtering" or "Collaborative Filtering". In Content Based Filtering (CBF), the computer system assumes that the content (content) of the items with which a user is in contact reflects the interests and / or knowledge of the user. This approach known from the prior art is shown in FIG. A user (User 1) is associated with a number of items (Item 1 to Item j). An item can be any object. An item may be a document (books, web pages, emails, etc.), a multimedia object (movies, music, photos), a person or a place. However, items can also be menu entries of a computer application or components of graphical user interfaces.

A user is associated with an item if any reference exists between them. This means that if the user has for example read, bought or briefly reviewed a book, knows a person or has watched, downloaded or rated a film on a film portal, the user stands with the book, the person or the person Movie in connection. The connection can weighted differently, depending on the type of connection. For example, buying a book could be more weighted than just looking at the book cover. Or a connection to an item can be weighted more, the more often the item was used.

Content based filtering uses the content of connected objects to create a user model. As a rule, this method is applied to textual items, ie documents, since the content of documents (ie the text) can be processed well by computers in contrast, eg. B. to pictures. To use the content of the items, a model is created for each item. For documents, the so-called "Vector Space Model" is often used, a model that displays documents as vectors of their terms. Each vector expresses by its length, how well the corresponding term describes the actual document. This weighting can be calculated using various methods. One common method is the so-called TF-IDF method. In this case, the weighting of a term for a document is the greater the more frequently the term occurs in the document and the fewer documents in the entire collection there are with this term. The user model is then generated from the models of the various connected items. This means that if a user has many books that contain the term "recommender" with a high weight, then the user model also gets assigned this term with a high weight.The different item models can be incorporated into the user model with different weightings. It is customary, for example, to weight items that were recently used more heavily than items whose use has been in use for a long time.Usually the user model is stored in the same format as the item models - for example, as a Vector Space Model which will later be recommended to the user, a model is also created, for example, again with TF-IDF method and the Vector Space Model. These items may not necessarily be the same items that are associated with users. For example, it would be possible to create a user model based on web pages visited by a user and to recommend books based on that model, or to display personalized advertisements. The matching of the user models with the items to be recommended is preferably based on similarity comparisons between the user and the item models. For text-based items, if the user model contains the same high-weight terms as the item models of the items to be recommended, then the similarity is large and the item is recommended to the user. A common similarity measure in the Vector Space Model is about the Cosine Similarity.

In Collaborative Filtering (CF), which is shown in FIG. 3, the content of items does not matter. Only the information is used, which items are related to which users (how strong). The weighting is either specified directly by the user by the user rating an item or indirectly by the system monitoring the usage of the item.

For example, as with Content Based Filtering, the more often an item is used, the stronger the weight. Or an item that has been bought is weighted more heavily than an item downloaded for free. The similarity between user models and Item models is not calculated, but only the similarity of user models to each other (user-user matching). Here are many known methods. Essentially, all methods check which user models have as many items in common weight as possible in common. If similar user models have now been identified, the user 1 is recommended the items that are in strong connection with the similar user 2 (and that users 1 may not yet know). It is also possible to generate the user models as in the case of content-based filtering and to identify the same users based on these user models. This approach is shown in FIG. At least in Content Based Filtering (CBF), the creation of the item models is a central component, since everything else, ie the creation of user models and the matching of users and items, is based on this. As mentioned above, a commonly used model is the Vector Space Model, which stores a textual item as a vector of its terms, where the length of the vector represents the weighting of the respective term with respect to the item. To determine the weight of the terms, there are numerous methods. The disadvantage here, however, is that with these methods known from the prior art only the modeling of "normal" textual items, ie documents such as emails, web pages, books, news articles, scientific articles, etc., lent is possible.

Object of the invention

The object of the invention is therefore to provide a method and a system which allow in a simple manner item models or user models also for hierarchical, i. Create tree-shaped structures to build user models and recommendations based on them.

Inventive solution

This object is achieved by a method and a system according to the independent claims. Advantageous embodiments of the invention are specified in the respective dependent claims. Accordingly, a method is provided for generating a user model, in particular for a recommendation service, from at least one tree-shaped one Data structure, wherein the user model comprises information about a user, wherein the at least one tree-shaped data structure is assignable to the user, wherein the tree-shaped data structure comprises a root node and a number of child nodes, which are connected via edges to the root node or to a child node, wherein at least one node is associated with at least one element, and wherein

the elements associated with the nodes are determined, the elements representing a content of the respective node,

- the determined elements are weighted and an element weighting is assigned to each element, and

a user model is generated, wherein the generated user model comprises the determined elements and the element weight assigned to the respective element. The nodes of the tree-shaped data structure can be weighted and each node can be assigned a node weighting.

In an initialization step, each element can be assigned a predetermined element weighting or the node weighting of the associated node.

The method may further include a preprocessing step in which

- nodes that have no elements assigned are deleted, and / or

- Nodes are deleted to which a predetermined element is assigned or not assigned, and / or

- nodes are deleted, which have or do not have predetermined attributes, and / or

- Nodes and / or elements of the nodes are deleted, which are not assigned directly to the user. The weighting of the nodes may include static node weighting and / or dynamic node weighting, where

in the case of static node management, the number of child nodes assigned to the respective node, the number of the respective sibling nodes, the depth of the respective node in the tree-shaped data structure, the visibility of the node, or a combination thereof are taken into account, and

for the dynamic node weighting for each node, the age, the time of the last change, the number of changes, the number of shifts within the tree-shaped data structure, the number of markers, the visibility of the node, an attenuation factor, or a combination thereof be taken into account.

Determining the elements associated with the nodes may include preprocessing the determined elements, wherein the pre-processing of the elements decomposes text into tokens and / or terms, provided that the element is a text element and / or references are processed, if the element is a reference element , The element weights assigned to the elements in the initialization step may be adjusted, where, when adjusting the respective element weights, the element type, attribute values of the attributes associated with the element, a frequency of the element within the tree-shaped data structure, the number of tree-shaped data structures in a collection of tree-shaped ones Data structures in which the element occurs, a frequency of the element within a collection of tree-shaped data structures, the size of the tree-shaped data structure relative to other tree-shaped data structures in a collection of tree-shaped data structures, the position of the element within the node, the language of the element, the Number of elements within the node, the distance of the element to similar elements of other nodes, frequency of the element in the path between the node and the root node, the age of the element, the time of the last change, the number of changes, the number of marks, the visibility of the element, an attenuation factor, or a combination thereof.

In the case of static node weighting and / or dynamic nodal weighting or after static knot weighting and / or after dynamic knot weighting and / or during or after element weighting, an inheritance of the node weight or element weight may be taken into account.

The generated user model may be stored in a storage device to be provided to the recommendation service. All elements can be stored together with the respective element weightings as a user model, or for each element type a separate user model can be stored, whereby the user models of the different element types form an overall user model. In the case of a plurality of tree-shaped data structures that can be assigned to the user, a number of user models can be generated for each tree-shaped data structure, which together form an overall user model assigned to the user.

Each tree-shaped data structure can be assigned a tree weighting.

For a new user-assignable tree-shaped data structure, the user model assigned to the user can be adapted. Elements referenced by the tree-shaped data structure can be inserted into the user model and treated as elements of the tree-shaped data structure. A generated user model can be assigned information about the user model type.

The method may further include selecting objects based on predetermined selection criteria, wherein an object comprises a user model or an item model.

The selection criteria may include:

- Objects of a predetermined type, and / or

Objects having a predetermined similarity to the user model and / or item model, similarity values being determined between the generated user model and / or item model and the objects before the selection.

By an item model, a promotion program can be represented, wherein the item model representing the promotion program is selected when the user model has a predetermined similarity to the item model.

Also provided is a system for generating a user model, in particular for a recommendation service, from at least one tree-shaped data structure, wherein the user model comprises information about a user, wherein the tree-shaped data structure can be assigned to the user, wherein the tree-shaped data structure is a root node and a number of child nodes connected via edges to the root node or to a child node, and wherein at least one node is associated with at least one element, the system comprising: at least one memory device for storing at least one tree-shaped data structure, and

a processing device which is coupled to the storage device and which is adapted to carry out a method according to one of the preceding claims, in order to generate a user model and to store the generated user model in the storage device and make it available to a recommendation service.

Furthermore, a data carrier product is provided, with a program code stored thereon, which can be loaded into a computer and / or into a computer network and is adapted to carry out a method according to the invention.

Brief description of the figures

The invention will be explained in more detail with reference to an embodiment and the drawing. In the drawing shows:

1 shows a known from the prior art approach for a so-called "user-item matching";

2 is a known from the prior art "Content Based Filtering"

Method;

Fig. 3 is a so-called "Collaborative Filtering" method, as it is known from

Prior art is known;

4 shows a modification of the "Collaborative Filtering" method known from the prior art;

5 shows a tree-shaped (hierarchical) data structure;

FIGS. 6a, 6b show two tree-shaped data structures which have the same meaning in the sense of the invention; and FIGS. 7a, 7b show a flow diagram of a method according to the invention. Detailed description of the invention

definitions

- Tree-shaped data structure - A tree-shaped data structure (hereinafter BD) is a data structure with which a monohaurarchy can be mapped. In this case, nodes in the data structure are connected in a tree-shaped manner by means of edges. There is exactly one root node that can have any number of child nodes. Each child node can in turn have any number of children's nodes.

Examples of BD within the meaning of the invention are, but are not limited to, directory structures and / or file systems on a hard disk (folders and files) or so-called mind maps. If the BD is a file system, the "leaves" (which are the last nodes of a path in a BD, respectively) correspond to files or file associations, and all other nodes correspond to directories or folders. Nodes of a BD usually contain one or more elements. These elements can be of different types. Common elements or element types are: text (in the case of a file system, the node text would be the file or directory name), additional notes, tables, appointments, multimedia objects (music, film, image), icons, formulas, links (usually to external Items), numbers, and / or binary code (especially if the BD is a directory structure and the node is a file). A reference can be a unique URI (Uniform Resource Identifier), eg hyperlink, local link / link to a file on a storage medium (eg hard disk). A reference can also be a non-unique description that identifies an item (eg title of a document, author name, photo, BibTeX Key, name of a place or product). Each of the elements can have a number of attributes. Thus, text can be formatted differently, so take different values in terms of size and color. Nodes themselves can also have attributes, in particular to format the display of the nodes or to assign specific functions to the node. For example, nodes may be represented as "collapsed" or "expanded" by attributes, that is, visible or invisible to the user. Just like nodes, individual elements of a node can be visible or invisible to the user.

Edges in a BD are usually undirected and usually contain no textual information. Edges can also be directed.

Item items are arbitrary objects, ie, for example, documents (books, web pages, scientific articles), files, advertisements (in image, text, sound), persons, music pieces or music albums, movies, products, geographical locations, etc. or theirs Digital representation (ie not necessarily a physical book, but eg the digital copy / representation of the book in various formats).

User - A user is a person who applies or uses the system according to the invention. A user can also be a so-called agent, a type of electronic person or a system that simulates the behavior of a real person.

User Model - A user model includes the interests, knowledge or other information about the person, usually in machine-readable form. In the following, the terms interests or knowledge of a user or information about the user are used synonymously. - Connection between BD and user - A BD is in connection with a user or is assignable to the user if this user created, edited, downloaded or opened the BD, for example, or if the BD was or was in the possession of the user (eg is stored on the user's hard disk).

- Collection - A collection is the set of all BD to which the system according to the invention has access. According to the invention, tree-shaped data structures BD which are associated with the user or can be assigned to a user are analyzed in order to obtain a model of the user, i. a user model to create. A user model includes, but is not limited to, information about the user's interests and knowledge.

5 shows a tree-shaped data structure BD according to the invention. A tree-shaped data structure BD comprises a number of nodes, wherein a special node represents the root node or the root node. The other nodes are referred to as children's nodes, where the children's nodes are connected via edges to the root node or to a child node. Nodes that do not contain child nodes are called "leaves". Each node may contain one or more references to external items. In Fig. 5, the node 2.i has such a reference to an item. According to the invention, the content of a tree-shaped data structure and possibly the items which are linked from a tree-shaped data structure or the contents of the linked items describe the interests of the user and can be used to generate a user model. In simple terms, if the nodes of a tree-shaped data structure frequently contain the word "patent", it can be inferred that the user of the tree-shaped data structure or the user to whom the tree-shaped data structure belongs. ordenbar is interested in patents or has knowledge in this field. The same conclusion can be drawn if the word does not appear in the tree-shaped data structure itself, but many documents (eg patents or web pages) are linked in the tree-shaped data structure containing the word "patent".

FIGS. 7a and 7b show a flow diagram of a method according to the invention for generating a user model from at least one tree-shaped data structure.

In a first step, a preprocessing takes place in which the tree-shaped data structures are adapted or processed for further processing. The preprocessing step is an optional step and does not necessarily have to be performed, for example if the tree-shaped data structures already have the format required for further processing.

The preprocessing may include converting the tree-shaped data structures to a system-readable format. Further, preprocessing may involve deleting nodes from the tree-shaped data structures, i.e., deleting certain nodes that are not relevant to creating a user model. Deleting a node means that it is removed from the tree-shaped data structure and the child nodes of the node to be deleted are either also removed or the child nodes are assigned to the parent node of the node to be deleted. For example, a node may be dropped if the node meets one or more of the following criteria:

- The node is empty;

- The node contains a specific element (not), such as text or reference; - The node has a certain attribute (not); - The node or elements of the node are not directly connected to the user. This may be the case if a node was not created by the user (eg nodes of a mind map linking to a file and where the text of the node is equal to the file name of the linked file, it can be assumed that the node is automatically, eg was created by "drag &drop", so the text of the node was not generated by the user, and therefore has little or no meaningfulness).

Of course, when deleting a node further criteria can be taken into account.

After the (optional) preprocessing, a user model is generated in a next step based on the tree-shaped data structure. In this case, the nodes of a tree-shaped data structure or the elements of the nodes are analyzed in order to identify the interests etc. of the user and store them in a user model which is assigned to the user. This happens in the following steps:

A) Weights the nodes

The weighting of the nodes is based on the assumption that some nodes or their elements are more meaningful to describe the interests of the user than other nodes or their elements. When weighting the nodes, two partial weights can be calculated. However, it is also possible to calculate only one of the two partial weights and to consider this partial weight as the node weight of a node. The two subweights include static node weighting and dynamic node weighting. Of course, other subweights not mentioned here can also be calculated. The combination of the calculated part weights gives the node weight of a node. For static node weighting, the following criteria can be taken into account:

Number of child nodes: A node is weighted depending on the number of child nodes, e.g. the more children the knot has the more weight this knot receives.

- Number of sibling nodes: Sibling nodes of a node are those nodes that have the same parent node as the considered node. Here, the node is weighted depending on the number of sibling nodes of the node, e.g. The more siblings a knot has, the less weight it gets.

- Node Depth: The higher up (that is, the closer to the root node) in a tree-shaped data structure is a node, the more weight it receives. The root node gets so much weight, the leaf nodes less.

- Knot visibility: Visible knots receive more weight than invisible knots.

Attributes: If the node is highlighted by a particular attribute, e.g. by colored marking or underlining, he gets more weight. Is it weakened by certain attributes, e.g. by graying out or stroking it, it gets less weight.

With static node weighting, the tree-shaped data structure is only considered at a certain point in time. However, according to the invention, the nodes can also be weighted dynamically, ie changes and usage intensity of the tree-shaped data structure over time can flow into the node weighting. If, for example, a node is used more intensively, or has been used more intensively in the past than other nodes, then this node can receive a higher weight than the other nodes. The weighting can result among other things from: - Age of the knot: Older knots may receive more or less weight than younger knots. Preferably, younger knots get a higher weight. There may also be a threshold, for example of the type "nodes that are at least 12 hours and at most 5 days old". Nodes that meet the threshold criterion receive a higher weight.

- Time of last processing (for example editing): Nodes that were recently edited receive a higher weight.

Number of edits: Preferably, a node that has been edited more often than other nodes may be given a higher weight. - Number of shifts: The more often a node has been moved (cut and reinserted), the more weight it receives.

- Number of markers: The more often a node is selected / marked, the more weight it gets.

- Visibility period: The longer a node was visible, the more weight it receives.

- Number of Visibility: The more often a node has been made invisible and visible again, ie it has been folded in and out, the stronger its weight.

- Number of tracked links: The more often a link has been opened, the greater the weight of the node.

Each of the above-mentioned (dynamic) weights may be weakened or enhanced by a time parameter. Here is an example: A node is weighted twice as much if it has been edited at least twice. If the last edit is longer than X weeks, the weighting is weighted only 1.5 times by a damping time parameter.

In addition to the static and / or dynamic node weighting mentioned above, inheritance of weights may be provided. Inheritance is preferably performed after the static or dynamic weighting has been performed. When inheriting weights, nodes can "inherit" weights from their surrounding nodes. If, for example, a parent node has a very high weight (because, for example, he was often selected), the child node can also be given a higher weight than if it were only considered individually. Preferably, all children's nodes and their nodes, all sibling nodes and all parent nodes and their parents get a higher weight to the root, the additional weight getting weaker the farther away from the hereditary node is the inheriting node. In addition, it can be provided that only nodes that exceed a threshold value (eg weighting five times greater than normal) can inherit the weight to surrounding nodes. If several nodes belong to a tree-shaped data structure of a certain "group", the weighting of the individual nodes of the group can be matched or inherited. Groups can be visually recognizable in the tree-shaped data structure or can be distinguished by specific attributes or element types. For example, a tree-shaped data structure contains some nodes that have references. All of these nodes are assigned to the Reference Node group. Although only 95% of these nodes have a very high weighting, the system assigns a very high weighting to all nodes (including the remaining 5%). The weak nodes of a group inherit from their other group nodes.

B) Identifying the Elements in the Nodes In a further step, the elements in the nodes are identified and possibly fed to preprocessing. First of all, the elements and their attributes contained in each node are identified or determined.

If the determined element is a text element, this text is further decomposed into tokens or terms (terms). Often the term can be a single word, but sometimes also compound words like "Mind Map". In the following, each term is considered an independent element of type Text.

The terms can be processed further. For this purpose, methods known from the prior art, for example in the field of information retrieval, can be used. Examples of such methods are about

- Stemming: words are reduced to their stems. For example, the word "trunks" would be truncated to "tribe".

- Stop Word Rem oval: Very frequently occurring words with little informative value (for example, those who, who, where, who, why, already, so, therefore, ...) are removed.

- Latent Semantic Indexing (LSI): Latent Semantic Indexing combines or considers synonyms of words.

Translation: the words are translated into a reference language, e.g. English, translated.

- Spelling Correction: spelling errors are detected and corrected or deleted.

The determined element can also be a reference. References can also be preprocessed by, for example, converting the URI (Uniform Resource Identifier) and / or the special characters to a uniform format for each reference or, if it is not a unique reference (eg only the title of a document) will find a unique identifier (in the case of a document, for example, the ISBN).

C) Weights the elements Similar to the nodes, each element of a node can also be weighted. Especially text and reference elements are important for the creation of the user model according to the invention. Preferably, each element first receives a predetermined weighting (initial weighting), such as the weighting 1 or the weighting of its associated node. This can be done, for example, in an initialization step in which all elements are provided with an initial weighting.

The initial weight of an element may be strengthened or weakened, preferably based on one or more of the following factors:

- Element Type: Elements of certain types can receive different basis weights. For example, a text element representing the general node text may receive a higher weighting than a text element representing an additional note. - Attributes: Depending on the attributes, elements can get a stronger or weaker weighting. For example, a text element that is bolded may be weighted more heavily than a text element without formatting.

- Frequency element: The more often an element occurs in the tree-shaped data structure, the stronger its weighting can be.

- BD frequency: the less tree-shaped data structures in the entire collection have an element, the more it is weighted. This is based on the assumption that an element, which occurs only a few times in all tree-shaped data structures, is more meaningful than an element that occurs in almost every tree-shaped data structure. For example, in a collection of 100 tree-shaped data structures, if only a single tree-shaped data structure contains the term "tree," then this term would be weighted more heavily with respect to the tree-shaped data structure than if 90 other tree-shaped data structures also contained that term. Collection Frequency: The less often the element appears in the total of all elements of the entire collection, the more it is weighted. This is very similar to the BD frequency, except that the BD frequency counts the number of tree-shaped data structures in which the element occurs and, at its collection frequency, the total number of elements itself.

BD size: The larger the tree-shaped data structure, the less heavily the element is weighted. This is based on the assumption that large tree-shaped data structures tend to contain more elements but should not be favored over small tree-shaped data structures. The size of a tree-shaped data structure can be specified by the number of nodes of a tree-shaped data structure or by the number of elements in a tree-shaped data structure.

Position in the node: Elements that are in the front of the node are weighted differently than elements further back in the node. If a node contains, for example, 100 terms, then it can be provided that only the first ten terms are taken into account. Furthermore, it can be provided that the further terms (for example the next 10 terms) are taken into account with less weight.

Language (if the node contains text elements): Unlike documents, such as web pages, tree-shaped data structures often include terms in different languages. The elements of a node can be weighted differently depending on the language. This also means that if e.g. the text of a node in a particular language is weighting the other elements of the node (for example, a reference) less or more.

Node length: Elements are weighted depending on the node length. The fewer elements a node contains, the more its elements can be weighted. - Distance to similar elements: The less similar elements in the vicinity of a node to which the element belongs, the more weight the element gets. For example: If a node has a reference to an item and the surrounding nodes (eg all children, siblings and parent nodes) do not contain any references, then this reference could be given a particularly high weight, as it seems reasonable to assume that the reference also refers to the surrounding nodes. If, on the other hand, sibling nodes also have references, this reference does not receive a particularly high weight.

- Element repetition: Tree-shaped data structures can be created very user-specifically. For example, it may happen that a user

Elements in the nodes often repeated, but not another user. Here, it can be advantageous to reduce the weight of an element, the more often one of the parent nodes (up to the root node) or sibling node already contains this element. Fig. 6a and Fig. 6b illustrate this case. FIGS. 6a and 6b each show a tree-shaped data structure with the same statement of two users, the tree-shaped data structures nevertheless appearing differently. In FIG. 6a, the term "recommending" is repeated several times, but not in FIG. 6b. Nevertheless, the term "recommender" would be equally applicable for both tree-shaped data structures or users and should be equally weighted.

The weighting of the elements can also take place using the same methods with which the nodes are weighted. For example, older elements can be weighted less heavily than newer ones, and inheritance can also take place in element weights.

D) Save the user model

In an advantageous embodiment of the invention, it may be advantageous to store the generated user model to make it about a recommendation service for To make available. Alternatively, a user model can also be created on demand without saving it.

According to the invention, at least two different approaches can be used to store a user model. The two approaches shown here are type-neutral storage and type-dependent storage of a user model.

In type-neutral storage, all elements are stored with their weighting. That is, terms, links, links, images, etc. are all stored together in the model. In a concrete embodiment of the method according to the invention, the vector space model described at the outset can be used for this purpose, which is extended by the invention such that not only terms with a weighting can be stored, but also any elements of different types with their weighting and their type.

With type-dependent storage, a separate user model can be generated for each element type, which together form an overall user model. A user model then includes, for example, a text model and a reference model. For this purpose, standard methods from the information retrieval area or user modeling area can be used. For example, a standard model for a text-based model would again be the named vector space model in which the individual terms are weighted according to the method described above. References can also be stored in other models that, for example, also take into account the order of the elements in the tree-shaped data structure.

E) Further steps

The subsequent steps can optionally be carried out for the abovementioned steps. If a user has relationships with several tree-shaped data structures, one (or more) models can be generated for each tree-shaped data structure, as described above, and the various models are finally joined together to form an overall model. Here, different tree-shaped data structures can be provided with different weighting. The weighting follows similar principles as the weighting of the nodes or elements. For example, a newer tree-shaped data structure or tree-shaped data structures that are opened or edited more frequently may be weighted more heavily.

If a new relationship arises between a tree-shaped data structure and a user for whom a user model already exists, the existing user model can be extended by the elements of the new tree-shaped data structure.

If a tree-shaped data structure contains references to items, these items can also be used to generate a user model. That is, the elements in the linked item are inserted into the user model in a manner similar to elements of the tree-shaped data structure itself. These items can be given a lower weighting. If the linked item is a tree-shaped data structure, its elements are weighted using the method described above. If the linked element is e.g. a web page, then the weighting can be done with standard methods, like the TF-IDF.

The above type-neutral and type-dependent models may be subdivided into submodels, for example, in FIG

- Models for short-term interests: For example, this model would only contain data from a session or the last edited tree-shaped data structure (or data of the tree-shaped data structure that were edited in a certain period of time). Long term interest models: This model would contain interests based on all or at least several tree-shaped data structures.

Models for Different Interests: It is conceivable that users create different tree-shaped data structures for e.g. different projects. That is, a tree-shaped data structure (or even several) are used for project A and another tree-shaped data structure (or more) for another project B. According to the invention, tree-shaped data structures that are very different can be used for creating different models (the possibly also be subdivided into long-term and short-term interests). The identification of related tree-shaped data structures can be done as follows:

- Content analysis: Here are the methods for weighting terms or links similar tree-shaped data structures determined. If the tree-shaped data structures fall below a certain similarity threshold, they are used for different user models.

- Temporal analysis: Tree-shaped data structures that are rarely or never used at the same time, opened, etc., are used for different user models.

Tree-shaped data structures can be used for different types of applications, such as file management, brainstorming, document management, project planning, etc. The type of application is noted in the user model. When a user creates different tree-shaped data structures for different types of applications, different user models are created again. The type of application can be determined as follows:

- About the application that created the tree-shaped data structure. For example, it can be assumed that a tree-shaped Data structure with which Windows Explorer has been created is used for file management.

By manually indicating the user: In the application for creating the tree-shaped data structure, the user can specify for which purpose he wants to create the tree-shaped data structure (for example brainstorming, project planning, etc.).

Automatic analysis: the system analyzes the tree-shaped data structure and closes, e.g. on the basis of their structure, their use or their source format, automatically on the type of application.

For automatic analysis, the following rules can be applied:

- If tree-shaped data structures contain many references, the primary application is file management, web page management or document management.

- If very many nodes in the tree-shaped data structure are created very quickly, then moved and edited and the tree-shaped data structure then never or rarely opened, it was created for brainstorming.

- If the tree-shaped data structure grows slowly and continuously, it is not a tree-shaped data structure for brainstorming.

The automatic analysis may include factors such as growth rate, size, duration of use, type of usage, application to create the tree-shaped data structure, and / or other factors.

With the method according to the invention described above, in each case one or more user models have been generated for a number of users. These user models are used according to the invention to give a user recommendations for items. That is, based on the user models, items can be identified that the user is likely to consider interesting. sant / relevant. In order to recommend items, according to the invention a method is suggested which considers both item models and user models as equal. In the following, items and users are summarized as an object, whereby each object can have a type (type = user, type = website, type = email, type = scientific article, etc.).

The method according to the invention for proposing objects can comprise at least the following steps: The user models have already been created and stored with the aforementioned methods according to the invention. Now, of all items that can potentially be recommended, an item model is generated using techniques known in the art. For example, if the item is a web page, terms could be weighted using TF-IDF and saved as a Vector Space Model. If the item is a scientific paper, TF-IDF could be used, as well as other methods, such as Citation Proximity Analysis, to model the item.

The above type-neutral method can also be used to create a corresponding model of items which, like tree-shaped data structures, contain several element types. This applies, for example, to scientific articles. A scientific paper usually contains text and references (references to other scientific papers), comparable to a tree-shaped data structure, which also contains text and references (eg to files). Therefore, both can be mapped relatively easily with compatible model types. It is important that according to the invention all objects are mapped in the same or a compatible model in order to be able to compare them later. Once models have been created and saved from all objects, they are compared for similarity. Do the object models contain several submissions? For example, for short-term and long-term interests or for different element types, each of these submodels is compared to the other object models. The comparison can take place using standard methods, such as Cosine for comparisons in the Vector Space Model or similarity measures such as Greedy Citation Tiling for reference-based models. In the end, this means that the similarity of the users to each other, the similarity of the items to one another and the similarity of the items and users to one another are calculated in one step by the object object matching. This has the advantage that a user can be given very flexible recommendations later.

If a user is to be given a recommendation, based on his object or user model:

- recommends all objects that are of a certain type (e.g., website or "item" in general) or are of a different type (e.g., users) and exceed a certain similarity value;

- recommended all objects referenced in any of the high-weight objects identified in the previous step. Was e.g. In the previous step, a web-type object of the type web page is determined, the user is recommended the web pages that are often linked to the similar web page. Or, if a similar object of type "user" has been determined, the items that are frequently referenced in the user model of the similar user or that are in close association with the user are recommended; and or

- recommended all objects that are similar to the objects referenced in the user's user model. This means: If a user has a reference to a website X in his user model, the objects that are similar to this website X are recommended.

In addition or as an alternative to the approach described above, a method based on machine learning can be used. The references of a groove Zermodells can be used with machine learning techniques to learn user preferences. Here, any reference to an item is considered a positive association that the system learns and provides recommendations for new items based on it.

In a specific embodiment of the method according to the invention, this can be used to make recommendations for subsidies / funding programs. For this purpose, a user model is first generated from a tree-shaped data structure, as described above. The support program itself is considered an item. The item in turn is represented by a text describing the funding program. This text can be a website, a brochure in PDF format, social tags, etc. If, for example, a user model contains the heavily weighted term "Recommender Systems" and there is a funding program whose website also often includes this term, this support program would be recommended to the user ,

By the user models and recommendations according to the invention, the benefit of many software programs for the user can be increased because they receive interesting recommendations.

Claims

claims

A computer-implemented method for generating a user model, in particular for a recommendation service, from at least one tree-shaped data structure, wherein the user model comprises information about a user, wherein the at least one tree-shaped data structure can be assigned to the user, wherein the tree-shaped data structure comprises a root node and a Number of child nodes, which are connected via edges to the root node or with a child node, wherein at least one node is associated with at least one element, and wherein

weighting of the determined elements and assigning an element weighting to each element, and

a user model is generated, wherein the generated user model comprises the determined elements and the element weight assigned to the respective element.

The method of claim 1, wherein the nodes of the tree-shaped data structure are weighted and each node is assigned a node weighting. 3. The method according to any one of the preceding claims, wherein in an initialization step each element is assigned a predetermined element weighting or the node weighting of the associated node. Method according to one of the preceding claims, wherein the method further comprises a preprocessing step, in which

- nodes that have no elements assigned are deleted, and / or

- Nodes and / or elements of the nodes are deleted, which are not assigned directly to the user.

Method according to one of the preceding claims, wherein the weighting of the nodes comprises a static knot weighting and / or a dynamic knot weighting, wherein

for the dynamic node weighting, the age, the time of the last change, the number of changes, the number of shifts within the tree-shaped data structure, the number of markings, the visibility of the node, an attenuation factor, or a combination thereof are taken into account for each node become.

Method according to one of the preceding claims, wherein the determination of the elements associated with the nodes comprises a preprocessing of the determined elements, wherein the preprocessing of the elements decomposes text into tokens and / or terms, provided that the element is a text element, and / or references, provided that the element is a reference element.

Method according to one of the preceding claims, wherein the element weights assigned to the elements in the initialization step are adapted, wherein, when adjusting the respective element weighting, the element type, attribute values of the attributes assigned by the element, a frequency of the element within the tree-shaped data structure, the number of tree-shaped data structures in a collection of tree-shaped data structures in which the element occurs, a frequency of the element within a collection of tree-shaped data structures, the size of the tree-shaped data structure relative to other tree-shaped data structures in a collection of tree-shaped data structures, the position of the element within the node , the language of the element, the number of elements within the node, the distance of the element to similar elements of other nodes, frequency of the element in the path between the node and the root node, the age of the element, the time of the last change, the number of changes, the number of marks, the visibility of the element, a damping factor, or a combination thereof.

Method according to one of claims 5 to 7, wherein in the case of static nodal weighting and / or dynamic nodule weighting or static nodal weighting and / or dynamic nodal weighting and / or element weighting, the heredity of the Node weight or the element weight are taken into account.

Method according to one of the preceding claims, wherein the generated user model is stored in a memory device to be provided to the recommendation service.

10. The method according to claim 9, wherein all elements are stored together with the respective element weights as a user model, or wherein for each element type a separate user model is stored, the user models of the different element types form an overall user model.

11. The method according to any one of the preceding claims, wherein in the case of a plurality of tree-shaped data structures that can be assigned to the user, a number of user models is generated for each tree-shaped data structure, which together form an overall user model assigned to the user.

12. The method of claim 11, wherein each tree-shaped data structure is assigned a tree weighting. 13. The method according to claim 11, wherein the user-assigned user model is adapted for a new tree-shaped data structure that can be assigned to the user.

14. The method according to any one of the preceding claims, wherein elements referenced by the tree-shaped data structure are inserted into the user model and treated as elements of the tree-shaped data structure.

Method according to one of the preceding claims, wherein information about the user model type is assigned to a generated user model.

16. The method of claim 1, wherein the method further comprises selecting objects based on predetermined selection criteria, wherein an object comprises a user model or an item model.

17. The method of claim 16, wherein the selection criteria include:

- Objects of a predetermined type, and / or

Objects which have a predetermined similarity to the user model and / or item model, wherein similarity values are determined between the generated user model and / or item model and the objects before the selection.

18. The method according to claim 1, wherein a promotion program is represented by an item model, and wherein the item model representing the promotion program is selected when the user model has a predetermined similarity to the item model.

19. System for generating a user model, in particular for a recommendation service, from at least one tree-shaped data structure, the user model comprising information about a user, wherein the tree-shaped data structure can be assigned to the user, wherein the tree-shaped data structure comprises a root node and a number of child nodes comprising, which are connected via edges to the root node or to a child node, and wherein at least one node is associated with at least one element comprising

at least one memory device for storing at least one tree-shaped data structure, and

20. A data carrier product with a program code stored thereon which can be loaded into a computer and / or into a computer network and is adapted to carry out a method according to one of claims 1 to 18.