WO2022163203A1

WO2022163203A1 - Recommendation device

Info

Publication number: WO2022163203A1
Application number: PCT/JP2021/046811
Authority: WO
Inventors: 邦宏相場; 素平小野
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2021-01-29
Filing date: 2021-12-17
Publication date: 2022-08-04
Also published as: JPWO2022163203A1

Abstract

This server 10 comprises: a diversity score calculation unit 117 for calculating, for each second content, a diversity score that progressively becomes higher in correspondence to a decrease in the degree of similarity between first content selected as the content the possibility of which being favored by a relevant user is high and second content other than the first content among a plurality of content; a recommendation degree calculation unit 118 for calculating, for each candidate content, the degree of recommendation to the relevant user on the basis of a user score obtained on the basis of the user attribute information of the relevant user, the user score indicating the degree of preference of the relevant user to each candidate content, and the diversity score for each second content; and a determination unit 119 for determining the content to be presented to the relevant user, on the basis of the degree of recommendation calculated by the recommendation degree calculation unit 118.

Description

Recommendation device

One aspect of the present invention relates to a recommendation device.

In content providing services that provide content to users, there is a well-known recommendation method that estimates content that is easy for the user to select and presents the estimated content to the user. In Patent Document 1, the degree of similarity between designated content designated by a user and search target content is calculated based on the feature amount of each content, and search target content with high similarity to the designated content is determined based on the user's preference. An information processing apparatus is described that presents to a user as content that matches the above.

Japanese Patent Application Laid-Open No. 2006-48320

However, in the above-described recommendation method, the presented content is fixed to content having similar characteristics, and the presented content does not come as a surprise to the user, and the user may become bored. There is

Therefore, it is an object of one aspect of the present invention to provide a recommendation device capable of providing a more convenient recommendation service to users.

A recommendation device according to one aspect of the present invention is a recommendation device that determines content to be presented to a user from among a plurality of candidate content in a content providing service that provides content to the user, and determines the possibility that the content will be preferred by the user. Diversity calculating for each second content, a diversity score that increases as the degree of similarity between the first content selected as high content and the second content other than the first content among a plurality of candidate contents decreases. A score calculation unit, a user score indicating the degree of user preference for each of a plurality of candidate contents obtained based on user attribute information related to user attributes, and a diversity score calculation unit for each second content calculated A recommendation degree calculation unit that calculates the degree of recommendation to the user for each of a plurality of candidate contents based on the diversity score of the above, and content to be presented to the user is determined based on the degree of recommendation calculated by the recommendation degree calculation unit. and a determining unit for

In the recommendation device according to one aspect of the present invention, based on the user score indicating the degree of preference of the user for each candidate content obtained based on the user attribute information, and the diversity score for each second content, candidate content The degree of recommendation for each is calculated. The diversity score is a score that increases as the degree of similarity between the first content and the second content decreases. Content to be presented to the user is determined based on the degree of recommendation for each candidate content. According to the above configuration, in addition to the user score that reflects the user's preference for the content, the diversity score that indicates the degree of diversity with respect to the first content is considered, and the content to be recommended is determined. It is possible to present not only content that is presumed to be preferred by the user, but also content that is newly discovered by the user. In other words, it is possible to diversify recommendations by ensuring the possibility that content that is surprising to the user will be selected as content to be recommended. Therefore, according to this recommendation device, it is possible to provide a more convenient recommendation service to the user.

According to one aspect of the present invention, it is possible to provide a recommendation device capable of providing a more convenient recommendation service to users.

FIG. 1 is a functional block diagram of a server according to this embodiment. FIG. 2 is a schematic diagram showing an example of a method of generating the first model shown in FIG. FIG. 3 is a schematic diagram showing a flow of acquiring item values of the genre "sports" using the first model shown in FIG. FIG. 4 is a schematic diagram for explaining the application of Matrix Factorization. FIG. 5 is a schematic diagram for explaining a series of flows for calculating a diversity score. FIG. 6 is a schematic diagram showing a series of flows from calculation of the degree of recommendation to determination of content to be presented to the user. FIG. 7 is a flowchart showing an example of processing by the server according to this embodiment. FIG. 8 is a diagram illustrating an example of a hardware configuration of a server;

An embodiment of the present invention will be described in detail below with reference to the accompanying drawings. In the description of the drawings, the same or corresponding elements are denoted by the same reference numerals, and overlapping descriptions are omitted.

FIG. 1 is a functional block diagram of the server 10 according to this embodiment. The server 10 (recommendation device) is an information processing device that provides a conventionally known recommendation service. A recommendation service is a service that presents a list of recommended contents to a user in a content providing service that provides (sells) contents to users. Examples of content include movies, music, and electronic books. The content provided by the content providing service includes content of various genres. Examples of genres include genres related to sports (hereinafter simply referred to as "sports"), genres related to anime (hereinafter simply referred to as "animation"), and genres related to magazines (e-books) (hereinafter simply referred to as "magazines"). be done. The server 10 may be a system different from the recommendation system, or may also serve as the recommendation system.

For example, when the user operates the user terminal 20 to access the content providing service, the display unit 21 of the user terminal 20 displays a list of content recommended to the user. An example list of content includes, for example, a plurality of icons each representing content. For example, the user terminal 20 is a smart phone having a touch panel display as the display unit 21 . However, the user terminal 20 is not limited to the form described above, and may be a terminal having a display unit 21, such as a tablet terminal, a desktop PC, a laptop PC, or the like. The user can use (for example, view, purchase, etc.) the content by performing an operation (for example, touch operation, click operation, etc.) to select the icon of the content of interest.

The size of the display area of the display unit 21 of the user terminal 20 is limited. Therefore, in the present embodiment, the display unit 21 of the user terminal 20 displays, among a plurality of candidate contents, the contents with the highest recommendation degree to the user in each genre. In the recommendation service, the server 10 calculates a recommendation degree, which will be described later, and determines content to be presented to the user from among a plurality of candidate contents based on the recommendation degree.

In the recommendation service, a method of determining the content to be presented to the user using a reinforcement learning algorithm such as the bandit algorithm that performs optimization inference to maximize the reward (estimating the content that the user prefers) can be adopted. . However, if such a bandit algorithm is simply used, content that is presumed to be preferred by the user is presented to the user. More specifically, the bandit algorithm actively selects the best content (which is likely to be preferred by the user) based on the information known so far, and which content is preferred by the user. It is an algorithm that balances "searching", which selects other content to explore whether Here, by making the weight of "search" larger than the weight of "utilization", it is considered that the fixation of the presented content as described above can be prevented to some extent. That is, by increasing the percentage of "search", it is possible to increase the probability that content other than content that is known to be highly likely to be preferred by the user will be presented to the user. However, the above-mentioned "search" does not make it easier for the user to positively select unexpected contents. Therefore, it is not sufficient to simply increase the proportion of "search" in conventional bandit algorithms in order to actively provide users with a serendipity experience. Therefore, the server 10 is configured to present (recommend) to the user not only content that is presumed to be preferred by the user, but also content that is newly discovered by the user. The configuration of the server 10 will be described in detail below.

As shown in FIG. 1, the server 10 includes a request receiving unit 111, a first acquiring unit 112, a first model generating unit 113, a user score calculating unit 114, a second model generating unit 115, a second An acquisition unit 116 , a diversity score calculation unit 117 , a recommendation degree calculation unit 118 , a determination unit 119 , and a display control unit 120 are provided. The server 10 also includes a user attribute storage unit 10a, a content ID storage unit 10b, a user log storage unit 10c, and a content attribute storage unit 10d as elements that store various data.

The request reception unit 111 receives request information indicating a request for access to the content providing service from the user (user terminal 20). Request accepting unit 111 outputs the request information to first obtaining unit 112 .

The first acquisition unit 112 (user preference acquisition unit) acquires the preference information of the target user to whom the content is presented based on the user attribute information. The target user is the user of the user terminal 20 that is the source of the information request received by the request receiving unit 111 . The preference information is information about user preferences. The preference information includes multiple item values related to preferences. A plurality of item values correspond to a user attribute vector z, which will be described later. Each item value is a numerical value indicating the degree of preference of the target user for each of a plurality of genres. A plurality of genres are genres related to content and are determined in advance. Examples of multiple genres include "sports", "animation", and "magazines". Also, one item value is associated with one genre. Each item value takes, for example, a value between 0 and 1, and a larger value indicates a higher degree of preference of the target user. In this way, the preference information includes information in which the target user's preferences for each of a plurality of genres are quantified.

The first acquisition unit 112 inputs the user attribute information of the target user to a plurality of first models M1 (preference estimation models) generated by the first model generation unit 113, which will be described later. is obtained as the target user's preference information.

The first model generation unit 113 inputs the user attribute information of the user and calculates the preference of the user by executing machine learning using teacher data including user attribute information of the user and information about the preference of the user. A plurality of first models M1 are generated that output estimates of the information.

The first model generation unit 113 generates a plurality of first models M1 by executing machine learning using teacher data including user attribute information of the user and information indicating the user's preferences. In the present embodiment, the multiple first models M1 include multiple genre-specific models (first model M11, first model M12, first model M13, etc.). The first model M11 corresponds to the genre "sports". The first model M12 corresponds to the genre "anime". The first model M13 corresponds to the genre "magazine". Each first model M1 is a model configured to input user attribute information of a user and output an estimated value of user preference information regarding the corresponding genre. The estimated value corresponds to the item value of the corresponding genre. The processing of the first model generation unit 113 will be described below, focusing on the first model M11.

As the machine learning performed by the first model generation unit 113, conventionally known methods such as gradient boosting, multiple regression analysis, neural networks (including deep learning using multilayer neural networks), etc. are used. The first model M1 generated by the first model generation unit 113 is not limited to a specific aspect. An example of a method of generating the first model M1 by the first model generation unit 113 will be described below using the example shown in FIG. FIG. 2 is a schematic diagram showing an example of a method of generating the first model M11 (model corresponding to "sports") shown in FIG. The first model generation unit 113 generates, for example, a feature amount related to the user attribute information of the user and an index value (information indicating the user's preference) indicating whether the user likes a genre (here, "sports"). A first model M11 is generated by executing machine learning using teacher data including Here, the feature amount related to the user attribute information of the user corresponds to the input data (explanatory variable) of the first model M11, and the index value corresponds to the output data (objective variable) of the first model M11.

The user's user attribute information includes the user's basic information and usage information of one or more services used by the user. Examples of basic information include the user's gender, generation (or age), location, and occupation. Examples of user usage information include the number of services to which the user has a contract, the number of services to which the user has not subscribed, the frequency of use of each service by the user, and the time of use of each service by the user (for example, one day). unit service average usage time). Note that the feature amount related to the user attribute information may be, for example, a numerical value normalized based on the overall distribution of many users who use each service.

The services subscribed by the user include the content providing service described above and other services provided by the service server 30 (see FIG. 1). The service server 30 is a server that provides a service (hereinafter referred to as "other service") different from the content providing service described above. Other services are, for example, services operated by carriers via mobile networks. The service server 30 stores various information including basic information of each user and usage information of other services by each user. The usage information of other services includes, for example, the usage frequency of other services by the user, the usage time of other services by the user, and the like. Although the number of service servers 30 shown in FIG. 1 is one, the number of other services may be plural. Moreover, the number of service servers 30 may be plural, and for example, a service server 30 may be provided for each other service.

The index value is "1" if the user likes sports, and "0" if the user does not like sports. The index value is obtained, for example, based on data indicating the results of a questionnaire answered by the user in advance, data indicating the "favorite genre" selected by the user when activating an application installed on the user terminal 20, and the like.

According to machine learning using such training data, the first model M11 configured to input the feature amount related to the user attribute information of the user and output the user's preference information for the corresponding genre is obtained. . The output value (preference information described above) of the first model M11 indicates the possibility (probability) that the user corresponding to the input user attribute information likes "sports". Note that the machine learning executed by the first model generation unit 113 is not limited to the above method. Also, the type of user attribute information input to the first model M1 is not limited to the above example.

The first acquisition unit 112 acquires user attribute information corresponding to the user ID of the target user from the user attribute storage unit 10a for input to each first model M1 generated by the first model generation unit 113. . User attribute information associated with a user ID for uniquely identifying a user is stored in the user attribute storage unit 10a. For example, the first acquisition unit 112 identifies the user ID of the target user based on information for identifying the user included in the information request (for example, a terminal ID associated with the user ID, etc.).

The user attribute information includes basic user information as described above (user's gender, age, whereabouts, occupation, etc.), content provision service and other service usage information (user identification information, frequency of service usage by each user, etc.). etc.). Each time the user uses each service, the user ID and usage information of the service used by the user are associated and stored in the user attribute storage unit 10a.

The first acquisition unit 112 acquires user attribute information of the target user (basic information of the target user, usage information of the content providing service and other services) from the user attribute storage unit 10a. Then, the first acquisition unit 112 inputs the acquired user attribute information (more specifically, the feature amount related to each of the above information) to each first model M1, and outputs the output result from each first model M1 to the target user. Acquired as preference information.

Here, an example of a method of acquiring preference information by the first acquisition unit 112 will be described using the example shown in FIG. FIG. 3 is a schematic diagram showing a flow of acquiring item values of the genre "sports" using the first model M11 shown in FIG. The first acquisition unit 112 inputs the basic information of the target user U, the usage information of the content providing service, and the usage information of other services acquired from the user attribute storage unit 10a to the first model M11. Then, the first acquisition unit 112 acquires the item value (here, “0.1” as an example) corresponding to “sports” as the output result from the first model M11. Similar to the first model M11, the first acquisition unit 112 obtains the output results from the other first models M1 (first model M12, first model M13, etc.) as item values corresponding to "animation" and "magazine". to get as In the preference information L of the target user U thus obtained, the item value "0.1" is associated with "sports", and the item value "0.2" is associated with "animation". , and the item value "0.7" is associated with "magazine". In this example, the target user U's preference information L indicates that the target user U prefers magazines to sports and anime. Thus, the preference information L of the target user U is acquired.

The user score calculation unit 114 calculates a user score for each candidate content. The user score is a score that indicates the degree of user preference for each candidate content. In other words, a candidate content with a higher user score is more preferred by the user. The user score calculation unit 114 calculates the user score of the target user for each candidate content using the second model M2.

For example, the user score calculation unit 114 acquires the content IDs of all candidate contents by referring to the content ID storage unit 10b that stores content IDs for uniquely identifying each content, and obtains content IDs corresponding to each content ID. Get the user score for the content you want.

The second model M2 is a machine learning model generated (including the case of updating; the same applies hereinafter) by the second model generation unit 115, which will be described later. More specifically, the second model M2 is a model that inputs the user attribute vector z (preference information) acquired by the first acquisition unit 112 and outputs the user score for each candidate content. A second model M2 is generated by reinforcement learning that rewards the user for selecting content. In this embodiment, the preference information L (see FIG. 3) is used as the user attribute vector z. That is, the user attribute vector z is a vector containing each item value of the preference information L as an element. As an example, the second model M2 is obtained using a bandit algorithm (for example, a contextual bandit algorithm such as LinUCB), which is a type of reinforcement learning algorithm. The second model M2 can be expressed by the following (Equation 1) as an example. The second model M2 consists of parameters (θ _a and A _a ) for each candidate content a. The parameters θ _a and A _a are parameters that are appropriately updated by reinforcement learning. Details of these parameters (initial values and update methods) will be described later.

In (Expression 1), "a" is an identifier indicating a content ID. “ _pt,a ” indicates the user score of candidate content a (content corresponding to content ID “a”). "z" indicates the user attribute vector described above. “θ _a ” is a unit vector having the same dimension as the user attribute vector z. _θa is learned (updated) so that when the content is presented to the user, the higher the probability that the content will be selected by the user (the user having the user attribute ^vector z), the larger the value of _θaTz becomes. . “θ _a ^T z” is a term (utilization term) that contributes to “utilization” in the bandit algorithm.

The second term in (Formula 1) is a term (search term) that contributes to the "search" in the bandit algorithm. “α” is the weight of the search term and is arbitrarily determined. By adjusting the magnitude of α, the ratio of exploitation to search can be adjusted.

The user score calculation unit 114 inputs the user attribute vector z of the target user into the second model M2 corresponding to each candidate content (i.e., (equation 1) corresponding to each candidate content). A user score is calculated for each candidate content. The user score calculation unit 114 outputs information indicating the calculated user score for each candidate content and information in which the user ID of the target user is associated to the recommendation degree calculation unit 118 .

The second acquisition unit 116 acquires a user log indicating the user's action results in the content providing service. The user log includes the user ID of the user who used the content providing service and information indicating the content displayed on the user terminal 20 (content ID in this embodiment). When the user selects content displayed on the user terminal 20, the user log further includes information indicating the selected content (content ID in this embodiment). The user logs for a predetermined period regarding multiple users acquired by the second acquisition unit 116 are stored in the user log storage unit 10c.

The second model generation unit 115 generates the above-described second model M2 based on the user log indicating the user's action results in response to content recommendations to the user. In this embodiment, the second model generation unit 115 can acquire the user log necessary for generating the second model M2 by referring to the user log storage unit 10c. The second acquisition unit 116 may acquire the user log timely each time the content providing service is used by the user, and store it in the user log storage unit 10c. Then, every time a new user log is stored in the user log storage unit 10c (or every predetermined period of time), the second model generation unit 115 uses the newly obtained user log to create the second model. 2 model M2 may be updated. According to such processing, the second model M2 can be appropriately updated at any time.

As described above, in the present embodiment, the second model M2 includes the parameter θa for each candidate content _a (see (Equation 1)). The second model generation unit 115 gives _a reward value ra to the candidate content a when the candidate content _a is selected by the target user, and based on the reward value ra and the user attribute vector z of the user, Update the parameter θ _a corresponding to candidate content a. According to such reinforcement learning, in a situation where there is no correct data as to whether or not the candidate content a should be presented to allow the target user to select the content, content suitable for recommendation to the target user can be obtained. A model can be generated that can derive a metric (user score “p _t,a ”) for determining (content that a user is likely to like).

A method of updating parameters by the above reinforcement learning will be described in detail. Here, as an example, an update method using the LinUCB algorithm described above will be described. First, the second model generation unit 115 sets parameters θ _a and A _a for each candidate content a shown in (Equation 1) based on the following (Equation 2) to (Equation 4).

Parameter b _a is a parameter included in parameter θ _a . (Formula 2) shows the initial value of the parameter _ba . “0 _d×1 ” in (Formula 2) is a zero vector of the same dimension (d) as the user attribute vector z. (Formula 3) shows the initial value of the parameter _Aa . “I _d ” in (Formula 3) is a d-th order unit matrix. The parameter _θa is calculated based on (Equation 4). As shown in Equations 2-4, the initial value of the parameter _θa is the zero vector.

The second model generation unit 115 calculates parameters A _a and Update (learn) b _a .

By (Formula 5), the information that "the target user having the user attribute vector z has viewed the candidate content _a (including the case where the content is not selected)" is added to the parameter Aa. Further, the information is reflected in the parameter _θa by (Equation 4).

By (Equation 6), the parameter b _a is updated based on the reward value r _a and the user attribute vector z of the target user. In addition, "update" here includes the case where the value does not change before and after the update process. By updating the parameter b _a , the parameter θ _a is also updated based on the reward value r _a and the user attribute vector z of the target user (see (Equation 4)).

Here, the second model generation unit 115 sets a larger reward value r _a when the candidate content a is selected by the target user having the user attribute vector z than when the candidate content a is not selected by the target user. Give. As an example, when the candidate content a is selected, the second model generation unit 115 sets the reward value ra of the candidate content _a to a value greater than zero. On the other hand, if the candidate content a is not selected, the second model generation unit 115 sets the reward value ra of the candidate content _a to zero. In other words, if the candidate content a is not selected, the second model generation unit 115 does not award the reward to the candidate content a.

As a result of the processing of the second model generation unit 115 described above, the higher the probability that the target user will select the candidate content a when the candidate content _{a is displayed, the more the utilization term “θ a} _T ^z ” is learned (updated) so as to increase.

The diversity score calculation unit 117 selects, from among the plurality of candidate contents, content that is highly likely to be preferred by the target user as the first content, and selects each second content other than the first content from among the plurality of candidate contents. Calculate a diversity score for your content. The diversity score is a score that increases as the degree of similarity between the first content and the second content decreases. In this embodiment, the first content is the content with the highest user score. Since the user score differs for each target user, the content with the maximum user score may differ for each target user. That is, the first content and the second content may differ for each target user.

(first example)
In this example, the diversity score calculation unit 117 acquires usage record information and calculates a diversity score for each second content with respect to the first content based on the usage record information. The usage history information is information about the usage history of each candidate content by the target user. In this example, the usage record information is a user log stored in the user log storage unit 10c. A specific example of a diversity score calculation method will be described below.

The diversity score calculation unit 117 first refers to the user log storage unit 10c to obtain information (content ID in this embodiment) indicating the content selected by the target user, which is associated with the user ID of the target user. Acquired as usage history information.

Then, the diversity score calculation unit 117 calculates a content feature amount (hereinafter referred to as "content feature amount"). A content feature is a relative feature between two candidate contents. As an example, the diversity score calculation unit 117 calculates content feature amounts for all possible combinations of two candidate contents for all candidate contents. Then, the diversity score calculation unit 117 calculates the similarity of each combination based on the calculated content feature amount of each combination.

As an example, the diversity score calculation unit 117 uses Matrix Factorization to calculate content feature amounts. Specifically, the diversity score calculation unit 117 uses Matrix Factorization to factorize the selection history matrix into a content feature matrix and a user feature matrix, which are factor matrices of the selection history matrix, and obtain a content feature matrix Based on, the content feature amount of each combination is calculated.

The diversity score calculation unit 117 uses the following (formula 7) as an example of application of matrix factorization.

In (Formula 7), "r _ui " indicates each element of the selection history matrix. The selection history matrix is a matrix showing the history of whether or not each user has selected each of the candidate contents. The selection history matrix is a matrix in which multiple users are arranged as row vectors (horizontal vectors) and multiple contents are arranged as column vectors (vertical vectors). That is, the selection history matrix is a matrix in which the history of whether or not the user has selected each content is represented for each user.

"x" indicates a row vector of the content feature matrix. The content feature matrix is a matrix in which a predetermined f-th feature dimension is arranged as a row vector (horizontal vector) and a plurality of contents are arranged as a column vector (vertical vector). Each element of the content feature matrix corresponds to a content feature amount. "y" indicates the row vector of the user feature matrix. The user feature matrix is a matrix in which a plurality of users are arranged as row vectors (horizontal vectors) and f-th order feature dimensions are arranged as column vectors (vertical vectors).

Diversity score calculation unit 117 creates row vectors “x” and “y” with random values, and satisfies (Equation 7) so that the content feature matrix and user feature matrix with the minimum values are obtained. Calculate accordingly. The first term in (Equation 7) is a term that contributes to factorization of the selection history matrix. That is, according to the first term of (Equation 7), the vectors "x" and "y" are adjusted so that the matrix obtained by multiplying the content feature matrix and the user feature matrix is approximately equal to the selection history matrix. It is determined. The second term in (Formula 7) is a regularization term for the content feature matrix and the user feature matrix. The second term of (Equation 7) prevents over-learning in (Equation 7). In this way, a content feature amount, which is each element of the content feature matrix, is obtained. Matrix Factorization using (Formula 7) is described in detail in "Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu, Yehuda Koren, Chris Volinsky, Data Mining 2008."

Fig. 4 is a schematic diagram for explaining the application of Matrix Factorization. In the example shown in FIG. 4, a selection history matrix r, a content feature matrix X, and a user feature matrix Y are shown. The content feature matrix X and the user feature matrix Y are factor matrices of the selection history matrix r factorized by applying Matrix Factorization using (Formula 7). In the selection history matrix r, users U1, U2, and U3 are arranged as row vectors, and candidate contents C1, C2, and C3 are arranged as column vectors. In the selection history matrix r, an element "1" of the selection history matrix indicates that the user selected the content, and an element "0" indicates that the user did not select the content. In the content feature matrix X, feature dimensions V1, V2, and V3 are arranged as row vectors, and candidate contents C1, C2, and C3 are arranged as column vectors. In the user feature matrix Y, users U1, U2, and U3 are arranged as row vectors, and feature dimensions V1, V2, and V3 are arranged as column vectors. By applying Matrix Factorization using the above (formula 7), the content feature quantity, which is each element of the content feature matrix X, is obtained.

Subsequently, the diversity score calculation unit 117 calculates the similarity of each combination based on the calculated content feature amount of each combination, and calculates the diversity score based on the calculated similarity. FIG. 5 is a schematic diagram for explaining a series of flows for calculating a diversity score. A method for calculating the similarity of each combination and the diversity score will be described below using the example shown in FIG.

First, in procedure P1 shown in FIG. 5, the diversity score calculator 117 calculates cosine similarity between the content feature matrix X and the transposed matrix XT of the content feature matrix X ( ^cosine similarity calculation process). As a result, a similarity matrix H is obtained. In the similarity matrix H, the candidate contents C1, C2, C3 are arranged as row vectors and column vectors. The i-th row and j-th column element of the similarity matrix H corresponds to the similarity between the candidate content corresponding to the i-th row and the candidate content corresponding to the j-th column. The similarity takes a value between -1 and 1, with a higher value indicating that the two candidate contents are more similar to each other. For example, in FIG. 5, the degree of similarity between candidate content C1 and candidate content C2 is "0.37". Note that the degree of similarity between the same candidate contents, such as the degree of similarity between candidate contents C1 and candidate contents C1, is the maximum possible value of "1.0".

Subsequently, as shown in FIG. 5, in procedure P2, the diversity score calculation unit 117 adds "1" to each element (similarity) of the similarity matrix H, A conversion is performed that takes the reciprocal of each element to which "1" is added.

Then, in procedure P3, the diversity score calculation unit 117 calculates a score f(x) corresponding to each combination using (Equation 8) below.

When the conversion of procedure P2 above is performed, each element after conversion takes a value between 0.5 and infinity. Therefore, the smaller the element (similarity) before conversion, the closer the element after conversion is to infinity. Diversity score calculation unit 117 inputs each element after conversion in procedure P2 as x into (Formula 8), so that each element after conversion in procedure P2 takes a value between 0 and 1. Convert. In other words, a normalized score can be obtained by performing the procedures P2 and P3 on the similarity. Note that the score for the combination of the same candidate content, such as the combination of the candidate content C1 and the candidate content C1, is the minimum value "0".

As described above, the diversity score calculation unit 117 calculates the score corresponding to each combination. Then, the diversity score calculation unit 117 refers to the user score for each candidate content received from the user score calculation unit 114, sets the candidate content with the highest user score as the first content, and sets the other candidate content as the second content. and Then, the diversity score calculation unit 117 refers to the score corresponding to each combination, and obtains the score for each second content with respect to the first content as the diversity score for each second content.

(Second example)
In this example, the diversity score calculator 117 acquires content attribute information and vectorizes the attributes of each candidate content. Then, the diversity score calculation unit 117 calculates a diversity score based on the distance of all possible combinations of two candidate contents. The content attribute information is information indicating attributes of content, and includes a plurality of items of predetermined content. Examples of content items include content genres, content sub-genres, and content prices.

As an example, the diversity score calculation unit 117 acquires the content IDs of all candidate content by referring to the content ID storage unit 10b, and stores the content attribute information corresponding to each content ID from the content attribute storage unit 10d. get. Content attribute information including each item as described above is stored for each content ID in the content attribute storage unit 10d. Diversity score calculation unit 117 vectorizes the attributes of each candidate content based on the content attribute information. Diversity score calculation unit 117, for example, determines each item according to a predetermined rule (for example, the content genre “baseball” is “0.1”, the content sub-genre “anime” is “0.3”, and the content price is “0 -500 yen" is set to "0.7"), by appropriately labeling (quantifying) content with content attributes such as (baseball, anime, 0-500 yen, ...) As a vector, we get a numerical vector (vector) such as (0.1, 0.3, 0.7, ...).

Subsequently, the diversity score calculation unit 117 calculates the distance between the vectors of the two candidate contents in each combination. Here, using an example of calculating the distance between candidate content C1 and candidate content C2, a method for calculating the distance between vectors of two candidate contents will be described. As an example, the vector for each item of candidate content C1 (baseball, anime, 0-500 yen, . . . ) is (0.1, 0.3, 0.4, . The vector for 1000-2000 yen,...) is (0.1, 0.5, 0.1,...). The diversity score calculator 117 calculates the distance d12 between the candidate content C1 and the candidate content C2 using the following (Equation ₉ ).

In (Formula 9), “C ₁ ” indicates the vector of candidate content C1, and “C ₂ ” indicates the vector of candidate content C2. The diversity score calculation unit 117 obtains the distance d12 by inputting the vector of the candidate content C1 and the vector of the candidate content C2 described above into (Equation ₉ ).

The diversity score calculation unit 117 calculates the distance between the vectors of the two candidate contents in each combination by using (Formula 9). Then, the diversity score calculation unit 117 may normalize each calculated distance using a conventionally known formula or the like so as to take a value between 0 and 1, for example. Then, the diversity score calculator 117 selects the first content in the same manner as in the first example, and determines the distance of each second content from the first content as the diversity score of each second content.

The diversity score calculation unit 117 outputs the calculated diversity score to the recommendation degree calculation unit 118 .

The recommendation level calculation unit 118 calculates the recommendation level for each candidate based on the user score calculated by the user score calculation unit 114 and the diversity score for each second content calculated by the diversity score calculation unit 117. Compute content. The recommendation degree calculation unit 118 calculates the recommendation degree so that the higher the user score of the content is, the higher the recommendation degree is, and the higher the diversity score of the content is, the higher the recommendation degree is. As an example, the recommendation level calculation unit 118 calculates the recommendation level of each candidate content using the following (Equation 10).

In (Formula 10), “P _t,a ” indicates the recommendation level of each candidate content. "d' _{pt,a_top,a} " indicates the diversity score of each secondary content relative to the primary content. "θ" is a constant that contributes to the weight of the first term and the weight of the second term, and is arbitrarily determined. By adjusting the magnitude of θ, it is possible to adjust the ratio between the degree of contribution of the user score and the degree of contribution of the diversity score to the degree of recommendation P _t,a .

The recommendation level calculation unit 118 calculates the recommendation level for each candidate content. Specifically, for the first content (candidate content with the highest user score among a plurality of candidate contents), the recommendation degree calculation unit 118 calculates the user score of the first content and the diversity score of the first content (= 0) into (Formula 10), the recommendation level of the first content is calculated. On the other hand, for the second content (candidate content other than the first content), the recommendation degree calculation unit 118 inputs the user score and the diversity score of the second content with respect to the first content into (Equation 10). A recommendation degree of each second content is calculated. Recommendation level calculation section 118 outputs information indicating the recommendation level of each candidate content to determination section 119 .

The determination unit 119 determines content to be presented to the target user based on the recommendation level calculated by the recommendation level calculation unit 118 . The determining unit 119 determines the content with the higher recommendation degree as the content to be presented to the user. As an example, the determining unit 119 ranks a plurality of candidate contents in descending order of recommendation, and determines the top N (N is a predetermined number) contents as contents to be presented to the target user. The determining unit 119 outputs information indicating the content to be presented to the target user and the order to the display control unit 120 .

FIG. 6 is a schematic diagram showing a series of flows from calculation of the degree of recommendation to determination of content to be presented to the user. The user score calculator 114 calculates the user score of each candidate content. In the example shown in FIG. 6, the user score " _pt,3 " is the highest. Therefore, in this case, the content indicated by the content ID "3" is the first content, and each other content is the second content. Diversity score calculation unit 117 calculates the diversity score of each second content with respect to the first content (content indicated by content ID “3”). Subsequently, the recommendation degree calculation unit 118 calculates the recommendation degree of each candidate content using (Formula 10). Then, the determination unit 119 ranks each candidate content based on the degree of recommendation, and determines the priority of presentation (order of presentation). It should be noted that the numbers shown in the "order of presentation" in FIG. 6 are content IDs. The higher the order of presentation, the higher the order of presentation (recommendation priority). In the example shown in FIG. 6, the content indicated by the content ID "3" has the highest recommendation level, and the content indicated by the content ID "2" has the second highest recommendation level.

The display control unit 120 causes the display unit 21 of the user terminal 20 of the target user to display the content list according to the content and order to be presented to the target user determined by the determination unit 119 .

Next, an example of the processing of the server 10 will be described with reference to the flowchart shown in FIG.

First, the first model generation unit 113 executes machine learning using the above-described teacher data to generate a plurality of first models M1 (first models M11 to M13 etc.) is generated (step S1).

Subsequently, the request accepting unit 111 accepts a request for access to the content providing service from the user terminal 20 (step S2). Subsequently, the first acquisition unit 112 acquires the preference information of the user (target user) of the user terminal 20 who is the source of the access request by referring to the user attribute storage unit 10a (step S3).

Specifically, the first acquisition unit 112 refers to the user ID of the target user, the user attribute information of the target user (more specifically, the basic information of the target user stored in the user attribute storage unit 10a, and usage information of the content providing service and other services) is input to each first model M1 generated by the first model generation unit 113 . Then, the first acquisition unit 112 acquires the item value (see FIG. 3) of each genre output from each first model M1 as the target user's preference information.

Subsequently, the second model generation unit 115 generates the second model M2 (see (Equation 1)) by machine learning using the information obtained from the user log described above (step S4). Specifically, the second model generation unit 115 receives the user attribute vector z acquired by the first acquisition unit 112 as an input, and when content is presented to the user, the content is selected by the user. A second model M2 is generated by reinforcement learning with a reward of .

Subsequently, the user score calculation unit 114 calculates a user score for each candidate content for the target user using the second model M2 generated by the second model generation unit 115 (step S5). Specifically, the user score calculation unit 114 inputs the user attribute vector z of the target user into the second model M2 corresponding to each candidate content (that is, (Equation 1) corresponding to each candidate content). , user scores p _t,1 to p _t,a of a plurality of candidate contents indicated by content IDs “1” to “a” as shown in FIG. 6 are calculated.

Subsequently, the diversity score calculation unit 117 sets the content with the highest user score as the first content, and calculates the diversity score for each other content (second content) based on the first content (step S6). For example, the diversity score calculation unit 117 acquires the usage record information of the target user by referring to the user log storage unit 10c, calculates the similarity of all candidate content combinations, and calculates the similarity of each combination. , the score represented by (Equation 8) is calculated for each combination (see FIGS. 4 and 5). Then, the diversity score calculation unit 117 refers to the score corresponding to each combination, and obtains the score for each second content with respect to the first content as the diversity score for each second content. In the example shown in FIG. 6, the diversity score calculation unit 117 calculates diversity scores d' ₁₃ to d' _a3 for each second content based on the first content (the content indicated by the content ID "3"). do.

Next, the recommendation degree calculation unit 118 calculates the user score calculated by the user score calculation unit 114 and the diversity score for each second content calculated by the diversity score calculation unit (diversity of the first content itself as a reference). The degree of recommendation for the target user is calculated for each candidate content (step S7). In the example shown in FIG. 6, the recommendation level P _t,1 for the target user is based on the user scores p _t,1 to p _t,a and the diversity scores d′ ₁₃ to d′ _a3 for each second content. ˜P _t,a is calculated for each candidate content (contents corresponding to content IDs “1” to “a”).

The determination unit 119 determines the content and order to be presented to the target user based on the recommendation level calculated by the recommendation level calculation unit 118 (step S8). As an example, the determining unit 119 ranks each candidate content in descending order of the degree of recommendation, and determines the first to N-ranked content as content to be presented to the target user.

Subsequently, the display control unit 120 causes the display unit 21 of the user terminal 20 of the target user to display a list of contents to be presented to the target user according to the content and the order to be presented to the target user determined by the determination unit 119 ( step S9).

Next, the second acquisition unit 116 acquires a user log indicating the user's action results in the content providing service (step S10). That is, the second acquisition unit 116 acquires the user log, which is result information indicating whether or not the user has selected the candidate content displayed on the display unit 21 of the user terminal 20 .

Subsequently, the second model generation unit 115 updates the second model M2 using (Equation 2) to (Equation 6) based on the user log acquired by the second acquisition unit 116 (step S11). Through the process of step S11, the second model M2 can be appropriately updated at any time. The above is an example of the processing of the server 10 . Note that the process of step S4 is omitted in the second and subsequent processes of the server 10 for the same target user.

In the server 10 (recommendation device) described above, based on the user score indicating the degree of user preference for each candidate content obtained based on preference information (user attribute information) and the diversity score for each second content Then, the degree of recommendation for each candidate content is calculated. The diversity score is a score that increases as the degree of similarity between the first content and the second content decreases. Content to be presented to the target user (user) is determined based on the degree of recommendation for each candidate content. According to the above configuration, in addition to the user score that reflects the user's preference for the content, the diversity score that indicates the degree of diversity with respect to the first content is considered, and the content to be recommended is determined. It is possible to present not only content that is presumed to be preferred by the user, but also content that is newly discovered by the user. In other words, it is possible to diversify recommendations by ensuring the possibility that content that is surprising to the user will be selected as content to be recommended. Therefore, according to the server 10, it is possible to provide a user with a more convenient recommendation service.

For example, in the method of determining the content to be presented to the user using a reinforcement learning algorithm such as the general bandit algorithm, optimization inference is performed to maximize the reward. For this reason, there is a risk that the content presented to the user will be fixed to content having characteristics similar to each other. On the other hand, in the server 10, not only content that is presumed to be preferred by the user, but also content that is surprising to the user is subject to recommendation. Therefore, according to the server 10, personalized and unexpected recommendations can be made.

Further, for example, if the content estimated by the optimization inference to be preferred by the user does not match the actual taste of the user, the content that is of interest to the user cannot be presented. likely not to be selected. In this case, it is difficult to provide a highly convenient recommendation service to the user. Providing a recommendation service that is not very convenient for the user leads to a decrease in the possibility that the user will revisit the content providing service. On the other hand, the server 10 ensures the possibility that even content with a high degree of diversity relative to the first content will be selected as content to be recommended. Even if the first content) is different from the user's actual preference, it is possible to increase the possibility of presenting the content that the user likes. As a result, it is possible to provide a highly convenient recommendation service to the user and increase the possibility that the user will revisit the content providing service.

The diversity score calculation unit 117 selects the first content from a predetermined number of content with the highest user scores. In this embodiment, the diversity score calculator 117 selects the content with the highest user score as the first content. According to the above configuration, content that is highly likely to be preferred by the user can be appropriately and easily selected as the first content based on the user score.

The recommendation degree calculation unit 118 calculates the recommendation degree of one content so that the higher the user score of the one content is, the higher the recommendation degree of the content is, and the higher the diversity score of the one content is, the higher the recommendation degree is. The content with the higher degree is determined as the content to be presented to the target user. According to the above configuration, for example, content with a low user score but a high diversity score is more likely to be selected as content to be recommended, so there is a possibility that content that is surprising to the user will be selected as content to be recommended. can be ensured.

As described in the first example above, the diversity score calculation unit 117 acquires a user log (usage record information) regarding the target user's use record for each candidate content, and based on the user log, for each second content Then, calculating a relative feature amount based on the first content, calculating the similarity of the second content to the first content for each second content based on the calculated feature amount, and calculating the similarity of the second content to the first content based on the similarity may be used to calculate a diversity score for each second content. According to the above configuration, it is possible to accurately obtain a diversity score indicating the degree of diversity with respect to the first content.

Alternatively, the diversity score calculation unit 117 acquires content attribute information indicating the attribute of content, as described in the second example, vectorizes the attribute of the first content and the attribute of each second content, A diversity score for each second content may be calculated based on the distance between the vector for the first content and the vector for each second content. According to the above configuration, the diversity score is calculated based on the attribute of the content pre-associated with the content, so the diversity score can be easily obtained.

The first acquisition unit 112 (user preference acquisition unit) acquires preference information about the preferences of the target user based on the user attribute information, and the user score calculation unit 114 calculates the preference information acquired by the first acquisition unit 112 ( A second model M2 (score estimation model) that inputs the user attribute vector z) and outputs the user score for each candidate content, generated by reinforcement learning that rewards the user for selecting the content A user score for each candidate content is calculated by using the obtained second model M2. According to the above configuration, since the user score is calculated using the second model M2 constructed by reinforcement learning, the user score can be obtained efficiently and accurately. Further, information (preference information) in which user attribute information has been processed to be suitable for calculating a user score is input to the second model M2 to calculate the user score, so the user score can be obtained with even higher accuracy. .

The first model generation unit 113 (model generation unit) performs machine learning using teacher data including user attribute information of the user and an index value (information about the user's preference) to determine the user attribute of the user. By inputting information and generating a first model M1 (preference estimation model) that outputs an estimated value of the user's preference information, the first acquisition unit 112 inputs the user attribute information to the first model M1, The output result from the first model M1 is acquired as preference information. According to the above-described configuration, the first model M1 is constructed to receive the attribute information of the user and output the estimated value of the preference information, and the preference information is acquired using the first model M1. can be obtained efficiently and accurately.

　User attribute information includes user usage information related to services other than the content providing service. According to the above configuration, it is possible to accurately estimate the content that the user prefers, even for a user who has not used the content providing service in a short time. Therefore, not only users who have used the content providing service for a long time but also users who have not used the content providing service for a long time can receive personalized and unexpected recommendations.

Note that the second model M2 may be a model that inputs user attribute information and outputs a user score for each candidate content. In that case, as an example, the first acquisition unit 112 acquires the user attribute vector z from the user attribute storage unit 10a, and the user score calculation unit 114 converts the user attribute vector z received from the first acquisition unit 112 into the second A user score is obtained by inputting into the model M2. In the user attribute storage unit 10a, a user attribute vector z obtained by digitizing and vectorizing various items such as user's basic information is stored for each user ID. For example, by appropriately labeling (quantifying) each item based on a predetermined rule (for example, setting male to "0" and female to "1" for gender), for example ( A numerical vector such as (0.4, 1.0, 0.7, . Also with the above configuration, the user score is calculated using the second model M2 constructed by reinforcement learning, so the user score can be obtained efficiently and accurately.

Also, the server 10 does not have to include the first model generation unit 113 . In that case, each first model M1 may be stored in the server 10 in advance, or may be stored in a server different from the server 10, for example. Moreover, the preference information may be generated by a method different from the method using the first model M1 generated by machine learning, and may be stored in advance in a server or the like different from the server 10 .

Also, the diversity score may be set such that the smaller the degree of similarity with the first content is (that is, the more distant the content is from the content of the first content), the larger the diversity score is. It is not limited to the first and second examples described above.

Also, the method of calculating the diversity score is not limited to this embodiment. For example, in the above second example, the diversity score calculation unit 117 does not need to calculate distances for all combinations of content. That is, the diversity score calculator 117 may select the first content and calculate only the distance of each second content to the selected first content as the diversity score.

Also, the first content does not necessarily have to be the content with the highest user score. As an example, the diversity score calculation unit 117 may select the first content from among a predetermined number of contents with the highest user scores (for example, contents with the first to third highest user scores). With the above configuration as well, it is possible to select content that is highly likely to be preferred by the user as the first content. Also, the first content may be selected using a criterion other than the user score described above. For example, the server 10 may acquire in advance information on typical content that the target user tends to like, based on the target user's content usage history or the like. In this case, the diversity score calculator 117 may set the representative content as the first content. Also, content other than the plurality of candidate content provided by the content providing service may be selected as the first content.

It should be noted that the block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are realized by any combination of at least one of hardware and software. Also, the method of implementing each functional block is not particularly limited. That is, each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.

Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't

For example, the server 10 in one embodiment of the present disclosure may function as a computer that performs the recommendation method of the present disclosure. FIG. 8 is a diagram illustrating an example of a hardware configuration of server 10 according to an embodiment of the present disclosure. The server 10 described above may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

In the following explanation, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the server 10 may be configured to include one or more of the devices shown in FIG. 8, or may be configured without some of the devices.

Each function in the server 10 is performed by causing the processor 1001 to perform calculations, controlling communication by the communication device 1004, controlling communication by the communication device 1004, and controlling the communication by the memory 1002 and It is realized by controlling at least one of data reading and writing in the storage 1003 .

The processor 1001, for example, operates an operating system and controls the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.

Also, the processor 1001 reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, the user score calculation unit 114, the diversity score calculation unit 117, and the like may be stored in the memory 1002 and realized by a control program operating in the processor 1001, and other functional blocks may be similarly realized. . Although it has been described that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.

The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program codes), software modules, etc. for implementing a communication control method according to an embodiment of the present disclosure.

The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .

The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like.

The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.

In addition, the server 10 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). A part or all of each functional block may be implemented by the hardware. For example, processor 1001 may be implemented using at least one of these pieces of hardware.

Although the present embodiment has been described in detail above, it is obvious to those skilled in the art that the present embodiment is not limited to the embodiments described herein. This embodiment can be implemented as modifications and changes without departing from the spirit and scope of the present invention defined by the description of the claims. Therefore, the description in this specification is for the purpose of illustration and explanation, and does not have any restrictive meaning with respect to the present embodiment.

The order of the processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in the present disclosure may be changed as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.

Input/output information may be stored in a specific location (for example, memory) or managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.

The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).

Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching along with execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.

Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.

In addition, software, instructions, information, etc. may be transmitted and received via a transmission medium. For example, if the software uses at least one of wired technology (coaxial cable, optical fiber cable, twisted pair, digital subscriber line (DSL), etc.) and wireless technology (infrared, microwave, etc.), the website, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.

The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of

In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented.

The names used for the parameters described above are not restrictive names in any respect. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure. The various names assigned to these various information elements are not limiting names in any way, as the various information elements can be identified by any suitable name.

The term "based on" as used in this disclosure does not mean "based only on," unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."

Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.

Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.

In this disclosure, if articles are added by translation, such as a, an, and the in English, the disclosure may include that the nouns following these articles are plural.

In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."

10... Server (recommendation device), 112... First acquisition unit (user preference acquisition unit), 113... First model generation unit (model generation unit), 114... User score calculation unit, 117... Diversity score calculation unit, 118 ... recommendation degree calculation unit, 119 ... determination unit, L ... preference information, M1, M11, M12, M13 ... first model (preference estimation model), M2 ... second model (score estimation model), U ... target user (user ).

Claims

A recommendation device for determining content to be presented to a user from among a plurality of candidate content in a content providing service for providing content to a user,
A diversity score, which is a score that increases as the degree of similarity between the first content selected as content likely to be preferred by the user and second content other than the first content among the plurality of candidate content decreases, increases. , a diversity score calculation unit that calculates for each second content;
A user score indicating the degree of preference of the user for each of the plurality of candidate contents, which is obtained based on user attribute information about the attribute of the user; and each of the second contents calculated by the diversity score calculation unit a recommendation degree calculation unit that calculates the degree of recommendation to the user for each of the plurality of candidate contents based on the diversity score of
a determination unit that determines content to be presented to the user based on the recommendation level calculated by the recommendation level calculation unit;
A recommendation device.
The recommendation device according to claim 1, wherein the diversity score calculation unit selects the first content from among a predetermined number of contents with the highest user scores.
The recommendation degree calculation unit calculates the recommendation degree of the one content so that the higher the user score of the one content is, the higher the recommendation degree is and the higher the diversity score of the one content is, the higher the recommendation degree is,
The recommendation device according to claim 1 or 2, wherein the determination unit determines content having a high recommendation level as content to be presented to the user.
The diversity score calculation unit,
Acquiring usage record information regarding the user's usage record for each of the plurality of candidate contents;
calculating a relative feature amount with respect to the first content for each of the second content based on the usage record information;
calculating a degree of similarity of the second content to the first content for each second content based on the calculated feature amount;
The recommendation device according to any one of claims 1 to 3, wherein the diversity score for each second content is calculated based on the degree of similarity.
The diversity score calculation unit,
Acquire content attribute information indicating content attributes,
vectorizing the attributes of the first content and the attributes of each of the second content;
The recommendation according to any one of claims 1 to 3, wherein the diversity score for each second content is calculated based on the distance between the vector for the first content and the vector for the second content. Device.
a score estimation model inputting the user attribute information and outputting the user score for each of the plurality of candidate content generated by reinforcement learning to reward user selection of the content The recommendation device according to any one of claims 1 to 5, further comprising a user score calculation unit that calculates the user score for each of the plurality of candidate contents by using the score estimation model.
a user preference acquisition unit that acquires preference information about the user's preference based on the user attribute information;
A score estimation model for inputting the preference information acquired by the user preference acquisition unit and outputting the user score for each of the plurality of candidate contents, the score estimation model rewarding selection of the contents by the user. A user score calculation unit that calculates the user score for each of the plurality of candidate contents by using the score estimation model generated by the reinforcement learning to be imparted, further comprising a user score calculation unit according to any one of claims 1 to 5 The recommendation device described in the item.
By executing machine learning using teacher data including the user attribute information of the user and information about the preference of the user, the user attribute information of the user is input and an estimated value of the preference information of the user is calculated. further comprising a model generation unit that generates a preference estimation model to be output,
8. The recommendation device according to claim 7, wherein said user preference acquisition unit acquires an output result from said preference estimation model as said preference information by inputting said user attribute information into said preference estimation model.
9. The recommendation device according to any one of claims 6 to 8, wherein said user attribute information includes usage information of said user regarding a service different from said content providing service.