KR20170079423A

KR20170079423A - Dynamic Noise Reduction Method based on Content Rating Distribution for Content Recommendation and Content Recommendation System Using Thereof

Info

Publication number: KR20170079423A
Application number: KR1020150189979A
Authority: KR
Inventors: 김베드로; 이지형; 김누리
Original assignee: 성균관대학교산학협력단
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2017-07-10

Abstract

The present invention relates to a dynamic noise cancellation method for content recommendation and a content recommendation system using the method. The method includes the steps of reflecting at least one of a content evaluation time difference and a number of contents evaluated by the user as a weight, calculating a rating distribution for each user, and removing the noise data by comparing the distribution with a predetermined value .

Description

TECHNICAL FIELD The present invention relates to a dynamic noise reduction method and a content recommendation system for content recommendation,

The present invention relates to a media content recommendation method and system, and more particularly, to a technique for filtering to remove noise of user rating data on media content such as movies.

The Recommendation System recommends services and items to users, and is widely used not only in shopping malls such as Amazon and CDNow but also in some shopping malls in Korea. Among these recommendation systems, collaborative filtering (Collaborative filtering), which is a recommendation method based on the opinions of others, is widely used (G. Adomavicius, and A. Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State Art and Possible Extensions, " Knowledge and Data Engineering, IEEE Transactions, Vol. 17, pp. 734-749, 2005). However, if the distorted data are reflected, such as for commercial purposes or intentional bad reviews, the accuracy of the recommendations may be poor. A typical example is rating data of users reflected on a movie recommendation site. A rating for a movie can be an important indicator because other users can choose a movie or be associated with a movie's hit. Therefore, in order to obtain more accurate reliability and accuracy, filtering for noise data is required. In the conventional noise elimination research, there is a case in which the score is removed without taking the user's tendency into consideration and information other than noise is removed. It is difficult to say that the data on which the recommendation system is based only reflects the user's preferences. That is, a kind of noise may be included in the data. In general, the noise of the recommendation system is defined by O'Mahony in two categories as follows. First, it is defined as related to collecting or inferring a user's preference with a natural noise. For example, errors that occur in the behavior of a user entering data. Next, it can be defined as malicious noise (Malicious Noise). This is caused by an action to influence the recommendation system by entering intentional information into the data. For example, a highly artificially high-ranking behavior is used to promote the value of a product on a recommendation system that is used for commercial purposes. In the existing collaborative filtering research, studies are being conducted to filter out data distortion caused by such malicious noise. As a representative study, it was noted that 1 or 10 of the evaluators who recorded the malicious ratings had a high percentage of the ratings, so that the evaluators who had a high proportion of 1 point and 10 points of the rating data were classified as malicious evaluators (MP O'Mahony, NJ Hurley, GCM Silvestre, "Detecting Noise in Recommender System Databases," Proceedings of the 11th International Conference on Intelligent User Interfaces, pp. 109-115, 2006) (Item Based Recommendation System for Filtering Noise Ratings), Proceedings of the Korea Information Science Society Conference, pp. 291-293, 2013). However, in the case of the existing research method, a considerable number of scores may be removed instead of noise.

SUMMARY OF THE INVENTION An object of the present invention is to improve reliability of data by removing noise data, to analyze the distribution of ratings by reflecting the change of rating influence degree, And to provide a dynamic noise removal method and a content recommendation system for content recommendation that provide reliable data in various data analysis.

According to an aspect of the present invention, there is provided a dynamic noise removal method for content recommendation. The method includes the steps of reflecting at least one of a content evaluation time difference and a number of contents evaluated by the user as a weight, calculating a rating distribution for each user, and removing the noise data by comparing the distribution with a predetermined value .

The method may further include generating a content recommendation collaboration filtering model using rating data obtained through the step of removing the noise data as learning and test data.

The method may further include collecting data regarding at least one of a user, a content, and a rating, and recommending the content using the content recommendation collaboration filtering model.

The step of reflecting the weighted value as the weighting value may be performed by:

(here,

Indicates original rating information,

Is a weight based on time difference.

At this point,

Means the time when the movie was evaluated,

Is a proportionality constant) to the content rating.

The step of reflecting the weighted value as the weighting factor may include:

(here,

Means the weight according to the number of movies viewed by the user.

(Reflecting the effect of the evaluation time difference) on the content rating.

The step of calculating a rating distribution for each user may include calculating an average and a standard deviation with respect to a rating point on which the weight is reflected,

Is expressed by the following equation

(

: Average,

: Number of movies rated by the user,

: User-rated

Second movie rating,

:Standard Deviation,

:

th

).

Wherein the step of comparing the distribution with a predetermined value to remove noise data further comprises the step of removing lower rating data from the higher rating data if the rating distribution average of each user is greater than the rating distribution average of all users, Is smaller than the average distribution of the average ratings of all users, the high rating data is removed more than the low rating data.

The step of recommending the content may include calculating a prediction rating for the content to be recommended and generating the recommended content information.

The step of recommending the content may further include receiving a content recommendation request from the user, determining whether to create a content recommendation model, and transmitting the generated content recommendation information to the user device.

The recommended content information may include a list sorted according to a prediction rating and an outline of the content.

According to another aspect of the present invention, there is provided a content recommendation system using a dynamic noise cancellation method for content recommendation. The system includes a content rating weight calculation unit that reflects a content evaluation time difference and a number of contents evaluated by a user as a weight, a noise rating calculator for calculating a rating distribution for each user and comparing the distribution with a predetermined value to remove noise data, And a collaborative filtering model generation unit for generating a content recommendation collaborative filtering model by using score data from which noise data is removed as learning and test data.

The system includes a content recommendation unit for generating a recommendation content using the content recommendation collaboration filtering model, a communication interface for receiving data on at least one of a user, a content and a rating from content sources and transmitting the recommendation content to a user device As shown in FIG.

The content rating weight calculation unit may calculate a difference between a current time point and a content evaluation point by

(here,

Indicates original rating information,

Is a weight based on time difference.

At this point,

Means the time when the movie was evaluated,

Is a proportional constant), and reflects the number of contents evaluated by the user and the evaluation time point difference using the following equation

(here,

Means the weight according to the number of movies viewed by the user.

Is reflected in the content rating by using the influence of the evaluation time difference).

Wherein the noise eliminator calculates an average and a standard deviation of each user for a rating point on which the weight is reflected,

Is expressed by the following equation

(

: Average,

: Number of movies rated by the user,

: User-rated

Second movie rating,

:Standard Deviation,

:

th

), And obtains a rating distribution for each user. If the average of ratings for each user is greater than the average rating distribution of all the users, the low rating data is further removed from the higher rating data, Is lower than the average of the rating distributions of the high score data, the high score data is further removed from the low score data.

According to the dynamic noise removal method and the content recommendation system for recommending contents of the present invention, the movie data may be filtered by reflecting the attribute information of the user appearing in the movie data, and a movie satisfying the users may be recommended using the filter.

1 is a diagram showing an example of a flowchart of a dynamic noise removal method for recommending contents of the present invention.
FIG. 2 is a view illustrating an example of a process of obtaining a rating distribution of each user and comparing with a predetermined value to remove noise.
3 is a diagram schematically illustrating a content recommendation service providing environment including a content recommendation system according to an embodiment of the present invention.
4 is a content recommendation flowchart in a content recommendation unit according to an embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

The terms including the first, second, etc. may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term " and / or " includes any combination of a plurality of related entry items or any of a plurality of related entry items.

When an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may be present in between. On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, Should not be construed to preclude the presence or addition of one or more other features, integers, steps, operations, elements, parts, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The description will be omitted.

The present invention relates to a system using a collaborative filtering method for media content recommendation, and in particular, to a method and system for removing noise in data for creating a content recommendation collaborative filtering model.

Collaborative filtering is a method that automatically predicts users' interests according to preference information obtained from users. There are two types of collaborative filtering: active filtering and passive filtering. Active filtering is based on the fact that people want to share information about their purchases with other people in a P2P way. Active filtering can generate reliable descriptions and ranks because people who have interest in the product in question have evaluated it. However, prejudice can be involved in the evaluation and there may be an initial evaluator problem and a cold start problem. Passive filtering is a method of collecting information implicitly and can remove certain changes from the analysis that appear in active filtering. For example, in passive filtering, everyone can automatically access the given data. Another collaborative filtering method is item-based filtering. Item-based collaborative filtering is based on the fact that most people tend to like products that are similar to those they liked in the past, and tend to dislike products that are similar to those that they did not like. This filtering method is a method of predicting the customer's preference by calculating the similarity between the existing products in which the customer inputs the preferences and the products to be predicted. The item-based collaborative filtering method uses the preferences of customers who input preferences in both products to calculate the similarity between the products. However, since the similarity between customers is not considered at all, if the evaluation is based on the evaluation of users who do not have similar preferences with a specific customer, the accuracy of the correlation between the products may be degraded and the prediction and recommendation abilities of the recommendation system may be degraded have. Thus, in the recommendation system, it is important to precisely predict the value of unevaluated points from the rating history.

In the present invention, noise-canceled data is used to generate a content recommendation collaboration filtering model using an item-based collaborative filtering method. In the present specification, movies are used as media contents for the sake of explanation. However, the scope of the present invention is not limited to movies but can be applied to all media contents.

Dynamic for content recommendation noise How to uninstall

1 is a diagram showing an example of a flowchart of a dynamic noise removal method for recommending contents of the present invention. Referring to FIG. 1, a dynamic noise removal method for recommending a content of the present invention includes collecting data on a user / content / rating (S11), reflecting a weight based on a difference in evaluation time (S12) (S13), and the weighted value of each user

(S14), noise data removal (S15), learning data and test data configuration (S16), content recommendation collaboration filtering model creation (S17), and test and recommendation movie rating prediction (S18).

First, data for generating a content recommendation model is collected (S11). For example, user information, movie information, and rating data for a movie are collected. The user information may include the user's sex, age, number of evaluations, and the like. The movie information may include a movie title, a genre, an opening date, a synopsis, and the like. The rating information may include rating points for evaluation fields such as amusement and artistry for the movie.

After the data is collected as described above, the data is calculated and reflected in order to reflect the influence of the rating according to the difference in the evaluation time (S12).

here,

Indicates original rating information,

Is a weight based on time difference.

At this point,

Means the time when the movie was evaluated,

Is a proportional constant.

And

Is converted into a day number using the equation (2), and is substituted.

(Unix Time: Seconds from January 1, 1970, 0:00:00, Conversion timestamp: Number of days since January 1, 1900, 9/24: Day of the week for GMT + 9 )

Equation (3) is an expression that reflects both the weight based on the difference between the number of movies watched by the user and the evaluation time point. here,

Quot; refers to a weight according to the number of movies viewed by the user.

Reflects the effect of the difference in evaluation time.

E.g,

Can reflect the weight by dividing the number of movies of the person who viewed the movie at the minimum and the person who watched the movie at the maximum into five sections of 0.1 to 0.5. Also

Expresses the difference between the current point and the oldest point of view in days and divides it into five sections of 0.1 ~ 0.5, so that the older the movie is, the less weight is given, and the more recent the evaluation point is, the higher the weight is. Table 1 is an example of weighting.

When the process of reflecting the weight according to the evaluation point-in-time and the number of movies is completed, the average and standard deviation of the rating data reflecting the weight values for each user are calculated (Equation 4). after that

(Equation 4) to obtain a rating distribution of each user (S14).

: Average,

: Number of movies rated by the user,

: User-rated

Second movie rating,

:Standard Deviation,

:

th

When a rating distribution is obtained for each user, the noise data is removed (S15). FIG. 2 is a view illustrating an example of a process of obtaining a rating distribution of each user and comparing with a predetermined value to remove noise. Referring to Figure 2,

When the distribution has a low rating distribution by comparing the distribution with a predetermined value, many high ratings are removed, and when the distribution has a high rating distribution, many low ratings are removed. In the case of an intermediate distribution, both ratings are removed (for example,

By comparing the distribution with the total user distribution, we can grasp the tendency of the distribution of the distribution of each user distribution,

Average for the value

Removes the information except the value corresponding to the range of values. That is, for users corresponding to a high rating distribution,

, And for users corresponding to a low rating distribution, the average

The value corresponding to the excess is removed). Unlike the existing extreme value removal method, the prediction performance can be improved because the noise data is removed by judging the user's rating tendency.

The rating data from which the noise data is removed is composed of training data and test data (S16). Then, a content recommendation collaboration filtering model to be learned using the training data is generated (S17). When the content recommendation collaboration filtering model is generated, the test can be performed using the test data to evaluate the performance of the model (S18).

When a content recommendation collaborative filtering model is generated, it is possible to recommend a content suitable for the user's tendency (S19).

Hereinafter, a content recommendation system using the dynamic noise removal method will be described.

Content recommendation system

3 is a diagram schematically illustrating a content recommendation service providing environment including the content recommendation system 10 according to an embodiment of the present invention. Referring to FIG. 3, a content recommendation system 10 in accordance with an embodiment of the present invention is connected to a content source 20 and a user device 40 via a network.

The content source 20 includes a website or the like where a user can purchase content and record a rating for the content. The user can purchase TV programs, movies, radio programs, and the like from such content sources.

The content recommendation system 10 includes a communication interface 100, a weight calculation unit 200, a noise removal unit 300, a collaboration filtering model generation unit 400, and a content recommendation unit 500. The communication interface 100 receives content-related data from the content source providing sites 20 or receives a content recommendation request from the user device 40. [ The content-related data may include, for example, content information such as genre, release date, synopsis, existing rating information, and the like in the case of movie content. The content recommendation request may include, for example, a request for recommendation of a new movie, a request for recommendation of a movie by genre, and the like. Also, it is possible to transmit the recommended content information to the user device 40 through the communication interface unit 100. The weight calculation unit 200 assigns weights to the collected rating data on the basis of the rating time difference and the number of movies evaluated by the user. The noise removing unit 300 removes noise data according to a tendency of a user-specific rating distribution with a weighted rating. The collaboration filtering model generation unit 400 generates a content recommendation collaboration filtering model using the noise-removed rating data as training data. Then, the content recommendation unit 500 calculates a predictive rating for the content using the model, and generates the recommended content information accordingly. 4 is a diagram showing an example of a content recommendation flowchart in the content recommendation unit of the present invention. Referring to FIG. 4, when the content recommendation request input from the user device 40 is received from the receiving unit (S41), the content recommendation unit 500 determines whether a content recommendation collaboration filtering model has been created (S42). If the content recommendation collaboration filtering model is not created, the content recommendation collaboration filtering model is created with the functions of the weight calculation unit 200, the noise removing unit 300 and the collaboration filtering model generating unit 400 according to the order of FIG. 1 (S43). When the content recommendation collaborative filtering model is generated, the predictive score for the content is calculated using the model (S44). Then, the recommended content information list is sorted or categorized according to the calculated predictive rating (S45). For example, if a user requests a recommendation for a newest movie, the user can generate a list from a movie having a high prediction rating ranking among the latest movies, starting with a movie. Alternatively, when a user requests a movie recommendation for each genre, the user can generate a list by sorting from a movie having a high prediction ranking ranking by genre such as action, romance, and fear. In addition, brief introduction information (for example, a synopsis of a movie, a reproduction time, and the like) about the recommended content can be generated. The communication interface 100 transmits the generated content recommendation information to the user device 40 (S46).

The user device 40 includes a PC, a tablet, a smart phone, an IPTV, etc. having an interface through which a user can purchase content and input a rating and content recommendation request thereto.

Performance evaluation Experimental Example

An example of the performance of the collaborative filtering model generated using the noise data removal method of the present invention is as follows.

For the experiment, MovieLens' 100k data set was used. This data set consists of 100,000 movie rating data, and 1682 movies were rated by 943 users (one with at least 20 ratings). Compared with the collaborative filtering model using existing noise data removal method, Mean Absolute Error (MAE) and coverage were used as the performance measurement technique.

Mean Absolute Error (MAE) is a technique for evaluating the accuracy of recommendation, which is calculated as the difference between the actual preference and the predicted value as shown in Equation (5).

Here, N is the total number of evaluation objects,

Is the predicted score,

Is the actual score. That is, the absolute value is taken as the difference between the predicted score and the actual score, and the sum is divided by the total number of the evaluation subjects. Mean Absolute Error (MAE) is the average of the error correction values, and they all have the same weight irrespective of the magnitude of the error. Therefore, the lower the value, the better the performance.

Coverage is a technique for evaluating the possibility of recommending various contents according to the user's preference.

That is, the recommended content set

A set of contents in which genre-specific information is not duplicated

The higher the value, the better the performance.

Table 2 below is a table for evaluating the performance of the content recommendation collaboration filtering model generated using the noise data removal method disclosed herein.

Pw1 : Removal of noise data through technique 1 (time effect * rating) and performance of recommended technique through collaborative filtering method

P _MF 1 : All data are classified into male and female. Method 1 is applied to remove noise data, and the combined data is then used as a collaborative filtering method. (The reasons for dividing male and female gender are male and female There may be a difference in tendency to add a rating)

Pw2 : Removing noise data by applying Technique 1 and Technique 2 and improving performance of recommended technique through collaborative filtering

P _MF 2 _: All data are classified into male and female, and techniques 1 and 2 are applied to remove noise data.

Pw3 : Remove noise data through Technique 1, Technique 2, Technique 3, and perform performance of recommended techniques through collaborative filtering

P _MF 3 : All data are classified into male and female, and the techniques 1 and 2 are applied to remove noise data.

: The larger the value of the proportional constant and the proportional constant, the smaller the influence of the old rating is reflected)

For the collaborative filtering model using the existing noise reduction method, the MAE figure of merit was 0.816 and the coverage diversity index was 0.056. The proposed method ( P _MF 3 ) has a MAE value of 0.474 and a coverage value of 0.045. Recommendation accuracy is improved compared to existing methods. Diversity index is decreased compared with existing method, but diversity performance index is generally decreased because recommendation accuracy is higher, and only the most suitable content is recommended for user. That is, it can be seen that the recommendation accuracy is higher than that of the conventional method.

10: Content recommendation system
100-1, 100-2: Communication interface 200: Weight calculation unit
300: Noise removing unit 400: Collaboration filtering model generating unit
500: content recommendation section
20: Content Sources
40: User device

Claims

A dynamic noise removal method for content recommendation, the method comprising:
Reflecting at least one of a content evaluation time difference and a number of contents evaluated by a user as a weight;
Calculating a rating distribution for each user; And
And comparing the distribution with a predetermined value to remove noise data.

The method according to claim 1,
And generating a content recommendation collaborative filtering model using the rating data obtained through the step of removing the noise data as learning and test data.

3. The method of claim 2,
Collecting data regarding at least one of a user, a content, and a rating; And
And recommending content using the content recommendation collaboration filtering model. &Lt; RTI ID = 0.0 > 11. < / RTI >

The method of claim 3,
The step of reflecting to the weighting value includes:
The difference between the current time and the content evaluation point is expressed by the following equation

(here,

Indicates original rating information,

Is a weight based on time difference.

At this point,

Means the time when the movie was evaluated,

Is a proportional constant)
To the content rating using the content rating.

5. The method of claim 4,
The step of reflecting to the weighting value includes:
The number of contents evaluated by the user and the evaluation time point difference are expressed by the following equations

(here,

Means the weight according to the number of movies viewed by the user.

Reflects the effect of the difference in valuation time point)
To the content rating using the content rating.

6. The method of claim 5,
The step of calculating a rating distribution for each user may include:
Calculating an average and a standard deviation for the scores for which the weights are reflected,

Is expressed by the following equation

(

: Average,

: Number of movies rated by the user,

: User-rated

Second movie rating,

:Standard Deviation,

:

th

)
And a dynamic noise removal method for recommending a content.

The method according to claim 6,
And comparing the distribution with a predetermined value to remove noise data,
If the average of the average ratings of each user is larger than the average of ratings of all users, the low rating data is further removed from the high rating data,
Wherein the high score data is further removed from the low score data if the average score distribution of each user is smaller than the average score distribution of all users.

8. The method of claim 7,
The step of recommending the content may include:
Calculating a predictive rating for the content subject to the recommendation; And
And generating recommended content information. &Lt; RTI ID = 0.0 > 11. < / RTI >

9. The method of claim 8,
The step of recommending the content may include:
Receiving a user's content recommendation request;
Determining whether a content recommendation model is generated; And
And transmitting the generated content recommendation information to the user device.

10. The method of claim 9,
Wherein the recommended content information includes an ordered list of predicted ratings and an overview of the content.

A system for recommending content using dynamic noise removal for content recommendation, the system comprising:
A content rating weight calculation unit that reflects the content evaluation time difference and the number of contents evaluated by the user as a weight;
A noise removing unit for calculating a rating distribution for each user and comparing the distribution with a predetermined value to remove noise data; And
And a collaborative filtering model generation unit for generating a content recommendation collaborative filtering model by using score data from which noise data is removed as learning and test data, using a dynamic noise removal method for content recommendation.

12. The method of claim 11,
A content recommendation unit for generating a recommendation content using the content recommendation collaboration filtering model; And
Further comprising a communication interface for receiving data relating to at least one of a user, content and a rating from content sources and transmitting the recommendation content to a user device.

13. The method of claim 12,
The content rating weight calculation unit may calculate,
The difference between the current time and the content evaluation point is expressed by the following equation

(here,

Indicates original rating information,

Is a weight based on time difference.

At this point,

Means the time when the movie was evaluated,

Is a proportional constant)
To reflect the content rating,
The number of contents evaluated by the user and the evaluation time point difference are expressed by the following equations

(here,

Means the weight according to the number of movies viewed by the user.

Reflects the effect of the difference in valuation time point)
Wherein the content recommendation system uses the dynamic noise removal method for content recommendation.

14. The method of claim 13,
Wherein the noise eliminator comprises:
Calculating an average and a standard deviation of each user for the rated score in which the weight is reflected,

Is expressed by the following equation

(

: Average,

: Number of movies rated by the user,

: User-rated

Second movie rating,

:Standard Deviation,

:

th

)
To obtain a rating distribution for each user,
If the average of the average ratings of each user is larger than the average of ratings of all users, the low rating data is further removed from the high rating data,
Wherein the high score data is further removed from the low score data if the average score distribution of each user is smaller than the average score distribution of all users.