TWI639092B - Music recommendation method and computer program product thereof - Google Patents

Music recommendation method and computer program product thereof Download PDF

Info

Publication number
TWI639092B
TWI639092B TW106114498A TW106114498A TWI639092B TW I639092 B TWI639092 B TW I639092B TW 106114498 A TW106114498 A TW 106114498A TW 106114498 A TW106114498 A TW 106114498A TW I639092 B TWI639092 B TW I639092B
Authority
TW
Taiwan
Prior art keywords
matrix
user
item
rating
items
Prior art date
Application number
TW106114498A
Other languages
Chinese (zh)
Other versions
TW201843601A (en
Inventor
蘇家輝
曾新穆
張為義
Original Assignee
正修學校財團法人正修科技大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 正修學校財團法人正修科技大學 filed Critical 正修學校財團法人正修科技大學
Priority to TW106114498A priority Critical patent/TWI639092B/en
Application granted granted Critical
Publication of TWI639092B publication Critical patent/TWI639092B/en
Publication of TW201843601A publication Critical patent/TW201843601A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一種音樂推薦方法,包含利用一電腦之處理器將一項目用戶播次矩陣轉換後,與一項目標籤頻率矩陣合併為一項目用戶標籤矩陣,依據一非負矩陣因式分解法計算而產生一基於非負分解矩陣,用以計算產生一基於社交內容優化的相似性矩陣,依據一藝人標籤頻率矩陣產生一藝人標籤驅動的相似性矩陣;及經由該處理器依據一用戶的數個未購項目產生數個目標項目,以該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣決定數個相關項目,依據該些相關項目以一評分預測演算法預測各未購項目的一評等,將各未購項目依據該評等排序產生一排名列表。 A music recommendation method, which includes using a computer's processor to convert a project user broadcast matrix, and merging with a project label frequency matrix into a project user label matrix, based on a non-negative matrix factorization method to generate a Decomposition matrix for calculating a similarity matrix based on social content optimization, generating an artist tag-driven similarity matrix based on an artist tag frequency matrix; and generating a number of unpurchased items based on a user through the processor Target items, use the similarity matrix optimized based on social content and the similarity matrix driven by the artist tags to determine several related items, and based on these related items, predict a rating of each unpurchased item with a scoring prediction algorithm. Each unpurchased item is ranked according to the rating to generate a ranking list.

Description

音樂推薦方法及其電腦程式產品 Music recommendation method and its computer program product

本發明係關於一種音樂推薦方法,特別是關於一種基於社交資訊的音樂推薦方法。 The invention relates to a music recommendation method, in particular to a music recommendation method based on social information.

多媒體技術日益進展,使得可用的音樂數據以爆炸性速度成長,難以高效地及有效地從大量音樂數據中獲取優選的音樂片段(即項目,Item)。為了解決這個問題,可適當增加音樂推薦(預測)內容,供用戶作為選擇依據。所謂的音樂推薦可指從用戶的使用日誌學習音樂喜好的一組預測演算法,這是一個具有挑戰性的問題,因為通常用戶的喜好被以顯性表意(explicit votes)及隱性表意(implicit votes)方式呈現。前者例如用戶的喜好音樂評等(numerical rating),而後者包含喜好音樂播放次數(play count)、標籤(tags)、評論(comments)、聆聽歷史(listening history)及其他類型的使用資訊(usage information)等。 The increasing advancement of multimedia technology has made available music data grow at an explosive rate, making it difficult to efficiently and effectively obtain preferred music pieces (ie, items) from a large amount of music data. In order to solve this problem, music recommendation (prediction) content can be appropriately added for the user to choose. The so-called music recommendation can refer to a set of predictive algorithms for learning music preferences from the user's usage log. This is a challenging problem, because the user's preferences are usually expressed by explicit votes and implicit expressions. votes). The former is, for example, the user ’s favorite music rating, while the latter includes favorite music play counts, tags, comments, listening history, and other types of usage information. )Wait.

舉例來說,習知的協同過濾(Collaborative Filtering,CF)方法,即可用來預測用戶的音樂喜好(preferences),該方法執行過程可分為一評等預測階段及一項目選擇階段。該項目選擇階段雖易完成,惟該評等預測階段仍有評等多樣性(Rating Diversity)、評等稀疏(Rating Sparsity)及缺乏評等資訊(Lack of Rating Information)等問題。以評等多樣性為例,表一顯示四個用戶對四種項目的評等矩陣,由於各用戶對同一項目的評等結果不一致,往往導致用戶喜好與預測結果的巨大差距。 For example, the conventional Collaborative Filtering (CF) method can be used to predict a user's music preferences. The execution process of the method can be divided into a rating prediction stage and an item selection stage. Although the selection phase of the project is easy to complete, there are still issues such as Rating Diversity, Rating Sparsity, and Lack of Rating Information in the prediction phase of the rating. Taking the diversity of ratings as an example, Table 1 shows the rating matrix of four users for four items. Because the results of different users' ratings of the same item are inconsistent, it often leads to a huge gap between user preferences and predicted results.

為了解決上述問題,遂發展出另一種改良式CF預測方法,使用隱性表意的用戶喜好,如社交媒體命中(social media hits)等,有越來越多的研究使用社交媒體的訊息來確認用戶的音樂喜好,而非使用傳統的評等(ratings),另採納用戶對音樂片段的相關訊息。值得注意的是,儘管社交媒體資訊已被證明是與用戶的喜好具有相關性,但是預測用戶對音樂片段相關性的方法仍未足夠健全,難以用於確定用戶對音樂的真正喜好。 In order to solve the above problems, another improved CF prediction method has been developed, which uses implicit user preferences, such as social media hits, and more and more researches use social media messages to confirm users. 'S music preferences, instead of using traditional ratings, also uses user-related information about music clips. It is worth noting that although social media information has proven to be relevant to users ’preferences, the method of predicting users’ relevance to music clips is still not robust enough to be used to determine users ’true preferences for music.

有鑑於此,有必要提供一種音樂推薦方法,以解決習知技術所存在的問題。 In view of this, it is necessary to provide a music recommendation method to solve the problems of the conventional technology.

本發明之主要目的在於提供一種音樂推薦方法及其電腦程式產品,可綜合標籤資訊及評等資訊作為推薦基礎,以便改善資訊不足導致推薦效果不佳,進而提升音樂推薦的運算有效性。 The main purpose of the present invention is to provide a music recommendation method and its computer program product, which can integrate label information and rating information as a basis for recommendation, so as to improve the recommendation effect due to insufficient information and improve the computational effectiveness of music recommendation.

為達上述之目的,本發明提供一種音樂推薦方法,可供在一推薦音樂之電腦執行,該方法可包含步驟:在該電腦之一儲存器中儲存一項目用戶播次矩陣、一項目標籤頻率矩陣及一藝人標籤頻率矩陣;利用該電腦之一處理器進行運算,將該項目用戶播次矩陣轉換為一項目用戶評等矩陣,將該項目用戶評等矩陣與該項目標籤頻率矩陣合併為一項目用戶標籤矩陣,將該項目用戶標籤矩陣依據一非負矩陣因式分解法計算而產生一基於非負分解矩陣,將該基於非負分解矩陣依據一相似度演算法計算而產生一基於社交內容優化的相似性矩陣,將該藝人標籤頻率矩陣依據該相似度演算法計算而產生一藝人標籤驅動的相似性矩陣;經由該處理器回應一用戶的查訪,依據該用戶的數個未購項目產生數個目 標項目,以該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣決定數個相關項目,依據該些相關項目以一評分預測演算法預測各未購項目的一評等,將各未購項目依據該評等排序而產生一排名列表;及由該電腦之一顯示器輸出該排名列表,用以推薦該排名列表中的數個音樂項目。 To achieve the above purpose, the present invention provides a music recommendation method that can be executed by a computer that recommends music. The method may include the steps of: storing an item user broadcast matrix and an item tag frequency in a memory of the computer Matrix and an artist ’s label frequency matrix; using one of the computer ’s processors to perform calculations, convert the project user broadcast matrix into a project user rating matrix, and merge the project user rating matrix with the project label frequency matrix into a Project user label matrix, the project user label matrix is calculated according to a non-negative matrix factorization method to generate a non-negative decomposition matrix, and the non-negative decomposition matrix is calculated according to a similarity algorithm to generate a similarity based on social content optimization Sex matrix, the artist tag frequency matrix is calculated according to the similarity algorithm to generate an artist tag-driven similarity matrix; the processor responds to a user ’s visit, and generates several items based on the user ’s unpurchased items Target items, using the similarity matrix optimized based on social content and the similarity matrix driven by the artist ’s tags to determine several related items, based on these related items, a rating prediction algorithm is used to predict a rating of each unpurchased item, and Each unpurchased item is ranked according to the rating to generate a ranking list; and the ranking list is output by a monitor of the computer to recommend several music items in the ranking list.

在本發明之一實施例中,該些相關項目以該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣的前k%的相關相目所決定,其中k為正數。 In one embodiment of the present invention, the related items are determined by the top k% related phases of the similarity matrix optimized based on social content and the similarity matrix driven by the artist tag, where k is a positive number.

在本發明之一實施例中,該處理器可依據一閾值將該項目用戶播次矩陣之元素轉換為該項目用戶評等矩陣之元素,該閾值的定義如下式所示:T=μ-τ*σ,其中T表示該閾值,μ表示該用戶的播放次數的一平均值,σ表示該用戶的播放次數的一標準差,τ是一權重參數。 In an embodiment of the present invention, the processor may convert the elements of the project user broadcast matrix to the elements of the project user rating matrix according to a threshold. The threshold is defined as follows: T = μ-τ * σ, where T represents the threshold, μ represents an average value of the user's play times, σ represents a standard deviation of the user's play times, and τ is a weight parameter.

在本發明之一實施例中,該閾值的範圍分為二等效子範圍,該二等效子範圍分別包含一評等數值集合,該二評等數值集合所含的評等數值的數量可不同;該權重參數可為0.5。 In one embodiment of the present invention, the range of the threshold is divided into two equivalent sub-ranges, and the two equivalent sub-ranges respectively include a set of rated values, and the number of rated values contained in the set of second rated values may be Different; the weight parameter can be 0.5.

另一方面,為達上述之目的,本發明另提供一種電腦程式產品,前述電腦程式產品經由電腦載入並執行後,該電腦能夠進行前述之音樂推薦方法。 On the other hand, in order to achieve the above purpose, the present invention also provides a computer program product. After the computer program product is loaded and executed by a computer, the computer can perform the aforementioned music recommendation method.

S1‧‧‧離線預處理步驟 S1‧‧‧Offline preprocessing steps

S2‧‧‧在線預測步驟 S2‧‧‧ Online prediction steps

T‧‧‧播放次數轉化成評等之閾值 T‧‧‧ Threshold for converting the number of playbacks into ratings

μ‧‧‧播放次數的平均值 μ‧‧‧Average number of playback times

σ‧‧‧播放次數的標準差 σ‧‧‧ standard deviation

τ‧‧‧權重參數 τ‧‧‧ weight parameter

第1圖:本發明實施例之音樂推薦方法之流程方塊圖。 Fig. 1: Flow chart of the music recommendation method according to an embodiment of the invention.

第2圖:本發明實施例之音樂推薦方法之播放次數轉化成評等的範例示意圖。 Fig. 2: A schematic diagram of an example of converting the playback times of the music recommendation method into ratings according to an embodiment of the present invention.

第3a圖:本發明實施例之音樂推薦方法之項目標籤頻率矩陣的範例示意圖。 Fig. 3a: An exemplary schematic diagram of the item label frequency matrix of the music recommendation method according to an embodiment of the invention.

第3b圖:本發明實施例之音樂推薦方法之項目用戶評等矩陣的範例示意圖。 Fig. 3b: An exemplary schematic diagram of a project user rating matrix of the music recommendation method according to an embodiment of the invention.

第3c圖:本發明實施例之音樂推薦方法之項目用戶標籤矩陣的範例示意圖。 Fig. 3c: An exemplary schematic diagram of the item user tag matrix of the music recommendation method according to an embodiment of the invention.

第4a圖:本發明實施例之音樂推薦方法之IF矩陣的範例示意圖。 Fig. 4a: An exemplary schematic diagram of the IF matrix of the music recommendation method according to an embodiment of the invention.

第4b圖:本發明實施例之音樂推薦方法之UTF矩陣的範例示意圖。 Fig. 4b: An exemplary schematic diagram of the UTF matrix of the music recommendation method according to an embodiment of the invention.

第5a圖:本發明實施例之音樂推薦方法之原始項目標籤頻率矩陣的範例示意圖。 Fig. 5a: An exemplary schematic diagram of the original item label frequency matrix of the music recommendation method according to an embodiment of the invention.

第5b圖:本發明實施例之音樂推薦方法之轉化藝人標籤頻率矩陣的範例示意圖。 Fig. 5b: An exemplary schematic diagram of a frequency matrix of transformed artist tags in the music recommendation method according to an embodiment of the present invention.

第6圖:本發明之OSCP、ATP、MMR實施例在RMSE方面的實驗結果示意圖。 Figure 6: A schematic diagram of the experimental results of the OSCP, ATP, and MMR embodiments of the present invention in terms of RMSE.

第7圖:本發明比較MMR實施例與習知推薦系統在RMSE方面的前k%實驗結果示意圖。 Figure 7: The present invention compares the top k% experimental results of the MMR embodiment and the conventional recommendation system in terms of RMSE.

第8圖:本發明比較MMR實施例與習知推薦系統在RMSE方面的整體實驗結果示意圖。 Figure 8: The present invention compares the overall experimental results of the MMR embodiment and the conventional recommendation system in terms of RMSE.

第9圖:本發明比較MMR實施例與習知推薦系統在NDCG方面的排名列表上的前N個測試項目的實驗結果示意圖。 Figure 9: The present invention compares the experimental results of the top N test items on the NDCG ranking list of the MMR embodiment and the conventional recommendation system.

第10圖:本發明以數據集1評估不同比例的訓練集合對RMSE影響的實驗結果示意圖。 Figure 10: A schematic diagram of the experimental results of the present invention for evaluating the effect of different proportions of training sets on RMSE using data set 1.

第11圖:本發明以數據集2比較MMR實施例與習知IB方法在RMSE方面的實驗結果示意圖。 Figure 11: The present invention compares the experimental results of the MMR embodiment and the conventional IB method in terms of RMSE with the data set 2.

第12圖:本發明以數據集2比較MMR實施例與習知RWR方法在NDCG方面的實驗結果示意圖。 Fig. 12: The present invention compares the experimental results of the MMR embodiment and the conventional RWR method in terms of NDCG with the data set 2.

第13圖:本發明比較NMFN實施例與習知IB方法在RMSE方面的實驗結果示意圖。 Figure 13: A schematic diagram of the experimental results of the present invention comparing the NMFN embodiment and the conventional IB method in terms of RMSE.

為了讓本發明之上述及其他目的、特徵、優點能更明顯易懂,下文將特舉本發明較佳實施例,並配合所附圖式,作詳細說明如下。再者,本發明所提到的方向用語,例如上、下、頂、底、前、後、左、右、內、外、側面、周圍、中央、水平、橫向、垂直、縱向、軸向、徑向、最 上層或最下層等,僅是參考附加圖式的方向。因此,使用的方向用語是用以說明及理解本發明,而非用以限制本發明。 In order to make the above and other objects, features, and advantages of the present invention more comprehensible, the preferred embodiments of the present invention will be specifically described below in conjunction with the accompanying drawings, which are described in detail below. Furthermore, the terms of direction mentioned in the present invention, such as up, down, top, bottom, front, back, left, right, inner, outer, side, surrounding, center, horizontal, horizontal, vertical, longitudinal, axial, Radial, most The upper layer or the lowermost layer, etc., are for reference only to the directions of the attached drawings. Therefore, the directional terminology is used to illustrate and understand the present invention, not to limit the present invention.

請參照第1圖所示,本發明音樂推薦方法實施例可供在一推薦音樂之電腦執行,該電腦可為具有數據處理功能的電子裝置,如:雲端運算平台(Cloud Computing Platform)、各式電腦(Computer)或可攜式資料處理設備(Portable Computing Apparatus)等,在本實施例中,該電腦系統可包含一處理器(processor)、一儲存器(memory)及一顯示器(display),其中該儲存器例如為各式硬碟等,其電性連接該處理器,其架構係本發明所屬技術領域中具有通常知識者可以理解,在此不再贅述,該儲存器可用於儲存該方法執行過程所需的資料庫,如:包含預存性資料或暫存性資料等;該處理器例如為中央處理器(CPU)等,其可執行一作業軟體,用以實現該方法的執行過程,如:資料格式轉換及資料特徵擷取等,惟不以此為限;該顯示器例如為LCD液晶螢幕等,其電性連接該處理器,其架構係本發明所屬技術領域中具有通常知識者可以理解,在此不再贅述,該顯示器可用以輸出該處理器處理後之結果資料,例如用以推薦排名列表中的數個音樂項目等。 Referring to FIG. 1, the music recommendation method embodiment of the present invention can be executed by a computer that recommends music. The computer can be an electronic device with data processing functions, such as a cloud computing platform (Cloud Computing Platform), various A computer (Computer) or a portable data processing device (Portable Computing Apparatus), etc. In this embodiment, the computer system may include a processor, a memory, and a display, wherein The storage is, for example, various hard drives, etc., which is electrically connected to the processor, and its architecture is understandable to those with ordinary knowledge in the technical field to which the present invention belongs, and will not be described here again. The storage can be used to store the execution of the method The database required for the process, such as: contains pre-stored data or temporary data, etc .; the processor is, for example, a central processing unit (CPU), etc., which can execute an operating software to implement the execution process of the method, such as : Data format conversion and data feature extraction, etc., but not limited to this; the display is, for example, an LCD liquid crystal screen, etc., which is electrically connected to the processor, and its architecture belongs to the present invention Those with ordinary knowledge in the technical field can understand and will not repeat them here. The display can be used to output the result data processed by the processor, for example, to recommend several music items in the ranking list.

請續參第1圖所示,本發明上述方法實施例可包含一離線預處理(Offline Preprocessing)步驟S1及一在線預測(Online Prediction)步驟S2。其中,該離線預處理步驟S1可滿足利用評等代表用戶喜好的需求,並可加快在線預測過程,對於利用評等代表用戶喜好的需求,可藉由統計性的計算方式,將一項目用戶播次矩陣(item-user playcount matrix)轉化成一項目用戶評等矩陣(item-user rating matrix),以便用於加速該在線預測步驟S2。在一實施例中,該離線預處理步驟S1可先將該轉化後的項目用戶評等矩陣及該項目標籤頻率矩陣合併成一個混合矩陣(hybrid matrix)。接著,將該混合矩陣經由執行如習知非負矩陣因式分解(Nonnegative Matrix Factorization,NMF)演算法,而優化成一基於非負分解矩陣(NMF-based matrix)。最後,可利用該基於非負分解矩陣和一藝人標籤頻率矩陣(Artist-Tag frequency matrix),此二相似性矩陣可基於計算項目的相似性,而產生包括一基於社交內容優化的相似性矩陣(Optimized Social-Content-based similarity matrix)及一藝人標籤驅動的相似性矩陣(Artist-Tag-Driven similarity matrix)。 Please refer to FIG. 1 again. The above method embodiment of the present invention may include an offline preprocessing step S1 and an online prediction step S2. Among them, the offline pre-processing step S1 can meet the needs of using ratings to represent user preferences, and can speed up the online prediction process. For the needs of using ratings to represent user preferences, a project user can be broadcast by statistical calculation. The item-user playcount matrix is transformed into an item-user rating matrix for use in accelerating the online prediction step S2. In one embodiment, the offline preprocessing step S1 may first merge the converted project user rating matrix and the project label frequency matrix into a hybrid matrix. Then, the mixed matrix is optimized into a non-negative matrix-based matrix (NMF-based matrix) by performing a conventional non-negative matrix factorization (NMF) algorithm. Finally, the non-negative decomposition-based matrix and an artist-tag frequency matrix can be used. The two similarity matrices can be based on calculating the similarity of items, and a similarity matrix (Optimized based on social content optimization) can be generated. Social-Content-based similarity matrix) and an artist-tag-driven similarity matrix.

請續參第1圖所示,在一實施例中,該在線預測步驟S2的執行階段的輸入可包含一未購項目(un-purchased item)、該基於社交內容優化的相似性矩陣(Optimized Social-Content-based similarity matrix)、該藝人標籤驅動的相似性矩陣(Artist-Tag-Driven similarity matrix)及該項目用戶評等矩陣(Item-User rating matrix),而輸出為一排名列表(ranking list)。整個過程從用戶的查訪(active user’s visit)開始,如:該用戶正在使用網站或執行電腦程式購買音樂等過程。接著,將該用戶的各未購項目視為一目標項目(target item),對於各個目標項目中,可藉由使用該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣,決定前k%最相關的項目,其中k為正數,例如20、21.5、25、30、32.5、35…等。根據該相關項目,隨後可預測各未購項目的評等。最後,可藉由被預測的評等排序所有未購項目,從而產生該排名列表,可作為相關應用之參考,如:作為推薦用戶購買項目之依據等。以下舉例說明本發明上述實施例的實施態樣,惟不以此為限。以下舉例說明上述方法實施例之離線預處理階段及該在線預測階段的實施方式,惟不以此為限。 Please continue to refer to FIG. 1. In an embodiment, the input of the execution stage of the online prediction step S2 may include an un-purchased item and the similarity matrix optimized based on social content (Optimized Social -Content-based similarity matrix), the artist-tag-driven similarity matrix (Artist-Tag-Driven similarity matrix) and the item user rating matrix (Item-User rating matrix), and the output is a ranking list (ranking list) . The whole process starts with the user's visit (active user ’s visit), for example, the user is using a website or running a computer program to purchase music. Then, each unpurchased item of the user is regarded as a target item. For each target item, the similarity matrix optimized based on social content and the similarity matrix driven by the artist tag can be used to determine The top k% of the most relevant items, where k is a positive number, such as 20, 21.5, 25, 30, 32.5, 35 ... etc. Based on this related item, the rating of each unpurchased item can then be predicted. Finally, all unpurchased items can be sorted according to the predicted ratings, thereby generating the ranking list, which can be used as a reference for related applications, such as: a basis for recommending users to purchase items. The following examples illustrate the implementation of the above embodiments of the present invention, but not limited thereto. The following examples illustrate the implementation of the offline pre-processing phase and the online prediction phase of the above method embodiments, but not limited thereto.

舉例而言,在離線預處理階段主要的作業有:(1)評等轉換及(2)建構該基於社交內容優化的相似性矩陣及藝人標籤驅動的相似性矩陣,詳述如後。 For example, the main tasks in the offline pre-processing stage are: (1) rating conversion and (2) constructing the similarity matrix optimized based on social content and the similarity matrix driven by artist tags, as described in detail later.

以該評等轉換作業而言,通常對於一音樂推薦系統,用戶的喜好可以通過兩種類型表示,即顯性表意及隱性表意。但從可用性的角度來看,對於用戶來說,使用隱性表意比顯性表意更方便,因為用戶不需要在非評等推薦系統中給出評等作為顯性表意,但用戶於使用過程即可留下相關資訊作為隱性表意。結果是,對於一個基於社交資訊的推薦系統,如何描述使用隱式表意的用戶喜好是一個重要的問題,如:用戶的播放次數可被視為隱式表意。根據這種觀點,本發明的實施例提出將該播放次數表達為評等空間的公式,以滿足用評等代表用戶喜好的需求。 In terms of the conversion task of rating, usually for a music recommendation system, the user's preference can be expressed by two types, namely explicit and implicit. However, from a usability point of view, it is more convenient for users to use implicit expressions than explicit expressions, because users do not need to give ratings as explicit expressions in non-rating recommendation systems, but users are in the process of using Relevant information can be left as an implicit expression. As a result, for a recommendation system based on social information, how to describe user preferences using implicit ideology is an important issue. For example, the number of user plays can be regarded as implicit ideology. From this point of view, the embodiments of the present invention propose a formula for expressing the number of playbacks as a rating space to meet the requirement of using ratings to represent user preferences.

第2圖顯示將播放次數轉化成評等的範例,整個過程包括三 個子步驟,在第一個子步驟中,對於每個用戶,所述播放次數可由一閾值(threshold)T分為兩個範圍,該閾值T的定義如下式(1)所示:T=μ-τ*σ, (1)其中T表示該閾值,μ表示該播放次數的一平均值(mean),σ表示該播放次數的一標準差(standard deviation),τ是一權重參數(weight parameter)。上式(1)被稱為傳統表意(traditional votes),其中該項目的播放次數越多,表示用戶對該項目的興趣越高,具體而言,對於用戶,她/他的播放次數平均值提供他/她的平均喜好。因此,比該播放次數的標準差低者被視為的表示負向喜好,而比該播放次數的標準差高者被認為是表示正向喜好。 Figure 2 shows an example of converting the number of plays into a rating. The whole process includes three In the first sub-step, for each user, the number of plays can be divided into two ranges by a threshold T, which is defined by the following formula (1): T = μ- τ * σ, (1) where T represents the threshold value, μ represents a mean value of the number of playback times, σ represents a standard deviation of the number of playback times, and τ is a weight parameter. The above formula (1) is called traditional votes, where the more times the item is played, the higher the user ’s interest in the item. Specifically, for the user, the average number of her / his play times provides His / her average preference. Therefore, those with a lower standard deviation than the number of play times are considered to indicate negative preferences, while those with a higher standard deviation than the number of play times are considered to indicate positive preferences.

在第二個子步驟中,將低於T的範圍進一步分為兩個等效子範圍,相對於範圍數字集合{1,2},而高於T的範圍分為相對於範圍數字集合{3,4,5}。在第三個子步驟中,如果某個播放次數在一個特定的範圍內,它可以被轉換成被引用的範圍數字。在這個過程中,τ的決定是基於本發明揭露的評等系統收集的實際評等數據,在一實施例中,藉由參考實際數據分佈,可將τ的實驗數據設置為0.5。最後,可將項目用戶評等矩陣構建為該在線預測階段的輸入源之一。 In the second substep, the range below T is further divided into two equivalent subranges, relative to the range number set {1,2}, and the range above T is divided into the range number set {3, 4,5}. In the third sub-step, if a certain number of plays is within a certain range, it can be converted into the quoted range number. In this process, the determination of τ is based on the actual rating data collected by the rating system disclosed by the present invention. In one embodiment, by referring to the actual data distribution, the experimental data of τ can be set to 0.5. Finally, the project user rating matrix can be constructed as one of the input sources for this online prediction stage.

以下舉例說明該基於社交內容優化的相似性矩陣的建構方式,在一實施例中,可通過整合播放次數及標籤資訊來構建一種基於社交內容優化的相似性矩陣,意圖讓使用標籤資訊作為評等(播放次數)資訊的補充。因此,此階段的第一操作過程是組合播放次數及標籤的資訊,在此操作過程中,該標籤資訊可由該項目標籤頻率矩陣表示,而該播放次數資訊可由該項目用戶播次矩陣表示,在將該項目用戶播次矩陣轉換為該項目用戶評等矩陣後,可將項目標籤頻率矩陣及該項目用戶評等矩陣合併到一個命名為一項目用戶標籤(Item-User-Tag,IUT)矩陣的混合資訊矩陣中,其定義可如後所述的定義1,需注意的是,該項目標籤頻率矩陣及該項目用戶評等矩陣兩者中的元素值是正規化項目標籤頻率值及正規化項目用戶評等值。 The following example illustrates the construction of the similarity matrix based on social content optimization. In one embodiment, a similarity matrix based on social content optimization can be constructed by integrating the number of plays and tag information, with the intention of using tag information as a rating (Playing times) information supplement. Therefore, the first operation process at this stage is to combine the number of plays and the tag information. During this operation, the tag information can be represented by the item tag frequency matrix, and the play number information can be represented by the item user broadcast matrix. After converting the project user broadcast matrix to the project user rating matrix, the project tag frequency matrix and the project user rating matrix can be merged into a matrix named Item-User-Tag (IUT) matrix In the mixed information matrix, the definition can be as defined below. It should be noted that the element value in the item label frequency matrix and the item user rating matrix are the normalized item label frequency value and the normalized item User rating.

定義1: Definition 1:

假設在資料庫中有一定數量的獨特用戶U={u1,u2,...,u|U|},用以表示{用 戶1,用戶2,...,用戶|U|}的集合;且有一定數量的獨特項目I={itm1,itm2,...,itmi,...,itm|I|},用以表示{項目1,項目2,...,項目i,...,項目|I|}的集合,i是範圍從1到正整數I的編號值;及有一定數量的獨特標籤T={tag1,tag2,...,tagj,...,tag|T|},用以表示{標籤1,標籤2,...,標籤j,...,標籤|T|}的集合,j是範圍從1到正整數T的編號值。若給定一項目用戶評等矩陣IURI→U及一項目標籤頻率矩陣ITFI→U,I→U用以表示I集合中的每一項目對U集合中的所有用戶的排列向量,該項目用戶標籤矩陣可被定義為:IUTI→UT[vi,m]其中,UT是U集合及T集合組合後的集合,I→UT表示I集合中的所有項目對UT集合中的所有元素的排列組合,IUTI→UT表示該項目用戶標籤(IUT)矩陣,vi,m用以表示該項目用戶標籤矩陣IUTI→UT中的任一元素(如v1,1等),v是範圍從0到1的正規化值,|UT|=|U|+|T|且0<m≦|UT|,例如:第3a、3b、3c圖分別揭示一項目標籤頻率矩陣(item-tag frequency matrix)、一項目用戶評等矩陣(item-user rating matrix)、一項目用戶標籤矩陣(Item-UserTag matrix)的例子。 Suppose there are a certain number of unique users U = {u 1 , u 2 , ..., u | U | } in the database to represent {user 1 , user 2 , ,,, user | U | } Collection; and there are a certain number of unique items I = {itm 1 , itm 2 , ..., itm i , ..., itm | I | }, used to represent {item 1 , item 2 , ..., item i , ..., items | I | }, i is a numbered value ranging from 1 to a positive integer I; and there are a certain number of unique tags T = {tag 1 , tag 2 , ..., tag j , ..., tag | T |}, {tag to indicate 1, tag 2, ..., J tags, ..., tags | T |} is set, j is a positive integer ranging from 1 to the number of T value. If an item user rating matrix IUR I → U and an item label frequency matrix ITF I → U are given , I → U is used to represent the permutation vector of each item in the I set to all users in the U set. The user tag matrix can be defined as: IUT I → UT [v i, m ] where UT is the set of the combination of U set and T set, and I → UT represents all items in the I set to all elements in the UT set Permutation and combination, IUT I → UT represents the project user label (IUT) matrix, v i, m is used to represent any element in the project user label matrix IUT I → UT (such as v 1,1, etc.), v is the range Normalized values from 0 to 1, | UT | = | U | + | T | and 0 <m ≦ | UT |, for example: Figures 3a, 3b, and 3c reveal an item-tag frequency matrix (item-tag frequency matrix), an item-user rating matrix, and an example of an item-user tag matrix.

在產生該項目用戶標籤矩陣以後,此階段的第二操作過程可執行NMF演算法,用以近似稱為基於NMF矩陣的較佳資訊矩陣。實際上,NMF是一種習知的矩陣因式分解方法,其係所屬技術領域中具有通常知識者可以理解,該方法主要用在非負元素矩陣的因式分析,因此,此操作過程可降低資料結構之維度,以發現潛在的因素,使分別預測更有效。基於該定義1,在該操作過程中,該項目用戶標籤矩陣可近似為兩個子矩陣的乘積,如下所示:IUT IUT [v i,m ] IF IF [iv i,f ].UTF UTF [utv m,f ] T 其中,IF及UTF表示該二因子矩陣,各項目i及用戶m是由一因子向量集合F,0<f≦|F|建模而成,該二矩陣的元素都是非負元素,如上所述,vi,m用以表示該項目用戶標籤矩陣IUTI→UT中的任一元素,依此類推,ivi,f用以表示該因子矩陣IFI→F中的任一元素,utvm,f用以表示該因子矩陣UTFUT→F中的任一元素,其餘參數定義同上。 After the user tag matrix of the project is generated, the second operation process at this stage can execute the NMF algorithm to approximate the preferred information matrix based on the NMF matrix. In fact, NMF is a conventional matrix factorization method, which is understood by those with ordinary knowledge in the technical field to which it belongs. This method is mainly used for factor analysis of non-negative element matrix. Therefore, this operation process can reduce the data structure Dimension to discover potential factors and make separate predictions more effective. Based on the definition 1, during this operation, the project user label matrix can be approximated as the product of two sub-matrices, as follows: IUT IUT [ v i , m ] IF IF [ iv i , f ]. UTF UTF [ utv m, f ] T where IF and UTF represent the two-factor matrix, each item i and user m are modeled by a set of one-factor vector F, 0 <f ≦ | F |, the two The elements of the matrix are all non-negative elements. As mentioned above, v i, m is used to represent any element in the item user label matrix IUT I → UT , and so on, and iv i, f is used to represent the factor matrix IF I → any element in F , utv m, f is used to represent any element in the factor matrix UTF UT → F , and the remaining parameters are defined as above.

在此過程中,該目標函數可如下式(2)所示: 其中,上式(2)中的參數定義同上,於此不再贅述。上式(2)可利用兩種迭代更新演算法進行最小化,其係所屬技術領域中具有通常知識者可以理解,在此不再贅述,其結果可如下所示: 其中,上式中的參數定義同上,於此不再贅述。 In this process, the objective function can be shown as the following formula (2): The definition of the parameters in the above formula (2) is the same as above, and will not be repeated here. The above formula (2) can be minimized using two iterative update algorithms, which can be understood by those with ordinary knowledge in the technical field to which it belongs, and will not be repeated here. The result can be as follows: Among them, the parameter definition in the above formula is the same as above, and will not be repeated here.

根據NMF演算法,該IUT可被分解成兩個因子矩陣包括IF和UTF,該IF即為基於NMF的矩陣,用於下一步操作過程。在一實施例中,|F|可被設為1000。第4a及4b圖中顯示基於第3a、3b及3c圖中示例的因子矩陣的例子。 According to the NMF algorithm, the IUT can be decomposed into two factor matrices including IF and UTF. The IF is a matrix based on NMF, which is used in the next operation process. In an embodiment, | F | may be set to 1000. Figures 4a and 4b show examples based on factor matrices illustrated in figures 3a, 3b and 3c.

在基於NMF矩陣的基礎上,該子階段的最後操作過程是利用項目之間的相似性計算,而產生該基於社交內容優化的相似性矩陣,該基於社交內容優化的相似性矩陣可用於降低在線預測成本,該項目相似性可定義於如下所述的定義2。 Based on the NMF matrix, the final operation of this sub-stage is to use the similarity calculation between items to generate the similarity matrix based on social content optimization. The similarity matrix based on social content optimization can be used to reduce online For projected costs, the similarity of the project can be defined in Definition 2 described below.

定義2: Definition 2:

根據上述定義的獨特項目|I|及該基於NMF矩陣中列示的|F|,該項目x及y的特徵向量可分別表示為{ivx,1,ivx,2,...,ivx,|F|}及{ivy,1,ivy,2,...,ivy,|F|},因此,介於itmx與itmy之間的基於社交內容的優化相似性可定義如下式(3)所示: 其中,OSCsim(itmx,itmy)表示項目x(即itmx)、y(即itmy)之間基於社交內容的優化相似性(Optimized Social-Content-based similarity)之值,ivx,f用以表示以項目x代入該因子矩陣IFI→F中的任一元素(即ivi,f,i=x),ivy,f用以表示以項目y代入該因子矩陣IFI→F中的任一元素(即ivi,f,i=y),其餘參數定義同上。 According to the unique item | I | defined above and the | F | based on the NMF matrix, the feature vectors of the item x and y can be expressed as {iv x, 1 , iv x, 2 , ..., iv x, | F | } and {iv y, 1 , iv y, 2 , ..., iv y, | F | }, therefore, the similarity of optimization based on social content between itm x and itm y can be The definition is shown in the following formula (3): Among them, OSCsim (itm x , itm y ) represents the value of Optimized Social-Content-based similarity between projects x (ie itm x ) and y (ie itm y ), iv x, f It is used to indicate that any element in the factor matrix IF I → F is substituted by the item x (ie, iv i, f , i = x), and iv y, f is used to indicate that the factor matrix IF I → F is substituted by the item y Any element (ie ivi , f , i = y), the remaining parameters are defined as above.

根據定義2,該項目之間的相似性可以被計算為基於社會內容優化相似性矩陣的元素,該相似性矩陣可被定義如定義3。 According to definition 2, the similarity between the items can be calculated as elements of the similarity matrix optimized based on social content, and the similarity matrix can be defined as in definition 3.

定義3: Definition 3:

根據定義1及定義2的獨特項目|I|,該基於社交內容優化的相似性矩陣可定義如下式所示:OSC II [OSCsim x,y ]其中,OSCsim x,y表示上述項目x,y之間的基於社交內容的優化相似性之值,其定義如上式(3),表示該基於社交內容優化的相似性矩陣OSCI→I的任一元素。表二是由上述實施例基於社交內容優化的相似性矩陣的例子,其中itm1、itm2、itm3、itm4為不同項目編號。 According to the unique items | I | of Definition 1 and Definition 2, the similarity matrix based on social content optimization can be defined as follows: OSC II [ OSCsim x, y ] where OSCsim x , y represents the above item x , The value of the optimized similarity based on social content between y, which is defined as formula (3) above, represents any element of the similarity matrix OSC I → I optimized based on social content. Table 2 is an example of the similarity matrix optimized based on social content by the above embodiment, where itm 1 , itm 2 , itm 3 , and itm 4 are different item numbers.

以下舉例說明該藝人標籤驅動相似性矩陣的建構方式,在此例中,項目標籤資訊的另一個價值是可以提高音樂推薦做為藝人資訊(information of artists)。經由觀察可知,大多數用戶的音樂喜好對某些特定的藝人而言相當穩定,亦即,從優選的藝人挖掘用戶的音樂喜好是有效。因此,可利用藝人的標籤作為暗示用戶喜好的有用信息。在此子階段的第一操作過程是產生藝人標籤頻率矩陣,該矩陣可以定義如下所述的定義4。 The following example illustrates the construction method of the artist tag-driven similarity matrix. In this example, another value of the item tag information is that it can improve music recommendation as information of artists. It can be seen from observation that most users 'music preferences are quite stable for certain artists, that is, it is effective to mine users' music preferences from preferred artists. Therefore, the artist's label can be used as useful information to suggest the user's preferences. The first operation in this sub-stage is to generate an artist tag frequency matrix, which can be defined as defined in the following definition 4.

定義4: Definition 4:

基於定義1,假設在資料庫中有n個獨特的藝人(artist)AT={art1,art2,...,arta,...,artn},用以表示{藝人1,藝人2,...,藝人|U|}的集合,a是範圍從1到正整數n的編號值,且單一藝人arta包含一組可執行項目itmz,其中art a =∪itm z ,itm z I。給定該項目標籤頻率(Item-Tag Frequency,ITF)矩陣ITF IT [itf i,j ],該藝人標籤頻率矩陣可定義如下:ATX ATT [atf a,j ] Based on definition 1, it is assumed that there are n unique artists in the database AT = {art 1 , art 2 , ..., art a , ..., art n }, which is used to represent {artist 1 , artist 2 , ..., artist | U | }, a is a numbered value ranging from 1 to a positive integer n, and a single artist art a contains a set of executable items itm z , where art a = ∪ itm z , itm z I. Given the item-tag frequency (ITF) matrix ITF IT [ itf i, j ], the artist tag frequency matrix can be defined as follows: ATX ATT [ atf a, j ]

其中,AT→T表示AT集合中的所有藝人對T集合中的所有標籤的排列組合,atfa,j表示該藝人標籤頻率矩陣ATXAT→T中的任一元素(如atf1,1等),表示該項目標籤頻率矩陣中的元素itfi,j中的i以itmz代入之值,其餘參數定義同上。 Among them, AT → T means the arrangement and combination of all the artists in the AT set to all the tags in the T set, atf a, j means any element in the artist ’s tag frequency matrix ATX AT → T (such as atf 1,1, etc.) , It means the value of the element itf i in the label frequency matrix of the item, i in j is substituted by itm z , and the remaining parameters are defined as above.

第5a及5b圖為用於說明如何產生藝人標籤頻率矩陣的例子,第5a圖為原始項目標籤頻率矩陣的藝人與標籤的表列示意圖,第5b圖為轉換後的藝人與標籤的表列示意圖。在第3a、3b及3c圖所示的實例中,假設資料庫中有兩個獨特的藝人及四個獨特的標籤,其中,art1={itm1,itm2}且art2={itm3,itm4}。根據定義4,art1的標籤特徵向量為{(1+1)/2,(0.8+1)/2,(0+0.2)/2,(0.6+1)/2}={1,0.9,0.1,0.8},同理,art2的標籤特徵向量則為{1,0.5,0.3,0}。 Figures 5a and 5b are examples for explaining how to generate the artist label frequency matrix. Figure 5a is a schematic diagram of the list of artists and labels of the original project label frequency matrix. . In the examples shown in Figures 3a, 3b, and 3c, suppose there are two unique artists and four unique tags in the database, where art 1 = {itm 1 , itm 2 } and art 2 = {itm 3 , itm 4 }. According to definition 4, the label feature vector of art 1 is {(1 + 1) / 2, (0.8 + 1) / 2, (0 + 0.2) / 2, (0.6 + 1) / 2} = {1,0.9, 0.1,0.8}. Similarly, the label feature vector of art 2 is {1,0.5,0.3,0}.

在產生藝人標籤頻率矩陣之後,該藝人標籤驅動相似度矩陣可以利用計算項目之間的相似度而推導得知,其係定義於如後所述的定義5。 After the artist tag frequency matrix is generated, the artist tag driving similarity matrix can be derived by calculating the similarity between items, which is defined in Definition 5 as described later.

定義5: Definition 5:

假設arta、artb中各別包含itmx、itmy,基於藝人標籤頻率矩陣,利用藝人標籤驅動相似度的itmx與itmy之間的項目相似性可定義如下式(4)所述: 其中,ATsim(itmx,itmy)表示項目x(即itmx)、y(即itmy)之間藝人標籤驅動的相似性(Artist-Tag-Driven similarity)之值,ATsim(arta,artb)表示藝人arta、arta之間藝人標籤驅動的相似性(Artist-Tag-Driven similarity)之值,p表示範圍從0到該閾值T的正數,atfa,p表示該藝人標籤頻率矩陣ATXAT→T中的任一元素atfa,j中的j以p代入之值,atfb,p表示該藝人標籤頻率矩陣ATXAT→T中的任一元素atfa,j中的a、j分別以b、p代入之值,其餘參數定義同上。 Assuming that art a and art b each contain itm x and itm y , based on the artist tag frequency matrix, the project similarity between itm x and itm y using artist tag-driven similarity can be defined as follows (4): Among them, ATsim (itm x , itm y ) represents the value of artist-tag-driven similarity (Artist-Tag-Driven similarity) between items x (ie itm x ) and y (ie itm y ), ATsim (art a , art b ) represents the value of Artist-Tag-Driven similarity between artists a a and art a , p represents a positive number ranging from 0 to the threshold T, and atf a, p represents the artist ’s label frequency matrix ATX AT → T , any element atf a, j , j is substituted by p, atf b, p represents the artist label frequency matrix ATX AT → T , any element atf a, j , a, j Substitute the values entered in b and p, respectively, and define the remaining parameters as above.

其中定義5背後的意義是經由該藝人相似性即可表示該項目相似性, 亦即,如果兩個用戶喜歡的藝人相同,他們可能對音樂的喜好是相同的。因此,藝人標籤驅動相似度矩陣可利用項目之間的相似性推導得知,其係定義於如後所述的定義6。 The meaning behind Definition 5 is that the similarity of the project can be expressed by the similarity of the artist, That is, if two users like the same artist, they may have the same music preferences. Therefore, the artist tag-driven similarity matrix can be derived from the similarity between items, which is defined in Definition 6 as described later.

定義6: Definition 6:

基於定義1、4及5,假設資料庫中有n個獨特藝人AT=={art1,art2,...,arta,...,artn},該藝人標籤驅動相似性矩陣可定義如後所述:ATS II [ATsim x,y ]其中,ATsim x,y表示上述項目x,y之間的藝人標籤驅動相似性,且為該藝人標籤驅動相似性矩陣ATS II 中的元素,其餘參數定義同上。表三是基於第5a、5b圖及定義6的藝人標籤驅動相似度矩陣的例子,其中itm1、itm2、itm3、itm4為不同項目編號。 Based on definitions 1, 4 and 5, assuming that there are n unique artists in the database AT == {art 1 , art 2 , ..., art a , ..., art n }, the artist tag-driven similarity matrix can be The definition is as follows: ATS II [ ATsim x, y ] where ATsim x , y represents the artist tag-driven similarity between the above items x , y, and the artist tag-driven similarity matrix ATS II For the element in, the remaining parameters are defined as above. Table 3 is an example of the artist tag-driven similarity matrix based on Figures 5a and 5b and Definition 6, where itm 1 , itm 2 , itm 3 , and itm 4 are different project numbers.

以下舉例說明融合相似性矩陣的建構方法,在此例中,基礎基於社交內容優化及藝人標籤驅動相似性的基礎下,可融合這兩個相似性如定義7中所示的融合相似性。 The following example illustrates the construction method of the fusion similarity matrix. In this example, based on social content optimization and artist tag-driven similarity, these two similarities can be fused as shown in Definition 7.

定義7: Definition 7:

假設arta、artb中各別包含itmx、itmy,給定該相對的基於社交內容及藝人標籤驅動相似性分別為OSCsim(itm x ,itm y )ATsim(itm x ,itm y ),該itmx、itmy之間的融合相似性(Fusion similarity,FU)可定義如後所述:FUsim(itm x ,itm y )=OSCsim(itm x ,itm y )*ATsim(itm x ,itm y ) (5)其中,FUsim(itmx,itmy)表示項目x(即itmx)、y(即itmy)之間的融合相似性(Fusion similarity)之值,其餘參數定義同上。 Assuming that art a and art b each contain itm x and itm y , given the relative similarity based on social content and artist tag drive are OSCsim (itm x , itm y ) and ATsim (itm x , itm y ) , respectively. The fusion similarity (FU) between itm x and itm y can be defined as described below: FUsim (itm x , itm y ) = OSCsim (itm x , itm y ) * ATsim (itm x , itm y ) (5) Among them, FUsim (itm x , itm y ) represents the value of Fusion Similarity between items x (ie itm x ) and y (ie itm y ), and the remaining parameters are defined as above.

考慮表二及表三內容,該融合相似性矩陣的例子可如表四所示,其中itm1、itm2、itm3、itm4為不同項目編號。 Considering Table 2 and Table 3, the example of the fusion similarity matrix can be shown in Table 4, where itm 1 , itm 2 , itm 3 , and itm 4 are different project numbers.

此外,以下舉例說明上述在線預測的方式,在此例中,由於該項目相似性已在離線預處理階段產生,該在線預測的主要目的是利用已產生的項目相似性對未購項目進行評等,該在線預測的整個過程會由一活動的用戶訪問觸發。然後,逐一預測該未知的評等,待所有未知的評等預測完成後,該機制會將該未購項目排序以產生一排名列表。詳言之,該預測首先利用該項目相似性決定預測與該目標項目相關性最高的前k%個項目。其次,對於一個未知的評等,可以由定義8計算。 In addition, the following example illustrates the above online prediction method. In this example, since the item similarity has been generated in the offline pre-processing stage, the main purpose of the online prediction is to use the generated item similarity to rate unpurchased items , The entire process of online prediction will be triggered by an active user visit. Then, the unknown ratings are predicted one by one. After all unknown ratings are predicted, the mechanism will sort the unpurchased items to generate a ranking list. In detail, the forecast first uses the project similarity to determine the top k% of projects that are most relevant to the target project. Secondly, for an unknown rating, it can be calculated by definition 8.

定義8: Definition 8:

根據上述定義,給定一用戶項目評等矩陣UIRU→I[rs,i],其係前述項目用戶評等矩陣的轉置(transpose)矩陣,rs,i表示矩陣UIRU→I的任一元素。假設對於一用戶u s 的一目標項目itmx的相關性最高的項目集合為RI s =∪itm c ,其中itm x Iitm c I。藉此,該未購項目itmx的評等可被定義如後所述: 其中,表示該用戶s對未購項目itmx的評等預估值,r s,c 表示該用戶u s 對未購項目itmx的評等值,其餘參數定義如上。 According to the above definition, given a user project rating matrix UIR U → I [r s, i ], which is the transpose matrix of the aforementioned project user rating matrix, r s, i represents the matrix UIR U → I Any element. Suppose the set of items with the highest correlation for a target item itm x for a user u s is RI s = ∪ itm c , where itm x I and itm c I. In this way, the rating of the unpurchased project itm x can be defined as described below: among them, S indicates that the user is not available to comment item itm estimate value x, r s, c s U indicates that the user is not available for assessment items itm x equivalent of the other parameters are as defined above.

需注意,根據三個相似性矩陣,該預測模型可被定義如後所述: It should be noted that according to three similarity matrices, the prediction model can be defined as described below:

接續上述實施例,假設一用戶對項目評等矩陣(user-to-item matrix)係如表一所示: 若用於一用戶u 3(用戶3)的目標項目為itm 1(項目1),則該目標評等可表示如下: Following the above embodiment, assume that a user-to-item matrix is shown in Table 1: If the target item for a user u 3 (user 3) is itm 1 (item 1), The target rating It can be expressed as follows:

以下舉例說明相關實驗,用以驗證上述方法實施例的功效。 The following examples illustrate related experiments to verify the effectiveness of the above method embodiments.

在實驗數據方面,可從Last.fm提供的用戶收聽行為的集合來收集。Last.fm是一個廣受歡迎的社交音樂網站,為用戶提供在線收聽(online listening)和標記服務(tagging services)。通過這個平台,有3000萬用戶可以標記他們已經聽過的音樂,來描述他們的音樂品味。因此,來自Last.fm的數據被廣泛地用作實驗數據。我們的實驗數據包含兩組,即數據集1和數據集2(如表五所示),數據集1包括912的用戶,27303個項目和3937個標籤,而該數據集2有319個用戶,40992個項目和235921個標籤。主要的區別是,在數據集1包含在2010年的紀錄日誌,數據集2包含於2013年的紀錄日誌,在我們的實驗中,對於每個用戶,約有比率20%左右的項目是隨機選擇作為測試資料,以及其他比率的項目被用作訓練資料。表5揭示實驗資料的詳細說明。需要注意的是,優化矩陣的維數IF,可在這項工作過程減少為1000,如前所述的比率轉換,τ被設定為0.5。 In terms of experimental data, it can be collected from a collection of user listening behaviors provided by Last.fm. Last.fm is a popular social music website, providing users with online listening (online listening) and tagging services (tagging services). Through this platform, 30 million users can mark the music they have listened to to describe their musical tastes. Therefore, the data from Last.fm is widely used as experimental data. Our experimental data contains two groups, namely Data Set 1 and Data Set 2 (as shown in Table 5). Data Set 1 includes 912 users, 27303 items and 3937 tags, while Data Set 2 has 319 users. 40992 items and 235921 tags. The main difference is that data set 1 contains the record log in 2010 and data set 2 contains the record log in 2013. In our experiment, for each user, about 20% of the items are randomly selected As test data, and other ratio items are used as training data. Table 5 reveals the detailed description of the experimental data. It should be noted that the dimension IF of the optimization matrix can be reduced to 1000 during this work process. As mentioned above, the ratio conversion, τ is set to 0.5.

以下舉例說明評等方法,如前所述,在推薦過程可以分解為兩個階段,即評等預測和項目選擇。習用的兩個階段採用兩種類型的措施,即,評等誤差(rating error)和精度(precision)。在一般情況下,評等誤差被採納為評等預測階段的評等測量因素,用於指示預測的評等與基礎事實之間 的差異;而精度作為該項目選擇項目的評等指標,用於表示測試項目與排名列表中前N項結果的比例(最高的N個建議)。 The following example illustrates the ranking method. As mentioned earlier, the recommendation process can be broken down into two stages, namely, rating prediction and project selection. The two stages of practice use two types of measures, namely, rating error and precision. In general, the rating error is adopted as the rating measurement factor in the rating prediction phase, which is used to indicate the relationship between the predicted rating and the underlying facts The accuracy is used as the rating index of the selected item of the project, and it is used to indicate the ratio of the test item to the top N results in the ranking list (the highest N suggestions).

然而,使用習知精度不容易對基於前N推薦(Top-N Recommendation,TNR)的推薦系統進行評等,因為該排名列表包含未購項目和測試(評等)項目。對於習知的精度,基於TNR推薦系統將測試項目視為基礎事實。因此,對於這種類型的推薦系統,其成功的預測之處在於一個方面,即所得到的項目應該是測試項目中的一個。亦即,不是測試項目中的項目將被認為是不正確的結果。亦即,這種測量方式似乎不合適的,因為沒有證據表明,未購項目對於用戶均為負值。 However, it is not easy to rank the recommendation system based on Top-N Recommendation (TNR) using conventional accuracy, because the ranking list contains unpurchased items and test (rating) items. For the known accuracy, the test items are regarded as basic facts based on the TNR recommendation system. Therefore, for this type of recommendation system, the prediction of its success lies in one aspect, that is, the obtained item should be one of the test items. That is, items that are not test items will be considered as incorrect results. That is, this measurement method seems inappropriate, because there is no evidence that the unpurchased items are negative for users.

在此例中,可用一個簡單的例子來解釋這一點,假設在數據庫中有6個項目{項目1,項目2,項目3,項目4,項目5,項目6},對於一用戶而言,指示由用戶評等指定項目的測試項目集合是{項目1,項目2}。然後,假設基於TNR的推薦系統的排名列表導出{項目3,項目2,項目4,項目1,項目6,項目5}。對於基於TNR推薦系統,精度定義為(正確數值/N),其中正確數值表示的正向項目的數量,且N是所得的項目數值。因此,如果N值是2,在本例中所得到的項{項目3,項目2}。顯然,只有項目2是正確的,因此精度為1/2。總體上,在這個例子中,精度為0/1、1/2、1/3、2/4、2/5及2/6,其中N的值分別為1、2、3、4、5及6。此例顯示,它對於將該未購項目識別為假預測是不可行的,因為項目3、項目4和項目5可能是用戶的正向項目。 In this example, a simple example can be used to explain this. Suppose there are 6 items {item 1, item 2, item 3, item 4, item 5, item 6} in the database. For a user, indicate The set of test items for the specified item rated by the user is {Item 1, Item 2}. Then, suppose that the ranking list of the recommendation system based on TNR derives {item 3, item 2, item 4, item 1, item 6, item 5}. For a TNR-based recommendation system, the accuracy is defined as (correct value / N), where the correct value represents the number of forward items, and N is the resulting item value. Therefore, if the value of N is 2, the item {item 3, item 2} obtained in this example. Obviously, only item 2 is correct, so the accuracy is 1/2. In general, in this example, the accuracy is 0/1, 1/2, 1/3, 2/4, 2/5 and 2/6, where the values of N are 1, 2, 3, 4, 5 and 6. This example shows that it is not feasible to identify the unpurchased item as a false prediction, because item 3, item 4 and item 5 may be the user's positive items.

因此,在本例中,使用一個稱為NDCG(正規化折扣累積增益)的概略測量作為該評等指標。在這次評等中,每個用戶的所有測試項目都先以正規化播放次數進行排序,其中播放次數被視為與相關性判斷或評分有關。需要注意的是,測試中的評等項目被視為未評等的項目,因此未評等的項目由檢驗項目和未購項目組成。所有未評等的項目進行評等預測後,產生由測試項目和未購項目組成的一個排名列表,以從排名列表中選擇前N個測試項目,該NDCG定義如下式(7)所示: 其中,IDCG(理想正規化折扣累積增益)為一正規化因子,e表示排名位置,rel表示使用正規化播放次數的相關性判斷或評分。從上述NDCG的定義可知,習知精度與NDCG的差異在於,使用NDCG評等僅考慮測試項目,但使用習知精度同時考慮測試項目和未購項目。 Therefore, in this example, a rough measurement called NDCG (Normalized Discounted Cumulative Gain) is used as the rating index. In this rating, all test items of each user are first sorted by the number of normalized play times, where the play times are considered to be related to relevance judgment or scoring. It should be noted that the rated items in the test are regarded as unrated items, so the unrated items are composed of inspection items and unpurchased items. After all unrated items are rated and predicted, a ranking list consisting of test items and unpurchased items is generated to select the top N test items from the ranked list. The NDCG is defined as follows (7): Among them, IDCG (ideal normalized discounted cumulative gain) is a normalization factor, e represents the ranking position, and rel represents the correlation judgment or score using the normalized play times. From the above definition of NDCG, the difference between conventional accuracy and NDCG is that using NDCG rating only considers test items, but using conventional accuracy considers both test items and unpurchased items.

表六及表七是說明如何計算NDCG的例子。假設在該評等過程中有6個未評等項目,該評等過程包含一測試項目組合{itm1,itm2,itm3,itm4,itm5}及一未購項目{itm6}。在表六中,排名集合為{itm4,itm6,itm1,itm3,itm5,itm2},其中該預測評等集合為{4.7,4.2,3.8,3.5,3.3,3.0}及相關的相關性評分集合為{0.9,NULL,0.4,0.7,0.1,0.2}。在此例中,因為沒有分配相關性評分給itm6,所以它不能被用於考慮評等。因此,不考慮未購項目itm6的DCG是1.468。同理,由表3中導出的IDCG是1.511,因此NDCG是1.468/1.511=0.97。 Tables 6 and 7 are examples of how to calculate NDCG. Suppose there are 6 unrated items in the rating process. The rating process includes a test item combination {itm 1 , itm 2 , itm 3 , itm 4 , itm 5 } and an unpurchased item {itm 6 }. In Table 6, the ranking set is {itm 4 , itm 6 , itm 1 , itm 3 , itm 5 , itm 2 }, where the forecast rating set is {4.7,4.2,3.8,3.5,3.3,3.0} and related The correlation score set is {0.9, NULL, 0.4, 0.7, 0.1, 0.2}. In this example, because itm 6 is not assigned a relevance score, it cannot be used to consider ratings. Therefore, the DCG without considering the unpurchased project itm 6 is 1.468. Similarly, the IDCG derived from Table 3 is 1.511, so the NDCG is 1.468 / 1.511 = 0.97.

除了NDCG,其他指標用於評估就是RMSE(均方根誤差),其定義為: 其中r代表該基礎事實,代表預測評等值並用於表示該測試數據集(test)的測試。整體而言,NDCG顯示出相對於理想排名的預測排名比率,而RMSE顯示出預測評等與基礎事實之間的誤差方差。也就是說,均方根誤差越低,誤差越低,精度越高,建議越好。相較之下,NDCG越高,建議越好。為了保證的實驗評估結果健全,NDCG和RMSE都被用來驗證上述方法。 In addition to NDCG, the other indicators used for evaluation are RMSE (root mean square error), which is defined as: Where r represents the basic fact, Represents the predicted rating value and is used to represent the test of the test data set (test). Overall, NDCG shows the predicted ranking ratio relative to the ideal ranking, and RMSE shows the error variance between the predicted rating and the underlying facts. In other words, the lower the rms error, the lower the error, the higher the accuracy, and the better the recommendation. In contrast, the higher the NDCG, the better the recommendation. In order to ensure sound experimental evaluation results, both NDCG and RMSE are used to verify the above method.

以下舉例說明實驗結果,在實驗中,主要以幾個概念進行評估:(1)本發明提出的包括OSCP、ATP及MMR等個別模型的效能評估;(2)本發明提出的融合模型和現有的知名推薦系統之間在於RMSE和NDCG方面的效能比較;(3)所有比較的方法的效能評估。 The following examples illustrate the experimental results. In the experiment, the evaluation is mainly based on several concepts: (1) the performance evaluation of the individual models proposed by the present invention including OSCP, ATP, and MMR; (2) the fusion model proposed by the present invention and the existing The effectiveness comparison between RMSE and NDCG among well-known recommendation systems; (3) The effectiveness evaluation of all comparison methods.

以下說明在數據集1評估本發明提出的個別模型的有效性,在評估本發明提出的多模態推薦系統之前,必須說明的是本發明提出的個別型號的有效性。在該評估過程中,將上述前k個百分比率的相關項目都採用為預測的基礎。第6圖是實驗結果,並顯示出一些重要的點,首先,RMSE的增加隨著相關項目增加,由此可知,相關項目增加,雜訊增加,使得性能有降。其次,OSCP的效能優於ATP的效能,而MMR的效能最佳。由此可知,若無OSCP或ATP,該融合模式將無法實現高品質的音樂推薦結果。總之,OSCP和ATP在實際融合模型中有著關鍵作用,而且,由於MMR是最好的預測模型,因此,在本例中採用它作為本發明主要提出的方法,與其他知名的推薦系統進行比較。 The following description evaluates the effectiveness of the individual models proposed by the present invention in Data Set 1. Before evaluating the multimodal recommendation system proposed by the present invention, it must be noted that the effectiveness of the individual models proposed by the present invention. In this evaluation process, the relevant items of the top k percentages mentioned above are used as the basis for prediction. Figure 6 is the experimental results and shows some important points. First, the increase in RMSE increases with the related items. From this, it can be seen that the increase in related items and the increase in noise make the performance decrease. Secondly, the performance of OSCP is better than that of ATP, and MMR is the best. It can be seen that without OSCP or ATP, this fusion mode will not achieve high-quality music recommendation results. In short, OSCP and ATP play a key role in the actual fusion model, and because MMR is the best predictive model, it is used in this example as the main method proposed by the present invention and compared with other well-known recommendation systems.

以下說明本發明提出的融合模型和現有的知名推薦系統之間的有效性比較,在比較方法方面,為了使實驗扎實,我們選擇15個較優的習知推薦系統進行比較,其中包括基於存儲器(memory-based)、基於內容(content-based)及基於模型(model-based)的推薦系統。表八敘述在實驗比較方法的資訊。大多數推薦系統都是廣為人知的,鑒於某些待比較的方法只產生排名列表,故此類方法是藉由NDCG進行評估。 The following describes the effectiveness comparison between the fusion model proposed by the present invention and the existing well-known recommendation system. In terms of comparison methods, in order to make the experiment solid, we choose 15 better conventional recommendation systems for comparison, including memory-based), content-based and model-based recommendation systems. Table 8 describes the information on the experimental comparison method. Most recommendation systems are widely known, and since some methods to be compared only produce ranking lists, such methods are evaluated by NDCG.

表八 比較的方法列表 Table 8 Comparison method list

以下說明對數據集1的評等稀疏的實驗分析,如表九所示,在該實驗中,我們完全選擇40866個項目評等作為未知值來預測。然而,由於習知IB方法會遭遇評等稀疏的問題,只有34870個項目評等可經由IB方法被成功地預測。亦即,將有5996個項目評等無法經由習知IB方法預測。相較之下,本發明揭示的MMR僅有66個項目評等不能預測。顯然,本發明揭示的MMR的評等預測範圍大幅優於習知IB方法,表示本發明揭示的MMR方法能夠真正顯著地緩解評等稀疏的問題。基於此分析結果, 在後述比較的實驗方法,該測試項目評等數量都設為34870個。 The following describes the experimental analysis of the sparse ratings of data set 1, as shown in Table 9, in this experiment, we completely selected 40866 item ratings as unknown values to predict. However, since the conventional IB method suffers from the problem of sparse ratings, only 34,870 item ratings can be successfully predicted via the IB method. That is, there will be 5,996 item ratings that cannot be predicted by the conventional IB method. In contrast, the MMR disclosed in the present invention has only 66 item ratings that cannot be predicted. Obviously, the rating prediction range of the MMR disclosed by the present invention is much better than that of the conventional IB method, indicating that the MMR method disclosed by the present invention can truly significantly alleviate the problem of sparse ratings. Based on this analysis result, In the comparative experimental method described later, the number of ratings of this test item is set to 34870.

以下說明在數據集1使用RMSE評估的內容,由於評等稀疏和評等多樣性可能帶來令人不能滿意的預測結果,需要經由詳細的實驗評估進行調查。因此,在後續的實驗中,比較的方法是使用RMSE和NDCG顯示與評等稀疏和評等多樣性問題的應對能力進行評估。首先,此段敘述不同的方法的RMSE比較結果。需要注意的是,採用前k%的相關項目/用戶的預測作為這個實驗的基礎。 The following describes the content of using RMSE evaluation in data set 1. Due to the sparse ratings and the diversity of ratings may bring unsatisfactory prediction results, it needs to be investigated through detailed experimental evaluation. Therefore, in subsequent experiments, the method of comparison is to use RMSE and NDCG to display and rate the sparseness and the ability to assess the diversity of rating issues. First, this paragraph describes the RMSE comparison results of different methods. It should be noted that the top k% of related project / user predictions are used as the basis for this experiment.

第7圖描繪RMSE方面提供的一些重要實驗結果,首先,效能最好的是本發明的MMR方法,而習知UB方法是最差的;其次,習知基於用戶的方法相較於習知基於項目的方法的效能更差;第三,習知基於項目的方法和習知相似性融合(SF)方法的結果是相當接近;第四,雖然習知UPTR方法是基於內容的方法,但它並未比習知基於項目的方法和習知SF方法更好,可能的原因是,習知UPTR方法的推薦結果是基於用戶的想法產生的;第五,習知UB方法和習知UPTR的RMSE隨著相關用戶增加而下降。這個結果顯示習知IB、SF方法與本發明的MMR方法非常不同,由此可知,用戶更多可對基於用戶的方法帶來更好的結果,但更多的項目不能促進基於項目方法的預測值,可能是用戶在類似的聽力和標記的行為是非常相關的,但項目沒有此特性。 Figure 7 depicts some important experimental results provided by the RMSE. First, the MMR method of the present invention is the best, and the conventional UB method is the worst; second, the conventional user-based method is compared to the conventional based The effectiveness of the project method is worse; third, the results of the conventional project-based method and the conventional similarity fusion (SF) method are quite close; fourth, although the conventional UPTR method is a content-based method, it does not It is not better than the conventional project-based method and the conventional SF method. The possible reason is that the recommendation results of the conventional UPTR method are based on the user's ideas; fifth, the RMSE of the conventional UB method and the conventional UPTR follow As related users increase and decrease. This result shows that the conventional IB and SF methods are very different from the MMR method of the present invention. It can be seen that more users can bring better results to the user-based method, but more projects cannot promote the prediction based on the project method. The value may be that the user ’s behavior in similar listening and marking is very relevant, but the project does not have this feature.

承上,這是因為習知基於模型的SVM、DT、Bayes、SVD ++、MF、NMF及NIMF等方法預測評等時都不考慮前k百分比率的相關性項目,其係習知基於模型的方法、基於內存的方法及本發明的MMR方法比較出最好RMSE的總體評估方式。從第8圖中可知,首先,習知Bayes方 法的效能是最糟糕的;其次,習知SVD++、IB、MF、NMF及SF方法的表現都相當接近,比習知UPTR方法稍好;第三,整體習知基於項目的方法和習知SF方法優於習知基於用戶的方法和習知基於模型的方法;第四,雖然習知SF方法融合習知基於用戶和基於項目的CF方法,它仍然無法帶出比基於項目的習知CF方法顯有更好的結果。整體而言,此實驗結果表明,本發明之方法就RMSE方面而言比其他習知方法都好,亦即,從RMSE角度,本發明提出的方法可以有效融合基於社交內容優化及藝人標籤驅動的相似性有效地預測用戶的評等結果。 Inherited, this is because the conventional model-based methods such as SVM, DT, Bayes, SVD ++, MF, NMF, and NIMF do not consider the top k percentage rate correlation items when predicting the rating, which is based on the model. Method, memory-based method and the MMR method of the present invention compare the best overall evaluation method of RMSE. It can be seen from Figure 8 that, first, the Bayesian The efficiency of the method is the worst; second, the performance of the conventional SVD ++, IB, MF, NMF and SF methods are quite close, slightly better than the conventional UPTR method; third, the overall knowledge is based on the project-based method and the conventional SF The method is superior to the conventional user-based method and the conventional model-based method; fourth, although the conventional SF method combines the conventional user-based and project-based CF methods, it still cannot bring out the conventional CF-based method Significantly better results. Overall, the experimental results show that the method of the present invention is better than other conventional methods in terms of RMSE, that is, from the perspective of RMSE, the method proposed by the present invention can effectively integrate social content optimization and artist tag-driven Similarity effectively predicts users' rating results.

以下說明數據集1使用NDCG的評估內容,包括本發明的MMR與習知的RWR、PureSVD、NIMF、SVD ++、IB、TAGA和UB等方法比較時,最終評估檢查使用NDCG方法。第9圖描繪在NDCG方面的實驗結果,其說明如下。首先,習知UB方法的效能是最差,本發明的MMR方法的效能是最好的,在這個實驗中同時考慮10%的相關項目,本發明的MMR的NDCG可以達到0.8左右;次之,習知基於內容和基於模型的方法的結果非常接近,都比習知基於內存的方法好,亦即,採用RMSE結果無法透露推薦系統在預測用戶的喜好的真正實力。整體而言,本發明提出的方法在音樂推薦方面可以比眾多的比較方法帶來更好的效果。 The following describes the evaluation contents of the data set 1 using NDCG, including the comparison between the MMR of the present invention and the conventional methods of RWR, PureSVD, NIMF, SVD ++, IB, TAGA, and UB. The final evaluation check uses the NDCG method. Figure 9 depicts the experimental results in terms of NDCG, which is explained below. First, the performance of the conventional UB method is the worst, and the performance of the MMR method of the present invention is the best. In this experiment, while considering 10% of related items, the NDCG of the MMR of the present invention can reach about 0.8; second, The results of Xizhi's content-based and model-based methods are very close, which are better than Xizhi's memory-based methods, that is, the use of RMSE results cannot reveal the true strength of the recommendation system in predicting user preferences. Overall, the method proposed by the present invention can bring better results in music recommendation than many comparison methods.

以下說明數據集1使用RMSE的可擴展性評估內容,目的是要闡明這個實驗在訓練集大小不同情況下的有效性。第10圖顯示出的實驗結果是訓練數據越多,有效性越高。然而,訓練大小在80%和90%之間的有效性是相當接近的,但70%和80%略有不同,進而說明,本發明所提出的方法在不同的訓練規模都可產生穩定的結果。 The following illustrates the scalability evaluation content of data set 1 using RMSE, the purpose is to clarify the effectiveness of this experiment in the case of different training set sizes. The experimental results shown in Figure 10 are that the more training data, the higher the effectiveness. However, the effectiveness of the training size between 80% and 90% is quite close, but 70% and 80% are slightly different, which further shows that the method proposed by the present invention can produce stable results at different training scales .

以下說明在數據集2使用RMSE及NDCG的評估內容,藉由對數據集1上述評估結果可知,本發明所提出的方法在RMSE、NDCG和稀疏性方面的音樂推薦效能遠遠優於現今所有方法。為了使評估結果更加經得起考驗,進一步使用數據集2進行評估。如表5所示,兩個數據集存在著不同聆聽和標記的行為。因此,我們的目的是調查的有效性是否因聆聽變化和標記行為而不同,由於使用數據集1的最佳比較方法分別是用於RMSE和NDCG的習知IB和RWR方法,故仍選擇習知IB和RWR做 為使用數據集2的比較方法。無論對於RMSE或NDCG,第11和12圖顯示本發明所提出的MMR方法,即使使用數據集2,仍然優於習知IB和RWR方法,詳言之,即使在聆聽和標記行為改變的情況下,本發明之方法仍可真正實現高品質的音樂推薦效能。 The following describes the evaluation content of using RMSE and NDCG in data set 2. Based on the above evaluation results of data set 1, it can be seen that the music recommendation performance of the method proposed by the present invention in terms of RMSE, NDCG, and sparseness is far superior to all current methods. . In order to make the evaluation results more testable, the data set 2 was further used for evaluation. As shown in Table 5, the two data sets have different listening and marking behaviors. Therefore, our purpose is to investigate whether the effectiveness of listening varies and the marking behavior. Since the best comparison methods using Dataset 1 are the conventional IB and RWR methods for RMSE and NDCG, respectively, we still choose the conventional knowledge IB and RWR do For the comparison method using dataset 2. Regardless of RMSE or NDCG, Figures 11 and 12 show that the MMR method proposed by the present invention, even using Dataset 2, is still superior to the conventional IB and RWR methods, in particular, even in the case of changes in listening and marking behavior The method of the present invention can still truly achieve high-quality music recommendation performance.

以下說明效率評估結果,由於除了成效評估之外,每個用戶的效率是實際應用中的另一個問題。為了解決這個問題,因而進行對比實驗,顯示比較的方法對一用戶預測的時間。表十顯示實驗結果,整體而言,幾乎這些方法都可以在1秒鐘內實現用戶的預測,只有習知SVM和RWR方法需要的時間大於1秒鐘。從使用觀點來看,如果結果能在一秒左右返回則是可行的。因此,只有習知RWR方法不能滿足時間需求。此外,雖然習知RWR方法能夠真正實現高效益,但執行時間是如此之高,故用戶無法接受。 The following describes the results of the efficiency evaluation. In addition to the effectiveness evaluation, the efficiency of each user is another problem in practical applications. In order to solve this problem, a comparative experiment was conducted to show the time predicted by a method of comparison for a user. Table 10 shows the experimental results. Overall, almost all of these methods can achieve user predictions within 1 second, and only the time required for the conventional SVM and RWR methods is greater than 1 second. From a usage point of view, it is feasible if the result can be returned in about a second. Therefore, only the conventional RWR method cannot meet the time requirement. In addition, although the conventional RWR method can truly achieve high benefits, the execution time is so high that users cannot accept it.

以下說明上述實驗結果的相關討論:(1)在實驗中,OSCP綜合評等和標籤資訊,而習知IB方法正是利用評等資訊。使用第5至7圖對習知OSCP與IB方法進行比較,習知OSCP方法性能優於習知IB方法。 此結果說明,在處理評等多樣性和評等稀疏的問題上,聯合信息比評等信息更穩健,這也說明該標籤資訊確實有助於推薦系統預測用戶的喜好。(2)從第6至8圖可知,本發明提出的方法在RMSE和NDCG方面都是最好的方法。這是一個重要的發現,從這些結果得知,在RMSE和NDCG兩個方面都表現良好穩定是不容易的,例如:習知IB方法在RMSE方面是良好的,但對NDCG方面是差的;另根據觀察,雖然習知IB方法的預測可以接近評等的實際真相,相關排名無法接近真相;另,習知IB、SVD ++方法兩者RMSE和NDCG表現良好穩定。因此,並非RMSE好的方法即是NDCG差的方法。相反,對RMSE和NDCG都表現優良的方法用於音樂推薦才是真正可靠的。(3)實際上,上述比較的方法是採用幾個資訊源並組合不同的CF形成融合機制的基於融合的方法,如MMR、SF、NIMF、SVD ++、TagA、RWR、UPTR等。然而,實驗結果表明,並非所有基於融合的方法皆有利於RMSE和NDCG雙方。平均而言,本發明提出的方法可以帶來RMSE和NDCG方面穩定的結果。(4)除了上述實驗分析之外,需要使用基於NMF的優化方法。為了這個理由,本發明完成了另一種方法,即使用優化的評等矩陣(用戶項目評等矩陣)的基於NMF的鄰域預測(NMFN)。第13圖顯示相關實驗結果,使用優化評等矩陣的NMFN方法優於評等矩陣無優化的IB方法。該實驗結果說明,優化操作可以真正促進評等預測。此外,雖然NMFN採用了優化的評等矩陣,但其效能仍未及於本發明使用整合及優化資訊的MMR方法,由此可見,相較於其他方法,本發明之MMR方法確實是用於音樂推薦真正可靠的方法。 The following describes the relevant discussion of the above experimental results: (1) In the experiment, OSCP comprehensive rating and label information, and the conventional IB method uses rating information. Using Figures 5 to 7 to compare the conventional OSCP method with the IB method, the conventional OSCP method performs better than the conventional IB method. This result shows that the joint information is more robust than the rating information in dealing with the problems of rating diversity and sparse rating, which also shows that the tag information does help the recommendation system to predict user preferences. (2) As can be seen from Figures 6 to 8, the method proposed by the present invention is the best method in terms of RMSE and NDCG. This is an important finding. From these results, it is not easy to perform well in both RMSE and NDCG, for example: the conventional IB method is good in RMSE, but poor in NDCG; According to observation, although the prediction of the conventional IB method can be close to the actual truth of the rating, the relevant ranking cannot be close to the truth; in addition, both the RMSE and NDCG of the conventional IB and SVD ++ methods perform well and stable. Therefore, not a good RMSE method is a poor NDCG method. On the contrary, the method that performs well for both RMSE and NDCG is really reliable for music recommendation. (3) In fact, the above comparison method is a fusion-based method that uses several information sources and combines different CFs to form a fusion mechanism, such as MMR, SF, NIMF, SVD ++, TagA, RWR, UPTR, etc. However, experimental results show that not all fusion-based methods are beneficial to both RMSE and NDCG. On average, the method proposed by the present invention can bring stable results in RMSE and NDCG. (4) In addition to the above experimental analysis, it is necessary to use an optimization method based on NMF. For this reason, the present invention has completed another method, namely, NMF-based neighborhood prediction (NMFN) using an optimized rating matrix (user item rating matrix). Figure 13 shows the relevant experimental results. The NMFN method using the optimized rating matrix is superior to the IB method with no optimization of the rating matrix. The experimental results show that the optimization operation can really promote the rating prediction. In addition, although NMFN uses an optimized rating matrix, its performance is still not as good as the MMR method of integrating and optimizing information in the present invention. It can be seen that the MMR method of the present invention is indeed used for music compared to other methods. Recommend a truly reliable method.

另外,根據本發明的其他實施例,上述音樂推薦方法實施例可實現為一電腦程式產品,可利用程式語言(如:C或Java等)撰寫而成,該電腦程式產品可儲存於一電腦可讀取之儲存裝置,如:光碟片等,當一電腦載入該儲存裝置中的電腦程式產品,即可依據該電腦程式產品執行該音樂推薦方法實施例。 In addition, according to other embodiments of the present invention, the above-mentioned music recommendation method embodiment can be implemented as a computer program product, which can be written in a programming language (such as C or Java, etc.), and the computer program product can be stored in a computer. The read storage device, such as an optical disc, etc., when a computer loads the computer program product in the storage device, the music recommendation method embodiment can be executed according to the computer program product.

雖然本發明已以較佳實施例揭露,然其並非用以限制本發明,任何熟習此項技藝之人士,在不脫離本發明之精神和範圍內,當可作各種更動與修飾,因此本發明之保護範圍當視後附之申請專利範圍所界定 者為準。 Although the present invention has been disclosed in preferred embodiments, it is not intended to limit the present invention. Anyone who is familiar with this skill can make various changes and modifications without departing from the spirit and scope of the present invention, so the present invention The scope of protection shall be defined as the scope of the attached patent application Whichever prevails.

Claims (5)

一種音樂推薦方法,係供在一推薦音樂之電腦執行,該方法包含步驟:在該電腦之一儲存器中儲存一項目用戶播次矩陣、一項目標籤頻率矩陣及一藝人標籤頻率矩陣;利用該電腦之一處理器進行運算,將該項目用戶播次矩陣轉換為一項目用戶評等矩陣,將該項目用戶評等矩陣與該項目標籤頻率矩陣合併為一項目用戶標籤矩陣,將該項目用戶標籤矩陣依據一非負矩陣因式分解法計算而產生一基於非負分解矩陣,將該基於非負分解矩陣依據一相似度演算法計算而產生一基於社交內容優化的相似性矩陣,將該藝人標籤頻率矩陣依據該相似度演算法計算而產生一藝人標籤驅動的相似性矩陣;經由該處理器回應來自一用戶的查訪,依據該用戶的數個未購項目產生數個目標項目,以該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣決定數個相關項目,依據該些相關項目以一評分預測演算法預測各未購項目的一評等,將各未購項目依據該評等排序而產生一排名列表;及由該電腦之一顯示器輸出該排名列表,用以推薦該排名列表中的數個音樂項目;其中該處理器依據一閾值將該項目用戶播次矩陣之元素轉換為該項目用戶評等矩陣之元素,該閾值的定義如下式所示:T=μ-τ*σ,其中T表示該閾值,μ表示該用戶的播放次數的一平均值,σ表示該用戶的播放次數的一標準差,τ是一權重參數。A music recommendation method for a computer that recommends music. The method includes the steps of: storing an item user broadcast matrix, an item label frequency matrix, and an artist label frequency matrix in a memory of the computer; using the One of the computer's processors performs operations, converts the project user broadcast matrix into a project user rating matrix, merges the project user rating matrix and the project label frequency matrix into a project user label matrix, and then projects the user label The matrix is calculated based on a non-negative matrix factorization method to generate a non-negative decomposition matrix, which is calculated based on a similarity algorithm to generate a similarity matrix based on social content optimization, and the artist label frequency matrix is based on The similarity algorithm is calculated to generate an artist tag-driven similarity matrix; the processor responds to a query from a user, and generates several target items based on the user ’s unpurchased items. The social content-based optimization The similarity matrix and the artist-driven similarity matrix determine several correlations Project, based on the related items, a rating prediction algorithm is used to predict a rating of each unpurchased item, and each unpurchased item is sorted according to the rating to generate a ranking list; and the ranking list is output by a monitor of the computer To recommend several music items in the ranking list; where the processor converts the elements of the project user ’s broadcast matrix to the elements of the project user ’s rating matrix according to a threshold, the threshold is defined as follows: T = μ-τ * σ, where T represents the threshold, μ represents an average value of the user's play times, σ represents a standard deviation of the user's play times, and τ is a weight parameter. 如申請專利範圍第1項所述之音樂推薦方法,其中該些相關項目係以該基於社交內容優化的相似性矩陣及該藝人標籤驅動的相似性矩陣的前k%的相關項目所決定,其中k為正數。The music recommendation method as described in item 1 of the patent application scope, wherein the related items are determined by the top k% related items of the similarity matrix optimized based on social content and the similarity matrix driven by the artist ’s tags, where k is a positive number. 如申請專利範圍第1項所述之音樂推薦方法,其中該閾值的範圍分為二等效子範圍,該二等效子範圍分別包含一評等數值集合,各該評等數值集合所含的評等數值的數量不同。The music recommendation method as described in item 1 of the patent application scope, wherein the threshold range is divided into two equivalent sub-ranges, and the two equivalent sub-ranges respectively include a rating value set, each The number of rating values is different. 如申請專利範圍第1項所述之音樂推薦方法,其中該權重參數為0.5。The music recommendation method as described in item 1 of the patent application scope, wherein the weight parameter is 0.5. 一種電腦程式產品,經由一電腦載入該程式並執行後,該電腦能夠進行如申請專利範圍第1至4項任一項所述之音樂推薦方法。A computer program product. After the program is loaded and executed through a computer, the computer can perform the music recommendation method as described in any one of items 1 to 4 of the patent application.
TW106114498A 2017-05-02 2017-05-02 Music recommendation method and computer program product thereof TWI639092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106114498A TWI639092B (en) 2017-05-02 2017-05-02 Music recommendation method and computer program product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106114498A TWI639092B (en) 2017-05-02 2017-05-02 Music recommendation method and computer program product thereof

Publications (2)

Publication Number Publication Date
TWI639092B true TWI639092B (en) 2018-10-21
TW201843601A TW201843601A (en) 2018-12-16

Family

ID=64802695

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106114498A TWI639092B (en) 2017-05-02 2017-05-02 Music recommendation method and computer program product thereof

Country Status (1)

Country Link
TW (1) TWI639092B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6704433B2 (en) 1999-12-27 2004-03-09 Matsushita Electric Industrial Co., Ltd. Human tracking device, human tracking method and recording medium recording program thereof
TW200606681A (en) 2004-08-13 2006-02-16 Parasoft Corp Music recommendation system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6704433B2 (en) 1999-12-27 2004-03-09 Matsushita Electric Industrial Co., Ltd. Human tracking device, human tracking method and recording medium recording program thereof
TW200606681A (en) 2004-08-13 2006-02-16 Parasoft Corp Music recommendation system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Basu, C., Haym, H., and Cohen, W.W., "Recommendation as Classification: Using Social and Content-Based Information in Recommendation," In Proceedings of National Conference on Artificial Intelligence,pp.11-15 (1998),"https://www.aaai.org/Papers/Workshops/1998/WS-98-08/WS98-08-002.pdf"。

Also Published As

Publication number Publication date
TW201843601A (en) 2018-12-16

Similar Documents

Publication Publication Date Title
JP5350472B2 (en) Product ranking method and product ranking system for ranking a plurality of products related to a topic
US8156059B2 (en) Indicator-based recommendation system
Chen et al. General functional matrix factorization using gradient boosting
WO2018040069A1 (en) Information recommendation system and method
CN111242310B (en) Feature validity evaluation method and device, electronic equipment and storage medium
US20090300547A1 (en) Recommender system for on-line articles and documents
Lee et al. Technology opportunity analysis based on recombinant search: patent landscape analysis for idea generation
Rafeh et al. An adaptive approach to dealing with unstable behaviour of users in collaborative filtering systems
Al-Bashiri et al. An improved memory-based collaborative filtering method based on the TOPSIS technique
Mozetič et al. How to evaluate sentiment classifiers for Twitter time-ordered data?
Cheng et al. A multi-objective optimization approach for question routing in community question answering services
JP6668892B2 (en) Item recommendation program, item recommendation method and item recommendation device
Yin et al. Exploring social activeness and dynamic interest in community-based recommender system
Su et al. Effective social content-based collaborative filtering for music recommendation
CN110727872A (en) Method and device for mining ambiguous selection behavior based on implicit feedback
Grivolla et al. A hybrid recommender combining user, item and interaction data
Salmani et al. Hybrid movie recommendation system using machine learning
Assami et al. Implementation of a Machine Learning-Based MOOC Recommender System Using Learner Motivation Prediction.
Lu et al. Multi-trends enhanced dynamic micro-video recommendation
Li et al. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control
Diwandari et al. Research Methodology for Analysis of E-Commerce User Activity Based on User Interest using Web Usage Mining.
Beregovskaya et al. Review of clustering-based recommender systems
TWI639092B (en) Music recommendation method and computer program product thereof
Gao et al. [Retracted] Construction of Digital Marketing Recommendation Model Based on Random Forest Algorithm
Gasmi et al. Enhanced context-aware recommendation using topic modeling and particle swarm optimization