CN113268629A - Heterogeneous picture singing list multi-label recommendation method fusing node preference - Google Patents

Heterogeneous picture singing list multi-label recommendation method fusing node preference Download PDF

Info

Publication number
CN113268629A
CN113268629A CN202110477214.8A CN202110477214A CN113268629A CN 113268629 A CN113268629 A CN 113268629A CN 202110477214 A CN202110477214 A CN 202110477214A CN 113268629 A CN113268629 A CN 113268629A
Authority
CN
China
Prior art keywords
song
list
node
singing
song list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110477214.8A
Other languages
Chinese (zh)
Other versions
CN113268629B (en
Inventor
王晨旭
郭晨野
杨煜
索凯强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110477214.8A priority Critical patent/CN113268629B/en
Publication of CN113268629A publication Critical patent/CN113268629A/en
Application granted granted Critical
Publication of CN113268629B publication Critical patent/CN113268629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

A multi-label recommendation method for a heteromorphic graph song menu with node preference fusion comprises the following steps: constructing a song list abnormal composition graph through the abnormal data of the song list training set; carrying out neighbor sampling of fusion node preference on each song list through a song list different composition graph to obtain song list information containing song neighbor characteristics and song list information containing singer neighbor characteristics; carrying out continuous feature representation on the singing list by using word2vec technology, wherein the singing list information containing the song neighbor features and the singing list information containing the singer neighbor features; carrying out clustering analysis on the continuous feature representation of the song list by adopting a spectral clustering algorithm to obtain a song list clustering result; and calculating the weight value of each navigation type label in each type according to the singing list clustering result, and then completing label recommendation of the target singing list by using a local sensitive hash technology. The invention has the characteristics of simple structure and high recommendation efficiency. Compared with the traditional collaborative filtering method, the singing list label recommendation method has high accuracy and high recommendation speed.

Description

Heterogeneous picture singing list multi-label recommendation method fusing node preference
Technical Field
The invention belongs to the field of music recommendation systems, and particularly relates to a node preference fused heteromorphic picture song list multi-label recommendation method.
Background
In recent years, the song list function provided by the Internet is easy to implement by cloud music, so that the cloud music breaks through the mode of classifying and organizing the song lists of singers in traditional albums. The playing mode with the song list as the core can solve the requirement that a user finds and searches music by social sharing and personalized recommendation. The song list label plays an important role in improving the song listening experience of an online music user and encouraging the user to produce an individualized song list. In the context of rapid development of big data, we can implicitly infer the characteristics of songs in a song list from a large number of song lists with expert tags, making song list tag recommendations possible. The application of the collaborative filtering algorithm in the field of music recommendation is very common, and the algorithm is generally divided into three steps of data collection, similarity calculation and recommendation result giving. However, in the big data era, due to the high dimensional sparsity of song list data and the reason that interaction relationship is excessively concerned, the traditional collaborative filtering algorithm has the problems that a hot tag is easy to recommend when song list tag recommendation is carried out, the recommendation time is long, and the like, so that the method is difficult to be applied in practice.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention aims to provide a multi-label recommendation method for a heteromorphic graph song menu with node preference fused.
In order to achieve the purpose, the invention adopts the following technical scheme:
a multi-label recommendation method for a heteromorphic graph song menu with node preference fusion comprises the following steps:
step 1: constructing a song list abnormal composition graph through the abnormal data of the song list training set;
step 2: performing neighbor sampling of fusion node preference on each song through a song list dissimilarity graph by adopting a song element-based path and a singer element-based path to obtain song list information containing song neighbor characteristics and song list information containing singer neighbor characteristics;
and step 3: carrying out continuous feature representation on the singing list by using word2vec technology, wherein the singing list information containing the song neighbor features and the singing list information containing the singer neighbor features;
and 4, step 4: carrying out clustering analysis on the continuous feature representation of the song list by adopting a spectral clustering algorithm to obtain a song list clustering result;
and 5: and calculating the weight value of each navigation type label in each type according to the singing list clustering result, and then completing label recommendation of the target singing list by using a local sensitive hash technology.
Further, the specific process of step 1 is as follows:
step 1.1: connecting the song list nodes according to the following formula
Figure BDA0003047522590000021
And song node
Figure BDA0003047522590000022
If the ith song list LiDoes not include song m in the song list of (1)jNode of song list
Figure BDA0003047522590000023
And song node
Figure BDA0003047522590000024
There is no edge between, otherwise, singing node
Figure BDA0003047522590000025
And song node
Figure BDA0003047522590000026
There is a song-singing margin in between
Figure BDA0003047522590000027
Step 1.2: connecting the song list nodes according to the following formula
Figure BDA0003047522590000028
And singer node
Figure BDA0003047522590000029
If the ith song list LiThe list of singers does not contain singersjThen singing node
Figure BDA00030475225900000210
And singer node
Figure BDA00030475225900000211
There is no edge between them, otherwise, the song list node
Figure BDA00030475225900000212
And singer node
Figure BDA00030475225900000213
Between the singer and the singing edge
Figure BDA00030475225900000214
Step 1.3: connecting the song list nodes according to the following formula
Figure BDA00030475225900000215
And user node
Figure BDA00030475225900000216
If the ith song list LiIs not ujNode of song list
Figure BDA00030475225900000217
And user node
Figure BDA00030475225900000218
There is no edge between them, otherwise, the song list node
Figure BDA00030475225900000219
And user node
Figure BDA00030475225900000220
There is a user-song list edge in between
Figure BDA00030475225900000221
Further, in step 1.1, song-singing list is bordered
Figure BDA00030475225900000222
As shown in the following formula:
Figure BDA00030475225900000224
wherein the content of the first and second substances,
Figure BDA00030475225900000225
is a song mjNumber of occurrences, MnumIs the total number of songs, GLIn order to form a different picture of the song,
Figure BDA00030475225900000226
is a song feature; step 1.2, singer-singing side
Figure BDA00030475225900000227
As shown in the following formula:
Figure BDA00030475225900000228
wherein the content of the first and second substances,
Figure BDA0003047522590000031
denoted as singer sjNumber of occurrences, SnumThe total number of the singers is,
Figure BDA0003047522590000032
is a characteristic of the singer;
step 1.3, user-singing list side
Figure BDA0003047522590000033
As shown in the following formula:
Figure BDA0003047522590000034
wherein the content of the first and second substances,
Figure BDA0003047522590000035
is a user characteristic.
Further, the specific process of step 2 is as follows:
step 2.1: if singing node
Figure BDA0003047522590000036
Number of owned first-order neighbor song nodes NmThe number N of first-order song neighbors greater than the set song list nodemsThen use the song node
Figure BDA0003047522590000037
All first order neighbor song nodes of
Figure BDA0003047522590000038
As the relative weight of selection, N is randomly selected according to the relative weightmsOne first-order neighbor song node as alternative song neighbor
Figure BDA0003047522590000039
Step 2.2: if singing node
Figure BDA00030475225900000310
First-order neighbor song node number NmThe number N of first-order song neighbors smaller than the set song list nodemsThen will song list node
Figure BDA00030475225900000311
First order neighbor song node preference
Figure BDA00030475225900000312
As the relative weight, randomly selected (N) based on the relative weightms-Nm) One-order neighbor song node and song list node
Figure BDA00030475225900000313
Together form a total of NmsAlternative song neighbors of
Figure BDA00030475225900000314
Step 2.3: by alternative song neighbors
Figure BDA00030475225900000315
According to song-singing edge
Figure BDA00030475225900000316
Calculating target song list node
Figure BDA00030475225900000317
First order song list neighbor set
Figure BDA00030475225900000319
Singing order node preference
Figure BDA00030475225900000320
Step 2.4: according to singing list node preference
Figure BDA00030475225900000321
The weight randomly selects the singing list node
Figure BDA00030475225900000322
First order song list neighbor number NLNeighbor set of first order song list
Figure BDA00030475225900000323
Step 2.5: neighbor set adopting song list
Figure BDA00030475225900000324
Calculating the singing bill node according to the following formula
Figure BDA00030475225900000325
Node preference of second order song node
Figure BDA00030475225900000326
Then through node preference
Figure BDA00030475225900000327
Randomly selecting 2 x NLA second-order neighbor song
Figure BDA00030475225900000328
Figure BDA00030475225900000329
Step 2.6: integrating song list nodes by
Figure BDA00030475225900000330
Song list of
Figure BDA00030475225900000331
Figure BDA00030475225900000332
Step 2.7: repeating the steps 2.1-2.6 to connect all the singing nodes
Figure BDA00030475225900000333
Song list information containing song neighbor characteristics is sampled.
Further, in step 2.3, the singing node prefers
Figure BDA00030475225900000334
Calculated by the following formula:
Figure BDA0003047522590000041
wherein the content of the first and second substances,
Figure BDA00030475225900000414
for the ith song list LiThe song neighbor and the jth song list LjThe intersection number of the song neighbors;
further, in step 2.6, the song list
Figure BDA0003047522590000042
Has a length of Nms+2*NL;NLRepresenting the number of first order song list neighbors.
Further, the specific process of step 3 is as follows:
step 3.1: word2vec language Model for initializing song characteristics of song listmWord2vec language Model for singer featuressModelmAnd ModelsThe length of the output vector and the iteration times of the model are input parameter values of the model;
step 3.2: inputting the song list information containing the neighboring features of the song as a sentence into the ModelmInputting the singing list information containing the characteristics of the singer neighbors as sentences into the ModelsPerforming the following steps;
step 3.3: word2vec language Model for song characteristicsmWord2vec language Model for singer featuressAfter training, outputting a song feature vectorization representation set vec of the song single node setmSinger feature vectorization representation set vec of singing order nodessThe following formula shows:
Figure BDA0003047522590000043
wherein the content of the first and second substances,
Figure BDA0003047522590000044
for singing a song node
Figure BDA0003047522590000045
Is used for vectorizing the representation of the characteristics of the songs,
Figure BDA0003047522590000046
vectorizing the characteristics of the singers;
step 3.4: singing list node
Figure BDA0003047522590000047
Vectorization of
Figure BDA0003047522590000048
Expressed by the following formula:
Figure BDA0003047522590000049
further, the specific process of step 4 is as follows:
step 4.1: initializing a clustering Model according to the clustering class number n _ clustercluster
Step 4.2: training set data of the singing sheet represented by continuous features
Figure BDA00030475225900000410
Input into ModelclusterPerforming cluster learning to obtain the cluster result of the song list training set
Figure BDA00030475225900000411
Step 4.3: training set label based on song list
Figure BDA00030475225900000412
Clustering results with song list training set
Figure BDA00030475225900000413
And calculating the navigation class weight of each category, wherein the following formula is shown:
Figure BDA0003047522590000051
wherein the content of the first and second substances,
Figure BDA0003047522590000052
is a song list L1The combination of the labels of (1),
Figure BDA0003047522590000053
is a song list L2The combination of the labels of (1),
Figure BDA0003047522590000054
is a song list LnA label combination of wlabelRepresenting the weight of each navigational class in each category, ciRepresents the ith cluster category, there are five categories in total, wiFor the navigation class weights under the ith category,
Figure BDA0003047522590000055
a weight value representing a language navigation class,
Figure BDA0003047522590000056
a weight value representing a genre navigation class,
Figure BDA0003047522590000057
a weight value representing an emotional navigation class,
Figure BDA0003047522590000058
a weight value representing a scene navigation class,
Figure BDA0003047522590000059
a weight value representing a navigation class of the theme,
Figure BDA00030475225900000510
number of sings, n, expressed as a category of speech in the ith categoryiThe total number of the singing lists of the ith category;
step 4.4: test set feature set of song list
Figure BDA00030475225900000511
Input clustering ModelclusterIn the method, the clustering result of the test set is calculated
Figure BDA00030475225900000512
Further, the specific process of step 5 is as follows:
step 5.1: training set according to song list
Figure BDA00030475225900000513
Grouping the sings in the training set by the navigation category to which the label combination belongs to obtain a grouped singing set;
step 5.2: carrying out LSH/MinHash barrel-dividing calculation on the grouped song collection to obtain a language type Hash barrel ByzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgWherein the language category hash bucket B is made according to the menu song set and the menu singer setyzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgTwo kinds of data hash buckets are generated respectively, as shown in the following formula:
Figure BDA00030475225900000514
wherein the content of the first and second substances,
Figure BDA00030475225900000515
representing a navigation-like menu-song hash bucket,
Figure BDA00030475225900000516
representing a song order-singer hash bucket;
step 5.3: from the test set
Figure BDA00030475225900000517
Take out a target song list LrecAnd in the test set of the test set, singing the category
Figure BDA00030475225900000518
Finding out corresponding belonged categories
Figure BDA00030475225900000519
According to the category of belongings
Figure BDA00030475225900000520
Obtaining the weight value of each navigation class
Figure BDA00030475225900000521
Step 5.4: set of songs from a target song list
Figure BDA00030475225900000522
Harmony singer set
Figure BDA00030475225900000523
Respectively mapped to the affiliated categories
Figure BDA00030475225900000524
Medium weight value wiNavigation type song list-song hash bucket not equal to zero
Figure BDA00030475225900000525
And singing list-singer hash bucket
Figure BDA00030475225900000526
Then, a similar singing list set Sim to the target singing list is searchedij
Step 5.5: according to a set Sim of similar target singsijAnd weight value wijUpdating recommendation index for each tag
Figure BDA0003047522590000061
The update formula is as follows:
Figure BDA0003047522590000062
wherein the weight value wijIs composed of
Figure BDA0003047522590000063
Or
Figure BDA0003047522590000064
Is a recommendation index, t, for the ith labeliIs the ith label.
Step 5.6: finally, the label set R to be recommendedTagAccording to the recommended indexes of the labels
Figure BDA0003047522590000065
And sorting from large to small, and selecting the top N labels with the highest recommendation indexes as final recommendation results.
Compared with the prior art, the invention has the following beneficial effects: aiming at the problems that the traditional collaborative filtering algorithm is low in recommendation efficiency and low in recommendation accuracy rate, relevance among singing lists cannot be considered, and the like, the multi-label recommendation method for the heteromorphic composition singing lists with the preference of the fusion nodes is provided, and the singing list label recommendation process mainly comprises the following four steps: firstly, a song list abnormal composition is established by using the heterogeneous data of the song list, the neighbor information of the song list is extracted by using a meta-path song list characteristic sampling method based on the node preference abnormal composition, and the song list information containing the song neighbor characteristics and the song list information containing the singer neighbor characteristics are obtained. Secondly, the song list feature is continuously characterized in the word2vec technology. And then, calculating the clustering category represented by the singing sheet characteristics by using a spectral clustering algorithm, and calculating the weight value of each navigation class label in each category according to the clustered category of the singing sheet. And finally, combining the singing list set characteristics containing the singing list neighbors with the category group characteristics of the singing list set by using a local sensitive Hash technology to complete the final singing list multi-label recommendation task, so that the accuracy of the recommendation method is improved. The invention has the characteristics of simple structure and high recommendation efficiency. Compared with the traditional collaborative filtering method, the singing list label recommendation method has high accuracy and high recommendation speed.
Drawings
FIG. 1 is a diagram of a proposed model architecture.
Fig. 2 is an example of a song list heteromorphic diagram.
Fig. 3 is a schematic diagram of a song neighbor feature sampling method of a song list node.
Fig. 4 is a representation diagram of singing single node feature vectorization.
Fig. 5 is a schematic diagram of a singing single node cluster analysis method.
FIG. 6 is a navigation class ratio of each category in the singing list cluster.
Fig. 7 is a hash bucket plot of a training set song sheet-song set.
Fig. 8 is a schematic diagram of a multi-label recommendation method based on singing sheet clustering results.
FIG. 9 shows the execution time and recommendation accuracy results of the conventional method (TRAD) and the present invention (GRAC).
Fig. 10 shows the results of two data set-scale ablation experiments with different optimization modes of the present invention (GRAC).
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The invention discloses a node preference fused song list heteromorphic graph multi-label recommendation method, which uses captured hundred thousand scale Internet easier cloud music song list data to divide the data into test set song list heteromorphic data L according to the proportion of 1:9testAnd the singing list heterogeneous data L of the training settrain
Firstly, using the training set song list heterogeneous data L of the song list in the network musictrainCreating a song list heteromorphic graph GLExtracting neighbor information of the song list by using a meta-path song list feature sampling method based on node preference heterogeneous composition to acquire song heterogeneous feature information of the song list
Figure BDA0003047522590000071
And singer heterogeneous characteristic information
Figure BDA0003047522590000072
Secondly, adopting a skip-gram algorithm in the word2vec technology to continuously characterize the singing sheet characteristics
Figure BDA0003047522590000073
Then using spectral clustering algorithm to calculate the clustering class of the singing bill feature expressionClass of training set song list clustered
Figure BDA0003047522590000074
Calculating the weight value w of each navigation label in each typelabel
And finally, combining the singing sheet set characteristics containing the singing sheet neighbors with the category group characteristics of the singing sheet set, and using a locality sensitive hashing technology to complete the final singing sheet multi-label recommendation task.
The method specifically comprises the following steps:
step 1: construction of song list abnormal picture G through abnormal data of song list training setLThe specific process is as follows:
the network Yi cloud song list data set comprises a set L ═ L of n song lists1,L2,…,Ln}. Each song in the collection contains 3 types of feature information
Figure BDA0003047522590000081
Within each feature is a set of corresponding feature sets, namely song features
Figure BDA0003047522590000082
Characteristics of singer
Figure BDA0003047522590000083
And user features
Figure BDA0003047522590000084
A certain song list L in the song list setiCan be expressed as FV (L)i). The description information of the song list heteromorphic graph is as follows:
the song list is an undirected graph, denoted GLWhere, in the heteromorphic graph, there are four types of nodes, denoted as V ═ { V, E }, where V ═ V { (V) }L,Vm,Vs,VuAnd each type of node is composed of a plurality of node sets of the type, which are respectively: singing list node
Figure BDA0003047522590000085
Song node
Figure BDA0003047522590000086
Singer node
Figure BDA0003047522590000087
And user node
Figure BDA0003047522590000088
The heterogeneous graph also includes three types of edges, denoted as E ═ Elm,Els,Elu-wherein each type of edge is also formed by a set of geometries of the type of edge, respectively: song list-song edge
Figure BDA0003047522590000089
Singing sheet-singer side
Figure BDA00030475225900000810
And song list-user side
Figure BDA00030475225900000811
And each edge is also fused with preference information among the nodes, and the preference information is used as the weight of the edge.
Step 1.1: given song list node
Figure BDA00030475225900000812
And song node
Figure BDA00030475225900000813
Connecting the song list nodes according to the following formula
Figure BDA00030475225900000815
And song node
Figure BDA00030475225900000816
If the ith song list LiDoes not include song m in the song list of (1)jNode of song list
Figure BDA00030475225900000817
And song node
Figure BDA00030475225900000818
There is no edge between, otherwise, singing node
Figure BDA00030475225900000819
And song node
Figure BDA00030475225900000820
There is an edge in between, i.e. song-song single edge, use
Figure BDA00030475225900000821
It is shown that,
Figure BDA00030475225900000822
is a song mjNumber of occurrences, MnumAs the total number of songs:
Figure BDA00030475225900000823
step 1.2: given song list node
Figure BDA00030475225900000824
And singer node
Figure BDA00030475225900000825
Connecting the song list nodes according to the following formula
Figure BDA00030475225900000826
And singer node
Figure BDA00030475225900000827
If the ith song list LiThe list of singers does not contain singersjThen singing node
Figure BDA00030475225900000828
And singer node
Figure BDA00030475225900000829
There is no edge between them, otherwise, the song list node
Figure BDA00030475225900000830
And singer node
Figure BDA00030475225900000831
There is an edge therebetween, and
Figure BDA00030475225900000832
it is shown that,
Figure BDA00030475225900000833
denoted as singer sjNumber of occurrences, SnumTotal number of singers:
Figure BDA00030475225900000834
step 1.3: given song list node
Figure BDA00030475225900000835
And user node
Figure BDA00030475225900000836
Connecting the song list nodes according to the following formula
Figure BDA00030475225900000837
And user node
Figure BDA00030475225900000838
If the ith song list LiIs not ujNode of song list
Figure BDA00030475225900000839
And user node
Figure BDA00030475225900000840
There is no edge between them, otherwise, the song list node
Figure BDA00030475225900000841
And user node
Figure BDA0003047522590000091
With edges in between and use
Figure BDA0003047522590000092
Representation, wherein the preference balance is represented by 1 for user nodes in a heterogeneous graph, since most users will only create one to three vocalists:
Figure BDA0003047522590000093
step 2: drawing picture G by singingLUsing a song-based meta path RmAnd based on singer's meta path RsCarrying out neighbor sampling of fusion node preference on each song list to obtain song list information containing song neighbor characteristics and song list information containing singer neighbor characteristics;
Figure BDA0003047522590000094
Figure BDA00030475225900000927
the specific process is as follows:
step 2.1: given song list node
Figure BDA0003047522590000095
If singing node
Figure BDA0003047522590000096
Owned first-order neighbor song node
Figure BDA0003047522590000097
Number NmThe number N of first-order song neighbors greater than the set song list nodemsThen use the song node
Figure BDA0003047522590000098
All first order neighbor song nodes of
Figure BDA0003047522590000099
As the relative weight of selection, N is randomly selected according to the relative weightmsOne first-order neighbor song node as alternative song neighbor
Figure BDA00030475225900000910
Step 2.2: if singing node
Figure BDA00030475225900000911
First order neighbor song node of
Figure BDA00030475225900000912
Number NmThe number N of first-order song neighbors smaller than the set song list nodemsThen will song list node
Figure BDA00030475225900000913
First order neighbor song node preference
Figure BDA00030475225900000914
As the relative weight, randomly selected (N) based on the relative weightms-Nm) One-order neighbor song node and song list node
Figure BDA00030475225900000915
Together form a first-order song neighbor set of which the total number is still NmsAlternative song neighbors of
Figure BDA00030475225900000916
Step 2.3: by alternative song neighbors
Figure BDA00030475225900000917
According to song-singing edge in the heteromorphic graph
Figure BDA00030475225900000918
Calculating target song list node
Figure BDA00030475225900000919
First order song list neighbor set
Figure BDA00030475225900000926
Singing order node preference
Figure BDA00030475225900000920
The calculation formula is as follows:
Figure BDA00030475225900000921
wherein the content of the first and second substances,
Figure BDA00030475225900000925
for the ith song list LiThe song neighbor and the jth song list LjThe number of intersections of song neighbors, namely the number of the same songs of the two song lists;
step 2.4: according to singing list node preference
Figure BDA00030475225900000922
The weight randomly selects the singing list node
Figure BDA00030475225900000923
First order song list neighbor number NLNeighbor set of first order song list
Figure BDA00030475225900000924
Step 2.5: neighbor set by song list
Figure BDA0003047522590000101
Calculating the first-order neighbor song node preference of each neighbor song list, namely the song list node
Figure BDA0003047522590000102
Node preference of second order song node
Figure BDA0003047522590000103
Finally, as shown in the following formula, the node preference
Figure BDA0003047522590000104
Randomly selecting 2 x NLA second-order neighbor song
Figure BDA0003047522590000105
Figure BDA0003047522590000106
Step 2.6: finally integrating the song list nodes by the following formula
Figure BDA0003047522590000107
Song list of
Figure BDA0003047522590000108
Wherein the content of the first and second substances,
Figure BDA0003047522590000109
has a length of Nms+2*NL
Figure BDA00030475225900001010
Step 2.7: all singing nodes are connected through the steps 2.1-2.6
Figure BDA00030475225900001011
Song list information containing song neighbor characteristics is sampled.
And step 3: carrying out song list continuous feature representation on song list information containing neighbor features by using word2vec technology; the specific process is as follows:
step 3.1: word2vec language Model for initializing song characteristics of song listmHarmony songHand feature word2vec language ModelsThe two Model modelsmAnd ModelsThe length of the output vector and the iteration times of the model are input parameter values of the model;
step 3.2: inputting the song list information containing the neighboring features of the song as a sentence into the ModelmInputting the singing list information containing the characteristics of the singer neighbors as sentences into the ModelsPerforming the following steps;
step 3.3: word2vec language Model for song characteristicsmWord2vec language Model for singer featuressAfter training, outputting a song feature vectorization representation set vec of the song single node setmSinger feature vectorization representation set vec of singing order nodessThe following formula shows:
Figure BDA00030475225900001012
wherein
Figure BDA00030475225900001013
For singing a song node
Figure BDA00030475225900001014
Is used for vectorizing the representation of the characteristics of the songs,
Figure BDA00030475225900001015
vectorized representation of the singer's features.
Step 3.4: singing list node
Figure BDA00030475225900001016
Vectorization of
Figure BDA00030475225900001017
The expression is shown in the following formula, namely, the characteristics of the songs are taken to be represented vectorially
Figure BDA00030475225900001018
And singer feature vectorized representation
Figure BDA00030475225900001019
Average value of (d):
Figure BDA00030475225900001020
wherein the content of the first and second substances,
Figure BDA00030475225900001021
for a vectorized representation of the characteristics of the song,
Figure BDA00030475225900001022
vectorized representation for singer features.
And 4, step 4: using a spectral clustering algorithm to perform clustering analysis on the continuous feature representation of the song list to obtain a song list clustering result; the specific process is as follows:
step 4.1: initializing a clustering Model according to the clustering class number n _ clustercluster
Step 4.2: training set data of the singing sheet represented by continuous features
Figure BDA0003047522590000111
Input into ModelclusterPerforming cluster learning to obtain the cluster result of the song list training set
Figure BDA0003047522590000112
Step 4.3: training set label based on song list
Figure BDA00030475225900001119
Clustering results with song list training set
Figure BDA0003047522590000113
And calculating the navigation class weight of each category, wherein the following formula is shown:
Figure BDA0003047522590000114
wherein the content of the first and second substances,
Figure BDA0003047522590000115
is a song list L1The combination of the labels of (1),
Figure BDA0003047522590000116
is a song list L2The combination of the labels of (1),
Figure BDA0003047522590000117
is a song list LnA label combination of wlabelRepresenting the weight of each navigational class in each category, ciRepresents the ith cluster category, and has five categories in total, i is 5, wiIs the navigation class weight under the ith category, wiThe parameters in (1) are weighted values of language navigation classes under the ith clustering classes
Figure BDA0003047522590000118
Weight values for style navigation classes
Figure BDA0003047522590000119
Weighted values for emotion navigation classes
Figure BDA00030475225900001110
Weight values for scene navigation classes
Figure BDA00030475225900001111
And weight value of topic navigation class
Figure BDA00030475225900001112
The weights are calculated in the same way for each navigation class, where
Figure BDA00030475225900001113
The calculation method of (a) is as shown in the above formula,
Figure BDA00030475225900001114
number of sings, n, expressed as a category of speech in the ith categoryiTotal number of sings in ith category:
step 4.4: test set feature set of song list
Figure BDA00030475225900001115
Input clustering ModelclusterIn the method, the clustering result of the test set is calculated
Figure BDA00030475225900001116
Step 4.5: output test set song list category
Figure BDA00030475225900001117
And weight values w of navigation classes in each classlabel
And 5: and calculating the weight value of each navigation type label in each type according to the singing list clustering result, and then completing label recommendation of the target singing list by using a local sensitive hash technology. The specific process is as follows:
step 5.1: training set according to song list
Figure BDA00030475225900001118
Grouping the sings in the training set by the navigation category to which the label combination belongs to obtain a grouped singing set;
step 5.2: carrying out LSH/MinHash barrel-dividing calculation on the grouped song collection to obtain a language type Hash barrel ByzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgWherein the language category hash bucket B is made according to the menu song set and the menu singer setyzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgTwo kinds of data hash buckets are generated respectively, as shown in the following formula:
Figure BDA0003047522590000121
step 5.3:from the test set
Figure BDA0003047522590000122
Take out a target song list LrecAnd in the test set of the test set, singing the category
Figure BDA0003047522590000123
Finding out corresponding belonged categories
Figure BDA0003047522590000124
According to the category of belongings
Figure BDA0003047522590000125
Obtaining the weight value of each navigation class
Figure BDA0003047522590000126
Step 5.4: set of songs from a target song list
Figure BDA0003047522590000127
Harmony singer set
Figure BDA0003047522590000128
Respectively mapped to the affiliated categories
Figure BDA0003047522590000129
Medium weight value wiNavigation type song list-song hash bucket not equal to zero
Figure BDA00030475225900001210
And singing list-singer hash bucket
Figure BDA00030475225900001211
Then, a similar singing list set Sim to the target singing list is searchedijIt is represented as follows:
Figure BDA00030475225900001212
step 5.5: according to similarity to the target song listSong collection SimijAnd weight value wijUpdating recommendation index for each tag
Figure BDA00030475225900001213
The update formula is as follows:
Figure BDA00030475225900001214
wherein the weight value wijIs composed of
Figure BDA00030475225900001215
Or
Figure BDA00030475225900001216
Is a recommendation index, t, for the ith labeliIs the ith label.
Step 5.6: finally, the label set R to be recommendedTagAccording to the recommended indexes of the labels
Figure BDA00030475225900001217
Sorting from big to small, selecting the top N label with the highest recommendation index as the final recommendation result, and finally outputting the result shown in the following formula:
Figure BDA00030475225900001218
wherein, N is a parameter, and can be determined according to actual needs. RecTIn order to recommend the result,
Figure BDA00030475225900001219
is a recommendation index for the first tag,
Figure BDA00030475225900001220
is a recommendation index for the second label,
Figure BDA00030475225900001221
is the recommendation index of the Nth label.
Firstly, dividing captured Internet music song list data into test set song lists L according to the proportion of 1:9testAnd training set song list Ltrain. Using LtrainCreating a song list heteromorphic graph GLExtracting neighbor information of the song list by using a meta-path song list feature sampling method based on node preference heterogeneous composition to acquire song heterogeneous feature information of the song list
Figure BDA0003047522590000131
And singer heterogeneous characteristic information
Figure BDA0003047522590000132
Then adopting a skip-gram algorithm in the word2vec technology to continuously characterize the singing sheet characteristics
Figure BDA0003047522590000135
Then using spectral clustering algorithm to calculate the clustering category of the singing sheet feature expression, and through training set the category of the singing sheet to be clustered
Figure BDA0003047522590000133
Calculating the weight value w of each navigation label in each typelabel. And finally, combining the singing sheet set characteristics containing the singing sheet neighbors with the category group characteristics of the singing sheet set, and finishing the final singing sheet multi-label recommendation task by using a local sensitive Hash technology, thereby realizing the precise recommendation of the singing sheet labels.
The invention provides a node-preference-fused singing sheet heteromorphic graph multi-label recommendation model, which is based on the characteristics of sparse singing sheet data, uneven association degree among the singing sheet data and the like, applies the association relation among the multi-dimensional information of the singing sheet data to the recommendation model, expresses the singing sheet information by applying a singing sheet continuous characteristic expression method containing neighbor characteristics based on heteromorphic graph and word2vec technology, analyzes the category clustering characteristics of the singing sheet by using a spectral clustering algorithm, and applies the singing sheet category weight wlabelFinally, combining the singing list set characteristics containing the singing list cluster with the category group characteristics of the singing list set to use the locality sensitive Hash technology to recommend the singing list with multiple labelsAnd (5) performing tasks. The recommendation accuracy of the tag recommendation model provided by the method is greatly improved, and the recommendation efficiency is improved to a certain extent due to the low calculation complexity of the recommendation model. Compared with a recommendation model based on a deep learning algorithm, the method provided by the invention has better compatibility with a recommendation method based on a collaborative filtering algorithm adopted by the current online music platform, and the cost and risk for upgrading the system recommendation algorithm are lower.
A node preference fused heteromorphic graph song list multi-label recommendation system comprises
The following are specific examples.
Example 1
TABLE 1 Experimental data set
Figure BDA0003047522590000134
Referring to table 1, the invention first captures 10 ten thousand song list data from the internet cloud music platform, there are 73 different tags Tag, and there are 16449 different song list Tag combinations LTagDividing the song list data into test set song lists L according to the proportion of 1:9testAnd training set song list Ltrain
Referring to fig. 1, fig. 1 is a model architecture diagram of a singing style heteromorphic graph multi-label recommendation method for fusion node preference provided by the invention, and mainly comprises a heterogeneous graph construction and fusion node preference neighbor sampling method, a singing style node feature representation and cluster analysis method and a singing style clustering result-based multi-label recommendation method.
Referring to fig. 2, which is a song list heterogeneous example, the present invention takes song information, singer information, and user information in the song list data as heterogeneous feature information of the song list and constructs a song list heterogeneous composition based on the information. The network Yi cloud song list data set comprises a set L ═ L of n song lists1,L2,…,Ln}. Each song in the collection contains 3 types of feature information
Figure BDA0003047522590000141
Each one ofWithin the seed features there is also a set of corresponding feature sets, i.e. song features
Figure BDA0003047522590000142
Characteristics of singer
Figure BDA0003047522590000143
And user features
Figure BDA0003047522590000144
A certain song list L in the song list setiCan be expressed as FV (L)i). The definition of the heteromorphic graph is as follows:
the song list is an undirected graph, denoted GLWhere, in the heteromorphic graph, there are four types of nodes, denoted as V ═ { V, E }, where V ═ V { (V) }L,Vm,Vs,VuAnd each type of node is composed of a plurality of node sets of the type, which are respectively: singing list node
Figure BDA0003047522590000145
Song node
Figure BDA0003047522590000146
Singer node
Figure BDA0003047522590000147
And user node
Figure BDA0003047522590000148
The heterogeneous graph also includes three types of edges, denoted as E ═ Elm,Els,Elu-wherein each type of edge is also formed by a set of geometries of the type of edge, respectively: song list-song edge
Figure BDA0003047522590000149
Singing sheet-singer side
Figure BDA00030475225900001410
And song list-user side
Figure BDA00030475225900001411
And each edge is also fused with preference information among the nodes, and the preference information is used as the weight of the edge.
The construction steps of the song list heteromorphic graph are as follows:
step 1: given song list node
Figure BDA00030475225900001412
And song node
Figure BDA00030475225900001413
Connecting two heterogeneous nodes according to the following formula, if song list LiDoes not include m in the song list of (1)jIf the song is not played, no edge exists between the two nodes, otherwise, an edge exists between the two nodes, and the song is played
Figure BDA00030475225900001414
It is shown that,
Figure BDA00030475225900001415
is a song mjNumber of occurrences, MnumAs the total number of songs:
Figure BDA00030475225900001416
step 2: given song list node
Figure BDA00030475225900001417
And singer node
Figure BDA00030475225900001418
Connecting two heterogeneous nodes according to the following formula, if song list LiThe list of singers does not contain singersjThen there is no edge between the two nodes, otherwise there is an edge between the two nodes, and
Figure BDA00030475225900001419
it is shown that,
Figure BDA0003047522590000151
denoted as singer sjNumber of occurrences, SnumTotal number of singers:
Figure BDA0003047522590000152
and step 3: singing order node
Figure BDA0003047522590000153
And user node
Figure BDA0003047522590000154
Connecting two heterogeneous nodes according to the following formula, if song list LiIs not ujIf there is no edge between two nodes, otherwise, there is an edge between two nodes and use
Figure BDA0003047522590000155
Representation, wherein the preference balance is represented by 1 for user nodes in a heterogeneous graph, since most users will only create one to three vocalists:
Figure BDA0003047522590000156
FIG. 3 is a schematic diagram of a song list node song neighbor feature sampling method based on a heteromorphic graph, and referring to FIG. 3, in a song list multi-label recommendation task, the heterogeneous information of a song list is better and more utilized, and the incidence relation between the song lists can be more accurately represented, so that in the song list neighbor node sampling method, the incidence relation between nodes is fully considered through node preference, the sampling of neighbor nodes is more meaningful, and in an Internet cloud song list heteromorphic graph GLIn (1), through a song-based meta-path RmSampling and singer element-based path RsThe definition of the meta path is as follows:
Figure BDA0003047522590000157
Figure BDA0003047522590000158
wherein the meta path R is based on songsmSampling and singer-based meta-path RsThe sampling method is the same, and the method for obtaining the singing sheet information containing the singer neighbor characteristics is the same as the method for obtaining the singing sheet information containing the song neighbor characteristics. The method receives two input parameters, namely the first-order song neighbor number N of a song list nodemsAnd the number of neighbors of the first order song list NLThe method comprises the following specific steps:
step 1: selecting a song list node
Figure BDA0003047522590000159
If the current song list node has a first-order neighbor song node
Figure BDA00030475225900001510
Number NmGreater than NmsPreference of all first-order neighbor song nodes using the current song list node
Figure BDA00030475225900001511
As the selected relative weight value, N is randomly selected according to the weightmsOne first-order neighbor song node as alternative song neighbor
Figure BDA00030475225900001512
Step 2: if singing node
Figure BDA00030475225900001513
First order neighbor song node of
Figure BDA00030475225900001514
Number NmLess than NmsThen its first order neighbor song node is preferred
Figure BDA00030475225900001515
As the relative weight value, randomly selecting (N) according to the weightms-Nm) The first-order neighbor song nodes and the current existing song node neighbor set form a total number of NmsAlternative song neighbors of
Figure BDA0003047522590000161
And step 3: by alternative song neighbors
Figure BDA0003047522590000162
According to song-singing edge in the heteromorphic graph
Figure BDA0003047522590000163
Calculating target song list node
Figure BDA0003047522590000164
First order song list neighbor set
Figure BDA0003047522590000165
The calculation formula of the singing order node preference is shown as follows,
Figure BDA00030475225900001622
is LiSong neighbors and LjThe number of intersections of the songs z, namely the number of the same songs of the two song lists;
Figure BDA0003047522590000166
and 4, step 4: according to singing list node preference
Figure BDA0003047522590000167
The weight randomly selects the singing list node
Figure BDA0003047522590000168
N of (A)LNeighbor set of first order song list
Figure BDA0003047522590000169
And 5: neighbor set by song list
Figure BDA00030475225900001610
Calculating the first-order neighbor song node preference of each neighbor song list, namely the song list node
Figure BDA00030475225900001611
The node preference of the second-order song node of (2), as shown in the following formula, is finally passed through the node preference
Figure BDA00030475225900001612
Randomly selecting 2 x NLA second-order neighbor song
Figure BDA00030475225900001613
Figure BDA00030475225900001614
Step 6: finally integrating song list nodes
Figure BDA00030475225900001615
Song list of
Figure BDA00030475225900001616
Is shown in the following formula, wherein
Figure BDA00030475225900001617
Has a length of Nms+2*NL
Figure BDA00030475225900001618
And 7: all song list nodes are connected through the steps
Figure BDA00030475225900001619
A list of songs is sampled that contains the characteristics of the song's neighbors.
Fig. 4 is a schematic diagram of vectorization representation of singing order node characteristics, and referring to fig. 4, in order to further depict singing order data from the aspect of singing order node characteristics, the invention hopes to perform cluster analysis on the singing order nodes containing neighbor information to obtain five types of singing order data groups, but because of the discreteness of the singing order characteristics, the invention needs to perform continuous characteristic representation on the singing order characteristics. Because the song list is different in composition GLThe degrees of each node in the singing list have power law distribution characteristics, the node preference conforms to the distribution characteristics, so the singing list characteristics of neighbor sampling based on the node preference also follow the power law, the distribution is similar to the frequency distribution characteristics of text words in a word2vec model, and the embedded representation of the singing list nodes is learned by using a skip-gram model in the word2vec, namely, the song list containing neighbors subjected to wandering sampling in the singing list nodes
Figure BDA00030475225900001620
And singer list
Figure BDA00030475225900001621
When a sentence in the word2vec model is considered, song m in the listiAnd singer siInputting the words in the word2vec model into the skip-gram model so as to learn the vector representation of the singing nodes. The method needs to receive two parameters, namely the output vector length size and the model iteration times iter _ num, and comprises the following specific steps:
step 1: word2vec language Model for initializing song characteristics of song listmWord2vec language Model for singer featuressThe length of the output vector and the iteration times of the model are input parameter values;
step 2: inputting the song feature set of the song list node as a sentence into the ModelmInputting the singer feature set of the singer node as a sentence into the ModelsPerforming the following steps;
and step 3: outputting a song feature vectorization representation set vec of a song single node set after the two models are respectively trainedmSinger feature vectorization representation set vec of singing order nodessAs shown in the following formula, wherein
Figure BDA0003047522590000171
For singing a song node
Figure BDA0003047522590000172
Is used for vectorizing the representation of the characteristics of the songs,
Figure BDA0003047522590000173
vectorized representation of singer's features:
Figure BDA0003047522590000174
and 4, step 4: singing list node
Figure BDA0003047522590000175
The vectorization expression of the method is shown in the following formula, and the characteristics of the song are taken to be vectorized and expressed
Figure BDA0003047522590000176
And singer feature vectorized representation
Figure BDA0003047522590000177
Average value of (d):
Figure BDA0003047522590000178
TABLE 2 Internet Yiyun music song list tag categories
Figure BDA0003047522590000179
The song list multi-tag recommendation task can be regarded as a multi-classification task, and in the tag types of the online yi cloud song list data, as shown in the table 2, five tag navigation classes are provided, and each navigation class also has multiple emotion tag classes.
Fig. 5 is a schematic diagram of singing bill feature cluster analysis, and fig. 6 is a navigation class ratio of each category in the singing bill cluster. Referring to fig. 5 and 6, clustering analysis, which is a data mining tool in the field of machine learning, can be used for data classification, so that similar data can be clustered into the same class, different data can be classified into different classes, and implicit associations between classes can be found by using the difference of classes. The invention is used for clustering analysis of song list data, but the clustering analysis algorithm of the invention does not end the song list after being classified into a certain class, but calculates the weight value of each navigation class label in each class according to the class of the clustered song list so as to be recommended subsequently. This is because, if the label combination of a certain song list is [ "lonely", "hurt", "thoughts" ], then its navigation class should be [ "emotion" ] class, the song list is classified into "emotion" clustering class is suitable, and when the label combination of a certain song list is [ "rock", "europe", "america", "excitement" ], its navigation class should be [ "style", "language", "emotion" ] class, and the song list can only be classified into one clustering class, but any classification is unsuitable, so the invention only clusters the song list training set features through clustering algorithm, after calculating the navigation class weight of each class, then uses the original clustering model to perform cluster analysis on the song list test set data, the parameter that the clustering model needs to input is the clustering class number n _ cluster, its song list feature clustering steps are as follows:
step 1: initializing a clustering Model according to the clustering class number n _ clustercluster;;
Step 2: singing sheet training set data
Figure BDA0003047522590000188
Input into ModelclusterAnd (5) performing cluster learning.
And step 3: training set label based on song list
Figure BDA0003047522590000189
Clustering results with song list training set
Figure BDA0003047522590000181
Calculate the navigation class weight for each category, as shown in the following equation, wlabelRepresenting the weight of each navigational class in each category, ciRepresents the ith cluster category, there are five categories in total, wiIs the navigation class weight under the ith category, wiThe parameters in the (i) th clustering category are respectively the weight values of the language navigation category, the style navigation category, the emotion navigation category, the scene navigation category and the theme navigation category under the ith clustering category, and the weight calculation mode of each navigation category is the same, wherein
Figure BDA0003047522590000182
The calculation method of (a) is shown in the following formula,
Figure BDA0003047522590000183
number of sings, n, expressed as a category of speech in the ith categoryiTotal number of sings in ith category:
Figure BDA0003047522590000184
and 4, step 4: test set feature set of song list
Figure BDA0003047522590000185
Input clustering ModelclusterIn the method, the clustering result of the test set is calculated
Figure BDA0003047522590000186
And 5: output test set song list category
Figure BDA0003047522590000187
And weight values w of navigation classes in each classlabel
FIG. 7 is a schematic diagram of Hash bucket division of a song list-song set in a training set, and FIG. 8 is a schematic diagram of a multi-tag recommendation method based on a song list clustering result, wherein five navigation class tags [ "language", "scene", "theme", "style", "emotion ] of a song list are adopted in the invention"]Starting, dividing the singing sheet of the training set into five categories according to navigation categories, calculating the singing sheet hash barrel of each navigation category by using an LSH/MinHash algorithm, and finally completing a label recommendation task by combining singing sheet clustering analysis results. The method receives song list training set data containing neighbor information
Figure BDA0003047522590000191
And test set data
Figure BDA0003047522590000192
And clustering results of test set data
Figure BDA0003047522590000193
And navigation class weight wlabelFour parameters, the algorithm steps are as follows:
step 1: training set according to song list
Figure BDA0003047522590000194
Grouping the singing lists in the training set according to the navigation category to which the label combination belongs;
step 2: using the grouping song list set to carry out LSH/MinHash barrel-dividing calculation to obtain ByzLanguage type hash bucket, BztTopic hash bucket, BcjScene type hash bucket, BfgStyle class hash bucket and BqgAnd the emotion type hash bucket, wherein two data hash buckets are respectively generated according to each navigation type of the song list song set and the song list singer set, and the formula is as follows: :
Figure BDA0003047522590000195
and step 3: from the test set
Figure BDA0003047522590000196
Take out a target song list LrecAnd clustering results in the test set
Figure BDA0003047522590000197
Find out its corresponding category
Figure BDA0003047522590000198
Obtaining the weight value of each navigation class according to the class
Figure BDA0003047522590000199
And 4, step 4: set of songs from a target song list
Figure BDA00030475225900001910
Harmony singer set
Figure BDA00030475225900001911
Respectively mapped to the affiliated categories
Figure BDA00030475225900001912
Middle weight
Figure BDA00030475225900001913
Navigation type song list-song hash bucket not equal to zero
Figure BDA00030475225900001914
And singing list-singer hash bucket
Figure BDA00030475225900001915
In the method, a song list set Sim similar to the target song list is retrieved from each hash bucketijIt is represented as follows:
Figure BDA00030475225900001916
and 5: according to the target songSingle retrieved similar song list set SimijAnd a weight wijUpdating recommendation index for each tag
Figure BDA00030475225900001917
The update formula is as follows:
Figure BDA00030475225900001918
step 6: finally, the recommended label set R is takenTagAccording to the recommended indexes of the labels
Figure BDA00030475225900001919
Sorting from big to small, selecting the top N label with the highest recommendation index as the final recommendation result, and finally outputting the result shown in the following formula:
Figure BDA0003047522590000201
the invention has the advantages that:
compared with the traditional collaborative filtering recommendation method, the algorithm has the characteristics of high recommendation accuracy and high recommendation speed, and the method is simple and effective.
Fig. 9 shows a comparison between a singing style heteromorphic graph multi-label recommendation method (GRAC) and a conventional collaborative filtering recommendation method (TRAD) for fusing node preferences in the present invention in terms of recommendation accuracy and recommendation efficiency. The effect evaluation of the recommendation method is carried out by the accuracy rate RecpAnd recall rate RecRAn F1 value was calculated using the F1 value as an evaluation index of the recommendation, the definition of which is as follows:
the precision ratio is as follows:
Figure BDA0003047522590000202
wherein R isTIndicates that the correct number of tags, N, is recommendedRIndicating the number of recommended tags. The higher the accuracy, the more the representation is pushedThe greater the proportion of recommended total tags.
The recall ratio is as follows:
Figure BDA0003047522590000203
wherein N isTThe larger the proportion of the recommended correct label number to the actual correct total label number is, the more the actual correct label number is represented.
F1 value:
Figure BDA0003047522590000204
the F1 value serves as a weighted harmony of the precision rate and the recall rate, the influence of the precision rate and the recall rate on the model accuracy evaluation is comprehensively considered, and the higher the F1 value is, the more stable the overall accuracy of the recommended model is, and vice versa.
Table 3 description of the experimental methods
Figure BDA0003047522590000205
Table 4 description of ablation Experimental methods
Figure BDA0003047522590000211
Table 3 and fig. 9 show the comparison results of the present invention and the traditional collaborative filtering method on the real song list data under different data set scales for the multi-label recommendation of the song list. The comparison result shows that the invention can excellently improve the effect of the singing bill label recommendation task under different singing bill data scales, and has better recommendation accuracy and recommendation real-time performance. Table 4 and fig. 10 show that the recommendation effectiveness of each optimization method provided by the present invention is verified under different data scales, and the result shows that the improvement method of the present invention can better improve the recommendation effect.
The invention deeply researches the application of heterogeneous graph information in a recommendation scene, and provides a heterogeneous graph multi-label recommendation method based on node preference, the method improves the expression capability of a target song list node through a song list heterogeneous information network, and enhances the relevance of each type of song list data by using an information aggregation technology, and the method mainly works in the following three aspects: 1) in order to fully utilize the heterogeneous relation of the song list data, the method uses the relation among the song list, the song and the singer to construct the song list heterogeneous composition, and in the process of constructing the composition, the edge weight information is established by calculating the preference degrees among different nodes, and the multi-dimensional preference information of each pair of nodes is fully utilized. Aiming at various types of characteristics of the target song list, the method provides a second-order neighbor random sampling method based on node preference, and the node characteristics of the target song list are constructed by fully utilizing the relationship between the neighbor song list and the target song list. 2) In order to improve the accuracy of calculating a similar song list set, the method provides that clustering analysis is carried out on song data based on neighbor feature representation and a spectral clustering algorithm, the song list is divided into five navigation classes, the accuracy of a clustering result of each class is obtained, and the clustering result is used as a class weight and applied to downstream recommendation. 3) In order to optimize the recommendation efficiency, the method further integrates a collaborative filtering algorithm based on locality sensitive hashing to perform a final label recommendation task on the basis of constructing singing single node characteristics by using a heterogeneous graph and performing clustering analysis on the nodes by using a spectral clustering algorithm, so that the recommendation accuracy of the algorithm is improved, and the recommendation efficiency is greatly improved. The experiment result based on the real song list data of the internet music cloud shows that the recommendation effect of the invention is greatly improved compared with the traditional collaborative filtering method, and the average operation time is greatly improved compared with the traditional collaborative filtering method.

Claims (9)

1. A node preference fused heteromorphic graph song list multi-label recommendation method is characterized by comprising the following steps:
step 1: constructing a song list abnormal composition graph through the abnormal data of the song list training set;
step 2: performing neighbor sampling of fusion node preference on each song through a song list dissimilarity graph by adopting a song element-based path and a singer element-based path to obtain song list information containing song neighbor characteristics and song list information containing singer neighbor characteristics;
and step 3: carrying out continuous feature representation on the singing list by using word2vec technology, wherein the singing list information containing the song neighbor features and the singing list information containing the singer neighbor features;
and 4, step 4: carrying out clustering analysis on the continuous feature representation of the song list by adopting a spectral clustering algorithm to obtain a song list clustering result;
and 5: and calculating the weight value of each navigation type label in each type according to the singing list clustering result, and then completing label recommendation of the target singing list by using a local sensitive hash technology.
2. The method for recommending different composition song list with fused node preference according to claim 1, wherein the specific process of step 1 is as follows:
step 1.1: connecting the song list nodes according to the following formula
Figure FDA0003047522580000011
And song node
Figure FDA0003047522580000012
If the ith song list LiDoes not include song m in the song list of (1)jNode of song list
Figure FDA0003047522580000013
And song node
Figure FDA0003047522580000014
There is no edge between, otherwise, singing node
Figure FDA0003047522580000015
And song node
Figure FDA0003047522580000016
There is a song-singing margin in between
Figure FDA0003047522580000017
Step 1.2: connecting the song list nodes according to the following formula
Figure FDA0003047522580000018
And singer node
Figure FDA0003047522580000019
If the ith song list LiThe list of singers does not contain singersjThen singing node
Figure FDA00030475225800000110
And singer node
Figure FDA00030475225800000111
There is no edge between them, otherwise, the song list node
Figure FDA00030475225800000112
And singer node
Figure FDA00030475225800000113
Between the singer and the singing edge
Figure FDA00030475225800000114
Step 1.3: connecting the song list nodes according to the following formula
Figure FDA00030475225800000115
And user node
Figure FDA00030475225800000116
If the ith song list LiIs not ujNode of song list
Figure FDA00030475225800000117
And user sectionDot
Figure FDA00030475225800000118
There is no edge between them, otherwise, the song list node
Figure FDA00030475225800000119
And user node
Figure FDA00030475225800000120
There is a user-song list edge in between
Figure FDA00030475225800000121
3. The method as claimed in claim 2, wherein in step 1.1, the song-song list side
Figure FDA00030475225800000122
As shown in the following formula:
Figure FDA0003047522580000021
wherein the content of the first and second substances,
Figure FDA0003047522580000022
is a song mjNumber of occurrences, MnumIs the total number of songs, GLIn order to form a different picture of the song,
Figure FDA0003047522580000023
is a song feature;
step 1.2, singer-singing side
Figure FDA0003047522580000024
As shown in the following formula:
Figure FDA0003047522580000025
wherein the content of the first and second substances,
Figure FDA0003047522580000026
denoted as singer sjNumber of occurrences, SnumThe total number of the singers is,
Figure FDA0003047522580000027
is a characteristic of the singer;
step 1.3, user-singing list side
Figure FDA0003047522580000028
As shown in the following formula:
Figure FDA0003047522580000029
wherein the content of the first and second substances,
Figure FDA00030475225800000210
is a user characteristic.
4. The method for recommending different composition song list with fused node preference according to claim 2, wherein the specific process of step 2 is as follows:
step 2.1: if singing node
Figure FDA00030475225800000211
Number of owned first-order neighbor song nodes NmThe number N of first-order song neighbors greater than the set song list nodemsThen use the song node
Figure FDA00030475225800000212
All first order neighbor song nodes of
Figure FDA00030475225800000213
As the relative weight of selection, N is randomly selected according to the relative weightmsOne first-order neighbor song node as alternative song neighbor
Figure FDA00030475225800000214
Step 2.2: if singing node
Figure FDA00030475225800000215
First-order neighbor song node number NmThe number N of first-order song neighbors smaller than the set song list nodemsThen will song list node
Figure FDA00030475225800000216
First order neighbor song node preference
Figure FDA00030475225800000217
As the relative weight, randomly selected (N) based on the relative weightms-Nm) One-order neighbor song node and song list node
Figure FDA00030475225800000218
Together form a total of NmsAlternative song neighbors of
Figure FDA00030475225800000219
Step 2.3: by alternative song neighbors
Figure FDA00030475225800000220
According to song-singing edge
Figure FDA00030475225800000221
Calculating target song list node
Figure FDA00030475225800000222
First order song list neighbor set
Figure FDA00030475225800000223
Singing order node preference
Figure FDA00030475225800000224
Step 2.4: according to singing list node preference
Figure FDA00030475225800000225
The weight randomly selects the singing list node
Figure FDA00030475225800000226
First order song list neighbor number NLNeighbor set of first order song list
Figure FDA00030475225800000227
Step 2.5: neighbor set adopting song list
Figure FDA0003047522580000031
Calculating the singing bill node according to the following formula
Figure FDA0003047522580000032
Node preference of second order song node
Figure FDA0003047522580000033
Then through node preference
Figure FDA0003047522580000034
Randomly selecting 2 x NLA second-order neighbor song
Figure FDA0003047522580000035
Figure FDA0003047522580000036
Step 2.6: integrating song list nodes by
Figure FDA0003047522580000037
Song list of
Figure FDA0003047522580000038
Figure FDA0003047522580000039
Step 2.7: repeating the steps 2.1-2.6 to connect all the singing nodes
Figure FDA00030475225800000310
Song list information containing song neighbor characteristics is sampled.
5. The method as claimed in claim 4, wherein in step 2.3, the singing menu node preference is given
Figure FDA00030475225800000311
Calculated by the following formula:
Figure FDA00030475225800000312
wherein the content of the first and second substances,
Figure FDA00030475225800000313
for the ith song list LiThe song neighbor and the jth song list LjThe number of intersections of song neighbors.
6. The method as claimed in claim 5, wherein in step 2.6, the song list is divided into different groups, and the different groups are selected according to the different groups
Figure FDA00030475225800000314
Has a length of Nms+2*NL;NLRepresenting the number of first order song list neighbors.
7. The method for recommending different composition song list with fused node preferences as claimed in claim 1, wherein the specific process of step 3 is as follows:
step 3.1: word2vec language Model for initializing song characteristics of song listmWord2vec language Model for singer featuressModelmAnd ModelsThe length of the output vector and the iteration times of the model are input parameter values of the model;
step 3.2: inputting the song list information containing the neighboring features of the song as a sentence into the ModelmInputting the singing list information containing the characteristics of the singer neighbors as sentences into the ModelsPerforming the following steps;
step 3.3: word2vec language Model for song characteristicsmWord2vec language Model for singer featuressAfter training, outputting a song feature vectorization representation set vec of the song single node setmSinger feature vectorization representation set vec of singing order nodessThe following formula shows:
Figure FDA00030475225800000315
wherein the content of the first and second substances,
Figure FDA0003047522580000041
for singing a song node
Figure FDA0003047522580000042
Is used for vectorizing the representation of the characteristics of the songs,
Figure FDA0003047522580000043
vectorizing the characteristics of the singers;
step 3.4: singing list node
Figure FDA0003047522580000044
Vectorization of
Figure FDA0003047522580000045
Expressed by the following formula:
Figure FDA0003047522580000046
8. the method for recommending different composition song list with fused node preferences as claimed in claim 1, wherein the specific process of step 4 is as follows:
step 4.1: initializing a clustering Model according to the clustering class number n _ clustercluster
Step 4.2: training set data of the singing sheet represented by continuous features
Figure FDA0003047522580000047
Input into ModelclusterPerforming cluster learning to obtain the cluster result of the song list training set
Figure FDA0003047522580000048
Step 4.3: training set label based on song list
Figure FDA0003047522580000049
Clustering results with song list training set
Figure FDA00030475225800000410
And calculating the navigation class weight of each category, wherein the following formula is shown:
Figure FDA00030475225800000411
wherein the content of the first and second substances,
Figure FDA00030475225800000412
is a song list L1The combination of the labels of (1),
Figure FDA00030475225800000413
is a song list L2The combination of the labels of (1),
Figure FDA00030475225800000414
is a song list LnA label combination of wlabelRepresenting the weight of each navigational class in each category, ciRepresents the ith cluster category, there are five categories in total, wiFor the navigation class weights under the ith category,
Figure FDA00030475225800000415
a weight value representing a language navigation class,
Figure FDA00030475225800000416
a weight value representing a genre navigation class,
Figure FDA00030475225800000417
a weight value representing an emotional navigation class,
Figure FDA00030475225800000418
a weight value representing a scene navigation class,
Figure FDA00030475225800000419
a weight value representing a navigation class of the theme,
Figure FDA00030475225800000420
number of sings, n, expressed as a category of speech in the ith categoryiThe total number of the singing lists of the ith category;
step 4.4: test set feature set of song list
Figure FDA00030475225800000421
Input clustering ModelclusterIn the method, the clustering result of the test set is calculated
Figure FDA00030475225800000422
9. The method for recommending different composition song list with fused node preferences as claimed in claim 1, wherein the specific process of step 5 is as follows:
step 5.1: training set according to song list
Figure FDA00030475225800000423
Grouping the sings in the training set by the navigation category to which the label combination belongs to obtain a grouped singing set;
step 5.2: carrying out LSH/MinHash barrel-dividing calculation on the grouped song collection to obtain a language type Hash barrel ByzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgWherein the language category hash bucket B is made according to the menu song set and the menu singer setyzTheme type hash bucket BztScene type Hash barrel BcjStyle hash bucket BfgAnd sentiment hash bucket BqgTwo kinds of data hash buckets are generated respectively, as shown in the following formula:
Figure FDA0003047522580000051
wherein the content of the first and second substances,
Figure FDA0003047522580000052
representing a navigation-like menu-song hash bucket,
Figure FDA0003047522580000053
representing a song order-singer hash bucket;
step 5.3: from the test set
Figure FDA0003047522580000054
Take out a target song list LrecAnd in the test set of the test set, singing the category
Figure FDA0003047522580000055
Finding out corresponding belonged categories
Figure FDA0003047522580000056
According to the category of belongings
Figure FDA0003047522580000057
Obtaining the weight value of each navigation class
Figure FDA0003047522580000058
Step 5.4: set of songs from a target song list
Figure FDA0003047522580000059
Harmony singer set
Figure FDA00030475225800000510
Respectively mapped to the affiliated categories
Figure FDA00030475225800000511
Medium weight value wiNavigation type song list-song hash bucket not equal to zero
Figure FDA00030475225800000512
And singing list-singer hash bucket
Figure FDA00030475225800000513
Then, a similar singing list set Sim to the target singing list is searchedij
Step 5.5: based on a song sheet similar to the target song sheetSet SimijAnd weight value wijUpdating recommendation index for each tag
Figure FDA00030475225800000514
The update formula is as follows:
Figure FDA00030475225800000515
wherein the weight value wijIs composed of
Figure FDA00030475225800000516
Or
Figure FDA00030475225800000517
Is a recommendation index, t, for the ith labeliIs the ith label;
step 5.6: finally, the label set R to be recommendedTagAccording to the recommended indexes of the labels
Figure FDA00030475225800000518
And sorting from large to small, and selecting the top N labels with the highest recommendation indexes as final recommendation results.
CN202110477214.8A 2021-04-29 2021-04-29 Heterogeneous picture singing list multi-label recommendation method fusing node preference Active CN113268629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110477214.8A CN113268629B (en) 2021-04-29 2021-04-29 Heterogeneous picture singing list multi-label recommendation method fusing node preference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110477214.8A CN113268629B (en) 2021-04-29 2021-04-29 Heterogeneous picture singing list multi-label recommendation method fusing node preference

Publications (2)

Publication Number Publication Date
CN113268629A true CN113268629A (en) 2021-08-17
CN113268629B CN113268629B (en) 2023-01-03

Family

ID=77230020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110477214.8A Active CN113268629B (en) 2021-04-29 2021-04-29 Heterogeneous picture singing list multi-label recommendation method fusing node preference

Country Status (1)

Country Link
CN (1) CN113268629B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 Collaborative filtering method based on socialized label
JP2016139229A (en) * 2015-01-27 2016-08-04 日本放送協会 Device and program for generating personal profile, and content recommendation device
CN106354862A (en) * 2016-09-06 2017-01-25 山东大学 Multidimensional individualized recommendation method in heterogeneous network
CN108021568A (en) * 2016-10-31 2018-05-11 北京酷我科技有限公司 One kind song is single to recommend method and device
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110083767A (en) * 2019-04-28 2019-08-02 广东工业大学 A kind of point of interest recommended method and relevant apparatus based on first path

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 Collaborative filtering method based on socialized label
JP2016139229A (en) * 2015-01-27 2016-08-04 日本放送協会 Device and program for generating personal profile, and content recommendation device
CN106354862A (en) * 2016-09-06 2017-01-25 山东大学 Multidimensional individualized recommendation method in heterogeneous network
CN108021568A (en) * 2016-10-31 2018-05-11 北京酷我科技有限公司 One kind song is single to recommend method and device
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110083767A (en) * 2019-04-28 2019-08-02 广东工业大学 A kind of point of interest recommended method and relevant apparatus based on first path

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EUGENIA KOBLENTS ET AL.: ""Evidence recommendation in forensics based on cyclic meta-paths in heterogeneous information networks"", 《8TH INTERNATIONAL CONFERENCE ON IMAGING FOR CRIME DETECTION AND PREVENTION》 *
胡斌斌: ""基于异质信息网络表示学习的推荐算法研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN113268629B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN106815252B (en) Searching method and device
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
Tang et al. Multi-label patent categorization with non-local attention-based graph convolutional network
CN110413780A (en) Text emotion analysis method, device, storage medium and electronic equipment
CN113505204B (en) Recall model training method, search recall device and computer equipment
CN104731954A (en) Music recommendation method and system based on group perspective
CN110929161A (en) Large-scale user-oriented personalized teaching resource recommendation method
CN110110225B (en) Online education recommendation model based on user behavior data analysis and construction method
CN103646070A (en) Data processing method and device for search engine
CN109063147A (en) Online course forum content recommendation method and system based on text similarity
CN106708953A (en) Discrete particle swarm optimization based local community detection collaborative filtering recommendation method
CN111914162B (en) Method for guiding personalized learning scheme based on knowledge graph
CN108388914A (en) A kind of grader construction method, grader based on semantic computation
CN110516074A (en) Website theme classification method and device based on deep learning
CN114117213A (en) Recommendation model training and recommendation method, device, medium and equipment
JP2020512651A (en) Search method, device, and non-transitory computer-readable storage medium
CN111523055A (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN110046713A (en) Robustness sequence learning method and its application based on multi-objective particle swarm optimization
CN108875034A (en) A kind of Chinese Text Categorization based on stratification shot and long term memory network
CN112434164A (en) Network public opinion analysis method and system considering topic discovery and emotion analysis
Yao et al. Online deception detection refueled by real world data collection
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN108491477B (en) Neural network recommendation method based on multi-dimensional cloud and user dynamic interest
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN109344319B (en) Online content popularity prediction method based on ensemble learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant