WO2015081915A1 - 文件推荐方法和装置 - Google Patents
文件推荐方法和装置 Download PDFInfo
- Publication number
- WO2015081915A1 WO2015081915A1 PCT/CN2015/072275 CN2015072275W WO2015081915A1 WO 2015081915 A1 WO2015081915 A1 WO 2015081915A1 CN 2015072275 W CN2015072275 W CN 2015072275W WO 2015081915 A1 WO2015081915 A1 WO 2015081915A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- identifier
- file
- user
- matrix
- user identifier
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4826—End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/251—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/252—Processing of multiple end-users' preferences to derive collaborative data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
- H04N21/25891—Management of end-user data being end-user preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42212—Specific keyboard arrangements
- H04N21/42218—Specific keyboard arrangements for mapping a matrix of displayed objects on the screen to the numerical key-matrix of the remote control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44222—Analytics of user selections, e.g. selection of programs or purchase activity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/4508—Management of client data or end-user data
- H04N21/4532—Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences
Definitions
- the present invention relates to the field of network technologies, and in particular, to a file recommendation method and apparatus.
- the server may recommend information that may be of interest to the user according to the user's browsing history, interests, and the like.
- the terminal recommends the most clicked video when other users watch the video to the current user, that is, recommending the video for the current user by comparing the preferences of other users with the preferences of the current user.
- this method does not take into account the fact that the current user has different preferences from other users. Therefore, when the current user has different preferences from other users, the recommendation success rate is very low.
- an embodiment of the present invention provides a file recommendation method and apparatus.
- the technical solution is as follows:
- An embodiment of the present invention provides a file recommendation method, where the method includes:
- the user identifier is used as the first dimension of the matrix, and the file identifier is used as the second dimension of the matrix to construct a two-dimensional matrix;
- File recommendation is performed based on the at least one user group.
- Another embodiment of the present invention provides a file recommendation apparatus, where the apparatus includes:
- a matrix construction module configured to construct a two-dimensional matrix by using a user identifier as a first dimension of the matrix and a file identifier as a second dimension of the matrix according to the user identifier and the file identifier included in the history play record;
- a filling module configured to fill an element with an element position corresponding to the corresponding relationship in the two-dimensional matrix according to a correspondence between a user identifier and a file identifier in the historical play record;
- a matrix decomposition module configured to perform matrix decomposition on the filled two-dimensional matrix to obtain a specified matrix
- a vector dividing module configured to divide the specified matrix according to the first dimension, to obtain a feature vector corresponding to each user identifier
- a clustering module configured to perform clustering processing on each user identifier based on the feature vector corresponding to each user identifier, to obtain at least one user group, each user group including at least one user identifier;
- a recommendation module configured to perform file recommendation based on the at least one user group.
- the method and apparatus provided by the embodiments of the present invention may be included according to historical play records.
- the correspondence between the user identifier and the file identifier is obtained by the user group, and the user identifiers with similar preferences can be divided into the same user group.
- the recommendation can be made based on the user group to which the current user identification belongs, without recommending based on all the user identifiers. Since the preference of the current user identification and other user identifications is taken into consideration, the recommendation efficiency and the recommendation success rate are improved.
- FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention
- FIG. 2 is a flowchart of a file recommendation method according to another embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
- FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention.
- the execution body of the embodiment of the present invention is a server. Referring to FIG. 1, the method includes:
- the user identifier and the file identifier are included in the user identifier.
- the file identifier is used as the second dimension of the matrix to construct a two-dimensional matrix.
- the specified matrix is divided according to the first dimension, and a feature vector corresponding to each user identifier is obtained.
- the user group can be obtained according to the correspondence between the user identifier and the file identifier included in the historical play record, and the user identifiers with similar preferences can be divided into the same user group.
- the recommendation can be made based on the user group to which the current user identification belongs, without recommending based on all the user identifiers. Since the preference of the current user identification and other user identifications is taken into consideration, the recommendation efficiency and the recommendation success rate are improved.
- performing file recommendation includes:
- the user identifier is used as the first dimension of the matrix
- the file identifier is used as the second dimension of the matrix
- the two-dimensional matrix is constructed to include:
- the user identifier is obtained as a sample user identifier
- the sample user identifier is used as the first dimension of the matrix
- the file identifier is used as the second dimension of the matrix to construct the two-dimensional matrix.
- the element position filling elements corresponding to the corresponding relationship in the two-dimensional matrix include:
- a sample user identifier and a file identifier when the correspondence between the sample user identifier and the file identifier is saved in the history play record, the user identifier is used as the first dimension of the matrix, and the file identifier is used as the matrix.
- a second dimension in the two-dimensional matrix, filling a first preset threshold with an element position corresponding to the sample user identifier and the file identifier;
- the number of remaining element positions of the two-dimensional matrix is randomly selected from the same as the position of the element that has been filled with the first preset threshold.
- the position of the element fills the selected element position with a second preset threshold.
- performing matrix decomposition on the filled two-dimensional matrix to obtain the specified matrix includes:
- This U matrix is taken as the specified matrix.
- the singular value decomposition SVD is performed on the two-dimensional matrix by using a stochastic gradient descent algorithm SGD according to the first weight and the second weight, and after obtaining the U matrix, the method further includes:
- the second dimension of the U matrix is dimension-reduced according to the preset retention dimension, and the dimension-reduced U matrix is used as the specified matrix.
- determining, according to the correspondence between the user identifier and the file identifier, and the specified user group, that the file identifier to be recommended includes:
- a predetermined number of file identifiers are determined in descending order of the determined number.
- determining, according to the correspondence between the user identifier and the file identifier, and the specified user group, that the file identifier to be recommended includes:
- the file identifier corresponding to the user identifier with the highest similarity is determined according to the correspondence between the user identifier and the file identifier.
- the first preset threshold is 1, the second preset threshold is 0, and the first weight is greater than the second weight.
- FIG. 2 is a flowchart of a file recommendation method according to an embodiment of the present invention.
- the execution body of the embodiment of the present invention is a server. Referring to FIG. 2, the method includes:
- the server obtains the number of file identifiers corresponding to the user identifier, where the history play record includes a correspondence between the user identifier and the file identifier.
- the embodiment of the present invention is applied to a scenario in which the server is based on a user group recommendation file obtained by grouping user identifiers according to the historical play record.
- the server may be a server associated with the current file identifier, or a function module in the server associated with the current file identifier, which is not limited in this embodiment of the present invention.
- the server records a file opened by each user identifier, and when a certain user identifier opens a file, the server establishes between the user identifier and the opened file identifier in the historical play record.
- the opening of the file by a certain user identifier means that the user corresponding to the user identifier opens the file through the terminal device used.
- the historical play record may be a historical play record within a preset duration saved by the server, that is, when the duration of any correspondence in the history play record has exceeded the preset duration, the reservation is retained. The duration of the corresponding relationship has been deleted beyond the preset duration.
- the file may be a video file, an audio file, or a text file provided by the server, such as a network video file provided by a video website server, an audio file provided by an audio website, or a network document provided by a document sharing server, etc., which is implemented by the present invention. This example does not limit this.
- the user identifier may be a user account or a terminal identifier.
- the file identifier may be a file name or a file number.
- the file indicated by the file identifier may be a video file, an audio file, a text file, or the like.
- the historical play record may include a correspondence between the user identifier and the plurality of types of file identifiers, which is not limited by the embodiment of the present invention.
- the server in the case that multiple types of files can be provided on the server, the server can also maintain corresponding historical play records for different types of files, such as recording the correspondence between the user identifier and the video file identifier. In addition, the correspondence between the user identifier and the audio file identifier is recorded. For a specified type, the server groups the user identifier according to the historical play record corresponding to the specified type, and obtains multiple user groups. When the current user identifier opens the file of the specified type, the server can A file of a specified type is recommended based on the plurality of user groups.
- the plurality of user groups are obtained according to a historical play record of a file of a specified type, and the user groups corresponding to the specified type are more versatile than the user group divided according to the historical play records of all types of files.
- the user's preference for a file of the specified type can further increase the recommendation success rate when recommending a file of the specified type.
- the server may select a sample user identifier according to the number of file identifiers corresponding to the user identifier, and perform grouping according to the selected sample user identifier, the server first obtains a file corresponding to each user identifier in the historical play record. The number of identities.
- the server selects the user identifier as a sample user identifier.
- the server determines whether the number of file identifiers corresponding to each user identifier exceeds a preset number when the number of file identifiers corresponding to each user identifier in the history play record is obtained, and the file corresponding to any user identifier When the number of the identifiers exceeds the preset number, the number of the corresponding file identifiers exceeds the preset number of user identifiers as the sample user identifier. When the number of file identifiers corresponding to any user identifier is less than or equal to the preset number The server may discard the user identifier or temporarily ignore the user identifier. When the number of files corresponding to the user identifier exceeds the preset number, the user identifier is used as the sample user identifier.
- the number of the file identifiers corresponding to the user identifier refers to the number of files that have been opened by the user identifier. When the number exceeds the preset number, the correspondence between the user identifier and the file identifier can be considered to reflect the number.
- User preferences can be used for grouping.
- the number of file identifiers corresponding to the user identifier A is 3, and the number of file identifiers corresponding to the user identifier B is 25. If the preset number is 10, the server uses the user identifier B as a sample user identifier. Regardless of user ID A.
- the preset number may be preset by a technician, or may be determined by the server according to the number of file identifiers corresponding to each user identifier in the historical play record, which is not limited by the embodiment of the present invention. .
- step 201 and the step 202 are optional steps, and the server may also use the user identifiers included in the history play record as the sample identifiers, and directly perform the subsequent step 203, which is not limited by the embodiment of the present invention. .
- the server uses the user identifier as the first dimension of the matrix, and identifies the file by using a file identifier.
- a second dimension of the matrix a two-dimensional matrix is constructed, into which a first preset threshold is filled with an element position corresponding to the sample user identifier and the file identifier.
- the two-dimensional matrix to be generated is identified by the user identifier as a first dimension and the file identifier as a second dimension, and each element in the two-dimensional matrix is determined according to a correspondence between the user identifier and the file identifier.
- each element in the two-dimensional matrix is determined according to a correspondence between the user identifier and the file identifier.
- a correspondence between the sample user identifier and the file identifier is saved in the history play record, it indicates that the user identifier has opened the file.
- the first preset threshold is filled in the two-dimensional matrix with the corresponding location of the user identifier and the file identifier; and the correspondence between the sample user identifier and the file identifier is not saved in the historical play record.
- the element is not temporarily filled with the element location corresponding to the user identifier and the file identifier.
- the first dimension is a row
- the second dimension is a column, that is, the two-dimensional matrix is identified by a user identifier as a row, and the file identifier is a column; or the first dimension is a column, the second dimension
- the two-dimensional matrix is identified by the user identifier as a column and by the file as a row.
- the first dimension and the second dimension are not limited in the embodiment of the present invention.
- the first preset threshold may be 1.
- the correspondence between the user identifier and the file identifier in the historical play record is as shown in Table 1. “ ⁇ ” indicates that the user identifier has a corresponding relationship with the file identifier.
- the first preset threshold is 1 as an example, and the sample user identifier A, the sample user identifier B, and the sample user identifier C respectively correspond to the first row of the two-dimensional matrix.
- the two-dimensional matrix is: (where X represents the element position of the unfilled element).
- the server randomly selects the same number of element positions from the remaining element positions of the two-dimensional matrix as the position of the element that has been filled with the first preset threshold, and fills the selected element position with the second preset. Threshold.
- the server fills all the sample user identifiers and the file identifiers corresponding to the file identifiers in the history play record, the number of the element positions filling the first preset threshold is obtained, and the remaining from the two-dimensional matrix In the element position of the unfilled element, the number of element positions equal to the position of the element that has been filled with the first preset threshold is randomly selected, and the selected element position is filled with the second preset threshold.
- the second preset threshold may be 0.
- the two-dimensional matrix after the first preset threshold and the second preset threshold are filled may be:
- the server determines a weight of an element position that has been filled with the first preset threshold as a first weight, and determines a weight of an element position that has been filled with the second preset threshold as a second weight.
- the file that the user identifier is opened may be a file that the user likes, and the file that the user identifier has not opened may be a file that the user likes, or may be a file that the user does not like.
- the server may assign a weight to the position of the element in the two-dimensional matrix, so that the machine learning can be performed according to the weight of the element position in the two-dimensional matrix in consideration of the influence of the first weight and the second weight.
- the first weight is greater than the second weight. For example, the first weight is 0.7 and the second weight is 0.3.
- the server performs SVD (Sigular Value Decomposition) on the two-dimensional matrix by using a SGD (Stochastic Gradient Descent) algorithm according to the first weight and the second weight to obtain a U matrix.
- SVD Sigular Value Decomposition
- A USV T
- A is the two-dimensional matrix
- U and V are orthogonal matrices
- S is a diagonal matrix.
- the two-dimensional matrix may include the element positions of the plurality of unfilled elements, that is, the two-dimensional matrix is a sparse matrix, and the SGD algorithm may be used to perform the SVD decomposition on the two-dimensional matrix. Learning, the U matrix is obtained by predicting missing elements in the two-dimensional matrix.
- step 206 is an optional step, and the server may further perform matrix decomposition and machine learning on the second matrix by using other algorithms, which is not limited by the embodiment of the present invention.
- the server uses the U matrix as the designated matrix.
- the server uses the U matrix as the designated matrix, so as to subsequently divide the user group according to the specified matrix.
- the step 207 may be replaced by the following steps: the server reduces the dimension of the second dimension of the U matrix according to the preset reserved dimension, and uses the reduced dimension U matrix as the designation. matrix.
- the server may set a preset retention dimension K, and perform dimension reduction on the second dimension of the U matrix according to the preset retention dimension K, to obtain the specified matrix, so that the dimension of the second dimension of the specified matrix is decreased.
- the preset retention dimension K may be preset by a technician, or may be determined by the server according to the prediction precision obtained by performing multiple experiments on different preset retention dimensions, which is not limited by the embodiment of the present invention.
- the sample data can be generalized effectively, and the calculation amount is reduced, and the over-fitting phenomenon is prevented.
- the server retains the first 8 columns of the U matrix, deletes the other columns, and uses the reserved matrix as the specified matrix.
- the server divides the specified matrix according to the first dimension, and obtains a feature vector corresponding to each sample user identifier.
- the first dimension of the specified matrix represents a sample user identifier
- the server divides the specified matrix according to the first dimension to obtain multiple vectors, and each obtained vector is used as each sample user identifier. Corresponding feature vector.
- the server divides the row of the specified matrix to obtain a plurality of row vectors of the specified matrix, and each row vector of the specified matrix is used as a feature vector corresponding to each sample user identifier. .
- the specified matrix is [-0.4472, -0.5373, -0.0064]
- the feature vector corresponding to the sample user identifier B is [-0.3586, 0.2461, 0.8622]
- the feature vector corresponding to the sample user identifier C is [- 0.2925, -0.4033, -0.2275].
- the server performs clustering processing on each sample user identifier based on the feature vector corresponding to each sample user identifier, to obtain at least one user group, where each user group includes at least one sample user identifier.
- the server uses a clustering algorithm to perform clustering calculation on the feature vector corresponding to each sample user identifier, to obtain at least one user group, and each user group includes at least one sample user identifier.
- the clustering algorithm may be a partitioning method, a hierarchical method, or the like, which is not limited by the embodiment of the present invention.
- the server may preset the number of clusters C, and perform clustering processing on each sample user identifier according to the number of clusters to obtain C user groups.
- the number of clusters C may be determined by the server according to the requirement of the grouping accuracy or the number of sample user identifiers, which is not limited by the embodiment of the present invention.
- step 210 performs step 210 after step 209 as
- the steps 201-209 can be performed in real time or periodically. There is no necessary timing relationship between the step 210 and the steps 201-209.
- the server receives the instruction to open the file, the server only needs to It is recommended to perform recommendations based on a plurality of user groups currently obtained.
- the server updates the historical play record, and re-executes the steps 201-209 to group the user identifiers to obtain an update. Multiple user groups.
- the server receives an instruction to open the file, the server performs recommendation based on the current plurality of user groups.
- the server receives an instruction to open a file, where the instruction for opening the file carries a current user identifier and a current file identifier.
- the server when the server receives the instruction to open the file, the file indicated by the current file identifier is opened based on the current user identifier. At this time, the server may identify the current user identifier and the current file identifier according to the current user identifier. Make recommendations.
- the server determines, according to the user identifier included in each user group, a specified user group to which the current user identifier belongs.
- the server has divided the plurality of user identifiers into multiple user groups, each user group includes at least one user identifier, and may consider that the user identifiers in the same user group are similar, and the server is recommended.
- the file may be recommended according to the file identifier corresponding to the user identifier similar to the current user identifier, without recommending the file identifier corresponding to all the user identifiers.
- the server determines, according to the user identifier included in each user group, the user group to which the current user identifier belongs, and the user group to which the current user identifier belongs is the designated user group, so as to perform recommendation based on the specified user group.
- the server determines, according to the historical play record, the number of user identifiers corresponding to each file identifier in the specified user group.
- the file identifier may correspond to a user identifier included in the specified user group, or may correspond to a user identifier not included in the specified user group.
- the server calculates the number of user identifiers corresponding to the file identified in the specified user group, the number indicating the number of user identifiers of the specified user group that have opened the file indicated by the file identifier.
- the server determines a preset number of file identifiers according to the determined number from the largest to the smallest.
- the preset number may be preset by the server, or may be determined by the server according to the number of files that can be displayed in the recommended area of the current display interface, which is not limited by the embodiment of the present invention.
- the server sorts each file identifier according to the determined number from the largest to the smallest, and determines the file identifier of the preset preset number of bits.
- the server in the specified user group, when the number of user identifiers of a certain file is opened, the more likely the file is considered to be the file of interest to the current user, the server should recommend the file. Therefore, the server can improve the recommendation success rate by determining the file identifier of the preset number of bits.
- step 212 and the step 213 are optional steps, and the server may determine the file identifier to be recommended in other manners, which is not limited by the embodiment of the present invention.
- step 212 and the step 213 may be replaced by the following steps:
- the server calculates, according to the feature vector of each user identifier in the specified user group, the similarity between each user identifier of the specified user group except the current user identifier and the current user identifier.
- the server may calculate the characteristics of each user identifier and the current user identifier.
- the cosine angle between the vectors or the Pearson correlation coefficient, etc. to indicate the similarity of each user identifier except the current user identifier to the current user identifier.
- the specific manner of calculating the similarity is not limited in the embodiment of the present invention.
- the server determines the user identity that is most similar to the current user identity.
- the server may determine the user identifier with the largest cosine angle between the feature vector and the feature vector of the current user identifier as the user identifier with the highest similarity to the current user identifier, or the feature vector and the current user identifier.
- the user identifier with the largest absolute value of the Pearson correlation coefficient between the feature vectors is determined to be the user identifier with the highest similarity to the current user identity.
- the user identifier with the highest similarity to the current user identifier may be regarded as the user identifier most similar to the current user identifier preference, and the server may perform the file identifier corresponding to the user identifier with the highest similarity. recommend.
- the server determines the file identifier corresponding to the user identifier with the highest similarity according to the correspondence between the user identifier and the file identifier, and performs step 214.
- the server determines, according to the correspondence between the user identifier and the file identifier, each file identifier corresponding to the user identifier with the highest similarity and each file identifier corresponding to the current user identifier, and compares and determines The file identifier corresponding to the user ID with the highest similarity and not corresponding to the current user identifier.
- the server determines that the user identifier B has opened the file identifier 1 according to the correspondence between the user identifier and the file identifier, and the current user If the identifier A does not open the file identifier 1, the file indicated by the file identifier 1 is recommended.
- the historical play record includes a correspondence between the user identifier and the first file identifier and the second file identifier, where the second file identifier is the user identifier after the first file identifier is opened. Open file identifier, then step 212 and step 213 are still
- the server may be replaced by the following steps: the server determines, according to the correspondence between the user identifier and the first file identifier and the second file identifier, a second file identifier corresponding to each user identifier in the specified user group and the current file identifier. And calculating the number of each second file identifier, and determining a preset number of second file identifiers according to the obtained number from the largest to the smallest.
- the server uses an AR (Association Rules) algorithm or a Collaborative Filtering (CF) algorithm to determine a file identifier to be recommended to recommend the file identifier to be recommended. document.
- AR Association Rules
- CF Collaborative Filtering
- the server recommends the file indicated by the determined file identifier.
- the link address of the determined file identifier may be provided on the display interface of the currently opened file, and the link address is used to jump to the The identified file identifies the file indicated.
- the server may also display a thumbnail generated by the file indicated by the determined file identifier, or display related information such as a publisher, a publishing time, and the like, which are not limited in this embodiment of the present invention.
- the recommendation may be sequentially performed according to the number of user identifiers corresponding to the specified user groups in each file identifier, and the recommendations may be sequentially performed according to the publishing time of the files.
- the embodiment does not limit this.
- the user group can be obtained according to the correspondence between the user identifier and the file identifier included in the historical play record, and the user identifiers with similar preferences can be divided into the same user group.
- the recommendation can be made based on the user group to which the current user identifier belongs, without pushing based on all the user identifiers. recommend. Since the preference of the current user identification and other user identifications is taken into consideration, the recommendation efficiency and the recommendation success rate are improved.
- the server filters out the sample user identifier according to the number of file identifiers corresponding to the user identifier, and assigns a weight to the first preset threshold and the second preset threshold filled in the two-dimensional matrix, thereby improving the accuracy of dividing the user group. Sex.
- the sample data can be generalized effectively, and the amount of calculation is reduced, thereby preventing the over-fitting phenomenon.
- FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention.
- the apparatus includes: a matrix construction module 301, a filler module 302, a matrix decomposition module 303, a vector division module 304, and a clustering module 305.
- Recommendation module 306 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention.
- the apparatus includes: a matrix construction module 301, a filler module 302, a matrix decomposition module 303, a vector division module 304, and a clustering module 305.
- Recommendation module 306 Recommendation module
- the matrix construction module 301 is configured to construct a two-dimensional matrix by using a user identifier and a file identifier according to the history play record, using the user identifier as the first dimension of the matrix, and using the file identifier as the second dimension of the matrix; the filling module 302 and The matrix construction module 301 is connected to fill an element with an element position corresponding to the corresponding relationship in the two-dimensional matrix according to the correspondence between the user identifier and the file identifier in the historical play record; the matrix decomposition module 303 and the filling module 302 The connection is used to perform matrix decomposition on the filled two-dimensional matrix to obtain a specified matrix.
- the vector dividing module 304 is connected to the matrix decomposition module 303, and is configured to divide the specified matrix according to the first dimension, and obtain a correspondence corresponding to each user identifier.
- a feature vector is connected to the vector partitioning module 304, and is configured to perform clustering processing on each user identifier based on the feature vector corresponding to each user identifier to obtain at least one user group, where each user group includes At least one user identification;
- a recommendation module 306 is coupled to the clustering module 305 for User groups, recommended files.
- the recommendation module 306 includes:
- An instruction receiving unit configured to receive an instruction to open a file, where the instruction for opening the file carries a current user identifier and a current file identifier;
- a designated group determining unit configured to determine, according to the user identifier included in each user group, a specified user group to which the previous user ID belongs, and each user group is obtained according to a correspondence between the user identifier and the file identifier included in the historical play record;
- a file identifier determining unit configured to determine a file identifier to be recommended according to the correspondence between the user identifier and the file identifier and the specified user group;
- a recommendation unit for recommending the file indicated by the identified file identifier is a recommendation unit for recommending the file indicated by the identified file identifier.
- the matrix construction module 301 includes:
- a number obtaining unit configured to obtain, for each user identifier in the historical play record, a number of file identifiers corresponding to the user identifier
- a sample obtaining unit configured to acquire the user identifier as a sample user identifier when the number of file identifiers corresponding to the user identifier exceeds a preset number
- the matrix construction unit is configured to construct a two-dimensional matrix by using a sample user identifier as a first dimension of the matrix and a file identifier as a second dimension of the matrix according to the sample user identifier and the file identifier included in the historical play record.
- the filling module 302 includes:
- a first filling unit configured to use a user identifier as a first dimension of the matrix when a correspondence between the sample user identifier and the file identifier is saved in the history play record for a sample user identifier and a file identifier.
- the file identifier is used as the second dimension of the matrix, and the first preset threshold is filled in the two-dimensional matrix with the element location corresponding to the sample user identifier and the file identifier;
- a second filling unit configured to randomly select from the remaining element positions of the two-dimensional matrix and fill the first pre-filled when all the sample user identifiers and the file element corresponding to the file identifiers are filled in the history play record
- the threshold element is positioned with an equal number of element positions, and the selected element position is filled with a second predetermined threshold.
- the matrix decomposition module 303 includes:
- a weight determining unit configured to determine a weight of an element position that has been filled with the first preset threshold For the first weight, determining a weight of an element position that has been filled with the second preset threshold as a second weight;
- a decomposing unit configured to perform a singular value decomposition SVD on the two-dimensional matrix by using a stochastic gradient descent algorithm SGD according to the first weight and the second weight, to obtain a U matrix;
- a matrix unit is specified for using the U matrix as the specified matrix.
- the device further includes:
- the dimension reduction module is configured to reduce the dimension of the second dimension of the U matrix according to the preset retention dimension, and use the dimension reduction U matrix as the specified matrix.
- the file identifier determining unit includes:
- a user number determining subunit configured to determine, according to the correspondence between the user identifier and the file identifier, the number of user identifiers corresponding to each file identifier in the specified user group;
- the first identifier determining subunit is configured to determine a preset number of file identifiers in descending order of the determined number.
- the file identifier determining unit includes:
- a similarity calculation sub-unit configured to calculate, according to a feature vector of each user identifier in the specified user group, a similarity between each user identifier of the specified user group except the current user identifier and the current user identifier;
- a user identifier determining subunit configured to determine a user identifier that has the highest similarity with the current user identifier
- the second identifier determining subunit is configured to determine, according to the correspondence between the user identifier and the file identifier, a file identifier corresponding to the user identifier with the highest similarity.
- the first preset threshold is 1, the second preset threshold is 0, and the first weight is greater than the second weight.
- the device provided by the embodiment of the present invention can obtain a user group according to the correspondence between the user identifier and the file identifier included in the history play record, and can similarly identify the user identifier
- the user group is assigned to the same user group, so that when the recommended file is identified for the current user, the recommendation can be made based on the user group to which the current user identifier belongs, without recommending based on all user identifiers, considering the preference of the current user identifier and other user identifiers. , improve the recommendation efficiency and recommendation success rate.
- the file recommendation device provided by the foregoing embodiment is used for recommending a file
- only the division of the above functional modules is illustrated.
- the function distribution may be completed by different functional modules as needed.
- the internal structure of the server is divided into different functional modules to perform all or part of the functions described above.
- the document recommendation device and the file recommendation method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
- the server 400 may have a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 422 ( For example, one or more processors) and memory 432, one or more storage media 430 that store application 442 or data 444 (eg, one or one storage device in Shanghai).
- the memory 432 and the storage medium 430 may be short-term storage or persistent storage.
- the program stored on storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
- central processor 422 can be configured to communicate with storage medium 430, executing a series of instruction operations in storage medium 430 on server 400.
- Server 400 may also include one or more power sources 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or one or more operating systems 441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
- the steps performed by the server described in the above embodiments may be based on the Server structure.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
- the storage medium 430 includes a matrix construction instruction, a padding instruction, a matrix decomposition instruction, a vector division instruction, a cluster instruction, and a recommendation instruction executable by the central processing unit 422.
- the storage medium 430 may be a non-volatile computer readable storage medium, and the matrix construction instructions, the padding instructions, the matrix decomposition instructions, the vector division instructions, the clustering instructions, and the recommendation instructions may be machine readable instructions stored in the storage medium 430.
- Central processor 422 can execute machine readable instructions stored in storage medium 430 to implement the method steps and apparatus functions described in the above-described embodiments.
- the central processing unit 422 executes a matrix construction instruction for using the user identifier and the file identifier included in the history play record, using the user identifier as the first dimension of the matrix, and the file identifier as the second dimension of the matrix, Construct a two-dimensional matrix.
- the central processing unit 422 executes a padding instruction for filling an element with an element position corresponding to the corresponding relationship in the two-dimensional matrix according to a correspondence between a user identifier and a file identifier in the history play record.
- the central processing unit 422 performs a matrix decomposition instruction for performing matrix decomposition on the filled two-dimensional matrix to obtain a specified matrix.
- the central processor 422 performs a vector partitioning instruction for dividing the specified matrix according to the first dimension to obtain a feature vector corresponding to each user identifier.
- the central processing unit 422 is configured to perform clustering processing on each user identifier based on the feature vector corresponding to each user identifier, to obtain at least one User groups, each user group including at least one user ID.
- the central processor 422 executes a recommendation instruction for performing file recommendation based on the at least one user group.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Social Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Graphics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例公开了一种文件推荐方法和装置,属于网络技术领域。所述方法包括:根据历史播放记录所包含的用户标识与文件标识,以用户标识作为第一维度,以文件标识作为第二维度,构造二维矩阵;根据用户标识与文件标识之间的对应关系,向所述二维矩阵的元素位置填充元素,并进行矩阵分解,得到指定矩阵;基于按照所述第一维度划分得到的特征向量,对所述每个用户标识进行聚类处理,得到至少一个用户群;基于所述至少一个用户群,进行文件推荐。在本发明实例中,可以根据历史播放记录包括的用户标识与文件标识之间的对应关系得到用户群,将喜好类似的用户标识划分至同一用户群,能够基于当前用户标识所属的指定用户群进行推荐,提高了推荐效率和推荐成功率。
Description
本申请要求于2013年12月5日提交中国专利局、申请号为201310653411.6、发明名称为“文件推荐方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及网络技术领域,特别涉及一种文件推荐方法和装置。
发明背景
在日常的线上活动中,用户时时刻刻都在面对着各种各样的信息,但却很难从中筛选出自己真正感兴趣的信息。为了便于用户的筛选,服务器可以根据用户的浏览记录、兴趣爱好等,为用户推荐其可能感兴趣的信息。
以视频为例,终端在播放视频时,将其他用户在观看该视频时点击最多的视频推荐给当前用户,即通过将其他用户的喜好类比当前用户的喜好,为当前用户推荐视频。
但这种方式并未考虑到当前用户与其他用户的喜好不同的情况,因此在当前用户与其他用户的喜好不同时,推荐成功率很低。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种文件推荐方法和装置。所述技术方案如下:
本发明实施例提供了一种文件推荐方法,所述方法包括:
根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵;
根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素;
对填充后的二维矩阵进行矩阵分解,得到指定矩阵;
按照所述第一维度对所述指定矩阵进行划分,得到每个用户标识对应的特征向量;
基于所述每个用户标识对应的特征向量,对所述每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识;
基于所述至少一个用户群,进行文件推荐。
本发明另一实施例提供了一种文件推荐装置,所述装置包括:
矩阵构造模块,用于根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵;
填充模块,用于根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素;
矩阵分解模块,用于对填充后的二维矩阵进行矩阵分解,得到指定矩阵;
向量划分模块,用于按照所述第一维度对所述指定矩阵进行划分,得到每个用户标识对应的特征向量;
聚类模块,用于基于所述每个用户标识对应的特征向量,对所述每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识;
推荐模块,用于基于所述至少一个用户群,进行文件推荐。
本发明实施例提供的技术方案带来的有益效果是:
利用本发明实施例提供的方法和装置,可以根据历史播放记录包括
的用户标识与文件标识之间的对应关系得到用户群,并能够将喜好类似的用户标识划分至同一用户群。在为当前用户标识推荐文件时,能够基于该当前用户标识所属的用户群进行推荐,而无需基于所有的用户标识进行推荐。因为考虑到了当前用户标识与其他用户标识的喜好,因此提高了推荐效率和推荐成功率。
附图简要说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种文件推荐方法的流程图;
图2是本发明另一实施例提供的一种文件推荐方法的流程图;
图3是本发明实施例提供的一种文件推荐装置的结构示意图;
图4是本发明实施例提供的一种服务器的结构示意图。
实施本发明的方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是本发明实施例提供的一种文件推荐方法的流程图。该发明实施例的执行主体为服务器,参见图1,所述方法包括:
101、根据历史播放记录所包含的用户标识与文件标识,以用户标
识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
102、根据该历史播放记录中用户标识与文件标识之间的对应关系,向该二维矩阵中与该对应关系相应的元素位置填充元素。
103、对填充后的二维矩阵进行矩阵分解,得到指定矩阵。
104、按照该第一维度对该指定矩阵进行划分,得到每个用户标识对应的特征向量。
105、基于该每个用户标识对应的特征向量,对该每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识。
106、基于该至少一个用户群,进行文件推荐。
利用本发明实施例提供的方法,可以根据历史播放记录包括的用户标识与文件标识之间的对应关系得到用户群,并能够将喜好类似的用户标识划分至同一用户群。在为当前用户标识推荐文件时,能够基于该当前用户标识所属的用户群进行推荐,而无需基于所有的用户标识进行推荐。因为考虑到了当前用户标识与其他用户标识的喜好,因此提高了推荐效率和推荐成功率。
在一实施例中,基于该至少一个用户群,进行文件推荐包括:
接收打开文件的指令,该打开文件的指令携带当前用户标识和当前文件标识;
根据每个用户群包括的用户标识,确定该当前用户标识所属的指定用户群,该每个用户群根据历史播放记录包括的用户标识与文件标识之间的对应关系得到;
根据该用户标识与文件标识之间的对应关系以及该指定用户群,确定待推荐的文件标识;
推荐确定的文件标识所指示的文件。
在一实施例中,根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵包括:
对于该历史播放记录中的每个用户标识,获取该用户标识对应的文件标识的数目;
当该用户标识对应的文件标识的数目超过预设数目时,将该用户标识获取为样本用户标识;
根据该历史播放记录所包含的样本用户标识与文件标识,以样本用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
在一实施例中,根据该历史播放记录中用户标识与文件标识之间的对应关系,向该二维矩阵中与该对应关系相应的元素位置填充元素包括:
对于一个样本用户标识和一个文件标识,当该历史播放记录中保存有该样本用户标识与该文件标识之间的对应关系时,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,向该二维矩阵中,与该样本用户标识和该文件标识相应的元素位置填充第一预设阈值;
当向该历史播放记录中保存的所有样本用户标识和文件标识相应的元素位置均填充完成时,随机从该二维矩阵剩余的元素位置中选取与已填充第一预设阈值的元素位置相等数目的元素位置,向选取的元素位置填充第二预设阈值。
在一实施例中,对填充后的二维矩阵进行矩阵分解,得到指定矩阵包括:
将已填充第一预设阈值的元素位置的权重确定为第一权重,将已填充该第二预设阈值的元素位置的权重确定为第二权重;
根据该第一权重和该第二权重,采用随机梯度下降算法SGD,对该二维矩阵进行奇异值分解SVD,得到U矩阵;
将该U矩阵作为该指定矩阵。
在一实施例中,根据该第一权重和该第二权重,采用随机梯度下降算法SGD,对该二维矩阵进行奇异值分解SVD,得到U矩阵之后,该方法还包括:
根据预设保留维数,对该U矩阵的第二维度进行降维,将降维后的U矩阵作为该指定矩阵。
在一实施例中,根据该用户标识与文件标识之间的对应关系以及该指定用户群,确定待推荐的文件标识包括:
根据该用户标识与文件标识之间的对应关系,确定每个文件标识在该指定用户群中所对应的用户标识的数目;
按照该确定的数目从大到小的顺序,确定预设数目的文件标识。
在一实施例中,根据该用户标识与文件标识之间的对应关系以及该指定用户群,确定待推荐的文件标识包括:
根据该指定用户群中的每个用户标识的特征向量,计算该指定用户群中除该当前用户标识之外的每个用户标识与该当前用户标识的相似度;
确定与该当前用户标识相似度最高的用户标识;
根据该用户标识与文件标识之间的对应关系,确定该相似度最高的用户标识所对应的文件标识。
在一实施例中,该第一预设阈值为1,该第二预设阈值为0,且该第一权重大于该第二权重。
上述所有实施例所提供的技术方案,可以任意结合以形成本发明的实施例,在此不再一一赘述。
图2是本发明实施例提供的一种文件推荐方法的流程图。该发明实施例的执行主体为服务器,参见图2,所述方法包括:
201、对于历史播放记录中的每个用户标识,该服务器获取该用户标识对应的文件标识的数目,该历史播放记录包括用户标识与文件标识之间的对应关系。
本发明实施例应用于该服务器基于根据该历史播放记录对用户标识进行分群后得到的用户群推荐文件的场景下。其中,该服务器可以为与该当前文件标识关联的服务器,或者为与该当前文件标识关联的服务器中的功能模块,本发明实施例对此不做限定。
在本发明实施例中,该服务器记录每个用户标识打开的文件,一旦某一用户标识打开了某一文件时,该服务器在该历史播放记录中建立该用户标识与所打开的文件标识之间的对应关系。在本发明实施例中,某一用户标识打开文件是指,与该用户标识对应的用户通过所使用的终端设备打开该文件。进一步地,该历史播放记录可以为该服务器保存的预设时长内的历史播放记录,也即是,当该历史播放记录中任一条对应关系保留的时长已超过该预设时长时,将该保留的时长已超过该预设时长的对应关系删除。其中,该文件可以为服务器所提供的视频文件、音频文件或者文本文件等,如视频网站服务器提供的网络视频文件、音频网站提供的音频文件或文档共享服务器所提供的网络文档等,本发明实施例对此不做限定。
其中,该用户标识可以为用户账号或者终端标识等,该文件标识可以为文件名称或者文件编号等,该文件标识所指示的文件可以为视频文件、音频文件、文本文件等多种类型的文件,相应的,该历史播放记录可以包括用户标识与多种类型的文件标识之间的对应关系,本发明实施例对此均不做限定。
在本发明实施例中,对于服务器上可以提供多种类型的文件的情况下,该服务器还可以对不同类型的文件维护对应的历史播放记录,如记录用户标识与视频文件标识之间的对应关系、另外记录用户标识与音频文件标识之间的对应关系等等。对于某一指定类型来说,该服务器根据该指定类型对应的历史播放记录,对用户标识进行分群,得到多个用户群,则在当前用户标识打开了该指定类型的文件时,该服务器即可基于该多个用户群推荐指定类型的文件。其中,该多个用户群是根据指定类型的文件的历史播放记录划分得到的,与根据所有类型的文件的历史播放记录划分的用户群相比,该指定类型对应的多个用户群更能体现用户对指定类型的文件的喜好,在推荐指定类型的文件时能够进一步提高推荐成功率。
在本发明实施例中,对于该历史播放记录包括的一个用户标识来说,如果该用户标识打开过的文件标识数目很小,则该用户标识与文件标识之间的对应关系并不能准确地体现该用户的喜好,该用户标识会对划分用户群的准确性造成影响。为了保证分群的准确性,该服务器可以根据用户标识对应的文件标识的数目选取样本用户标识,根据选取的样本用户标识进行分群,则该服务器先获取该历史播放记录中每个用户标识对应的文件标识的数目。
202、当该用户标识对应的文件标识的数目超过预设数目时,该服务器将该用户标识选取为样本用户标识。
具体地,该服务器在获取到该历史播放记录中每个用户标识对应的文件标识的数目时,判断每个用户标识对应的文件标识的数目是否超过预设数目,当任一用户标识对应的文件标识的数目超过该预设数目时,将该对应的文件标识的数目超过该预设数目的用户标识作为样本用户标识。当任一用户标识对应的文件标识的数目小于或等于该预设数目
时,该服务器可以丢弃该用户标识,或者暂时忽略该用户标识,等待该用户标识对应的文件数目超过该预设数目时,再将该用户标识作为样本用户标识。
其中,该用户标识对应的文件标识的数目即是指该用户标识打开过的文件的数目,当该数目超过该预设数目时,可以认为该用户标识与文件标识之间的对应关系能够体现该用户的喜好,可以用于进行分群。
例如,该历史播放记录中,用户标识A对应的文件标识数目为3,用户标识B对应的文件标识数目为25,假设该预设数目为10,则该服务器将用户标识B作为样本用户标识,而不考虑用户标识A。
在本发明实施例中,该预设数目可以由技术人员预先设定,或者由该服务器根据该历史播放记录中每个用户标识对应的文件标识的数目确定,本发明实施例对此不做限定。
需要说明的是,该步骤201和步骤202为可选步骤,该服务器还可以将该历史播放记录包括的所有用户标识均作为样本标识,直接执行后续步骤203,本发明实施例对此不做限定。
203、对于一个样本用户标识和一个文件标识,当该历史播放记录中保存有该样本用户标识与该文件标识之间的对应关系时,该服务器以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵,向该二维矩阵中,与该样本用户标识和该文件标识相应的元素位置填充第一预设阈值。
具体地,该待生成的二维矩阵以用户标识为第一维度,以文件标识为第二维度,且该二维矩阵中的各个元素根据该用户标识与文件标识之间的对应关系确定。对于该历史播放记录中的一个样本用户标识和一个文件标识来说,当该历史播放记录中保存有该样本用户标识与该文件标识之间的对应关系时,表明该用户标识曾打开过该文件标识所指示的文
件,则向该二维矩阵中,与该用户标识和该文件标识相应的元素位置填充第一预设阈值;当该历史播放记录中未保存该样本用户标识与该文件标识之间的对应关系时,表明该用户标识未打开过该文件标识所指示的文件,则暂时不向与该用户标识和该文件标识相应的元素位置填充元素。
在本发明实施例中,该第一维度为行、该第二维度为列,即该二维矩阵以用户标识为行、以文件标识为列;或者该第一维度为列、该第二维度为行,即该二维矩阵以用户标识为列、以文件标识为行。本发明实施例对该第一维度和该第二维度不做限定。
在本发明实施例中,该第一预设阈值可以为1。
例如,该历史播放记录中用户标识与文件标识之间的对应关系如表1所示,“√”表示用户标识与文件标识建立有对应关系。
表1
文件标识1 | 文件标识2 | 文件标识3 | |
样本用户标识A | √ | ||
样本用户标识B | √ | √ | |
样本用户标识C | √ |
以该第一维度为行、该第二维度为列,该第一预设阈值为1为例,样本用户标识A、样本用户标识B和样本用户标识C分别对应该二维矩阵的第一行、第二行和第三行,文件标识1、文件标识2和文件标识3分别对应该二维矩阵的第一列、第二列和第三列,则填充该第一预设阈值后的该二维矩阵为: (其中,X表示未填充元素的元素位置)。
204、当向该历史播放记录中保存的所有样本用户标识和文件标识
相应的元素位置均填充完成时,该服务器随机从该二维矩阵剩余的元素位置中选取与已填充第一预设阈值的元素位置相等数目的元素位置,将选取的元素位置填充第二预设阈值。
具体地,该服务器向该历史播放记录中保存的所有样本用户标识和文件标识对应的元素位置均填充完成时,获取填充该第一预设阈值的元素位置的数目,并从该二维矩阵剩余的、未填充元素的元素位置中,随机选取与已填充该第一预设阈值的元素位置相等数目的元素位置,向选取的元素位置填充第二预设阈值。
在本发明实施例中,该第二预设阈值可以为0。
205、该服务器将已填充第一预设阈值的元素位置的权重确定为第一权重,将已填充该第二预设阈值的元素位置的权重确定为第二权重。
在本发明实施例中,可以认为该用户标识打开过的文件为该用户喜欢的文件,而该用户标识未打开过的文件可能为该用户喜欢的文件,也可能为该用户不喜欢的文件。则该服务器可以为该二维矩阵中的元素位置分配权重,以便后续能够考虑到该第一权重和该第二权重的影响,按照该二维矩阵中的元素位置的权重进行机器学习。在本发明实施例中,该第一权重大于该第二权重。如,该第一权重为0.7,该第二权重为0.3。
206、该服务器根据该第一权重和该第二权重,采用SGD(Stochastic Gradient Descent,随机梯度下降)算法,对该二维矩阵进行SVD(Sigular Value Decomposition,奇异值分解),得到U矩阵。
其中,该进行SVD的过程为:A=USVT,A为该二维矩阵,U和V
为正交矩阵,S为对角矩阵。
在本发明实施例中,该二维矩阵中可能包括多个未填充元素的元素位置,即该二维矩阵为稀疏矩阵,则在对该二维矩阵进行SVD分解时,可以采用SGD算法进行机器学习,通过对该二维矩阵中缺失的元素进行预测,得到该U矩阵。
需要说明的是,该步骤206为可选步骤,该服务器还可以采用其他的算法对该第二矩阵进行矩阵分解以及机器学习,本发明实施例对此不做限定。
207、该服务器将该U矩阵作为该指定矩阵。
在本发明实施例中,该服务器将该U矩阵作为该指定矩阵,以便后续根据该指定矩阵划分用户群。
在本发明的另一实施例中,该步骤207可以由以下步骤代替:该服务器根据预设保留维数,对该U矩阵的第二维度进行降维,将降维后的U矩阵作为该指定矩阵。
其中,该服务器可以设定预设保留维数K,根据预设保留维数K对该U矩阵的第二维度进行降维,得到该指定矩阵,使得该指定矩阵的第二维度的维数降为K。该预设保留维数K可以由技术人员预先设定,或者由该服务器根据对不同的预设保留维数进行多次实验得到的预测精度确定,本发明实施例对此不做限定。
本发明实施例通过对该U矩阵进行降维,能够有效地对样本数据进行泛化,并降低计算量,同时防止了过拟合现象。
如该第二维度为列且该预设保留维数K=8时,该服务器保留该U矩阵的前8列,将其他列删除,将保留的矩阵作为该指定矩阵。
208、该服务器按照该第一维度对该指定矩阵进行划分,得到每个样本用户标识对应的特征向量。
在本发明实施例中,该指定矩阵的第一维度表示样本用户标识,该服务器按照该第一维度对该指定矩阵进行划分,得到多个向量,将得到的每个向量作为每个样本用户标识对应的特征向量。
如,该第一维度为行时,该服务器对该指定矩阵的行进行划分,得到该指定矩阵的多个行向量,将该指定矩阵的每个行向量作为每个样本用户标识对应的特征向量。
基于步骤203的举例,假设该指定矩阵为 则样本用户标识A对应的特征向量为[-0.4472,-0.5373,-0.0064],样本用户标识B对应的特征向量为[-0.3586,0.2461,0.8622],样本用户标识C对应的特征向量为[-0.2925,-0.4033,-0.2275]。
209、该服务器基于该每个样本用户标识对应的特征向量,对该每个样本用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个样本用户标识。
具体地,该服务器采用聚类算法,对该每个样本用户标识对应的特征向量进行聚类计算,得到至少一个用户群,每个用户群包括至少一个样本用户标识。
在本发明实施例中,该聚类算法可以为划分法、层次法等,本发明实施例对此不做限定。
进一步地,该服务器可以预先设定聚类个数C,并根据该聚类个数对该每个样本用户标识进行聚类处理,得到C个用户群。其中,该聚类个数C可以由该服务器根据分群精度的需求或者样本用户标识的个数确定,本发明实施例对此不做限定。
需要说明的是,本发明实施例以该步骤210在步骤209之后执行为
例进行说明,事实上,该步骤201-209可以实时执行或者周期性执行,该步骤210与该步骤201-209之间没有必然的时序关系,该服务器在接收到打开文件的指令时,只需根据当前得到的多个用户群进行推荐即可。
以实时执行该步骤201-209为例,每当任一用户标识打开了任一文件时,该服务器更新该历史播放记录,并重新执行该步骤201-209,对用户标识进行分群,得到更新后的多个用户群。该服务器在接收到打开文件的指令时,根据当前的多个用户群进行推荐。
210、该服务器接收打开文件的指令,该打开文件的指令携带当前用户标识和当前文件标识。
在本发明实施例中,该服务器接收到该打开文件的指令时,基于该当前用户标识,打开该当前文件标识所指示的文件,此时,该服务器可以根据该当前用户标识和该当前文件标识进行推荐。
211、该服务器根据每个用户群包括的用户标识,确定该当前用户标识所属的指定用户群。
在本发明实施例中,该服务器已将多个用户标识划分至多个用户群,每个用户群包括至少一个用户标识,且可以认为在同一用户群中的用户标识喜好类似,则该服务器在推荐文件时,可以根据与当前用户标识喜好类似的用户标识对应的文件标识进行推荐,而无需根据所有的用户标识对应的文件标识进行推荐。
具体地,该服务器根据每个用户群包括的用户标识,确定该当前用户标识所属的用户群,将该当前用户标识所属的用户群作为指定用户群,以便基于该指定用户群进行推荐。
212、该服务器根据该历史播放记录,确定每个文件标识在该指定用户群中所对应的用户标识的数目。
在本发明实施例中,对于该历史播放记录包括的一个文件标识来说,该文件标识可能对应有该指定用户群包括的用户标识,也可能对应有该指定用户群未包括的用户标识,则该服务器计算该文件标识在该指定用户群中所对应的用户标识的数目,该数目表明该指定用户群中,曾打开过该文件标识所指示文件的用户标识的数目。
213、该服务器按照该确定的数目从大到小的顺序,确定预设数目的文件标识。
其中,该预设数目可以由该服务器预先设定,或者由该服务器根据当前显示界面中的推荐区域能显示的文件数目确定,本发明实施例对此不做限定。
具体地,该服务器按照该确定的数目从大到小的顺序,对每个文件标识进行排序,确定排在前预设数目位的文件标识。
本发明实施例中,在该指定用户群中,当打开过某一文件的用户标识越多时,可以认为该文件越有可能是当前用户感兴趣的文件,则该服务器应当推荐该文件。因此,该服务器通过确定排在前预设数目位的文件标识进行推荐,能够提高推荐成功率。
需要说明的是,该步骤212和步骤213为可选步骤,该服务器还可以采用其他的方式,确定待推荐的文件标识,本发明实施例对此不做限定。
在本发明的另一实施例中,该步骤212和步骤213可以由以下步骤代替:
(I)该服务器根据该指定用户群中每个用户标识的特征向量,计算该指定用户群中除该当前用户标识之外的每个用户标识与该当前用户标识的相似度。
其中,该服务器可以计算该每个用户标识与该当前用户标识的特征
向量之间的余弦夹角或者Pearson(皮尔逊)相关系数等,以表示该当前用户标识之外的每个用户标识与该当前用户标识的相似度。本发明实施例对计算该相似度的具体方式不做限定。
(II)该服务器确定与该当前用户标识相似度最高的用户标识。
其中,该服务器可以将特征向量与该当前用户标识的特征向量之间的余弦夹角最大的用户标识确定为与该当前用户标识相似度最高的用户标识,或者将特征向量与该当前用户标识的特征向量之间的Pearson相关系数的绝对值最大的用户标识确定为与该当前用户标识相似度最高的用户标识。
在本发明实施例中,与该当前用户标识相似度最高的用户标识可以认为是与该当前用户标识喜好最为类似的用户标识,该服务器可以根据该相似度最高的用户标识所对应的文件标识进行推荐。
(III)该服务器根据该用户标识与文件标识之间的对应关系,确定该相似度最高的用户标识所对应的文件标识,执行步骤214。
具体地,该服务器根据该用户标识与文件标识之间的对应关系,确定该相似度最高的用户标识所对应的每个文件标识以及该当前用户标识对应的每个文件标识,并进行比较,确定该相似度最高的用户标识对应的、而该当前用户标识未对应的文件标识。
例如,在该指定用户群中,用户标识B与当前用户标识A的相似度最高,该服务器根据该用户标识与文件标识之间的对应关系,确定用户标识B打开过文件标识1,而当前用户标识A并未打开过文件标识1,则推荐该文件标识1指示的文件。
在本发明的又一实施例中,该历史播放记录包括用户标识与第一文件标识和第二文件标识之间的对应关系,该第二文件标识为该用户标识在打开该第一文件标识后打开的文件标识,则该步骤212和步骤213还
可以由以下步骤代替:该服务器根据该用户标识与第一文件标识和第二文件标识之间的对应关系,确定该指定用户群中每个用户标识与该当前文件标识所对应的第二文件标识,计算得到的每个第二文件标识的数目,按照该得到的数目从大到小的顺序,确定预设数目的第二文件标识。
通过根据用户标识与第一文件标识和第二文件标识之间的对应关系,确定该指定用户群中每个用户标识在打开该当前文件标识后所打开的第二文件标识,当在打开该当前文件标识后打开某一第二文件标识所指示的文件越多时,可以认为该文件与该当前文件标识的相关度越高,也越有可能是当前用户感兴趣的文件,则推荐该文件。
在本发明实施例中,该服务器采用AR(Association Rules,关联规则挖掘策略)算法或者CF(Collaborative Filtering,协同过滤策略)算法,确定待推荐的文件标识,以推荐该待推荐的文件标识所指示的文件。
214、该服务器推荐该确定的文件标识所指示的文件。
在本发明实施例中,该服务器推荐该确定的文件标识所指示的文件时,可以在当前打开的文件的显示界面上提供该确定的文件标识的链接地址,该链接地址用于跳转至该确定的文件标识所指示的文件。另外,该服务器还可以显示该确定的文件标识所指示的文件生成的缩略图,或者显示发布者、发布时间等相关信息等,本发明实施例对此不做限定。
进一步地,对于多个该确定的文件标识来说,可以按照每个文件标识在该指定用户群中所对应的用户标识的数目依次进行推荐,还可以按照文件的发布时间依次进行推荐,本发明实施例对此均不做限定。
利用本发明实施例提供的方法,可以根据历史播放记录包括的用户标识与文件标识之间的对应关系得到用户群,并能够将喜好类似的用户标识划分至同一用户群。在为当前用户标识推荐文件时,能够基于该当前用户标识所属的用户群进行推荐,而无需基于所有的用户标识进行推
荐。因为考虑到了当前用户标识与其他用户标识的喜好,因此提高了推荐效率和推荐成功率。进一步地,该服务器按照用户标识对应的文件标识的数目,筛选出样本用户标识,并为该二维矩阵填充的第一预设阈值和第二预设阈值分配权重,提高了划分用户群的准确性。且通过对该U矩阵进行降维,能够有效地对样本数据进行泛化,并降低计算量,防止了过拟合现象。
图3是本发明实施例提供的一种文件推荐装置的结构示意图,参见图3,该装置包括:矩阵构造模块301、填充模块302、矩阵分解模块303、向量划分模块304、聚类模块305、推荐模块306,
其中,矩阵构造模块301用于根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵;填充模块302与矩阵构造模块301连接,用于根据该历史播放记录中用户标识与文件标识之间的对应关系,向该二维矩阵中与该对应关系相应的元素位置填充元素;矩阵分解模块303与填充模块302连接,用于对填充后的二维矩阵进行矩阵分解,得到指定矩阵;向量划分模块304与矩阵分解模块303连接,用于按照该第一维度对该指定矩阵进行划分,得到每个用户标识对应的特征向量;聚类模块305与向量划分模块304连接,用于基于该每个用户标识对应的特征向量,对该每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识;推荐模块306与聚类模块305连接,用于基于该至少一个用户群,进行文件推荐。
在本发明实施例中,该推荐模块306包括:
指令接收单元,用于接收打开文件的指令,该打开文件的指令携带当前用户标识和当前文件标识;
指定群确定单元,用于根据每个用户群包括的用户标识,确定该当
前用户标识所属的指定用户群,该每个用户群根据历史播放记录包括的用户标识与文件标识之间的对应关系得到;
文件标识确定单元,用于根据该用户标识与文件标识之间的对应关系以及该指定用户群,确定待推荐的文件标识;
推荐单元,用于推荐确定的文件标识所指示的文件。
在本发明实施例中,该矩阵构造模块301包括:
数目获取单元,用于对于该历史播放记录中的每个用户标识,获取该用户标识对应的文件标识的数目;
样本获取单元,用于当该用户标识对应的文件标识的数目超过预设数目时,将该用户标识获取为样本用户标识;
矩阵构造单元,用于根据该历史播放记录所包含的样本用户标识与文件标识,以样本用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
在本发明实施例中,该填充模块302包括:
第一填充单元,用于对于一个样本用户标识和一个文件标识,当该历史播放记录中保存有该样本用户标识与该文件标识之间的对应关系时,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,向该二维矩阵中,与该样本用户标识和该文件标识相应的元素位置填充第一预设阈值;
第二填充单元,用于当向该历史播放记录中保存的所有样本用户标识和文件标识相应的元素位置均填充完成时,随机从该二维矩阵剩余的元素位置中选取与已填充第一预设阈值的元素位置相等数目的元素位置,向选取的元素位置填充第二预设阈值。
在本发明实施例中,该矩阵分解模块303包括:
权重确定单元,用于将已填充第一预设阈值的元素位置的权重确定
为第一权重,将已填充该第二预设阈值的元素位置的权重确定为第二权重;
分解单元,用于根据该第一权重和该第二权重,采用随机梯度下降算法SGD,对该二维矩阵进行奇异值分解SVD,得到U矩阵;
指定矩阵单元,用于将该U矩阵作为该指定矩阵。
在本发明实施例中,该装置还包括:
降维模块,用于根据预设保留维数,对该U矩阵的第二维度进行降维,将降维后的U矩阵作为该指定矩阵。
在本发明实施例中,该文件标识确定单元包括:
用户数目确定子单元,用于根据该用户标识与文件标识之间的对应关系,确定每个文件标识在该指定用户群中所对应的用户标识的数目;
第一标识确定子单元,用于按照该确定的数目从大到小的顺序,确定预设数目的文件标识。
在本发明实施例中,该文件标识确定单元包括:
相似度计算子单元,用于根据该指定用户群中每个用户标识的特征向量,计算该指定用户群中除该当前用户标识之外的每个用户标识与该当前用户标识的相似度;
用户标识确定子单元,用于确定与该当前用户标识相似度最高的用户标识;
第二标识确定子单元,用于根据该用户标识与文件标识之间的对应关系,确定该相似度最高的用户标识所对应的文件标识。
在本发明实施例中,该第一预设阈值为1,该第二预设阈值为0,且该第一权重大于该第二权重。
本发明实施例提供的装置,通过根据历史播放记录包括的用户标识与文件标识之间的对应关系得到用户群,能够将喜好类似的用户标识划
分至同一用户群,使得在为当前用户标识推荐文件时,能够基于该当前用户标识所属的用户群进行推荐,而无需基于所有的用户标识进行推荐,考虑到了当前用户标识与其他用户标识的喜好,提高了推荐效率和推荐成功率。
需要说明的是:上述实施例提供的文件推荐装置在推荐文件时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将服务器的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的文件推荐装置与文件推荐方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4是本发明实施例提供的一种服务器的结构示意图,该服务器400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)422(例如,一个或一个以上处理器)和存储器432,一个或一个以上存储应用程序442或数据444的存储介质430(例如一个或一个以上海量存储设备)。其中,存储器432和存储介质430可以是短暂存储或持久存储。存储在存储介质430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器422可以设置为与存储介质430通信,在服务器400上执行存储介质430中的一系列指令操作。
服务器400还可以包括一个或一个以上电源426,一个或一个以上有线或无线网络接口450,一个或一个以上输入输出接口458,和/或,一个或一个以上操作系统441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中所述的由服务器所执行的步骤可以基于该图4所示的
服务器结构。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本发明实施例中,该存储介质430中包括可由中央处理器422执行的矩阵构造指令、填充指令、矩阵分解指令、向量划分指令、聚类指令和推荐指令。该存储介质430可以是非易失计算机可读存储介质,矩阵构造指令、填充指令、矩阵分解指令、向量划分指令、聚类指令和推荐指令可以是存储在存储介质430中的机器可读指令。中央处理器422可以执行存储在存储介质430中的机器可读指令以实现上述实施例所述的方法步骤和装置功能。
例如,中央处理器422执行矩阵构造指令,该矩阵构造指令用于根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
中央处理器422执行填充指令,该填充指令用于根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素。
中央处理器422执行矩阵分解指令,该矩阵分解指令用于对填充后的二维矩阵进行矩阵分解,得到指定矩阵。
中央处理器422执行向量划分指令,该向量划分指令用于按照所述第一维度对所述指定矩阵进行划分,得到每个用户标识对应的特征向量。
中央处理器422执行聚类指令,该聚类指令用于基于所述每个用户标识对应的特征向量,对所述每个用户标识进行聚类处理,得到至少一
个用户群,每个用户群包括至少一个用户标识。
中央处理器422执行推荐指令,该推荐指令用于基于所述至少一个用户群,进行文件推荐。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (18)
- 一种文件推荐方法,其特征在于,所述方法包括:根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵;根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素;对填充后的二维矩阵进行矩阵分解,得到指定矩阵;按照所述第一维度对所述指定矩阵进行划分,得到每个用户标识对应的特征向量;基于所述每个用户标识对应的特征向量,对所述每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识;基于所述至少一个用户群,进行文件推荐。
- 根据权利要求1所述的方法,其特征在于,基于所述至少一个用户群,进行文件推荐包括:接收打开文件的指令,所述打开文件的指令携带当前用户标识和当前文件标识;根据每个用户群包括的用户标识,确定所述当前用户标识所属的指定用户群,所述每个用户群根据历史播放记录包括的用户标识与文件标识之间的对应关系得到;根据所述用户标识与文件标识之间的对应关系以及所述指定用户群,确定待推荐的文件标识;推荐确定的文件标识所指示的文件。
- 根据权利要求1所述的方法,其特征在于,根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文 件标识作为矩阵的第二维度,构造二维矩阵包括:对于所述历史播放记录中的每个用户标识,获取所述用户标识对应的文件标识的数目;当所述用户标识对应的文件标识的数目超过预设数目时,将所述用户标识获取为样本用户标识;根据所述历史播放记录所包含的样本用户标识与文件标识,以样本用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
- 根据权利要求3所述的方法,其特征在于,根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素包括:对于一个样本用户标识和一个文件标识,当所述历史播放记录中保存有所述样本用户标识与所述文件标识之间的对应关系时,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,向所述二维矩阵中,与所述样本用户标识和所述文件标识相应的元素位置填充第一预设阈值;当向所述历史播放记录中保存的所有样本用户标识和文件标识相应的元素位置均填充完成时,随机从所述二维矩阵剩余的元素位置中选取与已填充第一预设阈值的元素位置相等数目的元素位置,向选取的元素位置填充第二预设阈值。
- 根据权利要求4所述的方法,其特征在于,对填充后的二维矩阵进行矩阵分解,得到指定矩阵包括:将已填充第一预设阈值的元素位置的权重确定为第一权重,将已填充所述第二预设阈值的元素位置的权重确定为第二权重;根据所述第一权重和所述第二权重,采用随机梯度下降算法SGD, 对所述二维矩阵进行奇异值分解SVD,得到U矩阵;将所述U矩阵作为所述指定矩阵。
- 根据权利要求5所述的方法,其特征在于,根据所述第一权重和所述第二权重,采用随机梯度下降算法SGD,对所述二维矩阵进行奇异值分解SVD,得到U矩阵之后,所述方法还包括:根据预设保留维数,对所述U矩阵的第二维度进行降维,将降维后的U矩阵作为所述指定矩阵。
- 根据权利要求2所述的方法,其特征在于,根据所述用户标识与文件标识之间的对应关系以及所述指定用户群,确定待推荐的文件标识包括:根据所述用户标识与文件标识之间的对应关系,确定每个文件标识在该指定用户群中所对应的用户标识的数目;按照所述确定的数目从大到小的顺序,确定预设数目的文件标识。
- 根据权利要求2所述的方法,其特征在于,根据所述用户标识与文件标识之间的对应关系以及所述指定用户群,确定待推荐的文件标识包括:根据所述指定用户群中每个用户标识的特征向量,计算所述指定用户群中除所述当前用户标识之外的每个用户标识与所述当前用户标识的相似度;确定与所述当前用户标识相似度最高的用户标识;根据所述用户标识与文件标识之间的对应关系,确定所述相似度最高的用户标识所对应的文件标识。
- 根据权利要求1-8任一项所述的方法,其特征在于,所述第一预设阈值为1,所述第二预设阈值为0,且所述第一权重大于所述第二权重。
- 一种文件推荐装置,其特征在于,所述装置包括:矩阵构造模块,用于根据历史播放记录所包含的用户标识与文件标识,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵;填充模块,用于根据所述历史播放记录中用户标识与文件标识之间的对应关系,向所述二维矩阵中与所述对应关系相应的元素位置填充元素;矩阵分解模块,用于对填充后的二维矩阵进行矩阵分解,得到指定矩阵;向量划分模块,用于按照所述第一维度对所述指定矩阵进行划分,得到每个用户标识对应的特征向量;聚类模块,用于基于所述每个用户标识对应的特征向量,对所述每个用户标识进行聚类处理,得到至少一个用户群,每个用户群包括至少一个用户标识;推荐模块,用于基于所述至少一个用户群,进行文件推荐。
- 根据权利要求10所述的装置,其特征在于,所述推荐模块包括:指令接收单元,用于接收打开文件的指令,所述打开文件的指令携带当前用户标识和当前文件标识;指定群确定单元,用于根据每个用户群包括的用户标识,确定所述当前用户标识所属的指定用户群,所述每个用户群根据历史播放记录包括的用户标识与文件标识之间的对应关系得到;文件标识确定单元,用于根据所述用户标识与文件标识之间的对应关系以及所述指定用户群,确定待推荐的文件标识;推荐单元,用于推荐确定的文件标识所指示的文件。
- 根据权利要求10所述的装置,其特征在于,所述矩阵构造模块包括:数目获取单元,用于对于所述历史播放记录中的每个用户标识,获取所述用户标识对应的文件标识的数目;样本获取单元,用于当所述用户标识对应的文件标识的数目超过预设数目时,将所述用户标识获取为样本用户标识;矩阵构造单元,用于根据所述历史播放记录所包含的样本用户标识与文件标识,以样本用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,构造二维矩阵。
- 根据权利要求12所述的装置,其特征在于,所述填充模块包括:第一填充单元,用于对于一个样本用户标识和一个文件标识,当所述历史播放记录中保存有所述样本用户标识与所述文件标识之间的对应关系时,以用户标识作为矩阵的第一维度,以文件标识作为矩阵的第二维度,向所述二维矩阵中,与所述样本用户标识和所述文件标识相应的元素位置填充第一预设阈值;第二填充单元,用于当向所述历史播放记录中保存的所有样本用户标识和文件标识相应的元素位置均填充完成时,随机从所述二维矩阵剩余的元素位置中选取与已填充第一预设阈值的元素位置相等数目的元素位置,向选取的元素位置填充第二预设阈值。
- 根据权利要求13所述的装置,其特征在于,所述矩阵分解模块包括:权重确定单元,用于将已填充第一预设阈值的元素位置的权重确定为第一权重,将已填充所述第二预设阈值的元素位置的权重确定为第二权重;分解单元,用于根据所述第一权重和所述第二权重,采用随机梯度下降算法SGD,对所述二维矩阵进行奇异值分解SVD,得到U矩阵;指定矩阵单元,用于将所述U矩阵作为所述指定矩阵。
- 根据权利要求14所述的装置,其特征在于,所述装置还包括:降维模块,用于根据预设保留维数,对所述U矩阵的第二维度进行降维,将降维后的U矩阵作为所述指定矩阵。
- 根据权利要求11所述的装置,其特征在于,所述文件标识确定单元包括:用户数目确定子单元,用于根据所述用户标识与文件标识之间的对应关系,确定每个文件标识在该指定用户群中所对应的用户标识的数目;第一标识确定子单元,用于按照所述确定的数目从大到小的顺序,确定预设数目的文件标识。
- 根据权利要求11所述的装置,其特征在于,所述文件标识确定单元包括:相似度计算子单元,用于根据所述指定用户群中每个用户标识的特征向量,计算所述指定用户群中除所述当前用户标识之外的每个用户标识与所述当前用户标识的相似度;用户标识确定子单元,用于确定与所述当前用户标识相似度最高的用户标识;第二标识确定子单元,用于根据所述用户标识与文件标识之间的对应关系,确定所述相似度最高的用户标识所对应的文件标识。
- 根据权利要求10-17任一项所述的装置,其特征在于,所述第一预设阈值为1,所述第二预设阈值为0,且所述第一权重大于所述第二权重。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/172,207 US9930419B2 (en) | 2013-12-05 | 2016-06-03 | File recommendation method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310653411.6A CN104079960B (zh) | 2013-12-05 | 2013-12-05 | 文件推荐方法和装置 |
CN201310653411.6 | 2013-12-05 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/172,207 Continuation US9930419B2 (en) | 2013-12-05 | 2016-06-03 | File recommendation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015081915A1 true WO2015081915A1 (zh) | 2015-06-11 |
Family
ID=51600967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/072275 WO2015081915A1 (zh) | 2013-12-05 | 2015-02-05 | 文件推荐方法和装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9930419B2 (zh) |
CN (1) | CN104079960B (zh) |
WO (1) | WO2015081915A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11754157B2 (en) | 2020-05-20 | 2023-09-12 | Tolomatic, Inc. | Integrated motor linear actuator |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104079960B (zh) | 2013-12-05 | 2015-10-07 | 深圳市腾讯计算机系统有限公司 | 文件推荐方法和装置 |
CN105550240B (zh) * | 2015-11-30 | 2018-11-27 | 浪潮通用软件有限公司 | 一种推荐的方法及装置 |
CN107087235B (zh) * | 2017-04-21 | 2021-09-10 | 腾讯科技(深圳)有限公司 | 媒体内容推荐方法、服务器及客户端 |
CN106982128B (zh) * | 2017-05-25 | 2019-02-12 | 安徽智柜科技发展有限公司 | 基于网络的社群构建方法 |
CN107563841B (zh) * | 2017-08-03 | 2021-02-05 | 电子科技大学 | 一种基于用户评分分解的推荐系统 |
CN110309293A (zh) * | 2018-02-13 | 2019-10-08 | 北京京东尚科信息技术有限公司 | 文本推荐方法和装置 |
CN108737491B (zh) * | 2018-03-23 | 2020-09-01 | 腾讯科技(深圳)有限公司 | 信息推送方法和装置以及存储介质、电子装置 |
CN109240991B (zh) * | 2018-09-26 | 2021-07-30 | Oppo广东移动通信有限公司 | 文件推荐方法、装置、存储介质及智能终端 |
WO2020181479A1 (en) * | 2019-03-12 | 2020-09-17 | Citrix Systems, Inc. | Intelligent file recommendation engine |
CN112073817B (zh) * | 2019-06-10 | 2022-08-30 | 腾讯科技(深圳)有限公司 | 媒体文件的播放控制方法、装置、电子设备及存储介质 |
CN112184401B (zh) * | 2020-09-22 | 2021-05-14 | 筑客网络技术(上海)有限公司 | 一种用于建材招投标平台的智能匹配方法 |
CN114139055A (zh) * | 2021-11-30 | 2022-03-04 | 中国建设银行股份有限公司 | 一种特征信息的确定方法、装置、设备及存储介质 |
CN118194250B (zh) * | 2024-05-16 | 2024-08-09 | 国网山东省电力公司东营供电公司 | 文件材料智能分发组合方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2330516A1 (en) * | 2009-12-02 | 2011-06-08 | NBCUniversal Media, LLC | Method and systems for online recommendation |
CN102426686A (zh) * | 2011-09-29 | 2012-04-25 | 南京大学 | 一种基于矩阵分解的互联网信息产品推荐方法 |
CN103152618A (zh) * | 2011-12-07 | 2013-06-12 | 北京四达时代软件技术股份有限公司 | 数字电视增值业务内容推荐方法及装置 |
CN103209342A (zh) * | 2013-04-01 | 2013-07-17 | 电子科技大学 | 一种引入视频流行度和用户兴趣变化的协作过滤推荐方法 |
CN103389966A (zh) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | 一种海量数据的处理、搜索、推荐方法及装置 |
CN104079960A (zh) * | 2013-12-05 | 2014-10-01 | 深圳市腾讯计算机系统有限公司 | 文件推荐方法和装置 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8843965B1 (en) * | 2000-09-20 | 2014-09-23 | Kaushal Kurapati | Method and apparatus for generating recommendation scores using implicit and explicit viewing preferences |
US7721310B2 (en) * | 2000-12-05 | 2010-05-18 | Koninklijke Philips Electronics N.V. | Method and apparatus for selective updating of a user profile |
US20020075320A1 (en) * | 2000-12-14 | 2002-06-20 | Philips Electronics North America Corp. | Method and apparatus for generating recommendations based on consistency of selection |
US7739231B2 (en) * | 2006-08-28 | 2010-06-15 | Manyworlds, Inc. | Mutual commit people matching process |
JP4591794B2 (ja) * | 2008-04-22 | 2010-12-01 | ソニー株式会社 | 情報処理装置および方法、並びにプログラム |
CN102044009A (zh) * | 2009-10-23 | 2011-05-04 | 华为技术有限公司 | 群组推荐方法和系统 |
US20110153663A1 (en) * | 2009-12-21 | 2011-06-23 | At&T Intellectual Property I, L.P. | Recommendation engine using implicit feedback observations |
WO2013010024A1 (en) * | 2011-07-12 | 2013-01-17 | Thomas Pinckney | Recommendations in a computing advice facility |
CN103246672B (zh) * | 2012-02-09 | 2016-06-08 | 中国科学技术大学 | 对用户进行个性化推荐的方法和装置 |
JP5209129B1 (ja) * | 2012-04-26 | 2013-06-12 | 株式会社東芝 | 情報処理装置、放送受信装置及び情報処理方法 |
US20160165277A1 (en) * | 2013-03-15 | 2016-06-09 | Google Inc. | Media metrics estimation from large population data |
US9552055B2 (en) * | 2013-07-15 | 2017-01-24 | Facebook, Inc. | Large scale page recommendations on online social networks |
US20150143390A1 (en) * | 2013-11-21 | 2015-05-21 | Sony Corporation | Fillable form for providing and receiving customized audio video content |
US9767102B2 (en) * | 2014-12-01 | 2017-09-19 | Comcast Cable Communications, Llc | Content recommendation system |
-
2013
- 2013-12-05 CN CN201310653411.6A patent/CN104079960B/zh active Active
-
2015
- 2015-02-05 WO PCT/CN2015/072275 patent/WO2015081915A1/zh active Application Filing
-
2016
- 2016-06-03 US US15/172,207 patent/US9930419B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2330516A1 (en) * | 2009-12-02 | 2011-06-08 | NBCUniversal Media, LLC | Method and systems for online recommendation |
CN102426686A (zh) * | 2011-09-29 | 2012-04-25 | 南京大学 | 一种基于矩阵分解的互联网信息产品推荐方法 |
CN103152618A (zh) * | 2011-12-07 | 2013-06-12 | 北京四达时代软件技术股份有限公司 | 数字电视增值业务内容推荐方法及装置 |
CN103389966A (zh) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | 一种海量数据的处理、搜索、推荐方法及装置 |
CN103209342A (zh) * | 2013-04-01 | 2013-07-17 | 电子科技大学 | 一种引入视频流行度和用户兴趣变化的协作过滤推荐方法 |
CN104079960A (zh) * | 2013-12-05 | 2014-10-01 | 深圳市腾讯计算机系统有限公司 | 文件推荐方法和装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11754157B2 (en) | 2020-05-20 | 2023-09-12 | Tolomatic, Inc. | Integrated motor linear actuator |
Also Published As
Publication number | Publication date |
---|---|
US9930419B2 (en) | 2018-03-27 |
CN104079960B (zh) | 2015-10-07 |
CN104079960A (zh) | 2014-10-01 |
US20160286277A1 (en) | 2016-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015081915A1 (zh) | 文件推荐方法和装置 | |
TWI702844B (zh) | 用戶特徵的生成方法、裝置、設備及儲存介質 | |
US9595053B1 (en) | Product recommendation using sentiment and semantic analysis | |
KR102233805B1 (ko) | 미인지된 및 새로운 유저들에 대한 향상된 유저 경험 | |
US20190197416A1 (en) | Information recommendation method, apparatus, and server based on user data in an online forum | |
US10789620B2 (en) | User segment identification based on similarity in content consumption | |
US10380649B2 (en) | System and method for logistic matrix factorization of implicit feedback data, and application to media environments | |
TWI636416B (zh) | 內容個人化之多相排序方法和系統 | |
WO2017084362A1 (zh) | 模型生成方法、推荐方法及对应装置、设备和存储介质 | |
US9251292B2 (en) | Search result ranking using query clustering | |
US11487769B2 (en) | Arranging stories on newsfeeds based on expected value scoring on a social networking system | |
CN109511015B (zh) | 多媒体资源推荐方法、装置、存储介质及设备 | |
WO2014143018A1 (en) | Efficient and fault-tolerant distributed algorithm for learning latent factor models through matrix factorization | |
US10698944B2 (en) | Searches and recommendations using distance metric on space of media titles | |
US11676507B2 (en) | Food description processing methods and apparatuses | |
WO2015185020A1 (en) | Information category obtaining method and apparatus | |
US11875377B2 (en) | Generating and distributing digital surveys based on predicting survey responses to digital survey questions | |
US20110179013A1 (en) | Search Log Online Analytic Processing | |
US20170177739A1 (en) | Prediction using a data structure | |
CN106464682A (zh) | 使用登录到在线服务的状态以用于内容项推荐 | |
CN104809165B (zh) | 一种多媒体文件相关度的确定方法及设备 | |
CN106557469A (zh) | 一种处理数据仓库中数据的方法及装置 | |
CN106503044B (zh) | 兴趣特征分布获取方法及装置 | |
CN113378065B (zh) | 一种基于滑动谱分解确定内容多样性的方法和选定内容的方法 | |
US20240152512A1 (en) | Machine learning for dynamic information retrieval in a cold start setting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15725979 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 24/10/2016) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15725979 Country of ref document: EP Kind code of ref document: A1 |