CN108804492B - Method and device for recommending multimedia objects - Google Patents

Method and device for recommending multimedia objects Download PDF

Info

Publication number
CN108804492B
CN108804492B CN201810259572.XA CN201810259572A CN108804492B CN 108804492 B CN108804492 B CN 108804492B CN 201810259572 A CN201810259572 A CN 201810259572A CN 108804492 B CN108804492 B CN 108804492B
Authority
CN
China
Prior art keywords
multimedia object
multimedia
cluster
objects
recommended
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810259572.XA
Other languages
Chinese (zh)
Other versions
CN108804492A (en
Inventor
郑海洪
高理恩
邹红才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201810259572.XA priority Critical patent/CN108804492B/en
Publication of CN108804492A publication Critical patent/CN108804492A/en
Application granted granted Critical
Publication of CN108804492B publication Critical patent/CN108804492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a multimedia object recommendation method, which comprises the following steps: acquiring a multimedia object vector corresponding to a recommended user in a multimedia object vector matrix; calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user; determining a recommended multimedia object based on the calculated degree of correlation; and recommending the determined recommended multimedia object to the recommended user. With the method, multimedia objects liked by the user can be quickly and accurately recommended to the user by generating a multimedia object vector matrix from the multimedia objects visited by the respective users, calculating the degrees of correlation between the multimedia objects visited by the recommended user and all the collected multimedia objects visited by the respective users based on the multimedia object vector matrix, and then recommending the multimedia objects to the recommended user based on the calculated degrees of correlation.

Description

Method and device for recommending multimedia objects
Technical Field
The present application relates generally to the field of the internet, and more particularly, to a method and apparatus for multimedia object recommendation.
Background
With the development of internet network technology, multimedia objects (multimedia information) including, for example, short videos, movies, music, dramas or pictures, etc., are increasingly presented to users through the internet. For massive multimedia objects provided on the internet, users often need to spend much time and traffic to find the latest updated multimedia objects that users like, so that the time cost and traffic cost of users are very high.
In order to enable a user to conveniently view his/her favorite multimedia objects, an information recommendation method is provided to actively recommend favorite multimedia objects to the user. In the existing information recommendation method, the multimedia platform generally counts the click rate or the attention of all users, and then selects a multimedia object with a higher user click rate to recommend the multimedia object on a home page based on the statistics.
However, in the above recommendation, the user statistics based on are statistics for all users looking for multimedia objects. For a single user, multimedia objects recommended by the multimedia platform still have multimedia objects which are not interested by the single user, so that the single user still cannot directly and quickly find the favorite multimedia objects, thereby resulting in poor user experience.
Disclosure of Invention
In view of the foregoing problems, the present application provides a method and apparatus for multimedia object recommendation, by which multimedia objects that users like can be quickly and accurately recommended to users by collecting multimedia objects accessed by respective users and generating a multimedia object vector matrix using the collected multimedia objects, calculating correlations between the multimedia objects accessed by the recommended users and all the collected multimedia objects accessed by the respective users based on the multimedia object vector matrix, and then recommending the multimedia objects to the recommended users based on the calculated correlations.
According to an aspect of the present application, there is provided a method for multimedia object recommendation, comprising: acquiring multimedia object vectors corresponding to recommended users in a multimedia object vector matrix, wherein the multimedia object vector matrix consists of all corresponding multimedia object vectors generated based on multimedia objects accessed by all users; calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user; determining a recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and recommending the determined recommended multimedia object to the recommended user.
Preferably, in one example of the above aspect, the multimedia object vector is trained based on a unique identification of the corresponding multimedia object using a multimedia object vector training algorithm.
Preferably, in one example of the above aspect, the multimedia object vector training algorithm comprises a word2vec algorithm.
Preferably, in one example of the above aspect, the multimedia objects accessed by the user comprise multimedia objects accessed by the user within a specified time window.
Preferably, in one example of the above aspect, the size of the prescribed time window is determined based on a service scene, a user amount, and/or a user access frequency of the multimedia object.
Preferably, in one example of the above aspect, the method may further comprise: determining multimedia objects which are interested by the recommended user from the multimedia objects accessed by the recommended user, and calculating the correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user comprises: and calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is interested by the recommended user.
Preferably, in an example of the above aspect, determining a multimedia object that is of interest to the recommended user from among the multimedia objects accessed by the recommended user may include: and determining the multimedia objects which are interested by the recommended user according to the playing behavior of the recommended user on the accessed multimedia objects.
Preferably, in an example of the above aspect, determining a multimedia object that is of interest to the recommended user from among the multimedia objects accessed by the recommended user may include: and determining the multimedia objects which are interested by the recommended user according to the playing behavior and the like behavior of the recommended user on the accessed multimedia objects.
Preferably, in an example of the above aspect, determining a recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation degree may include: based on the calculated correlation degree, sequencing the multimedia objects corresponding to the multimedia object vector matrix; and selecting the recommended multimedia object from the sorted multimedia objects.
Preferably, in one example of the above aspect, the multimedia object vector of the multimedia object is generated offline based on a unique identification of the multimedia object.
Preferably, in one example of the above aspect, the method may further comprise: determining a multimedia object which is not interested by the recommended user according to the point stepping behavior of the recommended user on the accessed multimedia object; calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is not interested by the recommended user; determining a multimedia object prohibited from being recommended to the recommended user based on a correlation between a multimedia object vector in the calculated multimedia object vector matrix and a multimedia object vector of a multimedia object that is not of interest to the recommended user; and removing the determined multimedia objects prohibited from being recommended to the recommended user from the recommended multimedia objects.
Preferably, in one example of the above aspect, the method may further comprise: when a recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation, acquiring one or more tags/titles contained in the multimedia objects accessed by the recommended user; selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and determining the multimedia objects in the selected multimedia object clusters as recommended multimedia objects, wherein the multimedia object clusters are obtained by clustering the multimedia objects accessed by the respective users based on the multimedia object vector matrix, and the label/title weight list is generated by using the weights of the labels/titles in the respective multimedia object clusters to reflect the weights of the labels/titles in the respective multimedia object clusters.
Preferably, in one example of the above aspect, the method may further comprise: when a recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation, acquiring one or more tags/titles contained in the multimedia objects accessed by the recommended user; clustering multimedia objects accessed by respective users based on the multimedia object vector matrix to obtain one or more multimedia object clusters; calculating weights of labels/titles contained in the one or more multimedia object clusters in the multimedia clusters; generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster; selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and determining the multimedia objects in the selected multimedia object cluster as recommended multimedia objects.
Preferably, in an example of the above aspect, the weight of the tag/title in each multimedia object cluster is calculated according to the number of times the tag/title appears in the multimedia cluster and the total number of tags/titles included in the multimedia cluster.
Preferably, in an example of the above aspect, the tag/title satisfying a predetermined condition may include: the label/title with the highest weight; or the labels/titles in the top predetermined number of bits or a predetermined ratio are sorted in the label/title arrangement from high to low by weight.
Preferably, in an example of the above aspect, before selecting, from among the clusters of multimedia objects, a cluster of multimedia objects corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia object corresponding to the recommended user, the method may include: calculating the pornographic index of each multimedia object cluster; performing pornographic filtering processing on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters, and selecting a multimedia object cluster corresponding to the maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters may include: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
Preferably, in one example of the above aspect, calculating the pornographic index for each multimedia object cluster may comprise: extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster aiming at each multimedia object cluster; identifying negative words and positive words from the extracted high-frequency words; and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
Preferably, in one example of the above aspect, the multimedia object may include at least one of: short videos, movies, music, drama and pictures.
According to another aspect of the present application, there is provided an apparatus for multimedia object recommendation, comprising: a multimedia object vector acquisition unit for acquiring multimedia object vectors corresponding to recommended users in a multimedia object vector matrix composed of respective corresponding multimedia object vectors generated based on multimedia objects accessed by all users; a first correlation calculation unit, configured to calculate a correlation between a multimedia object vector in the multimedia object vector matrix and a multimedia object vector of a multimedia object accessed by the recommended user; a recommended multimedia object determining unit, configured to determine a recommended multimedia object from multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and the recommending unit is used for recommending the determined recommended multimedia object to the recommended user.
Preferably, in one example of the above aspect, the apparatus may further include: a collecting unit for collecting unique identification of multimedia objects accessed by each user; and the training unit is used for training corresponding multimedia object vectors based on the unique identifiers of the multimedia objects accessed by the users by utilizing a multimedia object vector training algorithm so as to form the multimedia object vector matrix.
Preferably, in one example of the above aspect, the multimedia object vector training algorithm may comprise a word2vec algorithm.
Preferably, in one example of the above aspect, the apparatus may further include: a time window size determining unit, configured to determine the size of a specified time window based on the service scene, the user amount, and/or the user access frequency of the multimedia object, and the multimedia object accessed by the user includes the multimedia object accessed by the user within the specified time window.
Preferably, in one example of the above aspect, the apparatus may further include: an interested multimedia object determining unit, configured to determine a multimedia object that is interested by the recommended user from the multimedia objects accessed by the recommended user, and the first correlation calculating unit is configured to: and calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is interested by the recommended user.
Preferably, in an example of the above aspect, the multimedia object of interest determination unit is configured to: and determining the multimedia objects which are interested by the recommended user according to the playing behavior of the recommended user on the accessed multimedia objects.
Preferably, in an example of the above aspect, the multimedia object of interest determination unit is configured to: and determining the multimedia objects which are interested by the recommended user according to the playing behavior and the like behavior of the recommended user on the accessed multimedia objects.
Preferably, in one example of the above aspect, the recommended multimedia object determining unit may include: the sorting module is used for sorting the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and a selection module for selecting the recommended multimedia object from the sorted multimedia objects.
Preferably, in one example of the above aspect, the apparatus may further include: the uninteresting multimedia object determining unit is used for determining the multimedia object which is not interested by the recommended user according to the point stepping behavior of the recommended user on the accessed multimedia object; a second correlation degree calculating unit, configured to calculate a correlation degree between a multimedia object vector in the multimedia object vector matrix and a multimedia object vector of a multimedia object that is not interested by the recommended user; a recommended prohibited multimedia object determination unit, configured to determine a multimedia object prohibited from being recommended to the recommended user based on a correlation between a multimedia object vector in the calculated multimedia object vector matrix and a multimedia object vector of a multimedia object that is not of interest to the recommended user; and a removing unit for removing the determined multimedia object prohibited from being recommended to the recommended user from the recommended multimedia objects.
Preferably, in one example of the above aspect, the apparatus may further include: a tag/title obtaining unit, configured to obtain one or more tags/titles included in the multimedia objects accessed by the recommended user when the recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation; a selecting unit for selecting a multimedia object cluster corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters according to the tag/title weight list, and the recommending unit is used for recommending the multimedia objects in the selected clusters to the recommended users, wherein the multimedia object clusters are obtained by clustering multimedia objects accessed by respective users based on the multimedia object vector matrix, and the label/title weight list is generated by utilizing the weight of the label/title in each multimedia object cluster and is used for reflecting the weight of the label/title in each multimedia object cluster.
Preferably, in one example of the above aspect, the apparatus may further include: a tag/title obtaining unit, configured to obtain one or more tags/titles included in the multimedia objects accessed by the recommended user when the recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation; a clustering unit, configured to cluster multimedia objects accessed by respective users based on the multimedia object vector matrix to obtain one or more multimedia object clusters; a weight calculation unit for calculating weights of labels/titles contained in the one or more multimedia object clusters in the multimedia cluster; a generating unit for generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster; a selecting unit, configured to select, from the multimedia object clusters according to the tag/title weight list, a multimedia object cluster corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user, and the recommending unit is configured to recommend the multimedia object in the selected cluster to the recommended user.
Preferably, in an example of the above aspect, the tag/title satisfying a predetermined condition may include: the label/title with the highest weight; or the labels/titles in the top predetermined number of bits or a predetermined ratio are sorted in the label/title arrangement from high to low by weight.
Preferably, in one example of the above aspect, the apparatus may further include: the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster; a filtering unit for pornographic filtering processing of the multimedia object clusters based on the calculated pornographic index of each multimedia object cluster, and the selecting unit is used for: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
Preferably, in an example of the above aspect, the pornographic index calculating unit may further include: the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster; the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
According to another aspect of the present application, there is provided a computing device comprising: one or more processors, and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform a method for multimedia object recommendation as described above.
According to another aspect of the present application, there is provided a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method for multimedia object recommendation as described above.
According to another aspect of the present application, there is provided a method for multimedia object recommendation, comprising: clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, the multimedia object vector matrix being composed of respective multimedia object vectors generated based on multimedia objects accessed by all users; calculating weights of labels/titles contained in the one or more multimedia object clusters in the multimedia clusters; generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster; selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and recommending the multimedia objects in the selected multimedia object cluster to the recommended user.
Preferably, in an example of the above aspect, calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia cluster may include: the weight of the label/title in each multimedia object cluster is calculated according to the number of times the label/title appears in the multimedia cluster and the total number of labels/titles included in the multimedia cluster.
Preferably, in an example of the above aspect, the tag/title satisfying a predetermined condition may include: the label/title with the highest weight; or the labels/titles in the top predetermined number of bits or a predetermined ratio are sorted in the label/title arrangement from high to low by weight.
Preferably, in an example of the above aspect, before selecting, from among the clusters of multimedia objects, a cluster of multimedia objects corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia object corresponding to the recommended user, the method may include: calculating the pornographic index of each multimedia object cluster; performing pornographic filtering processing on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters, and selecting a multimedia object cluster corresponding to the maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters may include: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
Preferably, in one example of the above aspect, calculating the pornographic index for each multimedia object cluster may comprise: extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster aiming at each multimedia object cluster; identifying negative words and positive words from the extracted high-frequency words; and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
Preferably, in one example of the above aspect, the multimedia object vector is trained based on a unique identification of the corresponding multimedia object using a multimedia object vector training algorithm.
Preferably, in one example of the above aspect, the multimedia object vector training algorithm comprises a word2vec algorithm.
Preferably, in one example of the above aspect, the multimedia objects accessed by the user comprise multimedia objects accessed by the user within a specified time window.
Preferably, in one example of the above aspect, the size of the prescribed time window is determined based on a service scene, a user amount, and/or a user access frequency of the multimedia object.
Preferably, in one example of the above aspect, the multimedia object may include at least one of: short videos, movies, music, drama and pictures.
According to another aspect of the present application, there is provided an apparatus for multimedia object recommendation, comprising: a clustering unit for clustering multimedia objects accessed by respective users based on a multimedia object vector matrix composed of respective multimedia object vectors generated based on the multimedia objects accessed by all the users to obtain one or more multimedia object clusters; a weight calculation unit for calculating weights of labels/titles contained in the one or more multimedia object clusters in the multimedia cluster; a generating unit for generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster; a selecting unit, configured to select, from the multimedia object clusters according to the tag/title weight list, a multimedia object cluster corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user; and the recommending unit is used for recommending the multimedia objects in the selected clusters to the recommended users.
Preferably, in one example of the above aspect, the weight calculation unit is configured to: the weight of the label/title in each multimedia object cluster is calculated according to the number of times the label/title appears in the multimedia cluster and the total number of labels/titles included in the multimedia cluster.
Preferably, in an example of the above aspect, the tag/title satisfying a predetermined condition may include: the label/title with the highest weight; or the labels/titles in the top predetermined number of bits or a predetermined ratio are sorted in the label/title arrangement from high to low by weight.
Preferably, in one example of the above aspect, the apparatus may further include: the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster; a filtering unit for pornographic filtering processing of the multimedia object clusters based on the calculated pornographic index of each multimedia object cluster, and the selecting unit is used for: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
Preferably, in an example of the above aspect, the pornographic index calculating unit may further include: the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster; the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
Preferably, in one example of the above aspect, the apparatus may further include: the collecting unit is used for collecting the unique identification of the multimedia accessed by each user; and the training unit is used for training corresponding multimedia object vectors based on the unique identifiers of the multimedia objects accessed by the users by utilizing a multimedia object vector training algorithm so as to form the multimedia object vector matrix.
Preferably, in one example of the above aspect, the apparatus may further include: a time window size determining unit, configured to determine the size of a specified time window based on the service scene, the user amount, and/or the user access frequency of the multimedia object, and the multimedia object accessed by the user includes the multimedia object accessed by the user within the specified time window.
According to another aspect of the present application, there is provided a computing device comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform a method for multimedia object recommendation as described above.
According to another aspect of the present application, there is provided a non-transitory machine-readable storage medium storing executable instructions that when executed cause the one or more processors to perform a method for multimedia object recommendation as described above.
According to another aspect of the present application, there is provided a method for multimedia object recommendation, comprising: clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, the multimedia object vector matrix being composed of respective multimedia object vectors generated based on multimedia objects accessed by all users; calculating the pornographic index of each multimedia object cluster; and based on the calculated pornographic indexes of the multimedia object clusters, carrying out pornographic filtering processing on the multimedia object clusters so as to be used for recommending the multimedia objects.
Preferably, in one example of the above aspect, calculating the pornographic index for each multimedia object cluster may comprise: extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster aiming at each multimedia object cluster; identifying negative words and positive words from the extracted high-frequency words; and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
According to another aspect of the present application, there is provided an apparatus for multimedia object recommendation, comprising: a clustering unit for clustering multimedia objects accessed by respective users based on a multimedia object vector matrix composed of respective multimedia object vectors generated based on the multimedia objects accessed by all the users to obtain one or more multimedia object clusters; the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster; and the filtering unit is used for carrying out pornographic filtering processing on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters so as to be used for recommending the multimedia objects.
Preferably, in one example of the above aspect, the pornographic index calculating unit may include: the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster; the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
According to another aspect of the present application, there is provided a computing device comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform a method for multimedia object recommendation as described above.
According to another aspect of the present application, there is provided a non-transitory machine-readable storage medium storing executable instructions that when executed cause the one or more processors to perform a method for multimedia object recommendation as described above.
Drawings
A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
FIG. 1 shows a flow diagram of a method for generating a multimedia object vector matrix according to the present application;
FIG. 2 shows a flow diagram of a method for multimedia object recommendation according to an embodiment of the present application;
FIG. 3 illustrates a flow diagram for one implementation example of the process of FIG. 2 for determining candidate recommended multimedia objects based on relevance;
FIG. 4 shows a block diagram of an apparatus for multimedia object recommendation according to an embodiment of the present application;
FIG. 5 is a block diagram illustrating an example of one implementation of a recommended multimedia object determination unit in the apparatus of FIG. 4;
FIG. 6 shows a flow diagram of a method for multimedia object recommendation according to another embodiment of the present application;
FIG. 7 shows a block diagram of an apparatus for multimedia object recommendation according to another embodiment of the present application;
FIG. 8 shows a flow diagram of a method for multimedia object recommendation according to another embodiment of the present application;
FIG. 9 shows a block diagram of an apparatus for multimedia object recommendation according to another embodiment of the present application;
FIG. 10 illustrates a block diagram of a computing device for implementing multimedia object recommendation in accordance with an embodiment of the present application;
FIG. 11 shows a flow diagram of a method for multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user, according to another embodiment of the present application;
FIG. 12 illustrates a block diagram of an apparatus for multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user according to another embodiment of the present application;
FIG. 13 illustrates a block diagram of a computing device for implementing multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user, according to another embodiment of the present application;
FIG. 14 shows a flow diagram of a method for pornography filtering of multimedia objects in accordance with another embodiment of the present application;
FIG. 15 shows a flowchart of the process for calculating a pornograph for a cluster in FIG. 14;
FIG. 16 shows a block diagram of an apparatus for pornography filtering of multimedia objects according to another embodiment of the present application;
FIG. 17 shows a block diagram of one implementation example of the pornography computing unit of FIG. 16; and
FIG. 18 illustrates a block diagram of a computing device for implementing pornographic filtering processing for multimedia objects according to another embodiment of the present application.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.
As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.
Embodiments of the method and apparatus for multimedia object recommendation of the present application are now described with reference to the drawings.
Before proceeding with a multimedia object recommendation scheme according to the present application, a multimedia object vector matrix needs to be created (or generated) on the multimedia information platform (i.e., server) side based on the unique identification of the multimedia object accessed by the user. Fig. 1 shows a flow chart of a method for generating a multimedia object vector matrix according to the present application.
As shown in fig. 1, at block S110, the unique identifications of all multimedia objects accessed by the user are collected at the server side. Here, the multimedia object includes at least one of short video, movie, music, series, picture, and the like. The unique identifier refers to identification information for uniquely identifying the multimedia object, for example, an ID for a user to click or play the multimedia object. The ID may be, for example, a mixture of letters and numbers, such as AA337UC 566. The collection action for the access of the multimedia objects of all the users can be actively executed by the server, or can be collected by the client of each user and then sent to the server in a wired or wireless communication mode. All users refer to all historical accessing users, including recommended users, for which access to multimedia objects has occurred in the past time. The collection of unique identifications for all user-accessed multimedia objects may occur within a specified time window, i.e. a user-accessed multimedia object refers to a multimedia object that a user accesses within a specified time window. In one example, the size of the prescribed time window may be a predetermined prescribed time window, such as a day, a week, or a month. Preferably, in another example, the size of the prescribed time window may be determined based on a service scene of the multimedia object, a user amount, and/or a user access frequency. For example, for a short video recommendation scene, the size of a prescribed time window for a scene in excess of ten million user levels is determined to be one day; the size of a prescribed time window for a million user-scale scene is determined to be 3 to 7 days; and the size of a prescribed time window of a scene of the order of one hundred thousand users is determined to be 10 days, and the like. In other examples, the size of the prescribed time window may be adjusted based on the actual circumstances.
After collecting the unique identifications of the multimedia objects accessed by all users as above, at block S120, on the server side, a multimedia object vector training algorithm is utilized to generate corresponding multimedia object vectors based on the collected multimedia objects accessed by each user to compose the multimedia object vector matrix. For example, a multimedia object vector training algorithm may be utilized to generate corresponding multimedia object vectors based on the collected unique identifications of the respective user-accessed multimedia objects, and then compose the multimedia object vector matrix from the generated respective multimedia object vectors. Here, the multimedia object vector training algorithm may include a word2vec algorithm.
The word2vec algorithm is an NLP tool introduced by google in 2013, and is characterized in that words in input information are vectorized, and then the relationship between the words is quantitatively measured by using the vector distance between the word vectors. The word2vec algorithm belongs to a machine learning algorithm, the input of which is text information, and the output of which is a word vector, i.e., a feature vector of a word in the text information. In the application, the unique identifier (for example, the ID of the multimedia object) of the multimedia object accessed by each user is input into a word2vec algorithm model as input information of the word2vec algorithm for training, and then a corresponding multimedia object vector, for example, an N-dimensional vector is obtained. For example, for a multimedia object Ai, its multimedia object vector is { a }i1,ai2,……,aiNWherein each element a in the multimedia object vectori1,ai2,……,aiNIs [ -1.0,1.0]And the values of the floating point numbers in the space are obtained by training through a word2vec algorithm. Here, the dimension N of the multimedia object vector may be set in advance, for example, typically to 200 dimensions. After obtaining the respective multimedia object vectors as described above, the obtained multimedia object vectors are combined into a multimedia object vector matrix. Assuming there are M multimedia objects, the multimedia object vector matrix is
Figure BDA0001609905920000151
The pair of media object vector matrices comprises M multimedia object vectors, i.e.,
Figure BDA0001609905920000152
to
Figure BDA0001609905920000153
Each multimedia object vector comprises N elements. In addition, the multimedia object may be recorded corresponding to a user who accesses it.
It is noted here that a multimedia object can only be obtained as a vector of multimedia objects even if the multimedia object is accessed by a plurality of different users. But for the multimedia object, it is recorded that the multimedia object corresponds to the plurality of different accessing users. For example, multimedia object A1Accessed by users U1 and U2, the resulting multimedia object vector is { a }11,a12,……,a1NAnd the multimedia object A1Recorded as corresponding to U1 and U2.
Furthermore, it is noted that the generation of the above-mentioned multimedia object vectors and multimedia object vector matrices may be generated off-line based on the multimedia objects (e.g. unique identifications of multimedia objects) accessed by the respective users.
After obtaining the multimedia object vector matrix as above, the apparatus (or system) for multimedia object recommendation may make a multimedia object recommendation to a recommended user based on the obtained multimedia object vector matrix. FIG. 2 shows a flow diagram of a method for multimedia object recommendation according to an embodiment of the present application.
As shown in fig. 2, in block S210, a multimedia object vector corresponding to a recommended user in the multimedia object vector matrix is obtained. For example, for the user U1, the corresponding multimedia object may be obtained, and then the corresponding one or more multimedia object vectors, such as multimedia object vectors, may be obtained
Figure BDA0001609905920000161
And the like. It is noted here that the multimedia object vector
Figure BDA0001609905920000162
By way of example only, other multimedia object vectors may also be included, such as
Figure BDA0001609905920000163
And the like.
Next, at block S220, the multimedia object vector matrix | A is calculatedMNMultimedia object vector in |
Figure BDA0001609905920000164
To
Figure BDA0001609905920000165
Multimedia object vector with the multimedia object accessed by the recommended user U1
Figure BDA0001609905920000166
(and
Figure BDA0001609905920000167
) The degree of correlation between them. Here, the degree of correlation between two multimedia object vectors can be characterized by a vector distance between the two multimedia object vectors. Thus, it can be based on the multimedia object vector matrix | AMNComputing vector to obtain multimedia object vector
Figure BDA0001609905920000168
To
Figure BDA0001609905920000169
And multimedia object vector
Figure BDA00016099059200001610
(and
Figure BDA00016099059200001611
) The vector distance between. How to calculate the vector distance between two vectors based on the vector matrix is well known in the art and will not be described herein.
Then, at block S230, a vector | a is calculated from the multimedia object vector matrix | a based on the calculated correlation (i.e., vector distance)MNThe multimedia object A corresponding to |1To AMDetermining the recommended multimedia object. How to base on correlationDetermining the recommended multimedia object may be accomplished in any suitable manner known in the art. Fig. 3 shows a flowchart according to one implementation example of the process in fig. 2 for determining candidate recommended multimedia objects based on relevance.
As shown in fig. 3, after the correlation degree is calculated as described above, the multimedia object a is subjected to correlation based on the calculated correlation degree at block S2311To AMSorting is performed, such as sorting by relevance from high to low or from low to high. Then, in block S235, a recommended multimedia object is selected from the sorted multimedia objects. For example, in the case of ranking from high to low in the degree of correlation, a multimedia object ranked a predetermined number of digits or a predetermined proportion is selected as the recommended multimedia object. For example, the multimedia objects ranked in the top 5 digits are selected, or the multimedia objects ranked in the top 10% are selected.
Further, in one example, determining the recommended multimedia object based on the relevance may further include: and determining the multimedia object with the calculated correlation degree larger than a preset threshold value as a recommended multimedia object.
After the recommended multimedia object is determined as above, the determined recommended multimedia object is recommended to the recommended user at block S240.
A method for multimedia object recommendation according to an embodiment of the present application is described above with reference to fig. 1 to 3, and an apparatus for multimedia object recommendation according to an embodiment of the present application is described below with reference to fig. 4 and 5.
Fig. 4 shows a block diagram of an apparatus 400 for multimedia object recommendation (hereinafter simply referred to as multimedia object recommendation apparatus 400) according to an embodiment of the present application.
As shown in fig. 4, the multimedia object recommending apparatus 400 may include a multimedia object vector acquiring unit 430, a first correlation calculating unit 440, a recommended multimedia object determining unit 450, and a recommending unit 460.
The multimedia object vector obtaining unit 430 is configured to obtain a multimedia object vector corresponding to the recommended user in a multimedia object vector matrix, where the multimedia object vector matrix is generated by using the method described with reference to fig. 1.
After obtaining the multimedia object vector corresponding to the recommended user, the first correlation calculation unit 440 calculates the correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user.
After the first correlation calculation unit 440 calculates the correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user, the recommended multimedia object determination unit 450 determines the recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation.
As described above with reference to fig. 2, the recommended multimedia object determining unit 450 may determine the recommended multimedia object based on the degree of correlation in any suitable manner in the art. For example, the recommended multimedia object determining unit 450 may determine a multimedia object whose calculated degree of correlation is greater than a predetermined threshold as a recommended multimedia object. In other examples, the recommended multimedia object determining unit 450 may also determine the recommended multimedia object based on the calculated correlation in other manners.
Fig. 5 is a block diagram illustrating an implementation example of the recommended multimedia object determination unit in fig. 4. As shown in fig. 5, the recommended multimedia object determination unit 450 may include a ranking module 451 and a selection module 455.
The ranking module 451 is arranged to rank the multimedia object A based on the calculated relevance1To AMSorting is performed, such as sorting by relevance from high to low or from low to high. The selection module 455 then selects a recommended multimedia object from the sorted multimedia objects. For example, in the case of ranking from high to low in the degree of correlation, a multimedia object ranked a predetermined number of digits or a predetermined proportion is selected as the recommended multimedia object. For example, the multimedia objects ranked in the top 5 digits are selected, or the multimedia objects ranked in the top 10% are selected.
Then, the recommending unit 460 recommends the determined recommended multimedia object to the recommended user.
Furthermore, the multimedia object recommending apparatus 400 may further preferably include a collecting unit 410 and a training unit 420.
The collecting unit 410 is used to collect the unique identification of the multimedia objects accessed by the respective users. Here, the multimedia object includes at least one of short video, movie, music, series, picture, and the like. The unique identifier refers to identification information for uniquely identifying the multimedia object, for example, an ID for a user to click or play the multimedia object. The collection action for the access of the multimedia objects of all the users can be actively executed by the server, or can be collected by the client of each user and then sent to the server in a wired or wireless communication mode. All users refer to all historical accessing users, including recommended users, for which access to multimedia objects has occurred in the past time.
The collection of unique identifications for all user-accessed multimedia objects may occur within a specified time window, i.e. a user-accessed multimedia object refers to a multimedia object that a user accesses within a specified time window. In one example, the size of the prescribed time window may be a predetermined prescribed time window, such as a day, a week, or a month. In some examples, the size of the prescribed time window may be adjusted based on the actual circumstances. Preferably, in another example, the size of the prescribed time window may be determined based on a service scene of the multimedia object, a user amount, and/or a user access frequency. For example, for a short video recommendation scene, the size of a prescribed time window for a scene in excess of ten million user levels is determined to be one day; the size of a prescribed time window for a million user-scale scene is determined to be 3 to 7 days; and the size of a prescribed time window of a scene of the order of one hundred thousand users is determined to be 10 days, and the like. In this another example, the multimedia object recommending apparatus 400 may further include a time window size determining unit 470 for determining the size of the prescribed time window based on the service scene, the user amount and/or the user access frequency of the multimedia object.
The training unit 420 is configured to train a corresponding multimedia object vector based on the unique identifier of the multimedia object accessed by each user by using a multimedia object vector training algorithm to form the multimedia object vector matrix. The operations performed by the training unit 420 may refer to the description of the generation of the multimedia object vector matrix made above with reference to fig. 1.
With the multimedia object recommendation apparatus and method described with reference to fig. 1 to 5, by collecting multimedia objects accessed by respective users and generating a multimedia object vector matrix using the collected multimedia objects, calculating correlations between the multimedia objects accessed by the recommended users and all the collected multimedia objects accessed by the respective users based on the multimedia object vector matrix, and then recommending the multimedia objects to the recommended users based on the calculated correlations, it is possible to quickly and accurately recommend multimedia objects liked by the users to the users.
In addition, by only collecting the unique identification of the multimedia object accessed by each user in the specified time window, the calculation amount required by the multimedia object recommendation process can be greatly reduced, thereby improving the multimedia object recommendation efficiency. In addition, the size of the specified time window is determined according to the service scene, the user amount and the user access frequency of the multimedia object, so that the accuracy of recommending the multimedia object liked by the user to the user can be ensured while the calculation amount required by the multimedia object recommending process is reduced.
FIG. 6 shows a flow diagram of a method for multimedia object recommendation according to another embodiment of the present application. The method shown in the embodiment in fig. 6 is an improvement over the method for multimedia object recommendation shown in fig. 2.
As can be seen by comparing fig. 6 with fig. 2, the method in fig. 6 differs from fig. 2 only in that the operation of block S215 is added, while the operation of block S220 is adapted. For simplicity of description, only the flow different from that of fig. 2 is described with respect to fig. 6, and the description of the same flow is omitted here.
As shown in fig. 6, after the multimedia object vector corresponding to the recommended user is obtained in block S210 as described above with reference to fig. 2, in block S215, the multimedia object that the recommended user is interested in is determined from the multimedia objects accessed by the recommended user.
In one example of the present application, the multimedia object in which the recommended user is interested may be determined according to a play behavior of the recommended user on the accessed multimedia object. For example, for a multimedia object accessed by the recommended user, once the multimedia object is played, the multimedia object is considered to be a multimedia object of interest to the recommended user.
In addition, preferably, in another example of the present application, the multimedia object in which the recommended user is interested may also be determined according to a play behavior and a like behavior of the recommended user on the accessed multimedia object. In this case, it may be determined whether the recommended user is interested in according to whether the accessed multimedia object is played, and the degree of interest of the recommended user in the multimedia object may be determined according to the praise behavior of the recommended user for the multimedia object, which may be measured by the number of praise times or other measure of praise behavior. For example, in one example, the interest level corresponding to the multimedia object in which only the play action exists may be defined as 1, and the interest level corresponding to the multimedia object may be increased by a predetermined value, such as 0.5, every time the approval action occurs, such that the interest level of the multimedia object becomes 1.5. If K complimentary actions occur, the level of interest of the multimedia object becomes 1+ 0.5K.
After determining the multimedia object of interest to the recommended user as above, at block S220', a correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object of interest to the recommended user is calculated. In one example of the present application, the method of calculating the correlation in block S200' may be similar to the way of calculating the correlation in block S220 in fig. 2. In another example of the present application, in the case that the multimedia object interested by the recommended user is determined according to the play behavior and the like of the recommended user on the accessed multimedia object, the correlation calculation in the block S200' further needs to perform weighting processing by using the interest degree, for example, the correlation calculated in the correlation calculation manner with the block S220 in fig. 2 may be multiplied by the corresponding interest degree to obtain the final correlation. In other examples of the present application, the weighting process for the calculated correlation degree may be performed by using other weighting processes applicable in the art based on the interest level of the recommended user in the multimedia object.
After the correlation degree is calculated as above, a recommended multimedia object is determined in block S230, and the determined recommended multimedia object is recommended to the recommended user in block S240.
Fig. 7 shows a block diagram of an apparatus 400 'for multimedia object recommendation (hereinafter simply referred to as multimedia object recommendation apparatus 400') according to another embodiment of the present application.
As shown in fig. 7, the multimedia object recommending apparatus 400 'may include a multimedia object vector acquiring unit 430, a multimedia object of interest determining unit 435, a first correlation calculating unit 440', a recommended multimedia object determining unit 450, and a recommending unit 460.
The operations and functions of the multimedia object vector obtaining unit 430, the recommended multimedia object determining unit 450, and the recommending unit 460 are completely the same as those of the multimedia object vector obtaining unit 430, the recommended multimedia object determining unit 450, and the recommending unit 460 in the multimedia object recommending apparatus 400 described with reference to fig. 4, and thus will not be described again.
The interested multimedia object determining unit 435 is configured to determine a multimedia object that is interested by the recommended user from the multimedia objects accessed by the recommended user. In one example of the present application, the multimedia object in which the recommended user is interested may be determined according to a play behavior of the recommended user on the accessed multimedia object. For example, for a multimedia object accessed by the recommended user, once the multimedia object is played, the multimedia object is considered to be a multimedia object of interest to the recommended user. In addition, preferably, in another example of the present application, the multimedia object in which the recommended user is interested may also be determined according to a play behavior and a like behavior of the recommended user on the accessed multimedia object. In this case, it may be determined whether the recommended user is interested in according to whether the accessed multimedia object is played, and the degree of interest of the recommended user in the multimedia object may be determined according to the praise behavior of the recommended user for the multimedia object, which may be measured by the number of praise times or other measure of praise behavior. For example, in one example, the interest level corresponding to the multimedia object in which only the play action exists may be defined as 1, and the interest level corresponding to the multimedia object may be increased by a predetermined value, such as 0.5, every time the approval action occurs, such that the interest level of the multimedia object becomes 1.5. If K complimentary actions occur, the level of interest of the multimedia object becomes 1+ 0.5K.
After determining the multimedia object of interest to the recommended user as above, the first correlation calculation unit 440' calculates the correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object of interest to the recommended user. In an example of the present application, the calculation method of the first correlation calculation unit 440' may be similar to the correlation calculation manner of the first correlation calculation unit 440 in fig. 4.
In addition, in another example of the present application, in the case that the multimedia object interested by the recommended user is determined according to the play behavior and the like of the recommended user for the accessed multimedia object, when the first relevance calculating unit 440' performs the relevance calculation, it is also necessary to perform a weighting process using the interest degree, for example, the relevance calculated in the relevance calculation manner with the first relevance calculating unit 440 in fig. 4 may be multiplied by the corresponding interest degree to obtain the final relevance. In other examples of the present application, the weighting process for the calculated correlation degree may be performed by using other weighting processes applicable in the art based on the interest level of the recommended user in the multimedia object.
Further, similar to fig. 4, in a preferred example, the multimedia object recommending apparatus 400' may further include a collecting unit 410 and a training unit 420. The operation and function of the collection unit 410 and the training unit 420 in fig. 7 are identical to those of the collection unit 410 and the training unit 420 described with reference to fig. 4, and are not described again.
Furthermore, the multimedia object recommending apparatus 400' may further preferably include a time window size determining unit 470. The operation and function of the time window size determining unit 470 in fig. 7 are completely the same as those of the time window size determining unit 470 described with reference to fig. 4, and are not described again here.
With the multimedia object recommendation apparatus and method described with reference to fig. 6 to 7, by calculating the correlation between the multimedia object of interest to the recommended user and all the collected multimedia objects accessed by the respective users based on the multimedia object vector matrix, and then recommending the multimedia object to the recommended user based on the calculated correlation, the accuracy of recommending the multimedia object liked by the user to the user can be improved.
FIG. 8 shows a flow diagram of a method for multimedia object recommendation according to another embodiment of the present application. The method shown in the embodiment in fig. 8 is an improvement over the method for multimedia object recommendation shown in fig. 2.
As can be seen by comparing fig. 8 with fig. 2, the method in fig. 8 differs from fig. 2 in that the operations of blocks S231-S237 are added, while the operation of block S240 is adaptively modified. For simplicity of description, only the flow different from that of fig. 2 is described with respect to fig. 8, and the description of the same flow is omitted here.
As shown in fig. 8, after determining the recommended multimedia object in block S230 as described above with reference to fig. 2, multimedia objects that are not of interest to the recommended user are determined according to a point-stepping behavior of the recommended user on the accessed multimedia object in block S231. Then, in block S233, a correlation between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object that is not of interest to the recommended user is calculated. The correlation calculation here is similar to the correlation calculation described with reference to block S220 in fig. 2.
Next, in block S235, a multimedia object prohibited from being recommended to the recommended user is determined based on a correlation between the multimedia object vector in the calculated multimedia object vector matrix and the multimedia object vector of the multimedia object that is not of interest to the recommended user. Here, the determination manner in which the recommendation of the multimedia object is prohibited may be similar to the determination manner described with reference to block S230 in fig. 2.
After determining the multimedia objects prohibited from being recommended to the recommended user, at block S237, the multimedia objects prohibited from being recommended to the recommended user determined at block 235 are removed from the recommended multimedia objects determined at block S230. Then, in block S240', the recommended multimedia object subjected to the above-mentioned removal processing is recommended to the recommended user.
It is noted here that the order of the blocks S231-S237 shown in fig. 8 is merely an example, and the operations of the blocks S231-S237 may occur anywhere between the blocks S210 and S240', as long as the precedence order between the operations of the blocks S231-S237 is maintained.
Fig. 9 shows a block diagram of an apparatus 400 for multimedia object recommendation (hereinafter referred to as a multimedia object recommendation apparatus 400 ") according to another embodiment of the present application.
As shown in fig. 9, the multimedia object recommending apparatus 400 ″ may include a multimedia object vector acquiring unit 430, a first degree of correlation calculating unit 440, a recommended multimedia object determining unit 450, a non-interested multimedia object determining unit 451, a second degree of correlation calculating unit 453, a recommendation prohibition multimedia object determining unit 455, a removing unit 457, and a recommending unit 460'.
The operations and functions of the multimedia object vector obtaining unit 430, the first correlation degree calculating unit 440 and the recommended multimedia object determining unit 450 are completely the same as those of the multimedia object vector obtaining unit 430, the first correlation degree calculating unit 440 and the recommended multimedia object determining unit 450 in the multimedia object recommending apparatus 400 described with reference to fig. 4, and are not described again here.
The uninteresting multimedia object determining unit 451 is configured to determine a multimedia object that is uninterested by the recommended user according to a click-on behavior of the recommended user for the accessed multimedia object. The second correlation degree calculation unit 453 is used to calculate the correlation degree between the multimedia object vectors in the multimedia object vector matrix and the multimedia object vectors of the multimedia objects that are not of interest to the recommended user. The correlation calculation manner here is similar to the correlation calculation manner described with reference to the first correlation calculation unit 440 in fig. 4.
The prohibited recommended multimedia object determination unit 455 is configured to determine a multimedia object prohibited from being recommended to the recommended user based on a correlation between the multimedia object vector in the calculated multimedia object vector matrix and the multimedia object vector of the multimedia object that is not of interest to the recommended user. Here, the determination manner of prohibiting the recommendation of the multimedia object may be similar to the determination manner described with reference to the recommended multimedia object determination unit 450 in fig. 4.
The removing unit 457 is configured to remove the multimedia object prohibited from being recommended to the recommended user, determined by the recommended multimedia object determining unit 455, from the recommended multimedia objects determined by the recommended multimedia object determining unit 450. Then, the recommending unit 460' recommends the recommended multimedia object subjected to the above-mentioned removal processing to the recommended user.
Further, similar to fig. 4, in a preferred example, the multimedia object recommendation apparatus 400 ″ may further include a collection unit 410 and a training unit 420. The operation and function of the collection unit 410 and the training unit 420 in fig. 9 are identical to those of the collection unit 410 and the training unit 420 described with reference to fig. 4, and are not described again.
Furthermore, the multimedia object recommendation device 400 ″ may further preferably include a time window size determining unit 470. The operation and function of the time window size determining unit 470 in fig. 9 are completely the same as those of the time window size determining unit 470 described with reference to fig. 4, and are not described again here.
With the multimedia object recommendation apparatus and method described with reference to fig. 8 to 9, the accuracy of recommending a multimedia object that is liked by a user to the user can be further improved by determining a multimedia object that is not interested by the recommended user, determining a recommendation prohibition multimedia object based on the correlation between the multimedia object that is not interested by the recommended user and all the collected multimedia objects accessed by the respective users, and then removing the determined recommendation prohibition multimedia object from the recommended multimedia objects obtained in the flow of fig. 2 to recommend to the recommended user.
Furthermore, fig. 8 shows a modification of fig. 2. In other examples of the present application, similar improvement may be made on the basis of fig. 6, that is, a recommendation prohibition multimedia object is removed from recommended multimedia objects determined based on the correlation between the multimedia object in which the recommended user is interested and all the collected multimedia objects accessed by the respective users, and then the recommended multimedia object after the removal processing is recommended to the recommended user.
It should be understood that the modules and corresponding functions described with reference to fig. 1 to 9 are for illustration and not for limitation, and that specific functions may be implemented in different modules or in a single module.
Embodiments of a method and apparatus for multimedia object recommendation according to the present application are described above with reference to fig. 1 to 9. The multimedia object recommendation apparatus described above may be implemented by hardware, software, or a combination of hardware and software.
In the present application, the multimedia object recommending apparatus 400 (and 400' and 400 ") may be implemented by using a computing device. FIG. 10 illustrates a block diagram of a computing device for implementing multimedia object recommendation in accordance with an embodiment of the present application. According to one embodiment, computing device 1000 may include one or more processors 1002, with processors 1002 executing one or more computer-readable instructions (i.e., the elements described above as being implemented in software) stored or encoded in a computer-readable storage medium (i.e., memory 1004).
In one embodiment, computer-executable instructions are stored in the memory 1004 that, when executed, cause the one or more processors 1002 to: acquiring multimedia object vectors corresponding to recommended users in a multimedia object vector matrix, wherein the multimedia object vector matrix consists of all corresponding multimedia object vectors generated based on multimedia objects accessed by all users; calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user; determining a recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and recommending the determined recommended multimedia object to the recommended user.
It should be appreciated that the computer-executable instructions stored in the memory 1004, when executed, cause the one or more processors 1002 to perform the various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present application.
According to one embodiment, a program product, such as a non-transitory machine-readable medium, is provided. The non-transitory machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-9 in various embodiments of the present application.
FIG. 11 shows a flow diagram of a method for multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user, according to another embodiment of the present application. Similar to the method illustrated in fig. 2, prior to performing the method in fig. 11, a multimedia object vector matrix needs to be created (or generated) on the multimedia information platform (i.e., server) side based on the unique identification of the multimedia object accessed by the user. The creation process of the vector matrix of multimedia objects can refer to the description of fig. 1 above.
As shown in fig. 11, in block S1110, one or more multimedia object clusters are obtained by clustering multimedia objects accessed by respective users based on a multimedia object vector matrix. Here, any clustering algorithm applicable in the art, such as the k-means algorithm, may be employed for the clustering of multimedia objects.
Next, at block S1120, weights of the tags/titles contained in the obtained one or more multimedia object clusters in the multimedia cluster are calculated. In one example, calculating the weight of the tags/titles contained in the one or more multimedia object clusters in the multimedia cluster may include: and calculating the weight of the label/title in the multimedia object cluster according to the number of times of the label/title appearing in the multimedia object cluster and the total number of the labels/titles included in the multimedia object cluster. For example, assuming that a multimedia object cluster comprises 20 multimedia objects, each comprising 50 tags/titles, the total number of tags/titles comprised by the multimedia object cluster is the total number of different tags/titles comprised. For example, if the tags/titles contained in each multimedia object are different from the tags/titles contained in other multimedia objects, the total number of the tags/titles contained in the multimedia object cluster is 1000. If there is identity, the same tag/title needs to be removed.
Then, at block S1130, a tag/title weight list reflecting the weights of the tags/titles in the respective multimedia object clusters is generated using the weights of the tags/titles in the respective multimedia object clusters. For example, an inverted index of each tag/title may be constructed based on its weight in the respective multimedia object cluster, resulting in a list of tag/title weights for the tag/title in different clusters. For example, if 1 video has three tags tagA, tagB and tagC, each tag will correspond to multiple clusters, assuming tagA is 0.1 in cluster clusterX and tagA is 0.2 in cluster clusterY. tagB has a weight of 0.3 in cluster clusterZ and tagB has a weight of 0.2 in cluster clusterX. the weight of tagC in cluster clusterU is 0.5, the weight of tagC in cluster clusterV is 0.2, and the weight of tagC in cluster clusterX is 0.1. We then mean the following:
tagA:{clusterY:0.2,clusterX:0.1}
tagB:{clusterZ:0.3,clusterX:0.2}
tagC:{clusterU:0.5,clusterV:0.2,clusterX:0.1}
the clustering weight corresponding to each label can be quickly retrieved through the inverted index.
After the tag/title weight list is obtained as above, in block S1140, a multimedia object cluster corresponding to the maximum weight of the tag/title satisfying a predetermined condition among the one or more tags/titles included in the multimedia objects accessed by the recommended user is selected from the multimedia object clusters according to the tag/title weight list. The tags/titles satisfying the predetermined condition include: the label/title with the highest weight; or the labels/titles in the top predetermined number of bits or a predetermined ratio are sorted in the label/title arrangement from high to low by weight.
For example, assume that the multimedia object accessed by the recommended user includes A1、A2And A3Wherein A is1Is LA1,LB1And LC1,A2Is LA2、LB2And LC2And A is3Is LA3,LB3And LC3. The multimedia objects are clustered into N clusters, K1,……,KN。LA1In cluster K1,……,KNRespectively has a weight value of WA11,……,WA1N(suppose WA)11Maximum value of) of (a), LB1In cluster K1,……,KNThe weighted values in (1) are respectively WB11,……,WB1N(suppose WB14Maximum value of) LC1In cluster K1,……,KNIn each case of WC11,……,WC1N(false)Is provided with WC15Maximum value of) LA), LA2In cluster K1,……,KNRespectively has a weight value of WA21,……,WA2N(suppose WA)22Maximum value of) of (a), LB2In cluster K1,……,KNThe weighted values in (1) are respectively WB21,……,WB2N(suppose WB26Maximum value of) LC2In cluster K1,……,KNIn each case of WC21,……,WC2N(suppose WC)27Maximum value of) LA), LA3In cluster K1,……,KNRespectively has a weight value of WA31,……,WA3N(suppose WA)33Maximum value of) of (a), LB3In cluster K1,……,KNThe weighted values in (1) are respectively WB31,……,WB3N(suppose WB38Maximum value of) LC3In cluster K1,……,KNIn each case of WC31,……,WC3N(suppose WC)39Maximum value of).
In the above manner, first, the maximum weight value of each tag in the N clusters is determined as the weight value of the tag, i.e., tag LA1Has a weight value of WA11Label LB1Weighted value of (1) is WB14Label LC1Weight value of WC15Label LA2Has a weight value of WA22Label LB2Weighted value of (1) is WB26Label LC2Weight value of WC27Label LA3Has a weight value of WA33Label LB3Weighted value of (1) is WB38And a label LC3Weight value of WC39
Next, based on the weight value of each tag/title, a tag/title satisfying a predetermined condition is determined. For example, the tag/title having the highest weight value is determined, or after the tags are sorted from high to low in weight value based on the weight value of the tags, the tags/titles sorted in the first predetermined number of bits or a predetermined ratio are determined. For example, assume a high to low ordering of WA11>WB14>WC15>WA22>WB26>WC27>WA33>WB38>WC39. If the predetermined condition is the first three digits of the tag/title, then the determined tag/title is LA1、LB1And LC1
After the tags/titles are determined as above, the multimedia cluster corresponding to the maximum weight of the determined tags/titles, i.e., K, is selected1、K4And K5
Then, in block S1150, the multimedia objects in the selected multimedia object cluster are recommended to the recommended user.
Further, before block 1120, it may further include: the process of removing the long-tailed labels is performed for each multimedia object cluster. Specifically, for each multimedia object cluster, the number of times each tag/title in the multimedia object cluster appears in the cluster is obtained. Then, the Top ranked tags/titles, such as the tags/titles ranked to Top100, are selected from the tags/titles of the multimedia object cluster.
It is noted here that the multimedia object recommendation method described in fig. 11 may be combined with the methods shown in fig. 2, 6 and 8. Specifically, in block S230 in fig. 2, 6 and 8, if no recommended multimedia object is found among the multimedia objects in the multimedia object vector matrix based on the calculated correlation degrees, for example, the calculated correlation degrees are all less than a predetermined threshold, the flow shown in fig. 11 is initiated to determine a multimedia object to be recommended to the recommended user.
Fig. 12 shows a block diagram of an apparatus 1200 for multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user (hereinafter simply referred to as a multimedia object recommendation apparatus 1200) according to the present application.
As shown in fig. 12, the multimedia object recommending apparatus 1200 includes a clustering unit 1230, a weight calculating unit 1240, a generating unit 1250, a selecting unit 1260, and a recommending unit 1270.
The clustering unit 1230 is configured to cluster the multimedia objects accessed by the respective users based on the multimedia object vector matrix to obtain one or more multimedia object clusters. The multimedia object vector matrix is created (or generated) on the multimedia information platform (i.e., server) side based on the unique identification of the multimedia object accessed by the user. The creation process of the vector matrix of multimedia objects can refer to the description of fig. 1 above. Furthermore, any clustering algorithm applicable in the art, such as the k-means algorithm, may be employed for clustering of multimedia objects.
The weight calculation unit 1240 is used for calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia cluster. The operation of the weight calculation unit 1240 may refer to the description above for the operation of block 1120 in fig. 11.
The generating unit 1250 is configured to generate a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster. The operations of the generation unit 1250 may refer to the description above for the operations of block 1130 in fig. 11.
The selecting unit 1260 is configured to select a multimedia object cluster corresponding to the maximum weight of the tags/titles satisfying a predetermined condition from among the one or more tags/titles included in the multimedia objects accessed by the recommended user, from among the multimedia object clusters according to the tag/title weight list. The operation of the selection unit 1260 may be as described above with respect to the operation of block 1140 in fig. 11.
The recommending unit 1270 is used for recommending the multimedia objects in the selected multimedia object cluster to the recommended users.
Furthermore, preferably, the multimedia object recommendation apparatus 1200 may further include: and a long-tail label removing unit, configured to perform a long-tail label removing process on each multimedia object cluster before the weight calculation unit 1240 performs weight calculation. Specifically, for each multimedia object cluster, the number of times each tag/title in the multimedia object cluster appears in the cluster is obtained. Then, the Top ranked tags/titles, such as the tags/titles ranked to Top100, are selected from the tags/titles of the multimedia object cluster.
Further, similar to fig. 4, in a preferred example, the multimedia object recommending apparatus 1200 may further include a collecting unit 1210 and a training unit 1220. The operations and functions of the collection unit 1210 and the training unit 1220 in fig. 12 are identical to those of the collection unit 410 and the training unit 420 described with reference to fig. 4, and thus, are not described again.
Furthermore, preferably, the multimedia object recommendation apparatus 1200 may further include a time window size determining unit 1280. The operation and function of the time window size determining unit 1280 in fig. 12 are completely the same as those of the time window size determining unit 470 described with reference to fig. 4, and are not described again here.
Further, the multimedia object recommending apparatus described in fig. 12 may be combined with the apparatuses shown in fig. 4, 7, and 9. Specifically, the clustering unit 1230, the weight calculating unit 1240, the generating unit 1250, and the selecting unit 1260 in fig. 12 are respectively incorporated in the multimedia object recommending apparatuses of fig. 4, 7, and 9.
With the multimedia object recommendation apparatus and method described with reference to fig. 11 to 12, by clustering multimedia objects accessed by a user based on a multimedia object vector matrix class, calculating weights of tags/titles in the multimedia objects accessed by the recommended user in each multimedia object cluster, and determining a multimedia object cluster to be recommended to the recommended user based on the calculated weights, thereby recommending the multimedia objects in the determined multimedia object cluster to the recommended user, it is possible to realize that multimedia objects liked by the user are recommended to the user based on the tags/titles of the multimedia objects accessed by the recommended user.
Furthermore, when the multimedia object recommendation apparatus and method described in fig. 11 to 12 are combined with the methods and apparatuses shown in fig. 2 to 9, it is possible to implement multimedia object recommendation based on tags/titles, i.e., multimedia object recall based on tags/titles, in the case where recommendation cannot be made based on the degree of correlation.
It should be understood that the modules and corresponding functions described with reference to fig. 11 to 12 are for illustration and not for limitation, and that specific functions may be implemented in different modules or in a single module.
As described above with reference to fig. 11 to 12, embodiments of a method and apparatus for multi-implementing multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user according to the present application are described. The multimedia object recommendation apparatus described above may be implemented by hardware, software, or a combination of hardware and software.
In the present application, the multimedia object recommendation apparatus 1200 may be implemented by a computing device. FIG. 13 illustrates a block diagram of a computing device 1300 that enables multimedia object recommendation based on tags/titles contained in multimedia objects accessed by a recommended user according to one embodiment of the present application. According to one embodiment, computing device 1300 may include one or more processors 1302, the processors 1302 executing one or more computer readable instructions (i.e., the elements described above as being implemented in software) stored or encoded in a computer readable storage medium (i.e., memory 1304).
In one embodiment, computer-executable instructions are stored in the memory 1304 that, when executed, cause the one or more processors 1302 to: clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, the multimedia object vector matrix being composed of respective multimedia object vectors generated based on multimedia objects accessed by all users; calculating weights of labels/titles contained in the one or more multimedia object clusters in the multimedia clusters; generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster; selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and recommending the multimedia objects in the selected multimedia object cluster to the recommended user.
It should be appreciated that the computer-executable instructions stored in the memory 1304, when executed, cause the one or more processors 1302 to perform the various operations and functions described above in connection with fig. 11-12 in the various embodiments of the present application.
According to one embodiment, a program product, such as a non-transitory machine-readable medium, is provided. The non-transitory machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 11-12 in various embodiments of the present application.
Fig. 14 shows a flow diagram of a method for pornographic filtering of multimedia objects according to another embodiment of the present application. Similar to the method illustrated in fig. 2, prior to performing the method in fig. 14, a multimedia object vector matrix needs to be created (or generated) on the multimedia information platform (i.e., server) side based on the unique identification of the multimedia object accessed by the user. The creation process of the vector matrix of multimedia objects can refer to the description of fig. 1 above.
As shown in fig. 14, at block 1410, multimedia objects accessed by respective users are clustered based on a multimedia object vector matrix to obtain one or more multimedia object clusters. Here, any clustering algorithm applicable in the art, such as the k-means algorithm, may be employed for the clustering of multimedia objects.
Next, at block 1420, a pornography index for each multimedia object cluster is calculated. The process of calculating the pornography index for a cluster of multimedia objects may be calculated in any suitable manner known in the art. FIG. 15 shows a flow diagram of one example of a process for calculating pornography indices for clusters according to the present application.
As shown in fig. 15, at block 1421, for a current multimedia object cluster, high frequency words that appear in all multimedia objects included in the current multimedia object cluster are extracted. After the high frequency words are extracted, at block 1423, negative words and positive words are identified from the extracted high frequency words. Here, the negative words refer to words having pornographic meanings, and the positive words refer to words that are more normative and do not have pornographic meanings. In general, a negative word list and a positive word list may be set in advance, and then negative words and positive words may be recognized from the extracted high-frequency words based on the set negative word list and positive word list. The identification means described herein may include precise identification or fuzzy identification. Exact identification means that the constituents of the words are identical or that the words are identical in meaning. Fuzzy recognition means that the meanings of the words are very similar. In addition, the pornographic index weight can be set for the negative word and the positive word respectively. The pornographic index weight may be set based on experience.
After the negative words and positive words are identified, at block 1425, a pornographic index for the current multimedia object cluster is calculated based on the number of each negative word identified and its pornographic index weight, the number of each positive word and its pornographic index weight, and the number of multimedia objects in the multimedia object cluster. For example, the pornographic index may be calculated based on the following formula: the pornographic index (negative word × pornographic index weight × number-positive word × pornographic index weight × number) × 100/number of multimedia objects in the cluster. Then, at block S1427, it is determined whether the pornography index calculation is complete for all multimedia object clusters. If so, flow proceeds to block S1430. Otherwise, the flow returns to block 1421 to perform a pornography index calculation for the next multimedia object cluster.
After the pornographic index calculation is completed for all multimedia object clusters, in block S1430, pornographic filtering processing is performed on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters for multimedia object recommendation. For example, the calculated pornography index for each multimedia object cluster is compared to a predetermined pornography index threshold (e.g., 30), and if greater than the predetermined pornography index threshold, the multimedia object cluster is filtered. Otherwise, the multimedia object cluster is retained.
Further, preferably, after block S1430, the method illustrated in fig. 14 may further include a white list recall process and a black list filtering process. Here, the white list and the black list have a general meaning in the art, and may be specifically defined by a user according to a specific case. Specifically, the white list recall process includes: and judging whether the titles of the multimedia objects contained in the multimedia object cluster are matched with the white list or not aiming at the filtered multimedia object cluster. If there is a match, the multimedia object cluster is recalled (or restored). If not, it remains filtered. The blacklist filtering process comprises the following steps: and judging whether the titles of the multimedia objects contained in the multimedia object cluster are matched with the white list or not aiming at the reserved multimedia object cluster. If there is a match, the multimedia object cluster is filtered. If not, the multimedia object cluster is retained.
Further, preferably, the pornographic clustering filtering process shown in fig. 14 and 15 may be combined with fig. 11. Specifically, after the multimedia object cluster to be recommended is selected in block 1140 in FIG. 11, the pornographic clustering process shown in FIGS. 14 and 15 is performed to filter out pornographic clusters from the selected multimedia object cluster to be recommended. Then, in block S1150, the multimedia objects in the multimedia object cluster processed by the above-mentioned pornography process are recommended to the recommended users. Also, the operations of fig. 11 combined with fig. 14 and 15 may be combined with the multimedia object recommendation method of fig. 2, 6 and 8 as described above.
Fig. 16 shows a block diagram of an apparatus for pornographic filtering processing on multimedia objects (hereinafter referred to as pornographic cluster filtering apparatus 1600) according to another embodiment of the present application.
As shown in fig. 16, the pornographic cluster filtering apparatus 1600 includes a clustering unit 1630, a pornographic index calculating unit 1640, and a filtering unit 1650.
The clustering unit 1630 is configured to cluster the multimedia objects accessed by each user based on the multimedia object vector matrix to obtain one or more multimedia object clusters. The multimedia object vector matrix is created (or generated) on the multimedia information platform (i.e., server) side based on the unique identification of the multimedia object accessed by the user. The creation process of the vector matrix of multimedia objects can refer to the description of fig. 1 above. Furthermore, any clustering algorithm applicable in the art, such as the k-means algorithm, may be employed for clustering of multimedia objects.
The pornographic index calculating unit 1640 is used for calculating the pornographic index of each multimedia object cluster. The process of calculating the pornography index for a cluster of multimedia objects may be calculated in any suitable manner known in the art.
Fig. 17 shows a block diagram of one example of the pornograph index calculation unit 1640 in fig. 16. As shown in fig. 17, the pornograph calculating unit 1650 includes an extracting module 1641, an identifying module 1643, and a calculating module 1645.
The extracting module 1651 is configured to extract, for each multimedia object cluster, high-frequency words appearing in all multimedia objects included in the multimedia object cluster. The operation of the extraction module 1651 may be as described above with respect to the operation of block 1421 in fig. 15.
The recognition module 1643 is used for recognizing the negative words and the positive words from the extracted high frequency words. The operation of the identification module 1643 may be as described above with respect to the operation of block 1423 in fig. 15.
The calculating module 1645 is configured to calculate the pornograph index of the current multimedia object cluster according to the number of the identified negative words and their pornograph index weights, the number of the positive words and their pornograph index weights, and the number of the multimedia objects in the multimedia object cluster. The operation of the calculation module 1645 may refer to the description above for the operation of block 1425 in fig. 15.
After the erotic index calculation is completed for all multimedia object clusters, the filtering unit 1650 performs erotic filtering processing on the multimedia object clusters based on the calculated erotic index of each multimedia object cluster, so as to be used for multimedia object recommendation. For example, the calculated pornography index for each multimedia object cluster is compared to a predetermined pornography index threshold (e.g., 30), and if greater than the predetermined pornography index threshold, the multimedia object cluster is filtered. Otherwise, the multimedia object cluster is retained.
In addition, preferably, the pornographic clustering filtering apparatus 1600 may further include a whitelist recall unit and a blacklist filtering unit. Specifically, the white list recall unit is configured to: and judging whether the titles of the multimedia objects contained in the multimedia object cluster are matched with the white list or not aiming at the filtered multimedia object cluster. If there is a match, the multimedia object cluster is recalled (or restored). If not, it remains filtered. The blacklist filter unit is used for: and judging whether the titles of the multimedia objects contained in the multimedia object cluster are matched with the white list or not aiming at the reserved multimedia object cluster. If there is a match, the multimedia object cluster is filtered. If not, the multimedia object cluster is retained.
Further, similar to fig. 4, in a preferred example, the pornographic cluster filtering apparatus 1600 may further include a collecting unit 1610 and a training unit 1620. The operation and function of the collection unit 1610 and the training unit 1620 in fig. 16 are identical to those of the collection unit 410 and the training unit 420 described with reference to fig. 4, and thus will not be described again.
Furthermore, preferably, the pornographic cluster filtering apparatus 1600 may further include a time window size determining unit 1660. The operation and function of the time window size determining unit 1660 in fig. 16 are identical to those of the time window size determining unit 470 described with reference to fig. 4, and are not described again here.
In addition, the pornographic cluster filtering apparatus depicted in fig. 16 may be combined with the apparatus shown in fig. 12. Specifically, the pornography index calculating unit 1640 and the filtering unit 1650 in fig. 16 are respectively incorporated in the multimedia object recommending apparatus of fig. 12.
In the present application, pornographic clustering filtering apparatus 1600 may be implemented by a computing device. FIG. 18 illustrates a block diagram of a computing device for implementing pornographic filtering processing for multimedia objects according to one embodiment of the present application. According to one embodiment, the computing device 1800 may include one or more processors 1802, the processors 1802 executing one or more computer readable instructions (i.e., the elements described above as being implemented in software) stored or encoded in a computer readable storage medium (i.e., memory 1804).
In one embodiment, computer-executable instructions are stored in the memory 1804 that, when executed, cause the one or more processors 1802 to: clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, the multimedia object vector matrix being composed of respective multimedia object vectors generated based on multimedia objects accessed by all users; calculating the pornographic index of each multimedia object cluster; and based on the calculated pornographic indexes of the multimedia object clusters, carrying out pornographic filtering processing on the multimedia object clusters so as to be used for recommending the multimedia objects.
It should be appreciated that the computer-executable instructions stored in the memory 1804, when executed, cause the one or more processors 1802 to perform the various operations and functions described above in connection with fig. 14-17 in the various embodiments of the present application.
According to one embodiment, a program product, such as a non-transitory machine-readable medium, is provided. The non-transitory machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 14-17 in various embodiments of the present application.
The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (54)

1. A method for multimedia object recommendation, comprising:
acquiring multimedia object vectors corresponding to recommended users in a multimedia object vector matrix, wherein the multimedia object vector matrix is composed of corresponding multimedia object vectors generated based on multimedia objects accessed by all users, and the multimedia object vectors are trained based on unique identifiers of corresponding multimedia objects by using a word2vec algorithm;
calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object accessed by the recommended user;
determining a recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and
recommending the determined recommended multimedia object to the recommended user.
2. The method of claim 1, wherein the multimedia objects accessed by the user comprise multimedia objects accessed by the user within a specified time window.
3. The method of claim 2, wherein the size of the prescribed time window is determined based on a service scene, a user amount, and/or a user access frequency of the multimedia object.
4. The method of claim 1, further comprising:
determining multimedia objects of interest to the recommended user from the multimedia objects accessed by the recommended user, an
Calculating a correlation between a multimedia object vector in the multimedia object vector matrix and a multimedia object vector of a multimedia object accessed by the recommended user comprises:
and calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is interested by the recommended user.
5. The method of claim 4, wherein determining the multimedia objects of interest to the recommended user from the multimedia objects accessed by the recommended user comprises:
and determining the multimedia objects which are interested by the recommended user according to the playing behavior of the recommended user on the accessed multimedia objects.
6. The method of claim 4, wherein determining the multimedia objects of interest to the recommended user from the multimedia objects accessed by the recommended user comprises:
and determining the multimedia objects which are interested by the recommended user according to the playing behavior and the like behavior of the recommended user on the accessed multimedia objects.
7. The method of claim 1, wherein determining a recommended multimedia object from the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlations comprises:
based on the calculated correlation degree, sequencing the multimedia objects corresponding to the multimedia object vector matrix; and
and selecting the recommended multimedia object from the sorted multimedia objects.
8. The method of claim 1, wherein the multimedia object vector of the multimedia object is generated offline based on a unique identification of the multimedia object.
9. The method of any of claims 1 to 8, further comprising:
determining a multimedia object which is not interested by the recommended user according to the point stepping behavior of the recommended user on the accessed multimedia object;
calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is not interested by the recommended user;
determining a multimedia object prohibited from being recommended to the recommended user based on a correlation between a multimedia object vector in the calculated multimedia object vector matrix and a multimedia object vector of a multimedia object that is not of interest to the recommended user; and
removing the determined multimedia objects prohibited from being recommended to the recommended user from the recommended multimedia objects.
10. The method of claim 1, further comprising:
when a recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation, acquiring one or more tags/titles contained in the multimedia objects accessed by the recommended user;
selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and
determining a multimedia object of the selected multimedia object cluster as the recommended multimedia object,
wherein the multimedia object clusters are obtained by clustering multimedia objects accessed by respective users based on the multimedia object vector matrix, and the label/title weight list is generated by using the weights of the labels/titles in the respective multimedia object clusters to reflect the weights of the labels/titles in the respective multimedia object clusters.
11. The method of claim 1, further comprising:
when a recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation, acquiring one or more tags/titles contained in the multimedia objects accessed by the recommended user;
clustering multimedia objects accessed by respective users based on the multimedia object vector matrix to obtain one or more multimedia object clusters;
calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia object cluster;
generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster;
selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and
and determining the multimedia objects in the selected multimedia object cluster as the recommended multimedia objects.
12. The method of claim 10 or 11, wherein the weight of the tags/titles in each multimedia object cluster is calculated according to the number of times the tags/titles appear in the multimedia cluster and the total number of tags/titles included in the multimedia cluster.
13. The method of claim 10 or 11, wherein the tag/title satisfying a predetermined condition comprises:
the label/title with the highest weight; or
In the tag/title arrangement from high to low by weight, tags/titles of a predetermined number of top bits or a predetermined ratio are sorted.
14. The method of claim 10 or 11, wherein, before selecting from the clusters of multimedia objects, the cluster of multimedia objects corresponding to the maximum weight of the tag/title satisfying a predetermined condition among the one or more tags/titles contained in the multimedia object corresponding to the recommended user, the method comprises:
calculating the pornographic index of each multimedia object cluster;
performing pornographic filtering processing on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters, an
Selecting a multimedia object cluster corresponding to the maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters, includes:
and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
15. The method of claim 14, wherein calculating the pornography index for each cluster of multimedia objects comprises:
for each cluster of multimedia objects,
extracting high-frequency words appearing in all multimedia objects contained in the multimedia object cluster;
identifying negative words and positive words from the extracted high-frequency words; and
and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
16. The method of any of claims 1 to 8 and 10 to 11, wherein the multimedia object comprises at least one of: short videos, movies, music, drama and pictures.
17. An apparatus for multimedia object recommendation, comprising:
a multimedia object vector obtaining unit, configured to obtain a multimedia object vector corresponding to a recommended user in a multimedia object vector matrix, where the multimedia object vector matrix is composed of corresponding multimedia object vectors generated based on multimedia objects accessed by all users, and the multimedia object vector is trained based on a unique identifier of a corresponding multimedia object by using a word2vec algorithm;
a first correlation calculation unit, configured to calculate a correlation between a multimedia object vector in the multimedia object vector matrix and a multimedia object vector of a multimedia object accessed by the recommended user;
a recommended multimedia object determining unit, configured to determine a recommended multimedia object from multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and
and the recommending unit is used for recommending the determined recommended multimedia object to the recommended user.
18. The apparatus of claim 17, further comprising:
a collecting unit for collecting unique identification of multimedia objects accessed by each user;
and the training unit is used for training corresponding multimedia object vectors based on the unique identifiers of the multimedia objects accessed by the users by utilizing a word2vec algorithm so as to form the multimedia object vector matrix.
19. The apparatus of claim 17, further comprising:
a time window size determining unit for determining the size of a defined time window based on the service scenario, the user amount and/or the user access frequency of said multimedia object, an
The multimedia objects accessed by the user include multimedia objects accessed by the user within the prescribed time window.
20. The apparatus of claim 17, further comprising:
an interested multimedia object determining unit for determining the multimedia objects interested by the recommended user from the multimedia objects accessed by the recommended user, and
the first correlation calculation unit is configured to:
and calculating the correlation degree between the multimedia object vector in the multimedia object vector matrix and the multimedia object vector of the multimedia object which is interested by the recommended user.
21. The apparatus of claim 20, wherein the multimedia object of interest determination unit is configured to:
and determining the multimedia objects which are interested by the recommended user according to the playing behavior of the recommended user on the accessed multimedia objects.
22. The apparatus of claim 20, wherein the multimedia object of interest determination unit is configured to:
and determining the multimedia objects which are interested by the recommended user according to the playing behavior and the like behavior of the recommended user on the accessed multimedia objects.
23. The apparatus of claim 17, wherein the recommended multimedia object determining unit comprises:
the sorting module is used for sorting the multimedia objects corresponding to the multimedia object vector matrix based on the calculated correlation; and
and the selection module is used for selecting the recommended multimedia objects from the sorted multimedia objects.
24. The apparatus of any of claims 17 to 23, further comprising:
the uninteresting multimedia object determining unit is used for determining the multimedia object which is not interested by the recommended user according to the point stepping behavior of the recommended user on the accessed multimedia object;
a second correlation degree calculating unit, configured to calculate a correlation degree between a multimedia object vector in the multimedia object vector matrix and a multimedia object vector of a multimedia object that is not interested by the recommended user;
a recommended prohibited multimedia object determination unit, configured to determine a multimedia object prohibited from being recommended to the recommended user based on a correlation between a multimedia object vector in the calculated multimedia object vector matrix and a multimedia object vector of a multimedia object that is not of interest to the recommended user; and
a removing unit, configured to remove the determined multimedia object prohibited from being recommended to the recommended user from the recommended multimedia objects.
25. The apparatus of claim 17, further comprising:
a tag/title obtaining unit, configured to obtain one or more tags/titles included in the multimedia objects accessed by the recommended user when the recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation;
a selecting unit for selecting a multimedia object cluster corresponding to a maximum weight of tags/titles satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters according to the tag/title weight list, and
the recommending unit is used for recommending the multimedia objects in the selected clusters to the recommended users,
wherein the multimedia object clusters are obtained by clustering multimedia objects accessed by respective users based on the multimedia object vector matrix, and the label/title weight list is generated by using the weights of the labels/titles in the respective multimedia object clusters to reflect the weights of the labels/titles in the respective multimedia object clusters.
26. The apparatus of claim 17, further comprising:
a tag/title obtaining unit, configured to obtain one or more tags/titles included in the multimedia objects accessed by the recommended user when the recommended multimedia object is not found in the multimedia objects of the multimedia object vector matrix based on the calculated correlation;
a clustering unit, configured to cluster multimedia objects accessed by respective users based on the multimedia object vector matrix to obtain one or more multimedia object clusters;
a weight calculation unit for calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia object cluster;
a generating unit for generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster;
a selecting unit for selecting a multimedia object cluster corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters according to the tag/title weight list, and
the recommending unit is used for recommending the multimedia objects in the selected clusters to the recommended users.
27. The apparatus of claim 25 or 26, wherein the tag/title satisfying a predetermined condition comprises:
the label/title with the highest weight; or
In the tag/title arrangement from high to low by weight, tags/titles of a predetermined number of top bits or a predetermined ratio are sorted.
28. The apparatus of claim 25 or 26, further comprising:
the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster;
a filtering unit for pornographic filtering processing of the multimedia object clusters based on the calculated pornographic index of each multimedia object cluster, an
The selection unit is used for: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
29. The apparatus of claim 28, wherein the pornography index calculating unit includes:
the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster;
the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and
and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
30. A computing device, comprising:
one or more processors; and
a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 16.
31. A non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 1-16.
32. A method for multimedia object recommendation, comprising:
clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, wherein the multimedia object vector matrix is composed of respective multimedia object vectors generated based on the multimedia objects accessed by all the users, and the multimedia object vectors are trained based on unique identifications of corresponding multimedia objects by using a word2vec algorithm;
calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia object cluster;
generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster;
selecting a multimedia object cluster corresponding to the maximum weight of the label/title meeting a preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters according to the label/title weight list; and
and recommending the multimedia objects in the selected multimedia object cluster to the recommended user.
33. The method of claim 32, wherein calculating weights in the multimedia cluster for tags/titles contained in the one or more multimedia object clusters comprises:
and calculating the weight of the label/title in the multimedia object cluster according to the number of times of the label/title appearing in the multimedia cluster and the total number of the labels/titles included in the multimedia object cluster.
34. The method of claim 32, wherein the tag/title satisfying a predetermined condition comprises:
the label/title with the highest weight; or
In the tag/title arrangement from high to low by weight, tags/titles of a predetermined number of top bits or a predetermined ratio are sorted.
35. The method of claim 32, wherein, prior to selecting from among the clusters of multimedia objects, a cluster of multimedia objects corresponding to a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia object corresponding to the recommended user, the method comprises:
calculating the pornographic index of each multimedia object cluster;
performing pornographic filtering processing on the multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters, an
Selecting a multimedia object cluster corresponding to the maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user from among the multimedia object clusters, includes:
and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
36. The method of claim 35, wherein calculating the pornography index for each cluster of multimedia objects comprises:
for each cluster of multimedia objects,
extracting high-frequency words appearing in all multimedia objects contained in the multimedia object cluster;
identifying negative words and positive words from the extracted high-frequency words; and
and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
37. The method of claim 32, wherein the multimedia objects accessed by the user comprise multimedia objects accessed by the user within a specified time window.
38. The method of claim 37, wherein the size of the prescribed time window is determined based on a service scene, a user amount, and/or a user access frequency of the multimedia object.
39. The method of any of claims 32 to 38, wherein the multimedia object comprises at least one of: short videos, movies, music, drama and pictures.
40. An apparatus for multimedia object recommendation, comprising:
a clustering unit, configured to cluster multimedia objects accessed by each user based on a multimedia object vector matrix to obtain one or more multimedia object clusters, where the multimedia object vector matrix is composed of multimedia object vectors generated based on multimedia objects accessed by all users, and the multimedia object vectors are trained based on unique identifiers of corresponding multimedia objects by using word2vec algorithm;
a weight calculation unit for calculating the weight of the label/title contained in the one or more multimedia object clusters in the multimedia object cluster;
a generating unit for generating a tag/title weight list reflecting the weight of the tag/title in each multimedia object cluster by using the weight of the tag/title in each multimedia object cluster;
a selecting unit, configured to select, from the multimedia object clusters according to the tag/title weight list, a multimedia object cluster corresponding to a maximum weight of a tag/title satisfying a predetermined condition among one or more tags/titles included in the multimedia objects accessed by the recommended user; and
and the recommending unit is used for recommending the multimedia objects in the selected multimedia object clusters to the recommended users.
41. The apparatus of claim 40, wherein the weight calculation unit is to:
the weight of the label/title in each multimedia object cluster is calculated according to the number of times the label/title appears in the multimedia cluster and the total number of labels/titles included in the multimedia cluster.
42. The apparatus of claim 40, wherein the tag/title satisfying a predetermined condition comprises:
the label/title with the highest weight; or
In the tag/title arrangement from high to low by weight, tags/titles of a predetermined number of top bits or a predetermined ratio are sorted.
43. The apparatus of claim 40, further comprising:
the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster;
a filtering unit for pornographic filtering processing of the multimedia object clusters based on the calculated pornographic index of each multimedia object cluster, an
The selection unit is used for: and selecting the multimedia object cluster corresponding to the maximum weight of the label/title meeting the preset condition in one or more labels/titles included in the multimedia objects accessed by the recommended user from the multimedia object clusters subjected to the pornographic filtering processing.
44. The apparatus of claim 43, wherein the pornography index calculating unit includes:
the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster;
the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and
and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
45. The apparatus of claim 40, further comprising:
a collecting unit for collecting unique identification of multimedia objects accessed by each user; and
and the training unit is used for training corresponding multimedia object vectors based on the unique identifiers of the multimedia objects accessed by the users by utilizing a word2vec algorithm so as to form the multimedia object vector matrix.
46. The apparatus of claim 40, further comprising:
a time window size determining unit for determining the size of a defined time window based on the service scenario, the user amount and/or the user access frequency of said multimedia object, an
The multimedia objects accessed by the user include multimedia objects accessed by the user within the prescribed time window.
47. A computing device, comprising:
one or more processors; and
a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the method of any one of claims 32 to 39.
48. A non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 32 to 39.
49. A method for multimedia object recommendation, comprising:
clustering multimedia objects accessed by respective users based on a multimedia object vector matrix to obtain one or more multimedia object clusters, wherein the multimedia object vector matrix is composed of respective multimedia object vectors generated based on the multimedia objects accessed by all the users, and the multimedia object vectors are trained based on unique identifications of corresponding multimedia objects by using a word2vec algorithm;
calculating the pornographic index of each multimedia object cluster; and
and performing pornographic filtering processing on the one or more multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters so as to be used for multimedia object recommendation.
50. The method of claim 49, wherein calculating the pornography index for each cluster of multimedia objects comprises:
for each cluster of multimedia objects,
extracting high-frequency words appearing in all multimedia objects contained in the multimedia object cluster;
identifying negative words and positive words from the extracted high-frequency words; and
and calculating the pornographic index of the multimedia object cluster according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object cluster.
51. An apparatus for multimedia object recommendation, comprising:
a clustering unit, configured to cluster multimedia objects accessed by each user based on a multimedia object vector matrix to obtain one or more multimedia object clusters, where the multimedia object vector matrix is composed of multimedia object vectors generated based on multimedia objects accessed by all users, and the multimedia object vectors are trained based on unique identifiers of corresponding multimedia objects by using word2vec algorithm;
the pornographic index calculating unit is used for calculating the pornographic index of each multimedia object cluster; and
and the filtering unit is used for performing pornographic filtering processing on the one or more multimedia object clusters based on the calculated pornographic indexes of the multimedia object clusters so as to be used for recommending the multimedia objects.
52. The apparatus of claim 51, wherein the pornography index calculating unit includes:
the extraction module is used for extracting high-frequency words appearing in all multimedia objects contained in each multimedia object cluster;
the recognition module is used for recognizing negative words and positive words from the extracted high-frequency words; and
and the calculating module is used for calculating the pornographic indexes of the multimedia object clusters according to the number of the identified negative words and the pornographic index weight thereof, the number of the positive words and the pornographic index weight thereof and the number of the multimedia objects in the multimedia object clusters.
53. A computing device, comprising:
one or more processors; and
a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the method of claim 49 or 50.
54. A non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of claim 49 or 50.
CN201810259572.XA 2018-03-27 2018-03-27 Method and device for recommending multimedia objects Active CN108804492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810259572.XA CN108804492B (en) 2018-03-27 2018-03-27 Method and device for recommending multimedia objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810259572.XA CN108804492B (en) 2018-03-27 2018-03-27 Method and device for recommending multimedia objects

Publications (2)

Publication Number Publication Date
CN108804492A CN108804492A (en) 2018-11-13
CN108804492B true CN108804492B (en) 2022-04-29

Family

ID=64095390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810259572.XA Active CN108804492B (en) 2018-03-27 2018-03-27 Method and device for recommending multimedia objects

Country Status (1)

Country Link
CN (1) CN108804492B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487276B (en) * 2019-09-11 2023-10-17 腾讯科技(深圳)有限公司 Object acquisition method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426548A (en) * 2015-12-29 2016-03-23 海信集团有限公司 Video recommendation method and device based on multiple users
CN105512252A (en) * 2015-12-01 2016-04-20 海信集团有限公司 Method and device obtaining multimedia data correlation
CN106055617A (en) * 2016-05-26 2016-10-26 乐视控股(北京)有限公司 Data pushing method and device
CN106407420A (en) * 2016-09-23 2017-02-15 广州视源电子科技股份有限公司 A multimedia resource recommendation method and system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027977B2 (en) * 2007-06-20 2011-09-27 Microsoft Corporation Recommending content using discriminatively trained document similarity
US8635241B2 (en) * 2009-02-18 2014-01-21 Hitachi, Ltd. Method of recommending information, system thereof, and server
US20160125028A1 (en) * 2014-11-05 2016-05-05 Yahoo! Inc. Systems and methods for query rewriting
CN104486461B (en) * 2014-12-29 2019-04-19 北京奇安信科技有限公司 Domain name classification method and device, domain name recognition methods and system
US9864951B1 (en) * 2015-03-30 2018-01-09 Amazon Technologies, Inc. Randomized latent feature learning
CN105069041A (en) * 2015-07-23 2015-11-18 合一信息技术(北京)有限公司 Video user gender classification based advertisement putting method
CN105677695B (en) * 2015-09-28 2019-03-08 杭州圆橙科技有限公司 A method of the calculating mobile application similitude based on content
CN105701155B (en) * 2015-12-30 2019-05-31 百度在线网络技术(北京)有限公司 Information-pushing method and device
US10748118B2 (en) * 2016-04-05 2020-08-18 Facebook, Inc. Systems and methods to develop training set of data based on resume corpus
CN106649561B (en) * 2016-11-10 2020-05-26 复旦大学 Intelligent question-answering system for tax consultation service
CN106776906A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 One kind application clustering method and device
CN107122351A (en) * 2017-05-02 2017-09-01 灯塔财经信息有限公司 A kind of attitude trend analysis method and system applied to stock news field
CN107818334A (en) * 2017-09-29 2018-03-20 北京邮电大学 A kind of mobile Internet user access pattern characterizes and clustering method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512252A (en) * 2015-12-01 2016-04-20 海信集团有限公司 Method and device obtaining multimedia data correlation
CN105426548A (en) * 2015-12-29 2016-03-23 海信集团有限公司 Video recommendation method and device based on multiple users
CN106055617A (en) * 2016-05-26 2016-10-26 乐视控股(北京)有限公司 Data pushing method and device
CN106407420A (en) * 2016-09-23 2017-02-15 广州视源电子科技股份有限公司 A multimedia resource recommendation method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于聚类层次模型的视频推荐算法;金亮等;《计算机应用》;20171010(第10期);第100-105页 *

Also Published As

Publication number Publication date
CN108804492A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
WO2017181612A1 (en) Personalized video recommendation method and device
CN110909205B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN108304429B (en) Information recommendation method and device and computer equipment
US20120323725A1 (en) Systems and methods for supplementing content-based attributes with collaborative rating attributes for recommending or filtering items
CN108388591B (en) Book recommendation method, device and system and readable storage medium
CN110019794B (en) Text resource classification method and device, storage medium and electronic device
CN107924401A (en) Video recommendations based on video title
CN110019943B (en) Video recommendation method and device, electronic equipment and storage medium
CN112163122A (en) Method and device for determining label of target video, computing equipment and storage medium
CN110348362B (en) Label generation method, video processing method, device, electronic equipment and storage medium
CN107590232B (en) Resource recommendation system and method based on network learning environment
CN105430505A (en) IPTV program recommending method based on combined strategy
CN112100504B (en) Content recommendation method and device, electronic equipment and storage medium
CN112765484B (en) Short video pushing method and device, electronic equipment and storage medium
KR101605654B1 (en) Method and apparatus for estimating multiple ranking using pairwise comparisons
CN111241381A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
US9020863B2 (en) Information processing device, information processing method, and program
CN110162769B (en) Text theme output method and device, storage medium and electronic device
CN110598126A (en) Cross-social network user identity recognition method based on behavior habits
CN108804492B (en) Method and device for recommending multimedia objects
Li et al. A rank aggregation framework for video multimodal geocoding
CN108024148B (en) Behavior feature-based multimedia file identification method, processing method and device
CN106446696B (en) Information processing method and electronic equipment
EP2741507A1 (en) Video processing system, method of determining viewer preference, video processing apparatus, and control method and control program therefor
CN111581435A (en) Video cover image generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210104

Address after: Room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: Singapore Marine financial centre - 10-01, 10 golly Wharf (049315)

Applicant before: YOUSHI TECHNOLOGY SINGAPORE Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant