CN115391663A - Video recommendation method and device, computer equipment and storage medium - Google Patents

Video recommendation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115391663A
CN115391663A CN202211160276.7A CN202211160276A CN115391663A CN 115391663 A CN115391663 A CN 115391663A CN 202211160276 A CN202211160276 A CN 202211160276A CN 115391663 A CN115391663 A CN 115391663A
Authority
CN
China
Prior art keywords
video
vector
recommended
sequence
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211160276.7A
Other languages
Chinese (zh)
Inventor
段勇
郑聪
姚倩媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211160276.7A priority Critical patent/CN115391663A/en
Publication of CN115391663A publication Critical patent/CN115391663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a video recommendation method, a video recommendation device, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a video watching set corresponding to a target user identification, wherein the video watching set comprises at least two video watching sequences with different video durations, respectively extracting corresponding feature vectors based on the video watching sequences, fusing the feature vectors to form a fusion vector, determining a recommended value of each video vector to be recommended according to an inner product result of each obtained video vector to be recommended and the fusion vector, and pushing video data corresponding to the video vector to be recommended to a terminal corresponding to the target user identification according to a descending order of the recommended values.

Description

Video recommendation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a video recommendation method and apparatus, a computer device, and a storage medium.
Background
With the development of video software, a user can watch a long video or a short video according to the preference of the video software, the video software captures the preference type of the user for the video according to the historical watching behavior of the user, so as to push the video content which the user may be interested in, but the watching sequence of the user is divided into a long video watching sequence, a medium video watching sequence and a short video watching sequence, and the existing video pushing mode is to model and analyze the video content which the user may be interested in respectively for the watching sequences of different types, neglects the relation among the watching sequences of different types, so that the interest preference of the user cannot be captured accurately, and the pushing result is poor.
Disclosure of Invention
In order to solve the technical problem, the application provides a video recommendation method, a video recommendation device, a computer device and a storage medium.
In a first aspect, the present application provides a video recommendation method, including:
acquiring a video watching set corresponding to a target user identifier, wherein the video watching set comprises at least two video watching sequences with different video durations;
respectively extracting corresponding feature vectors based on the video watching sequences, and fusing the feature vectors to form fused vectors;
determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
In a second aspect, the present application provides a video recommendation apparatus, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a video watching set corresponding to a target user identifier, and the video watching set comprises at least two video watching sequences with different video durations;
the fusion module is used for respectively extracting corresponding feature vectors based on the video watching sequences and fusing the feature vectors to form fusion vectors;
the determining module is used for determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and the pushing module is used for pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
In a third aspect, the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring a video watching set corresponding to a target user identifier, wherein the video watching set comprises at least two video watching sequences with different video durations;
respectively extracting corresponding feature vectors based on the video watching sequences, and fusing the feature vectors to form a fusion vector;
determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring a video watching set corresponding to a target user identifier, wherein the video watching set comprises at least two video watching sequences with different video durations;
respectively extracting corresponding feature vectors based on the video watching sequences, and fusing the feature vectors to form a fusion vector;
determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
Based on the video recommendation method, a video watching set corresponding to a target user identifier is obtained, the video watching set comprises at least two video watching sequences with different video durations, corresponding feature vectors are respectively extracted based on the video watching sequences, the feature vectors are fused to form fusion vectors, namely user preferences are mined from the different video watching sequences and relations among the different video watching sequences are established, the recommendation values of the video vectors to be recommended are determined according to the obtained inner product results of the video vectors to be recommended and the fusion vectors, wherein the video vectors to be recommended are used for indicating attribute information of videos to be recommended, the fusion vectors fused with the corresponding information of the different video watching sequences are used for predicting, the preference degree of the corresponding users of the target user identifier to the video contents corresponding to the video vectors to be recommended is obtained, the recommendation values of the video vectors to be recommended are obtained, the video data corresponding to the video vectors to be recommended are pushed to terminals corresponding to the target user identifier according to the descending order of the recommendation values, and the video contents corresponding to the target user identifier are pushed to the terminals accurately and the preferences of the users are captured by extracting the feature vectors from the different video watching sequences and by combining the preference of the video watching sequences.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.
FIG. 1 is a diagram of an application environment of a video recommendation method in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for video recommendation in one embodiment;
FIG. 3 is a block diagram of a video recommendation device in one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a diagram of an application environment of a video recommendation method in one embodiment. Referring to fig. 1, the video recommendation method is applied to a video recommendation system. The video recommendation system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network, video playing software is installed in the terminal 110, and a user can log in the video playing software through a user identifier to watch video content. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In an embodiment, fig. 2 is a flowchart illustrating a video recommendation method according to an embodiment, and referring to fig. 2, a video recommendation method is provided. The present embodiment is mainly illustrated by applying the method to the server 120 in fig. 1, and the video recommendation method specifically includes the following steps:
step S210, obtaining a video watching set corresponding to the target user identifier, where the video watching set includes at least two video watching sequences with different video durations.
Specifically, the target user identifier is any user identifier used for logging in the video playing software, the user identifier specifically includes user identity information and custom characters, the identity information includes a telephone number, a mailbox or a third-party application account, and the custom characters may be one or a combination of multiple numbers, letters, symbols, and the like. Different user identifications are used for indicating different users, different users have different preferences for different video contents, and therefore video watching sets corresponding to different user identifications are different, each video watching set comprises at least two video watching sequences with different video durations, the video watching sequences are divided into a long video watching sequence, a medium video watching sequence and a short video watching sequence according to the video durations, a video with the video duration exceeding 20 minutes is generally defined as a long video, a video with the video duration less than 1 minute is defined as a short video, a video with the video duration between 1 and 20 minutes is defined as a medium video, and the definition of the video durations of the long video, the medium video and the short video can be set in a user-defined mode. The video viewing set in this embodiment includes a long video viewing sequence and a short video viewing sequence.
The video watching sequences with different video durations contain video information corresponding to the corresponding video durations, namely, the long video watching sequence comprises a plurality of video description vectors corresponding to the long videos, the medium video watching sequence comprises a plurality of video description vectors corresponding to the medium videos, and the short video watching sequence comprises a plurality of video description vectors corresponding to the short videos. The video description vector contains attribute information of the video, the attribute information includes video ID (Identity), video album ID, title, genre, channel ID, broadcast tag, actor, director, number of actor works, number of director works, drama composition, all winning names of the drama, name of hero, rating of film, rating and description of film limitation, global release time, time of release in china, time of first online of page, etc., i.e., the attribute information contains description information of multiple dimensions of the video.
Step S220, extracting corresponding feature vectors based on the video watching sequences respectively, and fusing the feature vectors to form a fusion vector.
Specifically, a feature vector is extracted based on each video watching sequence, the feature vector is used for indicating the favorite vector of the user for the video, and then the feature vectors corresponding to the video watching sequences are fused to form a fusion vector, namely, the favorite information of the user is mined from different video watching sequences and the relation among the different video watching sequences is established, so that the video watching sequences with different video durations are subjected to sufficient information interaction.
Step S230, determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, where the video vector to be recommended is used to indicate attribute information of a video to be recommended.
Specifically, the video vector to be recommended is a description vector of the video to be recommended, the video to be recommended is a video which is not watched in a watching state relative to the target user identifier in the video database, and the video database contains videos with different video durations, namely the video database contains a plurality of long videos, medium videos and short videos. Based on the inner product result of the video vector to be recommended and the fusion vector, namely performing dot product processing on the video vector to be recommended and the fusion vector, effectively combining the attribute information of the video to be recommended with the user interest to obtain the recommended value of each video vector to be recommended, and performing descending order according to the recommended value of each video vector to be recommended to obtain a video recommendation list after rough arrangement.
Step S240, pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
Specifically, the higher the recommendation value is, the more likely the corresponding video to be recommended meets the video watching preference of the user, the lower the recommendation value is, the lower the preference degree of the user for the corresponding video to be recommended is, the video data corresponding to the first N video vectors to be recommended in the video recommendation list may be selected to be pushed to the terminal corresponding to the target user identifier, N is a positive integer greater than zero, and N is less than or equal to the number of the video vectors to be recommended in the video recommendation list.
The steps not only extract the characteristic vector from different video watching sequences to capture the preference of the user, but also combine the relation among the different video watching sequences, thereby realizing the purpose of fully and accurately capturing the interest preference of the user to the video content, improving the rough arrangement effect of the video to be recommended, namely optimizing the pushing result of the video content.
In one embodiment, the obtaining of the video viewing set corresponding to the target user identification includes:
acquiring a historical video set corresponding to the target user identifier, wherein the historical video set comprises at least two historical video sequences with different video durations, and the historical video sequences comprise a plurality of video attribute vectors;
respectively carrying out one-hot coding processing on each historical video sequence to obtain a corresponding video coding sequence, wherein the video coding sequence comprises one-hot code vectors corresponding to a plurality of video attribute vectors;
and performing dimensionality reduction on each video coding sequence to obtain a corresponding video watching sequence, wherein the video watching sequence comprises a plurality of video description vectors.
Specifically, the historical video set includes historical video sequences of different video durations, the historical video sequences are divided into a historical long video sequence, a historical medium video sequence and a historical short video sequence according to the video durations, in this embodiment, the historical video set includes the historical long video sequence and the historical short video sequence, the historical long video sequence is denoted as L = (L1, L2, \8230;, ln), where L1 to ln are n video attribute vectors in the historical long video sequence, each video attribute vector in the historical long video sequence is used to indicate attribute information of one long video, the historical short video sequence is denoted as S = (S1, S2, \8230;, sn), S1 to sn are n video attribute vectors in the historical short video sequence, and each video attribute vector in the historical short video sequence is used to indicate attribute information of one short video.
And respectively carrying out one-hot coding (one-hot) processing on the historical long video sequence and the historical short video sequence, namely, converting video attribute vector coding into one-hot code vectors so as to obtain a corresponding video coding sequence, carrying out one-hot coding on the historical long video sequence to obtain a long video coding sequence, and carrying out one-hot coding on the historical short video sequence to obtain a short video coding sequence.
And performing dimension reduction on each video coding sequence to obtain a video watching sequence after dimension reduction, namely the video watching set in the step S210 is obtained by performing coding dimension reduction on a historical video set, namely the long video coding sequence is subjected to dimension reduction to obtain a long video watching sequence, and the short video coding sequence is subjected to dimension reduction to obtain a short video watching sequence.
In one embodiment, the video viewing sequence includes a first dimension-reduced sequence and a second dimension-reduced sequence, and the dimension-reduction processing is performed on each video encoding sequence to obtain the corresponding video viewing sequence, where the process includes at least one of:
multiplying each video coding sequence by a first mapping matrix respectively to obtain a corresponding first dimension reduction sequence, wherein the first mapping matrix comprises matrix parameters corresponding to video attributes;
and multiplying each video coding sequence by a second mapping matrix respectively to obtain the corresponding second dimension reduction sequence, wherein the second mapping matrix comprises matrix parameters corresponding to preference attributes.
Specifically, the video viewing sequence includes a first dimension reduction sequence and a second dimension reduction sequence, which represent different dimension reduction sequences obtained by applying two dimension reduction methods to the same video coding sequence, if the first dimension reduction method is applied, the long video coding sequence and the short video coding sequence are multiplied by a first mapping matrix respectively to implement weight embedding mapping (embedding) to obtain a corresponding first dimension reduction sequence, that is, the long video coding sequence is multiplied by the first mapping matrix to obtain a first dimension reduction sequence of the long video, and the short video coding sequence is multiplied by the first mapping matrix to obtain a first dimension reduction sequence of the short video. And recording a first long video dimensionality reduction sequence as C = (C1, C2, \8230;, cn), wherein C1-cn are video description vectors of n long videos in the first long video dimensionality reduction sequence, and recording a first short video dimensionality reduction sequence as I = (I1, I2, \8230;, in), wherein I1-in are video description vectors of n short videos in the first short video dimensionality reduction sequence, and each video description vector not only comprises attribute information of a corresponding video, but also comprises an incidence relation between the video and other videos in the same sequence.
If a second dimension reduction mode is adopted, multiplying the long video coding sequence and the short video coding sequence by a second mapping matrix respectively to realize weight embedding mapping (embedding) to obtain a corresponding second dimension reduction sequence, namely multiplying the long video coding sequence by the second mapping matrix to obtain a second dimension reduction matrix of the long video, and multiplying the short video coding sequence by the second mapping matrix to obtain a second dimension reduction matrix of the short video. The first mapping matrix and the second mapping matrix may contain the same and/or different matrix parameters, the first mapping matrix is mostly video attribute related matrix parameters, for example, video attribute characteristics such as video type, video ID, channel ID, etc., and the second mapping matrix is mostly preference attribute related matrix parameters, for example, preference characteristics such as title of a drama, actor of a starring, director, and hero name, etc., which may be of interest to the user. And recording a second long video dimension-reduced sequence as D = (D1, D2, \8230;, dn), wherein D1-dn are video description vectors of n long videos in the second long video dimension-reduced sequence, and recording a second short video dimension-reduced sequence as K = (K1, K2, \8230;, K n), wherein K1-K n are video description vectors of n short videos in the second short video dimension-reduced sequence.
And the dimension reduction sequence obtained according to the first dimension reduction mode is used for reflecting the preference degree of the user to each long video or each short video, and the dimension reduction sequence obtained according to the second dimension reduction mode is used for reflecting the preference degree of the user to each preference feature, so that the interest features of the user are mined.
In one embodiment, the feature vectors include a feature vector, the fusion vector includes a feature fusion vector, and the extracting the corresponding feature vectors based on the video viewing sequences respectively and fusing the feature vectors to form the fusion vector includes:
respectively carrying out mean pooling on each first dimensionality reduction sequence to obtain corresponding characterization vectors;
determining a primary fusion vector according to the dot product result among the characterization vectors;
performing fusion processing on the primary fusion vector and all the first dimensionality reduction sequences to obtain a secondary fusion vector;
adding the primary fused vector and the secondary fused vector to form the token fused vector.
In particular, the feature vectors include a characterization vector that indicates a user's preference for different video viewing sequences. And performing mean pooling (MeanPooling) processing on the first long video dimension reducing sequence and the first short video dimension reducing sequence respectively, namely averaging all video description vectors in the first long video dimension reducing sequence to obtain a corresponding long video characterization vector Ec1, and averaging all video description vectors in the first short video dimension reducing sequence to obtain a corresponding short video characterization vector Ei1, wherein the characterization vectors comprise long video characterization vectors and short video characterization vectors and are used for reflecting the preference degree of a user for the long video or the short video.
The dot product result between the long video representation vector Ec1 and the short video representation vector Ei1 is used as a first-level fusion vector Ef1, namely partial information of a first long video dimension reduction sequence and a first short video dimension reduction sequence is fused, the first-level fusion vector Ef1, a first long video dimension reduction sequence C and a first short video dimension reduction sequence I are input into an Attention network model (Attention) as input parameters to output a fused second-level fusion vector Ef2, the first long video dimension reduction sequence C and the first short video dimension reduction sequence I are continuously fused on the basis of the first-level fusion vector to generate a second-level fusion vector Ef2 with higher-level representation, and the first-level fusion vector Ef1 and the second-level fusion vector Ef2 are added to form a fusion representation vector F representation vector of a double sequence of the first long video dimension reduction sequence and the first short video dimension reduction sequence, wherein the fusion vector F representation vector comprises relation and interaction information between different video watching sequences, and the fusion vector can be used for accurately capturing multiple dimensions of interested videos of a user.
In one embodiment, the feature vectors further include preference vectors, the fusion vectors include preference fusion vectors, and the extracting the corresponding feature vectors based on the video viewing sequences respectively and fusing the feature vectors to form the fusion vector includes:
determining a relation learning vector corresponding to each video description vector based on an association relation between different video description vectors in each second dimension-reduced sequence, wherein the relation learning vector comprises a target description vector and an association relation between the target description vector and each video description vector in the second dimension-reduced sequence, and the target description vector is any one of the video description vectors in the second dimension-reduced sequence;
determining preference vectors corresponding to the second dimension reduction sequences according to the confidence degrees that a plurality of relationship learning vectors corresponding to the same second dimension reduction sequence belong to the preference features;
and fusing the preference vectors corresponding to the second dimensionality reduction sequences to form the preference fusion vector.
Specifically, the feature vector further includes a preference vector, where the preference vector is used to indicate the preference degree of the user for different preference features, that is, the interest degree of the user for different aspects of the video is reflected by the preference vector, and the preference features include all or part of the above-mentioned attribute information, such as the actor, director, video category, and the like.
Inputting a long video second dimension reduction sequence D and a short video second dimension reduction sequence K into a multi-head self-attention network model as input parameters respectively, outputting a relation learning vector corresponding to each video description vector after learning the association relation between different video description vectors in each second dimension reduction sequence, namely the relation learning vector not only comprises the attribute information corresponding to the corresponding video but also comprises the association relation between the video and other videos in the same sequence, namely performing fusion learning on each video description vector in the same sequence, inputting a plurality of relation learning vectors output by the multi-head self-attention network model into a full-connection layer network structure as input parameters, classifying each relation learning vector, calculating the confidence coefficient of each relation learning vector belonging to each preference feature, determining the preference of a user for each preference feature according to the confidence coefficient of each relation learning vector belonging to each preference feature, forming the preference vector corresponding to the second dimension reduction sequence by the preference of each preference feature, namely obtaining the preference vector of the long video corresponding to the long video second dimension reduction sequence D and the preference vector corresponding to the short video second dimension reduction sequence K of the short video respectively.
Recording a long video preference vector as P = (P1, P2, \8230;, pm), wherein P1-pm are used for indicating the preference degrees of the users for m preference features when facing the long video, and recording a short video preference vector as Q = (Q1, Q2, \8230;, qm), wherein Q1-qm are used for indicating the preference degrees of the users for m preference features when facing the end video, and different users correspond to different preference degrees for the same preference features when facing the long video or the short video, for example, the preference features are actors, and young people prefer movies or television shows that the actors are referred to when facing the long video, i.e., the preference degrees for the long video that the actors are referred to are higher; the older adults prefer movies or television shows in which the senior actor plays when facing long videos, i.e., the preference for long videos in which the senior actor plays is higher. In this way, preference characteristics of different user identifications for different videos and preference degrees for the respective preference characteristics can be mined.
Inputting the long video preference vector and the short video preference vector into an MLP network to be fused to form a preference fusion vector H, wherein the preference fusion vector is used for reflecting preference characteristics of users to different videos and preference degrees of the users to various preference characteristics so as to accurately capture the preference characteristics of the users to the videos.
In one embodiment, the determining the recommendation value of each video vector to be recommended according to the obtained inner product result of each video vector to be recommended and the fusion vector includes:
converting the acquired attribute information of each video to be recommended into corresponding video vectors to be recommended;
and determining a recommendation value of each video vector to be recommended according to an inner product result between the representation fusion vector or the preference fusion vector and each video vector to be recommended, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended.
Specifically, video attribute information of each video to be recommended is mapped to vector representation, in this embodiment, mapping conversion processing is performed on the video attribute information through an MLP network model, so as to output a corresponding video vector V to be recommended, a recommendation value of the video to be recommended is calculated by using double-tower modeling according to a dot product result of a characterization fusion vector or a preference fusion vector and each video vector to be recommended, and a rough arrangement result of the video to be recommended is determined according to a dot product result between the characterization fusion vector and the video vector to be recommended, which is used as the recommendation value of the video vector to be recommended, that is, F · V, which fuses cross information of a long video viewing sequence and a short video viewing sequence and attribute information of the video to be recommended; and determining the rough arrangement result of the video to be recommended according to the fact that the dot product result between the preference fusion vector and the video vector to be recommended is used as the recommendation value of the video vector to be recommended, namely H.V, and the preference information of the user for the long video watching sequence and the short video watching sequence and the attribute information of the video to be recommended are fused.
In an embodiment, after converting the obtained attribute information of each video to be recommended into a corresponding video vector to be recommended, the method further includes:
and adding the video vector to be recommended with the dot product results of the representation fusion vector and the preference fusion vector respectively to obtain a recommendation value of the video vector to be recommended.
Specifically, the video vector to be recommended is added with the dot product result of the representation fusion vector and the preference fusion vector, namely F.V + H.V, to realize the calculation of the recommendation value of the video to be recommended by three-tower modeling, and the cross information of the long video watching sequence and the short video watching sequence, the preference information of the user for the long video watching sequence and the short video watching sequence and the attribute information of the video to be recommended are fused to determine the rough arrangement result of the video to be recommended, so that the preference information of the user for the video is captured more fully and comprehensively, and the rough arrangement result of the video more conforming to the preference of the user is provided.
No matter double-tower modeling or three-tower modeling is adopted, the modeling process also performs learning training on the initial model according to the model using process, namely performs cross learning training on the sample sequences in the sample data set according to the process to obtain the neural network model capable of realizing the process.
Fig. 2 is a flowchart illustrating a video recommendation method according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided a video recommendation apparatus including:
an obtaining module 310, configured to obtain a video watching set corresponding to a target user identifier, where the video watching set includes at least two video watching sequences with different video durations;
a fusion module 320, configured to extract corresponding feature vectors based on the video watching sequences, and fuse the feature vectors to form a fusion vector;
a determining module 330, configured to determine a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, where the video vector to be recommended is used to indicate attribute information of a video to be recommended;
and the pushing module 340 is configured to push the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommendation values.
In an embodiment, the obtaining module 310 is specifically configured to:
acquiring a historical video set corresponding to the target user identifier, wherein the historical video set comprises at least two historical video sequences with different video durations, and the historical video sequences comprise a plurality of video attribute vectors;
respectively carrying out one-hot coding processing on each historical video sequence to obtain a corresponding video coding sequence, wherein the video coding sequence comprises one-hot code vectors corresponding to a plurality of video attribute vectors;
and performing dimensionality reduction on each video coding sequence to obtain a corresponding video watching sequence, wherein the video watching sequence comprises a plurality of video description vectors.
In an embodiment, the obtaining module 310 is specifically configured to:
multiplying each video coding sequence by a first mapping matrix respectively to obtain a corresponding first dimension reduction sequence, wherein the first mapping matrix comprises matrix parameters corresponding to video attributes;
and multiplying each video coding sequence by a second mapping matrix respectively to obtain the corresponding second dimension reduction sequence, wherein the second mapping matrix comprises matrix parameters corresponding to preference attributes.
In one embodiment, the fusion module 320 is specifically configured to:
performing mean pooling on each first dimensionality reduction sequence to obtain corresponding characterization vectors;
determining a primary fusion vector according to the dot product result among the characterization vectors;
performing fusion processing on the primary fusion vector and all the first dimensionality reduction sequences to obtain a secondary fusion vector;
adding the primary fusion vector and the secondary fusion vector to form the token fusion vector.
In one embodiment, the fusion module 320 is specifically configured to:
determining a relation learning vector corresponding to each video description vector based on an association relation between different video description vectors in each second dimension-reduced sequence, wherein the relation learning vector comprises a target description vector and an association relation between the target description vector and each video description vector in the second dimension-reduced sequence, and the target description vector is any one of the video description vectors in the second dimension-reduced sequence;
determining preference vectors corresponding to the second dimension reduction sequences according to the confidence degrees that a plurality of relationship learning vectors corresponding to the same second dimension reduction sequence belong to the preference features;
and fusing the preference vectors corresponding to the second dimensionality reduction sequences to form the preference fusion vector.
In one embodiment, the determining module 330 is specifically configured to:
converting the acquired attribute information of each video to be recommended into corresponding video vectors to be recommended;
and determining a recommendation value of each video vector to be recommended according to an inner product result between the characterization fusion vector or the preference fusion vector and each video vector to be recommended, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended.
In an embodiment, the determining module 330 is specifically configured to:
and adding the video vector to be recommended with the dot product results of the representation fusion vector and the preference fusion vector respectively to obtain a recommendation value of the video vector to be recommended.
FIG. 4 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the server 120 in fig. 1. As shown in fig. 4, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. The memory comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the video recommendation method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the video recommendation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the video recommendation apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the video recommendation apparatus, such as the obtaining module 310, the fusing module 320, the determining module 330, and the pushing module 340 shown in fig. 3. The computer program constituted by the respective program modules causes the processor to execute the steps in the video recommendation method of the embodiments of the present application described in the present specification.
The computer device shown in fig. 4 may perform the step of obtaining a video viewing set corresponding to the target user identification through an obtaining module 310 in the video recommending apparatus shown in fig. 3, where the video viewing set includes at least two video viewing sequences with different video durations. The computer device may perform, through the fusion module 320, extracting corresponding feature vectors based on the video viewing sequences, respectively, and fuse the feature vectors to form a fusion vector. The computer device may determine, by the determining module 330, a recommendation value of each to-be-recommended video vector according to an obtained inner product result of each to-be-recommended video vector and the fusion vector, where the to-be-recommended video vector is used to indicate attribute information of a to-be-recommended video. The computer device can push the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value through the pushing module 340.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the above embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method of any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing the relevant hardware through a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for video recommendation, the method comprising:
acquiring a video watching set corresponding to a target user identifier, wherein the video watching set comprises at least two video watching sequences with different video durations;
respectively extracting corresponding feature vectors based on the video watching sequences, and fusing the feature vectors to form a fusion vector;
determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
2. The method of claim 1, wherein obtaining the video viewing set corresponding to the target user identification comprises:
acquiring a historical video set corresponding to the target user identifier, wherein the historical video set comprises at least two historical video sequences with different video durations, and the historical video sequences comprise a plurality of video attribute vectors;
respectively carrying out one-hot coding processing on each historical video sequence to obtain a corresponding video coding sequence, wherein the video coding sequence comprises one-hot code vectors corresponding to a plurality of video attribute vectors;
and performing dimensionality reduction on each video coding sequence to obtain a corresponding video watching sequence, wherein the video watching sequence comprises a plurality of video description vectors.
3. The method of claim 1, wherein the video viewing sequences comprise a first dimension-reduced sequence and a second dimension-reduced sequence, and wherein performing dimension-reduction processing on each of the video encoding sequences to obtain the corresponding video viewing sequence comprises at least one of:
multiplying each video coding sequence by a first mapping matrix respectively to obtain a corresponding first dimension reduction sequence, wherein the first mapping matrix comprises matrix parameters corresponding to video attributes;
and multiplying each video coding sequence by a second mapping matrix respectively to obtain the corresponding second dimension reduction sequence, wherein the second mapping matrix comprises matrix parameters corresponding to preference attributes.
4. The method according to claim 3, wherein the feature vectors include a feature vector, the fusion vector includes a feature fusion vector, and the extracting the corresponding feature vectors based on the video viewing sequences respectively and fusing the feature vectors to form the fusion vector comprises:
performing mean pooling on each first dimensionality reduction sequence to obtain corresponding characterization vectors;
determining a primary fusion vector according to a dot product result among the characterization vectors;
performing fusion processing on the primary fusion vector and all the first dimensionality reduction sequences to obtain a secondary fusion vector;
adding the primary fused vector and the secondary fused vector to form the token fused vector.
5. The method according to claim 4, wherein the feature vectors further include preference vectors, the fusion vector includes preference fusion vectors, and the extracting the corresponding feature vectors based on the video viewing sequences and fusing the feature vectors to form the fusion vector comprises:
determining a relation learning vector corresponding to each video description vector based on an association relation between different video description vectors in each second dimension-reduced sequence, wherein the relation learning vector comprises a target description vector and an association relation between the target description vector and each video description vector in the second dimension-reduced sequence, and the target description vector is any one of the video description vectors in the second dimension-reduced sequence;
determining preference vectors corresponding to the second dimensionality reduction sequences according to the confidence degrees that the plurality of relationship learning vectors corresponding to the same second dimensionality reduction sequence belong to the preference features;
and fusing the preference vectors corresponding to the second dimensionality reduction sequences to form the preference fusion vector.
6. The method according to claim 5, wherein the determining the recommendation value of each video vector to be recommended according to the obtained inner product result of each video vector to be recommended and the fusion vector comprises:
converting the acquired attribute information of each video to be recommended into corresponding video vectors to be recommended;
and determining a recommendation value of each video vector to be recommended according to an inner product result between the representation fusion vector or the preference fusion vector and each video vector to be recommended, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended.
7. The method according to claim 6, wherein after the obtained attribute information of each video to be recommended is converted into the corresponding video vector to be recommended, the method further comprises:
and adding the video vector to be recommended with the dot product results of the representation fusion vector and the preference fusion vector respectively to obtain a recommendation value of the video vector to be recommended.
8. A video recommendation apparatus, the apparatus comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a video watching set corresponding to a target user identifier, and the video watching set comprises at least two video watching sequences with different video durations;
the fusion module is used for respectively extracting corresponding feature vectors based on the video watching sequences and fusing the feature vectors to form fusion vectors;
the determining module is used for determining a recommendation value of each video vector to be recommended according to an obtained inner product result of each video vector to be recommended and the fusion vector, wherein the video vector to be recommended is used for indicating attribute information of a video to be recommended;
and the pushing module is used for pushing the video data corresponding to the video vector to be recommended to the terminal corresponding to the target user identifier according to the descending order of the recommended value.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202211160276.7A 2022-09-22 2022-09-22 Video recommendation method and device, computer equipment and storage medium Pending CN115391663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211160276.7A CN115391663A (en) 2022-09-22 2022-09-22 Video recommendation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211160276.7A CN115391663A (en) 2022-09-22 2022-09-22 Video recommendation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115391663A true CN115391663A (en) 2022-11-25

Family

ID=84126154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211160276.7A Pending CN115391663A (en) 2022-09-22 2022-09-22 Video recommendation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115391663A (en)

Similar Documents

Publication Publication Date Title
CN110263243B (en) Media information recommendation method, device, storage medium and computer equipment
CN110162701B (en) Content pushing method, device, computer equipment and storage medium
CN109002488B (en) Recommendation model training method and device based on meta-path context
KR101944469B1 (en) Estimating and displaying social interest in time-based media
CN106326391B (en) Multimedia resource recommendation method and device
CN112417207B (en) Video recommendation method, device, equipment and storage medium
CN105430505B (en) A kind of IPTV program commending methods based on combined strategy
CN112364204B (en) Video searching method, device, computer equipment and storage medium
CN112507163B (en) Duration prediction model training method, recommendation method, device, equipment and medium
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN108197336B (en) Video searching method and device
CN112100513A (en) Knowledge graph-based recommendation method, device, equipment and computer readable medium
CN111046230A (en) Content recommendation method and device, electronic equipment and storable medium
CN115168744A (en) Radio and television technology knowledge recommendation method based on user portrait and knowledge graph
CN113515696A (en) Recommendation method and device, electronic equipment and storage medium
CN113220974B (en) Click rate prediction model training and search recall method, device, equipment and medium
CN109063080B (en) Video recommendation method and device
CN110162689A (en) Information-pushing method, device, computer equipment and storage medium
US20100205041A1 (en) Determining the interest of individual entities based on a general interest
CN117061796A (en) Film recommendation method, device, equipment and readable storage medium
CN111008667A (en) Feature extraction method and device and electronic equipment
CN115391663A (en) Video recommendation method and device, computer equipment and storage medium
EP3314903B1 (en) Digital content provision
CN111061913B (en) Video pushing method, device, system, computer readable storage medium and equipment
Sotelo et al. A comparison of audiovisual content recommender systems performance: Collaborative vs. semantic approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination