CN113742567B

CN113742567B - Recommendation method and device for multimedia resources, electronic equipment and storage medium

Info

Publication number: CN113742567B
Application number: CN202010478317.1A
Authority: CN
Inventors: 唐新春
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-08-22
Anticipated expiration: 2040-05-29
Also published as: CN113742567A

Abstract

The disclosure relates to a recommendation method, a recommendation device, an electronic device and a storage medium for multimedia resources, which are used for solving the problem that the current mode of recommending advertisement short videos to users can cause the type of the advertisement short videos recommended to the users to be too single. According to the embodiment of the disclosure, the account data of the target account of the multimedia resource to be recommended is subjected to feature extraction to obtain the account feature information of the target account; extracting the characteristics of the attribute data of the candidate multimedia resources to obtain attribute characteristic information; carrying out fusion processing on the account characteristic information and the attribute characteristic information to obtain screening parameters corresponding to the candidate multimedia resources; and screening the target multimedia resources from the candidate multimedia resource set according to the screening parameters corresponding to the candidate multimedia resources. The multimedia resource recommendation method provided by the embodiment of the disclosure changes the current situation that the current single multimedia resource type clicked based on the account history is screened, and provides a plurality of screening modes, so that the screening of the resources is more accurate.

Description

Recommendation method and device for multimedia resources, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of computers, and in particular relates to a recommendation method and device for multimedia resources, electronic equipment and a storage medium.

Background

With the gradual rise of short videos on the mobile internet, a living habit is gradually formed in watching short videos. In short video content recommended to users by a general short video platform, natural works and advertising works are not only available.

Currently, when recommending short videos of advertisement types to users, recommending the advertisement short videos with the same type as the advertisement short videos clicked by the users in a history way to the users based on the advertisement short videos clicked by the users in a history way; for example, when the user clicks on the advertisement short video of "music toy", other advertisement short videos related to "music toy" are recommended to the user when recommending advertisement short videos to the user. However, the current way of recommending advertisement short videos to users may result in the advertisement short videos being recommended to users of too single type.

Disclosure of Invention

The disclosure provides a recommendation method, a recommendation device, an electronic device and a storage medium for multimedia resources, which are used for solving the problem that the type of advertisement short video recommended to a user is too single due to the current mode of recommending the advertisement short video to the user. The technical scheme of the present disclosure is as follows:

According to a first aspect of an embodiment of the present disclosure, there is provided a recommendation method for a multimedia resource, including:

carrying out feature extraction on account data of a target account of a multimedia resource to be recommended to obtain account feature information of the target account; extracting the characteristic of the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account to obtain the attribute characteristic information of the candidate multimedia resources;

carrying out fusion processing on the account characteristic information and the attribute characteristic information to obtain screening parameters corresponding to the candidate multimedia resources; the screening parameter is a probability value that the duration of playing the candidate multimedia resources by the target account is not less than a preset threshold value;

and screening target multimedia resources which are recommended to the target account from the candidate multimedia resource set according to the screening parameters corresponding to the candidate multimedia resources.

In one possible implementation manner, the feature extraction of the account data of the target account to obtain account feature information of the target account includes:

based on the trained first feature extraction network, embedding account data of the target account to obtain an embedded vector containing semantic information corresponding to the target account, and transforming the embedded vector to obtain account feature information of the target account.

The feature extraction is performed on the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account, so as to obtain attribute feature information of the candidate multimedia resources, including:

and based on the trained second feature extraction network, carrying out embedding processing on the attribute data of the candidate multimedia resources to obtain embedded vectors containing semantic information corresponding to the candidate multimedia resources, and carrying out transformation processing on the embedded vectors to obtain the attribute feature information of the candidate multimedia resources.

In one possible implementation, the trained second feature extraction network includes an embedded layer and at least one hidden layer;

the embedding processing is performed on the attribute data of the candidate multimedia resources to obtain embedding vectors containing semantic information corresponding to the candidate multimedia resources, including:

Based on the embedded layer of the trained second feature extraction network, carrying out embedding processing on the attribute data of the candidate multimedia resources to obtain embedded vectors containing semantic information corresponding to the candidate multimedia resources;

the transforming the embedded vector to obtain attribute characteristic information of the candidate multimedia resource includes:

and based on at least one hidden layer of the trained second feature extraction network, carrying out transformation processing on the embedded vector according to a second weight matrix corresponding to each hidden layer to obtain attribute feature information of the candidate multimedia resource.

In a possible implementation manner, the fusing the account feature information and the attribute feature information to obtain the filtering parameters corresponding to the candidate multimedia resources includes:

and based on the trained fully-connected neural network, carrying out fusion processing on the account feature information and the attribute feature information to obtain a feature vector to be output, and carrying out linear regression processing on the feature vector to be output to obtain screening parameters corresponding to the candidate multimedia resources.

In one possible implementation, the trained fully-connected neural network includes at least one hidden layer and an output layer;

The fusing processing is performed on the account feature information and the attribute feature information to obtain a feature vector to be output, which comprises the following steps:

based on at least one hidden layer of the trained fully-connected neural network, carrying out fusion processing on the account feature information and the attribute feature information according to a third weight matrix corresponding to each hidden layer to obtain a feature vector to be output;

and performing linear regression processing on the feature vector to be output to obtain screening parameters corresponding to the candidate multimedia resources, wherein the screening parameters comprise:

and carrying out linear regression processing on the feature vector to be output based on the output layer of the trained fully-connected neural network to obtain the screening parameters corresponding to the candidate multimedia resources.

In one possible implementation, the first feature extraction network, the second feature extraction network, and the fully-connected neural network are trained according to the following manner:

performing feature extraction on account data of a sample account in a training sample based on an initial first feature extraction network to obtain account feature information of the sample account; performing feature extraction on attribute data of the sample multimedia resources in the training sample based on an initial second feature extraction network to obtain attribute feature information of the sample multimedia resources;

Based on an initial fully-connected neural network, carrying out fusion processing on account characteristic information of the sample account and attribute characteristic information of the sample multimedia resource to obtain screening parameters corresponding to the sample multimedia resource;

and adjusting the first feature extraction network, the second feature extraction network and the fully-connected neural network according to the screening parameters corresponding to the sample multimedia resources and the training labels of the marked training samples until the determined loss value is within a preset range, so as to obtain the trained first feature extraction network, second feature extraction network and fully-connected neural network.

In one possible implementation, the first feature extraction network, the second feature extraction network, and the fully-connected neural network are adjusted according to the following manner:

determining a loss value according to the screening parameters corresponding to the sample multimedia resources and the training labels of the marked training samples;

and adjusting a first weight matrix of the first feature extraction network, a second weight matrix of the second feature extraction network and a third weight matrix of the fully-connected neural network according to the determined loss value.

In one possible implementation manner, the sample multimedia resource in the training sample is a multimedia resource played by the sample account in a preset history duration;

labeling training labels of training samples according to the following manner:

if the time length of the sample account for playing the sample multimedia resource in the preset historical time length is not less than a preset threshold value, marking the training label of the training sample as a positive sample label; or (b)

If the time length of the sample account for playing the sample multimedia resource in the preset history time length is smaller than a preset threshold value, marking the training label of the training sample as a negative sample label.

According to a second aspect of the embodiments of the present disclosure, there is provided a recommendation device for a multimedia resource, including:

the feature extraction unit is configured to perform feature extraction on account data of a target account of the multimedia resource to be recommended to obtain account feature information of the target account; extracting the characteristic of the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account to obtain the attribute characteristic information of the candidate multimedia resources;

the fusion processing unit is configured to perform fusion processing on the account feature information and the attribute feature information to obtain screening parameters corresponding to the candidate multimedia resources; the screening parameter is a probability value that the duration of playing the candidate multimedia resources by the target account is not less than a preset threshold value;

And the screening unit is configured to perform screening of target multimedia resources which are recommended to the target account from the candidate multimedia resource set according to the screening parameters corresponding to the candidate multimedia resources.

In a possible implementation manner, the feature extraction unit is specifically configured to perform:

In one possible implementation, the trained first feature extraction network includes an embedded layer and at least one hidden layer;

the feature extraction unit is specifically configured to perform:

based on the trained first feature extraction network embedding layer, embedding the account data of the target account to obtain an embedding vector containing semantic information corresponding to the target account;

and based on at least one hidden layer of the trained first feature extraction network, carrying out transformation processing on the embedded vector according to a first weight matrix corresponding to each hidden layer to obtain account feature information of the target account.

the feature extraction unit is specifically configured to perform:

based on the trained embedding layer of the second feature extraction network, carrying out embedding processing on the attribute data of the candidate multimedia resources to obtain embedding vectors containing semantic information corresponding to the candidate multimedia resources;

In one possible implementation, the fusion processing unit is specifically configured to perform:

the fusion processing unit is specifically configured to perform:

In one possible implementation, the apparatus further comprises a training unit;

the training unit is configured to perform training of the first feature extraction network, the second feature extraction network, the fully connected neural network according to the following manner:

In a possible implementation manner, the training unit is specifically configured to perform training of the first feature extraction network, the second feature extraction network, and the fully-connected neural network according to the following manner:

the training unit is specifically configured to perform a training label labeling training samples according to the following manner:

if the sample account plays the sample multimedia resource within the preset history time, marking the training label of the training sample as a positive sample label; or (b)

And if the play of the sample multimedia resource played by the sample account in the preset history time is smaller than a preset threshold, marking the training label of the training sample as a negative sample label.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a memory for storing executable instructions;

And the processor is used for reading and executing the executable instructions stored in the memory to realize the recommendation method of the multimedia resource according to any one of the first aspect of the embodiment of the disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-volatile storage medium, which when executed by a processor of a multimedia asset recommendation device, enables the multimedia asset recommendation device to perform the recommendation method of multimedia assets described in the first aspect of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the recommendation method for the multimedia resources, account characteristic information of a target account is extracted from acquired account data of the target account, and attribute characteristics of candidate multimedia resources are extracted from attribute data of the candidate multimedia resources corresponding to the determined target account; determining screening parameters corresponding to the candidate multimedia resources according to account feature information of the target account and attribute features of the candidate multimedia resources; because the determined screening parameter is a probability value that the predicted duration of the target account playing candidate multimedia resource is not less than a preset threshold value, the larger the screening parameter is, the larger the probability that the duration of the target account playing candidate multimedia resource is not less than the preset threshold value is. Therefore, after the target candidate multimedia resources screened from the database according to the screening parameters are recommended to the target account, the target account can play longer time when playing the target multimedia resources, so that the user experience of the user corresponding to the target account is improved. In addition, the multimedia resource recommendation method provided by the embodiment of the disclosure changes the current situation that the current single multimedia resource type clicked based on the account history is screened, and provides a plurality of screening modes, so that the screening of the resources is more accurate.

Drawings

FIG. 1 is a schematic diagram of an application scenario illustrated in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram of a recommendation system for multimedia assets, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a method for recommending multimedia assets, according to an exemplary embodiment;

FIG. 4 is a schematic diagram of a DNN model of a two column separation according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating a method of training a first feature extraction network, a second feature extraction network, and a fully connected neural network, according to an example embodiment;

FIG. 6 is a flowchart illustrating a complete multimedia asset recommendation method, according to an exemplary embodiment;

FIG. 7 is a block diagram of a recommendation device for multimedia assets, according to an exemplary embodiment;

FIG. 8 is a block diagram of another recommendation device for multimedia assets, according to an exemplary embodiment;

fig. 9 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

In the following, some terms in the embodiments of the present disclosure are explained for easy understanding by those skilled in the art.

(1) The term "and/or" in the embodiments of the present disclosure describes an association relationship of association objects, which indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

(2) The term "electronic device" in embodiments of the present disclosure may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

(3) The term "short video" in embodiments of the present disclosure refers to high frequency pushed video content that is played on various new media platforms, suitable for viewing in a mobile state and a short leisure state, ranging from a few seconds to a few minutes. The content integrates topics such as skill sharing, humorous, fashion trends, social hotspots, street interviews, public education, advertising creatives, business customization and the like. Because the content is shorter, the content can be singly sliced, and can also be a series of columns.

(4) The term "classification model" in the embodiments of the present disclosure refers to a deep learning model for solving classification problems, including algorithms such as support vector machines, decision trees, etc., which is a model for exploring the relationship between "expected response variable" and "independent variable" to achieve some fitting to nonlinear relationships.

(5) The term "multimedia resource" in the embodiments of the present disclosure may be a resource for digital transmission, specifically may be a video resource, a text resource, or a picture resource; for example, the video asset may be a short video, live video, etc., where the short video may be an advertising short video.

(6) The term "DNN" in the embodiment of the present disclosure is Deep Neural Networks, which is the basis of deep learning, and is divided according to the positions of different layers, and the neural network layers inside the DNN may be divided into three types, i.e., an input layer, a hidden layer and an output layer, where the layers are fully connected. Before using the DNN, the DNN needs to be trained, and after the DNN is trained, input data may be input into the DNN and corresponding output data may be obtained.

(7) The term "client" or "client" in the embodiments of the present disclosure refers to a program corresponding to a server, which provides a local service for a user. Except for some applications that only run locally, they are typically installed on a common client and need to run in conjunction with the server.

The multimedia resource recommendation can be applied to an application program for displaying the multimedia resource, such as a short video application program or a player application program, and after a user triggers a page display request, the application program screens the multimedia resource and displays the screened multimedia resource to the user.

The recommendation flow of the advertisement short video is introduced by taking the multimedia resource as the advertisement short video as an example.

In a large-scale digital advertisement system, the number of advertisements which can be selected and put under each flow (i.e. on-line users) is huge, and when recommending advertisements for users, the most suitable advertisements cannot be selected by one-time calculation, so that the industry generally adopts a multi-level funnel mode to screen the advertisements which are most suitable for recommending to the users, for example, three main stages of recall, rough selection and carefully selection are adopted, and finally, the benefits of the users, advertisers and platforms are maximized, and mutual win-win is realized.

For the purpose of promoting an understanding of the principles and advantages of the disclosure, reference will now be made in detail to the drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the disclosure. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

It should be noted that, the multimedia resource in the following description may be the advertisement short video in the above description, but may also be other multimedia resources besides the advertisement short video.

Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present disclosure.

As shown in fig. 1, at least one server 20 and a plurality of terminal devices 30 may be included in the application scenario. The terminal device 30 may be any suitable electronic device that may be used for network access, including but not limited to a computer, a notebook, a smart phone, a tablet, or other type of terminal. Server 20 is any server that is capable of providing information needed for interactive services through a network access. The terminal device 30 can transmit and receive information to and from the server 20 via the network 40. The server 20 may obtain content required by the terminal device 30, such as model parameters, advertisement content, index files, etc., by accessing the database 50. The terminal devices (e.g., between 30_1 and 30_2 or 30_n) may also communicate with each other via the network 40. Network 40 may be a broad network for information transfer and may include one or more communication networks such as a wireless communication network, the internet, a private network, a local area network, a metropolitan area network, a wide area network, or a cellular data network.

In the following description, only a single server or terminal device is detailed, but it will be understood by those skilled in the art that the single server 20, terminal device 30 and database 50 are shown to illustrate that the technical solution of the present application involves the operation of the terminal device, server and database. The details of the individual terminal devices and individual servers and databases are provided for ease of illustration at least and not to imply limitations on the types or locations of the terminal devices and servers. It should be noted that the underlying concepts of the exemplary embodiments of this application are not altered if additional modules are added to or individual modules are removed from the illustrated environment. In addition, although a bi-directional arrow from the database 50 to the server 20 is shown for ease of illustration, it will be understood by those skilled in the art that the above-described data transmission and reception may also be implemented through the network 40.

At present, when advertisement short videos are screened, advertisement short videos with the same type as the advertisement short videos clicked by the user are screened and recommended to the user based on the advertisement short videos clicked by the user in history, so that the advertisement short videos recommended to the user are all short videos with the same type, and the recommended advertisement video is too single in type.

Based on the above-mentioned problems, as shown in fig. 2, an embodiment of the present disclosure provides a recommendation system for multimedia resources, including a client 21, a server 22, and a user 23. The client 21 is an application client installed on the electronic device, and cooperates with the server 22 to provide services to the user 23, and the user 23 can view the content displayed by the client 21 or trigger the operations supported on the client 21.

In the embodiment of the disclosure, in response to an operation of a presentation page triggered by a user 23 on a client 21, the client 21 sends a request for presenting the page to a server 22, the server 22 obtains account data of a target account of a multimedia resource to be recommended, and the server 22 may determine a candidate multimedia resource set corresponding to the target account according to part or all of the account data of the target account; the server 22 also needs to obtain attribute data for candidate multimedia assets in the set of multimedia assets. Wherein the target account is an account used by the user 23 to log into the client 21.

The server 22 performs feature extraction on account data of the target account to obtain account feature information of the target account; and the server 22 performs feature extraction on the attribute data of the candidate multimedia resources in the candidate multimedia resource set to obtain attribute feature information of the candidate multimedia resources. Then, the server 22 performs fusion processing on account feature information of the target account and attribute feature information of the candidate multimedia resources to obtain screening parameters corresponding to the candidate multimedia resources; the screening parameter is a probability value that the predicted time length of playing the candidate multimedia resources by the target account is not less than a preset threshold value. The server 22 screens out the target multimedia resources to be recommended to the target account from the multimedia resource set according to the screening parameters corresponding to each multimedia resource in the candidate multimedia resource set, and returns the screened target multimedia resources to the client 21. After receiving the target multimedia resource returned by the server 22, the client 21 generates a presentation page containing the target multimedia resource, and presents the generated presentation page to the user 23 on the client 21.

The following describes a multimedia resource recommendation method provided in an embodiment of the present disclosure.

As shown in fig. 3, a flowchart of a method for recommending multimedia resources according to an embodiment of the disclosure includes the following steps:

in step S301, extracting features from account data of a target account of a multimedia resource to be recommended to obtain account feature information of the target account; extracting the characteristic of the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account to obtain the attribute characteristic information of the candidate multimedia resources;

in step S302, the account feature information and the attribute feature information are fused to obtain screening parameters corresponding to the candidate multimedia resources; the screening parameter is a probability value that the duration of playing the candidate multimedia resources by the target account is not less than a preset threshold value;

in step S303, a target multimedia resource for recommending to the target account is selected from the candidate multimedia resource set according to the screening parameter corresponding to the candidate multimedia resource.

The account data of the target account in the embodiment of the disclosure comprises user portrait and/or historical behavior data of the target account;

user portraits for a target account include, but are not limited to:

the user corresponding to the account fills in personal information such as gender, age, region, occupation and the like when registering, registration time, the type of an operating system of a terminal used by the user and the economic consumption level of the user;

the historical behavior data of the target account is the behavior data of the target account within the preset historical time period, and the historical behavior data of the target account comprises but is not limited to:

the method comprises the steps of short video platform use time length, short video number of target account historical viewing, clicked short video in the target account historical time length, praised short video in the target account historical time length and short video determined to be 'dislike' of the target account;

if the short advertisement videos need to be recommended to the target account, the historical behavior data of the target account are all data related to the short advertisement videos.

The attribute data of the candidate multimedia asset includes, but is not limited to:

the method comprises the steps of a type of a multimedia resource, an industry to which the multimedia resource belongs, the number of times the multimedia resource is played, whether the multimedia resource is reported, background music of the multimedia resource, whether the multimedia resource is commented, an ID of the multimedia resource and an uploading author of the multimedia resource.

It should be noted that, if the advertisement short video needs to be recommended for the target account, the candidate multimedia resource is the advertisement resource.

According to the embodiment of the disclosure, after the account data of the target account and the attribute data of the multimedia resource are obtained, the probability value that the time length of playing the multimedia resource by the target account is not less than a preset threshold value is required to be predicted;

because, if the predicted time length of playing the multimedia resource by the target account is not less than the larger probability value of the preset threshold value, the user is interested in the multimedia resource, and the multimedia resource can be recommended to the user preferentially.

When the screening parameters corresponding to the candidate multimedia resources are determined, the screening parameters corresponding to the candidate multimedia resources are determined according to account feature information of the target account and attribute features of the candidate multimedia resources;

in some embodiments of the present disclosure, the account characteristic information for the target account may be determined according to the following manner:

Wherein the first feature extraction network comprises an embedded layer and at least one hidden layer;

the embedding layer of the first feature extraction network is used for carrying out embedding processing on the acquired account data of the target account to obtain an embedding vector containing semantic information corresponding to the target account.

The embedded layer performs feature extraction on account data of the target account to obtain discrete features and continuous features;

the discrete feature is an integer feature value extracted from account data of the target account; for example, for the gender data of the target account, the extracted feature value is an integer-type feature value of 0 or 1;

the continuous feature is a continuous feature value extracted from account data of the target account; for example, using duration data for a short video platform of a target account, extracting continuous features;

after extracting discrete features and continuous features from account data of a target account, constructing an embedded vector by adopting an ebedding method aiming at the discrete features; and discretizing the continuous features aiming at the continuous features, and constructing an embedded vector by adopting an ebedding method after discretizing.

It should be noted that, the embedded vector constructed by adopting the embedding method is a vector containing semantic information, and is a low-dimensional vector which converts the feature value extracted from the account data of the target account into a semantic spatial relationship.

At least one hidden layer of the first feature extraction network performs transformation processing on the embedded vector containing semantic information corresponding to the target account according to the first weight matrix corresponding to each hidden layer to obtain account feature information of the target account;

each hidden layer in the first feature extraction network corresponds to a first weight matrix;

determining the times of transformation processing on the embedded vector corresponding to the target account according to the number of hidden layers in the first feature extraction network; for example, the first feature extraction network includes two hidden layers, and then the first transformation processing is performed on the embedded vector corresponding to the target account through the first weight matrix corresponding to the first hidden layer, and the second transformation processing is performed on the result of the first transformation processing through the second weight matrix corresponding to the second hidden layer, so as to obtain the account feature information of the target account.

It should be noted that, in the embodiment of the present disclosure, the first weight matrix corresponding to each hidden layer of the first feature extraction network is determined in the training process of the first feature extraction network.

In some embodiments of the present disclosure, the attribute characteristic information of the candidate multimedia asset may be determined according to the following manner:

Wherein the second feature extraction network comprises an embedded layer and at least one hidden layer;

and the embedding layer of the second feature extraction network is used for carrying out embedding processing on the acquired attribute data of the candidate multimedia resources to obtain embedding vectors containing semantic information corresponding to the candidate multimedia resources.

The embedded layer performs feature extraction on attribute data of the candidate multimedia resources to obtain discrete features and continuous features;

the discrete feature is an integer type feature value extracted from attribute data of the candidate multimedia resource; the continuous feature is a continuous feature value extracted from attribute data of the candidate multimedia resource;

after extracting discrete features and continuous features from attribute data of candidate multimedia resources, constructing an embedded vector by adopting an emmbedding method aiming at the discrete features; and discretizing the continuous features aiming at the continuous features, and constructing an embedded vector by adopting an ebedding method after discretizing.

It should be noted that, the embedded vector constructed by adopting the embedding method is a vector containing semantic information, and is a low-dimensional vector which converts the feature value extracted from the attribute data of the candidate multimedia resource into a semantic spatial relationship.

At least one hidden layer of the second feature extraction network performs transformation processing on the embedded vector containing semantic information corresponding to the candidate multimedia resource according to the second weight matrix corresponding to each hidden layer to obtain attribute feature information of the candidate multimedia resource;

each hidden layer in the second feature extraction network corresponds to a second weight matrix;

determining the times of transformation processing on the embedded vectors corresponding to the candidate multimedia resources according to the number of hidden layers in the second feature extraction network; for example, the second feature extraction network includes two hidden layers, and the first transformation processing is performed on the embedded vector corresponding to the candidate multimedia resource through the second weight matrix corresponding to the first hidden layer, and the second transformation processing is performed on the result of the first transformation processing through the second weight matrix corresponding to the second hidden layer, so as to obtain the attribute feature information of the candidate multimedia resource.

It should be noted that, in the embodiment of the present disclosure, the second weight matrix corresponding to each hidden layer of the second feature extraction network is determined in the training process of the second feature extraction network.

According to the embodiment of the disclosure, after the account characteristic information corresponding to the target account and the attribute characteristic information of the candidate multimedia resource are determined, the account characteristic information corresponding to the target account and the attribute characteristic information of the candidate multimedia resource are required to be fused, so that the screening parameters corresponding to the candidate multimedia resource are obtained.

In some embodiments of the present disclosure, the screening parameters corresponding to the candidate multimedia resources may be determined according to the following manner:

based on the trained fully-connected neural network, the account feature information and the attribute feature information are fused to obtain feature vectors to be output, and the feature vectors to be output are subjected to linear regression to obtain screening parameters corresponding to candidate multimedia resources.

Wherein the fully connected neural network comprises at least one hidden layer and an output layer;

at least one hidden layer of the fully-connected neural network is used for carrying out nonlinear fusion processing on account characteristic information of a target account and attribute characteristic information of candidate multimedia resources according to a third weight matrix corresponding to each hidden layer to obtain a characteristic vector to be output;

and the output layer of the fully-connected neural network carries out linear regression processing on the feature vector to be output to obtain screening parameters corresponding to the candidate multimedia resources.

According to the embodiment of the disclosure, account feature information of a target account and attribute feature information of candidate multimedia resources form a feature matrix, and the feature matrix formed by the account feature information of the target account and the attribute feature information of the candidate multimedia resources is subjected to transformation processing according to a third weight matrix corresponding to each hidden layer to obtain screening parameters corresponding to the candidate multimedia resources;

each hidden layer in the fully-connected neural network corresponds to a third weight matrix;

determining the times of transformation processing on a feature matrix formed by account feature information of a target account and attribute feature information of candidate multimedia resources according to the number of hidden layers in the fully-connected neural network; for example, the fully-connected neural network includes two hidden layers, and the feature matrix formed by the account feature information of the target account and the attribute feature information of the candidate multimedia resource is subjected to a first transformation process through a third weight matrix corresponding to the first hidden layer, and the result of the first transformation process is subjected to a second transformation process through a third weight matrix corresponding to the second hidden layer, so as to obtain the feature vector to be output.

The output layer of the fully-connected neural network carries out linear regression processing on the feature vector to be output to obtain screening parameters corresponding to the candidate multimedia resources;

It should be noted that, the screening parameter corresponding to the candidate multimedia resource in the embodiment of the present disclosure may be a value in the range of 0-1; therefore, the output layer of the fully-connected neural network carries out linear regression processing on the feature vector to be output obtained through the at least one hidden layer to obtain screening parameters in the range of 0-1.

The first feature extraction network, the second feature extraction network and the fully-connected neural network used in the process of determining the screening parameters corresponding to the candidate multimedia resources can be integrated in one network model;

taking a dual-tower separated DNN model integrating a first feature extraction network, a second feature extraction network, and a fully-connected neural network as an example, a process for determining screening parameters corresponding to candidate multimedia resources according to embodiments of the present disclosure is described below.

The dual-tower split DNN model shown in fig. 4, comprising an account-side DNN network, a multimedia resource-side DNN network, and a fully connected neural network;

the DNN network at the account side comprises an embedded layer and two hidden layers, wherein the first hidden layer comprises 256 neurons, and the second hidden layer comprises 128 neurons as an example;

the DNN network at the multimedia resource side comprises an embedded layer and two hidden layers, wherein the first hidden layer comprises 256 neurons, and the second hidden layer comprises 128 neurons;

The fully-connected neural network comprises two hidden layers and an output layer, with the first hidden layer comprising 512 neurons and the second hidden layer comprising 128 neurons as an example.

DNN network for account side:

inputting account data of a target account into an embedding layer of a DNN network at an account side, and performing feature extraction and embedding processing on the account data of the target account by the embedding layer to obtain an embedding vector corresponding to the account data of the target account;

for example, inputting the data of the gender, age, occupation and the like of the user into an embedded layer of the DNN network at the account side; the embedding layer performs feature extraction and ebedding processing on the data of the gender, the age, the occupation and the like of the user to obtain an ebedding vector corresponding to each data.

Assuming that 128 embedded vectors corresponding to account data of a target account are obtained, transforming the 128 embedded vectors through a first weight matrix corresponding to a first hidden layer of a DNN network at the account side;

specifically, 128 embedded vectors corresponding to account data of the target account are formed into a row matrix, which is assumed to be [ A ] ₀ 、A ₁ 、A ₂ 、A ₃ ……A ₁₂₇ ]；

The first weight matrix corresponding to the first hidden layer of the DNN network at the account side is a matrix of 128×256, and it is assumed that the first weight matrix corresponding to the first hidden layer is:

Then the row matrix composed of the embedded vectors corresponding to the target account is subjected to matrix multiplication operation with a first weight matrix corresponding to a first hidden layer of the DNN network at the account side to obtain [ B ] ₀ 、B ₁ 、B ₂ 、B ₃ ……B ₂₅₅ ]；

The first weight matrix corresponding to the second hidden layer of the DNN network at the account side is a matrix of 256×128, and it is assumed that the first weight matrix corresponding to the second hidden layer is:

processing the first hidden layer of the DNN network at the account side to obtain [ B ] ₀ 、B ₁ 、B ₂ 、B ₃ ……B ₂₅₅ ]Performing matrix multiplication operation on the first weight matrix corresponding to the second hidden layer of the DNN network at the account side to obtain [ C ] ₀ 、C ₁ 、C ₂ 、C ₃ ……C ₁₂₇ ]Account characteristic information as a target account.

DNN network for multimedia resource side:

inputting the attribute data of the candidate multimedia resources into an embedding layer of a DNN network at the side of the multimedia resources, and carrying out feature extraction and ebedding processing on the attribute data of the candidate multimedia resources by the embedding layer to obtain an embedding vector corresponding to the attribute data of the candidate multimedia resources;

for example, inputting data such as the type of the multimedia resource, the number of times the multimedia resource is played, the ID of the multimedia resource, etc. into an embedded layer of the DNN network at the side of the multimedia resource; the embedding layer performs feature extraction and embellishing processing on the data such as the type of the multimedia resource, the played times of the multimedia resource, the ID of the multimedia resource and the like to obtain an embellishing vector corresponding to each data.

Assuming 128 embedded vectors corresponding to the attribute data of the candidate multimedia resource are obtained, transforming the 128 embedded vectors through a second weight matrix corresponding to a first hidden layer of the DNN network at the multimedia resource side;

specifically, 128 embedded vectors corresponding to attribute data of candidate multimedia resources are formed into a row matrix, which is assumed to be [ D ] ₀ 、D ₁ 、D ₂ 、D ₃ ……D ₁₂₇ ]；

The second weight matrix corresponding to the first hidden layer of the DNN network at the multimedia resource side is a matrix of 128×256, and it is assumed that the second weight matrix corresponding to the first hidden layer is:

then the line matrix composed of the embedded vectors corresponding to the candidate multimedia resources is subjected to matrix multiplication operation with a second weight matrix corresponding to the first hidden layer of the DNN network at the side of the multimedia resources to obtain [ E ] ₀ 、E ₁ 、E ₂ 、E ₃ ……E ₂₅₅ ]；

The second weight matrix corresponding to the second hidden layer of the DNN network at the multimedia resource side is a matrix of 256×128, and it is assumed that the second weight matrix corresponding to the second hidden layer is:

e, processing the first hidden layer of DNN network of multimedia resource side ₀ 、E ₁ 、E ₂ 、E ₃ ……E ₂₅₅ ]Performing matrix multiplication operation on a second weight matrix corresponding to a second hidden layer of the DNN network at the multimedia resource side to obtain [ F ] ₀ 、F ₁ 、F ₂ 、F ₃ ……F ₁₂₇ ]As attribute characteristic information corresponding to the candidate multimedia resources.

The account characteristic information of the target account obtained by the DNN network of the account side and the attribute characteristic information of the candidate multimedia resource obtained by the DNN network of the multimedia resource side form a matrix with larger dimension; the matrix is used as an input matrix of the fully connected neural network.

For example, the account feature information of the target account is a row matrix [ C ] of 1×128 ₀ 、C ₁ 、C ₂ 、C ₃ ……C ₁₂₇ ]Line matrix [ F ] with attribute characteristics of 1 x 128 of candidate multimedia resources ₀ 、F ₁ 、F ₂ 、F ₃ ……F ₁₂₇ ]Composition [ C ₀ 、C ₁ 、C ₂ 、C ₃ ……C ₁₂₇ 、F ₀ 、F ₁ 、F ₂ 、F ₃ ……F ₁₂₇ ]Is a matrix of inputs to the computer.

For fully connected neural networks:

processing an input matrix of the full-connection neural network through a first hidden layer of the full-connection neural network;

the third weight matrix corresponding to the first hidden layer of the fully connected neural network is 256×512, and it is assumed that the third weight matrix corresponding to the first hidden layer of the fully connected neural network is:

then the input matrix of the fully connected neural network and the third weight matrix corresponding to the first hidden layer are subjected to matrix multiplication operation to obtain [ G ] ₀ 、G ₁ 、G ₂ 、G ₃ ……G ₂₅₅ ]；

The third weight matrix corresponding to the second hidden layer of the fully connected neural network is a matrix of 512×128, and it is assumed that the third weight matrix corresponding to the second hidden layer of the fully connected neural network is:

processing the first hidden layer of the fully connected neural network to obtain [ G ] ₀ 、G ₁ 、G ₂ 、G ₃ ……G ₂₅₅ ]Performing matrix multiplication on a third weight matrix corresponding to the second hidden layer of the fully-connected neural network to obtain [ H ] ₀ 、H ₁ 、H ₂ 、H ₃ ……H ₁₂₇ ]As the feature vector to be output.

The output layer of the fully-connected neural network carries out linear regression processing on the feature vector to be output; in the implementation, the fully-connected neural network can perform logistic regression processing on the feature vector to be output through a softmax function to obtain screening parameters of candidate multimedia resources;

the output layer of the fully-connected neural network performs transformation processing on the matrix of the feature vector to be output through a transformation matrix of 128 x 1;

assume that the transformation matrix of the output layer of the fully connected neural network is:

[W0、W1、W2、W3、……W127] ^T

in the implementation, matrix multiplication operation is carried out on a matrix formed by the output characteristic vectors and a transformation matrix of the output layer to obtain a numerical value 1*1; the softmax function also requires normalization of the values obtained by matrix multiplication to obtain a screening parameter in the range of 0-1.

The embodiment of the disclosure provides a method for training a first feature extraction network, a second feature extraction network and a fully-connected neural network; the training process of the first feature extraction network, the second feature extraction network and the fully-connected neural network is described in detail below.

1. Collecting a training sample set:

acquiring log data of the multimedia resource clicked by the global account history from a data system; and separating account data of the sample account from the acquired log data, and separating attribute data of the multimedia resource clicked by the sample account.

For example, account a history clicks on multimedia asset a, multimedia asset b, multimedia asset c; the account B clicks on the multimedia resource d in history; account C history clicks on multimedia asset e, multimedia asset f, etc.

When the training sample set is collected, account data of a sample account, attribute data of sample multimedia resources played by the sample account in a preset history time and training labels of the training samples form a training sample.

The training labels of the training samples are labeled in advance; the training sample set comprises a positive training sample and a negative training sample;

in the implementation, determining a training label of a training sample according to the time length of playing the multimedia resource by the sample account;

if the time length of the sample account for playing the sample multimedia resource in the preset history time length is not less than the preset threshold value, marking the training label of the training sample as a positive sample label;

If the time length of playing the sample multimedia resource in the preset historical time length of the sample account is smaller than the preset threshold value, marking the training label of the training sample as a negative sample label.

For example, assuming that the preset threshold is 3s, after the account A clicks on the multimedia resource a in a history manner, the duration of watching the multimedia resource a is longer than 3s, and marking the training label of the training sample formed by the account A and the multimedia resource a as a positive sample label;

for another example, assuming that the preset threshold is 3s, after the account B clicks on the multimedia resource a in a history manner, the duration of watching the multimedia resource a is less than 3s, and the training label of the training sample formed by the account B and the multimedia resource a is marked as a negative sample label.

An alternative embodiment is to label the positive sample label of the training sample with 1 and the negative sample label of the training sample with 0.

2. Training the first feature extraction network, the second feature extraction network and the fully connected neural network:

as shown in fig. 5, a flowchart of a method for training a first feature extraction network, a second feature extraction network, and a fully-connected neural network according to an embodiment of the disclosure includes the following steps:

in step S501, feature extraction is performed on account data of a sample account in a training sample based on an initial first feature extraction network, so as to obtain account feature information of the sample account; and

In step S502, feature extraction is performed on attribute data of the sample multimedia resources in the training sample based on the initial second feature extraction network, so as to obtain attribute feature information of the sample multimedia resources;

in step S503, the account feature information of the sample account and the attribute feature information of the sample multimedia resource are fused based on the initial fully connected neural network, so as to obtain screening parameters corresponding to the sample multimedia resource;

in step S504, the first feature extraction network, the second feature extraction network and the fully-connected neural network are adjusted according to the screening parameters corresponding to the sample multimedia resources and the training labels of the labeled training samples until the determined loss value is within the preset range, so as to obtain the trained first feature extraction network, second feature extraction network and fully-connected neural network.

In practice, embodiments of the present disclosure may adjust the first feature extraction network, the second feature extraction network, and the fully-connected neural network according to the following manner:

determining a loss value according to the screening parameters corresponding to the sample multimedia resources and the training labels of the marked training samples; and adjusting a first weight matrix of the first feature extraction network, a second weight matrix of the second feature extraction network and a third weight matrix of the fully-connected neural network according to the determined loss value.

After the screening parameters of each candidate multimedia resource in the candidate multimedia resource set are determined in the above manner, the embodiment of the disclosure screens out the target multimedia resource from the multimedia resource set according to the screening parameters corresponding to each candidate multimedia resource.

Embodiments of the present disclosure may screen a target multimedia asset from a set of multimedia assets according to the following manner:

mode 1, according to the screening parameters corresponding to the candidate multimedia resources, the candidate multimedia resources in the candidate multimedia resource set are ordered according to the order of the screening parameters from big to small, and the candidate multimedia resources with the top order are used as target multimedia resources.

Mode 2, predicting click rate, praise rate and attention rate of a target account on candidate multimedia resources according to each candidate multimedia resource in a candidate multimedia resource set; comprehensively determining the sorting parameters of the candidate multimedia resources according to the click rate, praise rate and attention rate of the target account on the candidate multimedia resources and the screening parameters corresponding to the candidate multimedia resources; sequencing the candidate multimedia resources in the candidate multimedia resource set according to the sequence of the sequencing parameters from big to small, and taking the candidate multimedia resources with the front sequencing as target multimedia resources;

In mode 2, the embodiment of the disclosure may perform weighted average operation on the click rate, the praise rate, the attention rate of the candidate multimedia resources and the screening parameters corresponding to the candidate multimedia resources by the target account, so as to determine the ranking parameters of the candidate multimedia resources.

According to the embodiment of the disclosure, when the target multimedia resource is determined, a display page containing the target multimedia resource is generated, and the display page is displayed on the client so as to recommend the target multimedia resource to the user.

As shown in fig. 6, a complete flowchart of a multimedia resource recommendation method according to an embodiment of the disclosure includes the following steps:

in step S601, account data of a target account triggering a request is acquired in response to a request for displaying a page;

in step S602, determining a candidate multimedia resource set corresponding to the target account;

in step S603, attribute data of candidate multimedia resources in the candidate multimedia resource set corresponding to the target account is obtained;

in step S604, based on the trained first feature extraction network, embedding account data of the target account to obtain an embedded vector containing semantic information corresponding to the target account, and transforming the embedded vector to obtain account feature information of the target account;

In step S605, based on the trained second feature extraction network, the attribute data of the candidate multimedia resources are subjected to embedding processing to obtain embedded vectors containing semantic information corresponding to the candidate multimedia resources, and the embedded vectors are subjected to transformation processing to obtain attribute feature information of the candidate multimedia resources;

in step S606, based on the trained fully connected neural network, the account feature information and the attribute feature information are fused to obtain a feature vector to be output, and the feature vector to be output is subjected to linear regression to obtain a screening parameter corresponding to the candidate multimedia resource;

wherein, the screening parameter is a probability value that the time length of playing the candidate multimedia resource for the target account is not less than a preset threshold value;

in step S607, a target multimedia resource for recommending to the target account is selected from the candidate multimedia resource set according to the screening parameter corresponding to the candidate multimedia resource;

in step S608, a presentation page containing the target multimedia resource is generated, and the generated presentation page is presented to the user.

Fig. 7 is a block diagram of a multimedia asset recommendation device, according to an exemplary embodiment. Referring to fig. 7, the apparatus includes a feature extraction unit 701, a fusion processing unit 702, and a screening unit 703;

A feature extraction unit 701, configured to perform feature extraction on account data of a target account of a multimedia resource to be recommended to obtain account feature information of the target account; extracting the characteristics of the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account to obtain the attribute characteristic information of the candidate multimedia resources;

the fusion processing unit 702 is configured to perform fusion processing on the account feature information and the attribute feature information to obtain screening parameters corresponding to the candidate multimedia resources; the screening parameter is a probability value that the duration of playing the candidate multimedia resources by the target account is not less than a preset threshold value;

and a screening unit 703 configured to perform screening, according to the screening parameters corresponding to the candidate multimedia resources, the target multimedia resources for recommendation to the target account from the candidate multimedia resource set.

In a possible implementation manner, the feature extraction unit 701 is specifically configured to perform:

In one possible implementation, the trained first feature extraction network includes an embedded layer and at least one hidden layer; the feature extraction unit 701 is specifically configured to perform:

In one possible implementation, the trained second feature extraction network includes an embedded layer and at least one hidden layer; the feature extraction unit 701 is specifically configured to perform:

In one possible implementation, the fusion processing unit 702 is specifically configured to perform:

In one possible implementation, the trained fully-connected neural network includes at least one hidden layer and an output layer; the fusion processing unit 702 is specifically configured to perform:

In one possible implementation, fig. 8 is a block diagram of a multimedia asset recommendation device according to an exemplary embodiment, the device further comprising a training unit 704;

the training unit 704 is configured to perform training of the first feature extraction network, the second feature extraction network, the fully connected neural network according to the following manner:

In a possible implementation manner, the training unit 704 is specifically configured to perform training of the first feature extraction network, the second feature extraction network, and the fully-connected neural network according to the following manner:

The training unit 704 is specifically configured to perform labeling of training labels of training samples according to the following manner:

The specific manner in which the respective units execute the requests in the apparatus of the above embodiment has been described in detail in the embodiment concerning the method, and will not be described in detail here.

Fig. 9 is a block diagram of an electronic device 900, shown in accordance with an exemplary embodiment, comprising:

a processor 910;

a memory 920 for storing instructions executable by the processor 910;

wherein the processor 910 is configured to execute the instructions to implement the multimedia asset recommendation method in the embodiments of the present disclosure.

In an exemplary embodiment, a non-volatile storage medium is also provided, such as a memory 920, including instructions executable by the processor 910 of the electronic device 900 to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

The disclosed embodiments also provide a computer program product which, when run on an electronic device, causes the electronic device to perform a method that implements any of the information recommendation methods or any of the information recommendation methods described above in the disclosed embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for recommending multimedia resources, comprising:

2. The method of claim 1, wherein the feature extracting the account data of the target account to obtain the account feature information of the target account comprises:

3. The method of claim 2, wherein the trained first feature extraction network comprises an embedded layer and at least one hidden layer;

the embedding processing of the account data of the target account to obtain an embedding vector containing semantic information corresponding to the target account comprises the following steps:

based on the embedded layer of the trained first feature extraction network, carrying out embedding processing on account data of the target account to obtain an embedded vector containing semantic information corresponding to the target account;

the transforming the embedded vector to obtain account feature information of the target account includes:

4. The method of claim 2, wherein the feature extracting the attribute data of the candidate multimedia resources in the candidate multimedia resource set corresponding to the target account to obtain attribute feature information of the candidate multimedia resources includes:

5. The method of claim 4, wherein the trained second feature extraction network comprises an embedded layer and at least one hidden layer;

6. The method of claim 4, wherein the fusing the account feature information and the attribute feature information to obtain the filtering parameters corresponding to the candidate multimedia resources comprises:

7. The method of claim 6, wherein the trained fully connected neural network comprises at least one hidden layer and an output layer;

8. The method of claim 6, wherein the first feature extraction network, the second feature extraction network, the fully connected neural network are trained according to the following:

9. The method of claim 8, wherein the first feature extraction network, the second feature extraction network, and the fully-connected neural network are adjusted according to the following:

10. The method of claim 8, wherein the sample multimedia assets in the training sample are multimedia assets played by the sample account for a preset historical duration;

labeling training labels of training samples according to the following manner:

11. A recommendation device for multimedia resources, comprising:

12. The apparatus of claim 11, wherein the feature extraction unit is specifically configured to perform:

13. The apparatus of claim 12, wherein the trained first feature extraction network comprises an embedded layer and at least one hidden layer;

the feature extraction unit is specifically configured to perform:

14. The apparatus of claim 12, wherein the feature extraction unit is specifically configured to perform:

15. The apparatus of claim 14, wherein the trained second feature extraction network comprises an embedded layer and at least one hidden layer;

The feature extraction unit is specifically configured to perform:

16. The apparatus of claim 14, wherein the fusion processing unit is specifically configured to perform:

17. The apparatus of claim 16, wherein the trained fully connected neural network comprises at least one hidden layer and an output layer;

the fusion processing unit is specifically configured to perform:

18. The apparatus of claim 16, further comprising a training unit;

19. The apparatus of claim 18, wherein the training unit is specifically configured to perform training of the first feature extraction network, the second feature extraction network, the fully connected neural network according to:

20. The apparatus of claim 18, wherein the sample multimedia asset in the training sample is a multimedia asset played by the sample account for a preset historical duration;

21. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia asset recommendation method according to any one of claims 1 to 10.

22. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the multimedia asset recommendation method according to any one of claims 1 to 10.