CN113032589B

CN113032589B - Multimedia file recommendation method and device, electronic equipment and readable storage medium

Info

Publication number: CN113032589B
Application number: CN202110336117.7A
Authority: CN
Inventors: 查强
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2024-01-23
Anticipated expiration: 2041-03-29
Also published as: CN113032589A

Abstract

The embodiment of the invention provides a multimedia file recommending method, a device, electronic equipment and a readable storage medium, relates to the technical field of computers, and can recommend multimedia files of interest to a user to the user. The embodiment of the invention comprises the following steps: and acquiring a historical access record of the user to be recommended, wherein the historical access record is a multimedia file accessed by the user to be recommended in a specified time period. And then generating a historical access tag set of the user to be recommended based on the multimedia files included in the historical access record. And simultaneously acquiring a tag set of each multimedia file to be recommended. And determining the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended based on a preset depth semantic matching model. And selecting the file to be recommended as the file to be recommended which is recommended by the user to be recommended according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended.

Description

Multimedia file recommendation method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for recommending multimedia files, an electronic device, and a readable storage medium.

Background

With the development of network technology, users access various websites more and more frequently to browse multimedia files of interest to themselves through the websites. For example, users often view their favorite videos by accessing a video website.

However, the content recommended by the web site to each user is the same. While the multimedia files of interest to each user are different for different users. In this way, the content recommended by the website cannot meet the personalized requirements of the user.

Disclosure of Invention

The embodiment of the invention aims to provide a multimedia file recommending method, a device, electronic equipment and a readable storage medium, so as to recommend multimedia files interested by a user to the user, thereby realizing personalized recommendation. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for recommending multimedia files, which may include:

acquiring a history access record of a user to be recommended, wherein the history access record is a multimedia file accessed by the user to be recommended in a specified time period;

Generating a history access tag set of the user to be recommended based on the multimedia file included in the history access record;

acquiring a label set of each multimedia file to be recommended;

based on a preset depth semantic matching model, determining the similarity between semantic vectors corresponding to the historical access tag sets and semantic vectors corresponding to tag sets of each multimedia file to be recommended; the preset depth semantic matching model is a model obtained by training an initial depth semantic matching model based on a sample training set, wherein the sample training set comprises a sample history access tag set, a positive sample tag set and a negative sample tag set of each sample user; for each sample user, the sample history access tag set for that sample user includes: labels corresponding to the multimedia files accessed by the sample user in the historical time period; the positive sample tab set of the sample user includes: the label corresponding to the multimedia file accessed by the sample user last time; the negative sample tag set of the sample user includes: labels corresponding to the multimedia files accessed by other sample users in the history time period in a preset number;

And selecting the file to be recommended which is recommended by the user to be recommended according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended.

In one possible implementation manner, the preset depth semantic matching model is obtained through training by the following steps:

constructing the sample training set;

inputting a sample history access tag set, a positive sample tag set and a negative sample tag set of a sample user, which are included in the sample training set, into the initial depth semantic matching model, and obtaining a first similarity and a second similarity which are output by the initial depth semantic model, wherein the first similarity is a semantic similarity between the sample history access tag set and the positive sample tag set of the sample user, and the second similarity is a semantic similarity between the sample history access tag set and the negative sample tag of the sample user;

calculating a loss function value based on the first similarity and the second similarity, and judging whether the initial depth semantic matching model converges or not according to the loss function value;

if the initial depth semantic matching model is not converged, adjusting network parameters of the initial depth semantic matching model according to the loss function value, and returning to the step of inputting a sample history access tag set, a positive sample tag set and a negative sample tag set of one sample user included in the sample training set into the initial depth semantic matching model;

And if the initial depth semantic matching model converges, taking the current initial depth semantic matching model as the preset depth semantic matching model.

In one possible implementation, the constructing the sample training set includes:

for each sample user, acquiring the multimedia files accessed by the sample user in the historical time period;

the acquired attribute information of the multimedia file is segmented to obtain a sample history access tag set of the sample user;

the attribute information of the multimedia file accessed by the sample user at the last time is segmented to obtain a positive sample label set of the sample user;

carrying out random negative sampling on other sample users except the sample user in all sample users to obtain sample history access tag sets of other sample users with preset numbers, wherein the sample history access tag sets are used as negative sample tag sets of the sample users;

and constructing a sample history access tag set, a positive sample tag set and a negative sample tag set of a plurality of sample users as the sample training set.

In a possible implementation manner, the determining, based on a preset depth semantic matching model, a similarity between a semantic vector corresponding to the historical access tag set and a semantic vector corresponding to a tag set of each multimedia file to be recommended includes:

And inputting a tag set of the multimedia file to be recommended and a history access tag set of the user to be recommended into the preset depth semantic matching model for each multimedia file to be recommended, and obtaining the similarity between semantic vectors corresponding to the tag set of the multimedia file to be recommended and the semantic vectors corresponding to the history access tag set, which are output by the preset depth semantic matching model.

inputting a tag set of the multimedia file and a history access tag set of the user to be recommended into the preset depth semantic model for each multimedia file to be recommended, and acquiring a first semantic vector corresponding to the tag set of the multimedia file to be recommended and a second semantic vector corresponding to the history access tag set, which are output by the preset depth semantic matching model; and calculating the similarity between the first semantic vector and the second semantic vector based on a preset similarity algorithm.

for each multimedia file to be recommended, acquiring word vectors of each tag in a tag set of the multimedia file from a cache file, and calculating semantic vectors corresponding to the multimedia file according to the acquired word vectors of each tag; the word vectors of a plurality of labels are cached in the cache file, and the word vectors of the plurality of labels are as follows: after the label set of each multimedia file is input into the preset depth semantic matching model, the word vector corresponding to each label output by the word vector expression layer of the preset depth semantic model;

acquiring word vectors of each tag included in the history access tag set from the cache file, and calculating semantic vectors corresponding to the history access tag set according to the word vectors of each tag included in the history access tag set;

and calculating the similarity between the semantic vector corresponding to the multimedia file and the semantic vector corresponding to the historical access tag set based on a preset similarity algorithm.

In a second aspect, an embodiment of the present invention provides a multimedia file recommendation apparatus, including:

the acquisition module is used for acquiring a history access record of a user to be recommended, wherein the history access record is a multimedia file accessed by the user to be recommended in a specified time period;

the generation module is used for generating a history access tag set of the user to be recommended based on the multimedia file included in the history access record acquired by the acquisition module;

the acquisition module is further used for acquiring a tag set of each multimedia file to be recommended;

the determining module is used for determining the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended based on a preset depth semantic matching model; the preset depth semantic matching model is a model obtained by training an initial depth semantic matching model based on a sample training set, wherein the sample training set comprises a sample history access tag set, a positive sample tag set and a negative sample tag set of each sample user; for each sample user, the sample history access tag set for that sample user includes: labels corresponding to the multimedia files accessed by the sample user in the historical time period; the positive sample tab set of the sample user includes: the label corresponding to the multimedia file accessed by the sample user last time; the negative sample tag set of the sample user includes: labels corresponding to the multimedia files accessed by other sample users in the history time period in a preset number;

And the selection module is used for selecting the file to be recommended, which is recommended by the user to be recommended, according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended, which are determined by the determination module.

In one possible implementation manner, the apparatus further includes a training module, where the training module is configured to:

constructing the sample training set;

In one possible implementation manner, the training module is specifically configured to:

In one possible implementation manner, the determining module is specifically configured to:

In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any of the first aspects when executing a program stored on a memory.

In a fourth aspect, embodiments of the present invention also provide a readable storage medium having stored thereon a computer program which, when executed by a processor of an electronic device, implements the method steps of any of the first aspects.

In a fifth aspect, embodiments of the present invention also provide a computer program product which, when run on an electronic device, causes a processor of the electronic device to carry out the method steps of any of the first aspects.

According to the multimedia file recommending method, the device, the electronic equipment and the readable storage medium, the history access tag set can be generated according to the multimedia files accessed by the user in the history, and then the files recommended by the user are selected according to the similarity between the semantic vectors corresponding to the history access tag set and the semantic vectors corresponding to the tag set of each multimedia file to be recommended. That is, the embodiment of the invention recommends the file similar to the multimedia file accessed by the user in history for the user based on the similarity between the multimedia file accessed by the user in history and the multimedia file to be recommended, and the user has higher possibility of interested in the file similar to the multimedia file accessed by the user in history because the user is interested in the file accessed by the user in history, so the embodiment of the invention can recommend the interested multimedia file to the user and realize personalized recommendation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a multimedia file recommendation method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training method for a preset depth semantic matching model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for constructing a training sample set according to an embodiment of the present invention;

FIG. 4 is an exemplary schematic diagram of a training process of a preset depth semantic matching model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a multimedia file recommendation device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

In the related art, the content recommended by the web site to each user is the same. While the multimedia files of interest to each user are different for different users. This makes the content recommended by the website unable to meet the personalized needs of the user.

In order to solve the technical problems, the embodiment of the invention provides a multimedia file recommending method, a device, electronic equipment and a readable storage medium.

The method for recommending the multimedia file provided by the embodiment of the invention is explained first.

It can be understood that the multimedia file recommendation method provided by the embodiment of the invention is applied to electronic equipment. Including but not limited to desktop computers, notebooks, cell phones, and servers. Fig. 1 is a flowchart of a multimedia file recommendation method according to an embodiment of the present invention. Referring to fig. 1, the method may include the steps of:

s101, acquiring a history access record of a user to be recommended. The historical access record is a multimedia file which is to be recommended and is accessed by the user in a specified time period.

And S102, generating a historical access tag set of the user to be recommended based on the multimedia files included in the historical access record.

S103, acquiring a label set of each multimedia file to be recommended.

And S104, determining the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended based on a preset depth semantic matching model.

The preset depth semantic matching model is a model obtained by training an initial depth semantic matching model based on a sample training set, wherein the sample training set comprises a sample history access tag set, a positive sample tag set and a negative sample tag set of each sample user; for each sample user, the sample history access tag set for that sample user includes: labels corresponding to the multimedia files accessed by the sample user in the historical time period; the positive sample tab set of the sample user includes: the label corresponding to the multimedia file accessed by the sample user last time; the negative sample tag set of the sample user includes: the labels corresponding to the multimedia files accessed by other sample users in the historical time period are preset in quantity.

S105, selecting the file to be recommended as the user to be recommended according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended.

According to the multimedia file recommending method provided by the embodiment of the invention, the history access tag set can be generated according to the multimedia files accessed by the user in the history, and then the file recommended by the user is selected according to the similarity between the semantic vector corresponding to the history access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended. That is, the embodiment of the invention recommends the file similar to the multimedia file accessed by the user in history for the user based on the similarity between the multimedia file accessed by the user in history and the multimedia file to be recommended, and the user has higher possibility of interested in the file similar to the multimedia file accessed by the user in history because the user is interested in the file accessed by the user in history, so the embodiment of the invention can recommend the interested multimedia file to the user and realize personalized recommendation.

The method for recommending the multimedia file according to the embodiment of the invention is described below with reference to a specific example.

For S101 described above, a history access record of the user to be recommended is obtained. The historical access record is a multimedia file which is to be recommended and is accessed by the user in a specified time period.

It is understood that multimedia files include, but are not limited to, video, audio, and pictures. For example, the multimedia files that the user to be recommended has historically accessed are: a drama video 1, a drama video 2, a movie video 3, an advertisement video 4, and an advertisement video 5.

The specified time period may be the last week, or the last month, or from the start of the user to be recommended accessing the first multimedia file to the current moment.

Optionally, the multimedia file accessed by the user to be recommended in the specified time period meets the preset interested condition. The interested condition is set, so that the multimedia files which are accessed by the user in the history can be screened out more accurately, and the influence of the multimedia files which are accessed by the user in the history and are not interested on the determination of the recommended multimedia files is reduced.

For example, conditions of interest include, but are not limited to: any one or more of the time length of the access of the user to be recommended to the multimedia file exceeds the preset time length, the time number of the access of the user to be recommended to the multimedia file exceeds the preset time number, and the interest marks are set on the multimedia file by the user to be recommended.

The preset duration may be set to 10 seconds, and the preset number of times may be set to 2 times. It will be appreciated that the user's access to the multimedia file is longer than 10 seconds, and the user's access to the multimedia file may be considered to be an effective viewing activity, and the user is interested in the viewed multimedia file.

For the above S102, a set of history access tags of the user to be recommended is generated based on the multimedia file included in the history access record.

In one embodiment, for each multimedia file included in the history access record, attribute information of the multimedia file may be segmented based on a dictionary segmentation algorithm. Thereby obtaining a historical access tag set of the user to be recommended.

The dictionary word segmentation algorithm can segment the attribute information of the multimedia file according to a preset word segmentation strategy, and a plurality of Chinese character strings are obtained after word segmentation. And matching the Chinese character strings with entries in a preset dictionary. If the entry identical to the character string is found in the dictionary, the matching is successful. And constructing a history access tag set by taking each successfully matched character string as a tag.

When matching the Chinese character string with the vocabulary entry in the preset dictionary, the forward matching algorithm, the reverse matching algorithm, the maximum matching algorithm or the minimum matching algorithm can be utilized. Of course, the method of word segmentation of the attribute information of the multimedia file is not limited thereto.

Attribute information of multimedia files includes, but is not limited to: any one or more of title information, profile information, publisher information, format information, and duration information.

Taking a multimedia file as an example of video, attribute information of the multimedia file may include: any one or more of title information, profile information, publisher information, format information, and duration information.

For example, when the user to be recommended accesses the video a in a specified time period and the attribute information of the video a includes the title information, the title information may be segmented based on a dictionary segmentation algorithm to obtain a segmentation result. Thereby obtaining a set of historical access tags for the user to be recommended.

For the above S103, a tag set of each multimedia file to be recommended is acquired.

In one embodiment, for each multimedia file to be recommended, attribute information of the multimedia file to be recommended may be segmented based on a dictionary segmentation algorithm. Thereby obtaining a tag set of each multimedia file to be recommended.

The word segmentation method for the multimedia file to be recommended may refer to the word segmentation method in S102, which is not described herein.

For S104, a similarity between the semantic vector corresponding to the history access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended is determined based on the preset deep semantic matching model (Deep Structured Semantic Models, DSSM).

In the embodiment of the invention, the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended can be calculated based on the DSSM. The semantic vector corresponding to the history access tag set may be denoted as an embedding (embedding) vector of the user to be recommended, and the semantic vector corresponding to the tag set of each multimedia file to be recommended may be denoted as an embedding vector of each multimedia file to be recommended. And then, calculating the similarity between the ebedding vector of the user to be recommended and the ebedding vector of each multimedia file to be recommended.

For S105, according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended, selecting the file to be recommended as the file to be recommended by the user to be recommended.

In one embodiment, an enabling method can determine an enabling vector of an enabling multimedia file to be recommended, the enabling vector of the enabling vector being similar to the enabling vector of the enabling user to be recommended is greater than a preset similarity, and the determined multimedia file corresponding to the enabling vector of the enabling multimedia file to be recommended is used as the enabling file to be recommended for the user to be recommended.

According to the multimedia file recommending method provided by the embodiment of the invention, based on the multimedia files accessed by the user in history, the multimedia files which are similar to the multimedia files accessed by the user in history are recommended to the user, and the multimedia files interested by the user can be recommended to the user, so that personalized recommendation is realized. Compared with the mode of recommending the same multimedia file to each user, the embodiment of the invention can realize more accurate recommendation and improve the success rate of recommendation.

Referring to fig. 2, the deep semantic matching model in S104 may be obtained through training:

s201, constructing a sample training set.

S202, inputting a sample history access tag set, a positive sample tag set and a negative sample tag set of a sample user included in a sample training set into an initial depth semantic matching model, and obtaining a first similarity and a second similarity output by the initial depth semantic model.

The first similarity is the semantic similarity between the sample history access tag set and the positive sample tag set of the sample user, and the second similarity is the semantic similarity between the sample history access tag set and the negative sample tag of the sample user.

In an embodiment of the present invention, the initial depth semantic matching model may be a DSSM model.

And S203, calculating a loss function value based on the first similarity and the second similarity, and judging whether the initial depth semantic matching model converges or not according to the loss function value. If the initial depth semantic matching model does not converge, then S204 is performed; if the initial depth semantic matching model converges, S205 is performed.

In an embodiment of the present invention, the first similarity is inversely related to the loss function value and the second similarity is positively related to the loss function value. The smaller the loss function value, the more accurate the recognition result of the initial depth semantic matching model. Thus, when the loss function value is minimum, the initial depth semantic matching model is determined to converge.

In one embodiment, a difference between the current calculated loss function value and the last calculated loss function value in the iterative process may be calculated, and it may be determined whether the difference is less than a preset difference. If yes, determining that the initial depth semantic matching model converges; if not, determining that the initial depth semantic matching model is not converged.

In another embodiment, it may be determined whether the loss function value calculated this time is smaller than a preset threshold. If yes, determining that the initial depth semantic matching model converges; if not, determining that the initial depth semantic matching model is not converged.

S204, adjusting network parameters of the initial depth semantic matching model according to the loss function value, and returning to S202.

In one embodiment, the network parameters of the initial depth semantic matching model may be adjusted in a gradient descent manner according to the loss function value, and S202 is performed based on the adjusted initial depth semantic matching model.

S205, taking the current initial depth semantic matching model as a preset depth semantic matching model.

In the embodiment of the invention, as the first similarity is inversely related to the loss function value and the second similarity is positively related to the loss function value, minimizing the loss function can lead the first loss function to be larger and the second loss function to be smaller, namely leading the obtained sample history access tag set of the sample user to have higher semantic similarity with the positive sample tag set by using the initial depth semantic matching model, and leading the sample history access tag set of the sample user to have lower semantic similarity with the negative sample tag set. Thereby improving the recognition accuracy of the preset depth semantic matching model.

For S201 described above, referring to fig. 3, the manner of constructing the sample training set includes the following steps:

And S2011, acquiring the multimedia files accessed by each sample user in the historical time period for each sample user.

The manner of acquiring the multimedia file in S2011 is the same as that of acquiring the multimedia file in S101, and reference may be made to the description in S101, which is not repeated here.

S2012, the acquired attribute information of the multimedia file is segmented to obtain a sample history access tag set of the sample user.

For S2011 and S2012, the sample users include sample user 1, sample users 2, …, sample user 10000, for example. Wherein sample user 1 has accessed video 1, video 2, and video 3 during a historical period of time. The title of the video 1 is segmented, the title of the video 2 is segmented, and the title of the video 3 is segmented. Thus, a sample history access tag set of the sample user 1 can be obtained.

The method of segmenting the attribute information of the multimedia file in S2012 is the same as the method of segmenting in S102, and reference is made to the description in S102, which is not repeated here.

S2013, the attribute information of the multimedia file accessed by the sample user last time is segmented to obtain a positive sample label set of the sample user.

It will be appreciated that the multimedia file that the user has last accessed is typically the multimedia file of most interest to the user at the present time. And therefore, the word segmentation result of the attribute information of the multimedia file accessed by the sample user last time is used as a positive sample label set of the sample user.

Optionally, the multimedia file that the sample user accessed last time satisfies the preset condition of interest. For example, conditions of interest include, but are not limited to: any one or more of the time length of the access of the user to be recommended to the multimedia file exceeds the preset time length, the time number of the access of the user to be recommended to the multimedia file exceeds the preset time number, and the interest marks are set on the multimedia file by the user to be recommended.

The manner of the S2013 word-splitting the attribute information of the multimedia file is the same as the manner of the word-splitting of S102, and reference may be made to the description in S102, which is not repeated here.

S2014, carrying out random negative sampling on other sample users except the sample user in all sample users, and obtaining sample history access tag sets of other sample users with preset numbers to serve as negative sample tag sets of the sample users.

It will be appreciated that the interests of each user are mostly focused on a certain class of multimedia files, such as game-like videos. Thus for a sample user, the multimedia file of interest to other sample users, a high probability is that the sample user is not of interest. Thus, the sample history access tag set of the other sample users with preset numbers is taken as the negative sample tag set of the sample users.

For example, the preset number is 4.

S2015, a sample history access tag set, a positive sample tag set and a negative sample tag set of a plurality of sample users are constructed into a sample training set.

Because the interests of each user are mostly concentrated in one type of multimedia file, and the other types of multimedia files are not interested by the user generally, the embodiment of the invention can set larger preset quantity to obtain a negative sample label set with larger data volume, thereby improving the accuracy of training the initial deep semantic matching model.

The following describes the overall flow of training a preset depth semantic matching model according to the embodiment of the present invention with reference to fig. 4:

referring to FIG. 4, in one embodiment, the overall structure of the three dashed boxes in FIG. 4 represents an initial depth semantic matching model.

For a sample user, the left dashed box in FIG. 4 represents the processing of the sample user's positive sample tag set, which may be referred to as a forward project (item) tower. T1 to tn below the forward item tower represent the tag set entered into the forward item tower. The middle dashed box represents the processing of the sample history access tag set for that sample user, which may be referred to as a user (user) tower. T1 to tm below the user tower represent the tag set entered into the user tower. The dashed box on the right represents the processing of the negative sample label set for that sample user and may be referred to as a negative item tower. T1 to tl below the negative item tower represent the set of tags entered into the negative item tower. The processing procedure of the initial depth semantic matching model is the same for each tag set, and the input tag set sequentially passes through an embedding layer (embedding layer), a stacking layer (stacking layer), a hiding layer (hidden layer) and an expression layer (representation layer) from bottom to top.

Among them, the process of the embedding layer includes word embedding (word embedding). The processing of the text layer includes a bag of words model (BOW) and an inverse text frequency index (Inverse Document Frequency, IDF). The processing of the hidden layer includes a linear rectification function (Rectified Linear Unit, reLU). The processing of representation layer includes ReLU.

For one sample user, processing a positive sample label set of the sample user through a left Bian Xuxian box to obtain a positive sample semantic vector; processing the sample history access label set of the sample user through a middle dotted line box to obtain a history access semantic vector; and processing the negative sample label set of the sample user through a right dotted line box to obtain a negative sample semantic vector. Then through a similarity calculation layer, calculating cosine similarity (cosine similarity) between the positive sample semantic vector and the historical access semantic vector to obtain R ⁺ And calculating calendarCosine similarity between the history access semantic vector and the negative sample semantic vector yields R ^- . Then based on R at hinge loss layer ⁺ And R is ^- The loss function value is calculated by using a sorting loss (sorting loss) method.

It is determined whether the initial depth semantic matching model converges based on the loss function value. If the initial depth semantic matching model is not converged, network parameters of the initial depth semantic matching model are adjusted according to the loss function value, and a sample history access tag set, a positive sample tag set and a negative sample tag set of another sample user are input into the initial depth semantic matching model to continue training. If the initial depth semantic matching model converges, a user tower and a positive item tower of the current initial depth semantic matching model are used as preset depth semantic matching models, or a user tower and a negative item tower of the current initial depth semantic matching model are used as preset depth semantic matching models.

In this case, S104 described above may be implemented as: the electronic equipment inputs a label set of the multimedia file and a history access label set of a user to be recommended into a preset depth semantic model aiming at each multimedia file to be recommended, and obtains a first semantic vector corresponding to the label set of the multimedia file to be recommended and a second semantic vector corresponding to the history access label set, which are output by the preset depth semantic matching model. And then the electronic equipment calculates the similarity between the first semantic vector and the second semantic vector based on a preset similarity algorithm.

For example, the preset similarity algorithm may be a cosine similarity algorithm, a jaccard similarity coefficient algorithm, a pearson correlation coefficient algorithm, or the like, which is not limited in detail in the embodiment of the present invention.

In the embodiment of the invention, the tag set of the multimedia file to be recommended is input into the item tower of the preset depth semantic matching model, and the history access tag set of the user to be recommended is input into the user tower of the preset depth semantic matching model.

Referring to fig. 4, in another implementation of the embodiment of the present invention, the overall structure composed of the three dotted boxes and the similarity calculation layer in fig. 4 represents an initial depth semantic matching model. After the initial depth semantic matching model is trained, a user tower, a positive item tower and a similarity calculation layer in the current initial depth semantic matching model are used as preset depth semantic matching models, or a user tower, a negative item tower and a similarity calculation layer are used as preset depth semantic matching models.

In this case, S104 described above may be implemented as: the electronic equipment inputs a label set of the multimedia file to be recommended and a history access label set of a user to be recommended into a preset depth semantic matching model aiming at each multimedia file to be recommended, and obtains the similarity between semantic vectors corresponding to the label set of the multimedia file to be recommended and semantic vectors corresponding to the history access label set, which are output by the preset depth semantic matching model.

Referring to fig. 4, in another implementation of the embodiment of the present invention, the overall structure of the three embedded layers of fig. 4 represents an initial depth semantic matching model. At this point, the positive item tower includes the left embedded layer, the user tower includes the middle embedded layer, and the negative item tower includes the right embedded layer. After the initial depth semantic matching model is trained, a user tower and a positive item tower in the current initial depth semantic matching model are used as preset depth semantic matching models, or a user tower and a negative item tower are used as preset depth semantic matching models.

In this case, the above S104 may be implemented as the following three steps:

step one, for each multimedia file to be recommended, acquiring word vectors of each tag in a tag set of the multimedia file from a cache file, and calculating semantic vectors corresponding to the multimedia file according to the acquired word vectors of each tag.

The word vectors of a plurality of labels are cached in the cache file, and the word vectors of the plurality of labels are as follows: after the label set of each multimedia file is input into a preset depth semantic matching model, the word vector corresponding to each label output by a word vector expression layer of the preset depth semantic model. Referring to fig. 4, a word vector expression layer of the preset depth semantic model is an embedded layer.

It can be understood that, since the attribute information of the multimedia file is generally unchanged, the tag set of the multimedia file obtained by word segmentation of the attribute information of the multimedia file and the word vector corresponding to each tag are unchanged. In order to reduce the calculation amount of determining the recommended multimedia files and improve the recommendation speed of the multimedia files, a label set of each multimedia file can be input into a preset depth semantic matching model in advance to obtain word vectors corresponding to each label output by a word vector expression layer of the preset depth semantic model, and the obtained word vectors are cached in a cache file.

And then the electronic equipment calculates the semantic vector corresponding to the multimedia file through a stacking layer, a hiding layer and an expression layer shown in fig. 4 according to the obtained word vector of each label.

Step two, word vectors of all the tags included in the history access tag set are obtained from the cache file, and semantic vectors corresponding to the history access tag set are calculated according to the word vectors of all the tags included in the history access tag set.

The method for calculating the semantic vector corresponding to the history access tag set in the second step is the same as the method for calculating the semantic vector corresponding to the multimedia file to be recommended in the first step, and the description in the first step may be referred to, and will not be repeated here.

And thirdly, calculating the similarity between the semantic vector corresponding to the multimedia file and the semantic vector corresponding to the historical access tag set based on a preset similarity algorithm.

The electronic device calculates the similarity between the semantic vector corresponding to the multimedia file and the semantic vector corresponding to the historical access tag set based on a preset similarity algorithm through a similarity calculation layer shown in fig. 4.

When the DSSM model is applied to the field of natural language processing, the DSSM model is used for calculating the similarity between the multimedia files accessed by the user history and the multimedia files to be recommended in a query and click (doc) correlation calculation mode, so that the multimedia files to be recommended with higher similarity with the multimedia files accessed by the user history are recommended to the user.

Based on the same inventive concept, corresponding to the method embodiment, the embodiment of the present invention further provides a multimedia file recommendation device, referring to fig. 5, where the device includes: an acquisition module 501, a generation module 502, a determination module 503, and a selection module 504;

the obtaining module 501 is configured to obtain a history access record of a user to be recommended, where the history access record is a multimedia file that the user to be recommended has accessed in a specified time period;

a generating module 502, configured to generate a history access tag set of a user to be recommended based on the multimedia file included in the history access record acquired by the acquiring module 501;

the obtaining module 501 is further configured to obtain a tag set of each multimedia file to be recommended;

a determining module 503, configured to determine, based on a preset deep semantic matching model, a similarity between a semantic vector corresponding to the historical access tag set and a semantic vector corresponding to the tag set of each multimedia file to be recommended; the preset depth semantic matching model is a model obtained by training an initial depth semantic matching model based on a sample training set, wherein the sample training set comprises a sample history access tag set, a positive sample tag set and a negative sample tag set of each sample user; for each sample user, the sample history access tag set for that sample user includes: labels corresponding to the multimedia files accessed by the sample user in the historical time period; the positive sample tab set of the sample user includes: the label corresponding to the multimedia file accessed by the sample user last time; the negative sample tag set of the sample user includes: labels corresponding to multimedia files accessed by other sample users in a history time period in a preset number;

The selecting module 504 is configured to select a file to be recommended, which is recommended by the user to be recommended, according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended, which are determined by the determining module 503.

Optionally, the device further includes a training module, and the training module is used for:

constructing a sample training set;

inputting a sample history access tag set, a positive sample tag set and a negative sample tag set of a sample user, which are included in a sample training set, into an initial depth semantic matching model, and obtaining a first similarity and a second similarity output by the initial depth semantic model, wherein the first similarity is the semantic similarity between the sample history access tag set and the positive sample tag set of the sample user, and the second similarity is the semantic similarity between the sample history access tag set and the negative sample tag of the sample user;

if the initial depth semantic matching model is not converged, network parameters of the initial depth semantic matching model are adjusted according to the loss function value, and a step of inputting a sample history access tag set, a positive sample tag set and a negative sample tag set of a sample user included in the sample training set into the initial depth semantic matching model is returned;

And if the initial depth semantic matching model converges, taking the current initial depth semantic matching model as a preset depth semantic matching model.

Optionally, the training module is specifically configured to:

for each sample user, acquiring a multimedia file accessed by the sample user in a historical time period;

and constructing a sample history access tag set, a positive sample tag set and a negative sample tag set of a plurality of sample users as a sample training set.

Optionally, the determining module 503 is specifically configured to:

inputting a label set of the multimedia file to be recommended and a history access label set of a user to be recommended into a preset depth semantic matching model aiming at each multimedia file to be recommended, and obtaining the similarity between semantic vectors corresponding to the label set of the multimedia file to be recommended and semantic vectors corresponding to the history access label set, which are output by the preset depth semantic matching model.

Optionally, the determining module 503 is specifically configured to:

inputting a tag set of the multimedia file and a history access tag set of a user to be recommended into a preset depth semantic model aiming at each multimedia file to be recommended, and acquiring a first semantic vector corresponding to the tag set of the multimedia file to be recommended and a second semantic vector corresponding to the history access tag set, which are output by the preset depth semantic matching model; and calculating the similarity between the first semantic vector and the second semantic vector based on a preset similarity algorithm.

Optionally, the determining module 503 is specifically configured to:

for each multimedia file to be recommended, acquiring word vectors of each tag in a tag set of the multimedia file from a cache file, and calculating semantic vectors corresponding to the multimedia file according to the acquired word vectors of each tag; the word vectors of a plurality of labels are cached in the cache file, and the word vectors of the plurality of labels are as follows: after the label set of each multimedia file is input into a preset depth semantic matching model, a word vector corresponding to each label output by a word vector expression layer of the preset depth semantic model;

Corresponding to the above-mentioned method embodiment, the embodiment of the present invention further provides an electronic device, referring to fig. 6, may include a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604;

a memory 603 for storing a computer program;

the processor 601 is configured to implement the method steps in the above-described method embodiment when executing the program stored in the memory 603.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a readable storage medium, where the readable storage medium is a computer readable storage medium, and a computer program is stored in the readable storage medium, and when the computer program is executed by a processor of an electronic device, the method steps of any one of the above multimedia file recommendation methods are implemented.

Corresponding to the above method embodiments, the present invention further provides a computer program product, which when run on an electronic device, causes a processor of the electronic device to execute the method steps of the method for recommending multimedia files according to any of the above.

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, and readable storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the section descriptions of method embodiments being merely pertinent.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A multimedia file recommendation method, comprising:

acquiring a label set of each multimedia file to be recommended;

Selecting a file to be recommended which is recommended by the user to be recommended according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended;

the generating the historical access tag set of the user to be recommended based on the multimedia file included in the historical access record includes:

aiming at each multimedia file included in the history access record, performing word segmentation on attribute information of the multimedia file based on a dictionary word segmentation algorithm to obtain a history access tag set of the user to be recommended;

the acquiring the tag set of each multimedia file to be recommended includes:

aiming at each multimedia file to be recommended, performing word segmentation on attribute information of the multimedia file to be recommended based on a dictionary word segmentation algorithm to obtain a tag set of each multimedia file to be recommended;

the dictionary word segmentation algorithm comprises the following steps: according to a preset word segmentation strategy, segmenting the attribute information of the multimedia file to obtain a plurality of Chinese character strings, matching each Chinese character string with entries in a preset dictionary, and constructing a label set by taking each successfully matched character string as a label; wherein, the attribute information of the multimedia file includes: any one or more of title information, profile information, publisher information, format information, and duration information.

2. The method according to claim 1, wherein the preset depth semantic matching model is obtained by training the following steps:

constructing the sample training set;

3. The method of claim 2, wherein said constructing said sample training set comprises:

4. A method according to any one of claims 1-3, wherein determining, based on a preset deep semantic matching model, a similarity between a semantic vector corresponding to the set of history access tags and a semantic vector corresponding to a set of tags for each multimedia file to be recommended comprises:

5. A method according to any one of claims 1-3, wherein determining, based on a preset deep semantic matching model, a similarity between a semantic vector corresponding to the set of history access tags and a semantic vector corresponding to a set of tags for each multimedia file to be recommended comprises:

6. A method according to any one of claims 1-3, wherein determining, based on a preset deep semantic matching model, a similarity between a semantic vector corresponding to the set of history access tags and a semantic vector corresponding to a set of tags for each multimedia file to be recommended comprises:

7. A multimedia file recommendation apparatus, comprising:

The selection module is used for selecting the file to be recommended which is recommended by the user to be recommended according to the similarity between the semantic vector corresponding to the historical access tag set and the semantic vector corresponding to the tag set of each multimedia file to be recommended, which are determined by the determination module;

the generating module is specifically configured to:

the acquisition module is specifically configured to:

8. The apparatus of claim 7, further comprising a training module to:

constructing the sample training set;

9. The device according to claim 8, wherein the training module is specifically configured to:

10. The apparatus according to any one of claims 7-9, wherein the determining module is specifically configured to:

11. The apparatus according to any one of claims 7-9, wherein the determining module is specifically configured to:

12. The apparatus according to any one of claims 7-9, wherein the determining module is specifically configured to:

13. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

The memory is used for storing a computer program;

the processor is configured to implement the method steps of any one of claims 1-6 when executing a program stored on the memory.

14. A readable storage medium, characterized in that it has stored therein a computer program which, when executed by a processor of an electronic device, implements the method steps of any of claims 1-6.