CN111353052B

CN111353052B - Multimedia object recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN111353052B
Application number: CN202010096847.XA
Authority: CN
Inventors: 张水发
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2023-11-21
Anticipated expiration: 2040-02-17
Also published as: CN111353052A

Abstract

The disclosure relates to a multimedia object recommendation method, a device, an electronic device and a storage medium, and relates to the technical field of information processing, comprising: obtaining user characteristics of a user and obtaining multimedia characteristics of a reference multimedia object; performing category prediction on the reference multimedia object according to the user characteristics and the multimedia characteristics; carrying out category prediction on the reference multimedia object again for a second preset classification number of times under the condition that the first preset classification number is larger than 1; selecting a multimedia object belonging to the predicted category from the multimedia object library, and determining similarity to the reference multimedia based on the feature similarity, and recommending by using the selected multimedia object. By applying the scheme provided by the embodiment of the invention, the accuracy of the recommended multimedia object can be improved.

Description

Multimedia object recommendation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of information processing, and in particular relates to a multimedia object recommendation method, a device, electronic equipment and a storage medium.

Background

In order to provide users with better quality of service, various types of application software typically recommend to the user the same multimedia objects as the categories to which the multimedia objects they have browsed belong. In particular, the multimedia object may be a video, a song, an article, or the like. For example, video class software typically determines the video category to which the video that the user browses belongs, and then recommends videos in the determined video category to the user.

Taking the above multimedia object as an example, in the related art, a worker typically manually classifies each video stored in a video database, for example: each video stored in the video database is divided into martial arts, emotion, even-image, and the like. In view of this, when determining the video category to which the video browsed by the user belongs, the video category to which each video browsed by the user belongs is generally determined according to the result of the manual classification by the staff.

However, when the staff classifies the videos, the subjectivity is strong, and when the staff classifies the videos, the staff generally only considers the attribute of the videos, so that the accuracy of the video category of each video browsed by the user determined by the method is low. Further, the accuracy is low when video recommendation is performed according to the determined video category to which the video browsed by the user belongs.

Disclosure of Invention

The disclosure provides a multimedia object recommendation method, a device, electronic equipment and a storage medium, so as to improve the accuracy of recommended multimedia objects. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a multimedia object recommendation method, the method comprising:

obtaining user characteristics of a user and obtaining multimedia characteristics of a reference multimedia object, wherein the reference multimedia object is a multimedia object requested by the user;

performing category prediction on the reference multimedia object according to the user characteristics and the multimedia characteristics;

and under the condition that the first preset classification times are larger than 1, carrying out class prediction on the reference multimedia object again for the second preset classification times, wherein the information on which the class prediction is carried out each time comprises the following components: the category obtained by the last category prediction and the multimedia feature, wherein the second preset classification times are equal to the first preset classification times minus 1;

selecting a multimedia object belonging to the predicted category from the multimedia object library, and determining similarity to the reference multimedia based on the feature similarity, and recommending by using the selected multimedia object.

In one embodiment of the present disclosure, the performing category prediction on the reference multimedia object according to the user feature and the multimedia feature includes:

inputting the user features and the multimedia features into a pre-trained category prediction model, and performing category prediction on the reference multimedia object;

the re-performing a second preset classification for the reference multimedia object for a plurality of class predictions, including:

and re-performing category prediction on the reference multimedia object for a second preset classification time by using the category prediction model, wherein the input information of the category prediction model when the category prediction is performed each time comprises: the category and the multimedia features output by the category prediction model when category prediction is performed last time;

wherein, the category prediction model is: taking sample user characteristics and sample multimedia characteristics of a sample user as model input information, taking a first preset number of labeling categories with different priorities, which the sample multimedia objects belong to, as training supervision information, training an initial model of the category prediction model, and obtaining a model for predicting the categories, which the multimedia objects belong to, wherein the sample multimedia characteristics are as follows: and the value of the first preset number is not smaller than the value of the first preset classification times.

In one embodiment of the present disclosure, the class prediction model is trained by:

obtaining characteristics of a sample user as sample user characteristics, and obtaining characteristics of a sample multimedia object requested by the sample user as sample multimedia characteristics;

obtaining the first preset number of annotation categories to which the sample multimedia object belongs, wherein each annotation category has different priorities;

inputting the sample user characteristics and the sample multimedia characteristics into the initial model, and carrying out category prediction on the sample multimedia objects;

and under the condition that the first preset number is larger than 1, performing second preset number sub-class prediction on the sample multimedia object by using the initial model, wherein the input information of the initial model when the class prediction is performed each time comprises: the category output by the initial model and the sample multimedia features when category prediction is performed last time, wherein the second preset number is equal to the first preset number minus 1;

and carrying out parameter adjustment on the initial model based on the difference of the category obtained by sequential prediction relative to the obtained labeling category, and realizing model training to obtain the category prediction model.

In one embodiment of the present disclosure, the obtaining the first preset number of annotation categories to which the sample multimedia object belongs includes:

obtaining the category to which the sample multimedia object belongs;

and counting the number of users corresponding to each category in the users with the sample user characteristics, wherein the number of users corresponding to each category is as follows: the number of users of the sample user characteristics that consider the sample multimedia object to belong to the category;

selecting the first preset number of categories with the largest corresponding user number as labeling categories, wherein the priorities of the labeling categories are as follows: the number of users corresponding to each labeling category is sequentially reduced from more to less.

In one embodiment of the present disclosure, the obtaining the user characteristic of the user includes:

and obtaining registration information of the user, and determining the category of the user according to the registration information as the user characteristic of the user.

According to a second aspect of embodiments of the present disclosure, there is provided a multimedia object recommendation apparatus, the apparatus comprising:

the feature acquisition module is used for acquiring user features of a user and acquiring multimedia features of a reference multimedia object, wherein the reference multimedia object is a multimedia object requested by the user;

A first category prediction module, configured to perform category prediction on the reference multimedia object according to the user feature and the multimedia feature;

the second class prediction module is configured to re-perform class prediction on the reference multimedia object for a second preset classification number of times when the first preset classification number of times is greater than 1, where information according to which class prediction is performed each time includes: the category obtained by the last category prediction and the multimedia feature, wherein the second preset classification times are equal to the first preset classification times minus 1;

and the object recommendation module is used for selecting the multimedia objects which belong to the predicted category and are similar to the reference multimedia based on the feature similarity from the multimedia object library, and recommending the multimedia objects by utilizing the selected multimedia objects.

In one embodiment of the disclosure, the first class prediction module is specifically configured to:

the second class prediction module is specifically configured to:

In one embodiment of the disclosure, the apparatus further includes a model training module for training to obtain the class prediction model, the model training module including:

the sample feature obtaining unit is used for obtaining the features of a sample user as sample user features and obtaining the features of the sample multimedia objects requested by the sample user as sample multimedia features;

the annotation category obtaining unit is used for obtaining the first preset number of annotation categories to which the sample multimedia object belongs, wherein each annotation category has different priorities;

The first model training unit is used for inputting the sample user characteristics and the sample multimedia characteristics into the initial model and carrying out category prediction on the sample multimedia objects;

the second model training unit is configured to perform a second preset number of sub-class predictions on the sample multimedia object by using the initial model when the first preset number is greater than 1, where input information of the initial model when performing the class predictions each time includes: the category output by the initial model and the sample multimedia features when category prediction is performed last time, wherein the second preset number is equal to the first preset number minus 1;

and the parameter adjustment unit is used for carrying out parameter adjustment on the initial model based on the difference of the category obtained by sequential prediction relative to the obtained labeling category, so as to realize model training and obtain the category prediction model.

In one embodiment of the present disclosure, the labeling category obtaining unit is specifically configured to:

obtaining the category to which the sample multimedia object belongs;

In one embodiment of the disclosure, the feature obtaining module is specifically configured to:

obtaining registration information of a user, determining a category to which the user belongs according to the registration information, taking the category as a user characteristic of the user, and obtaining a multimedia characteristic of a reference multimedia object, wherein the reference multimedia object is a multimedia object requested by the user.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia object recommendation method according to any of the first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the multimedia object recommendation method as in any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

as can be seen from the above technical solutions, when the solution provided by the embodiments of the present disclosure is applied to perform multimedia object recommendation, user features of a user are first obtained, multimedia features of a reference multimedia object are obtained, category prediction is performed on the reference multimedia object according to the user features and the multimedia features, and when the first preset classification number is greater than 1, category prediction is performed on the reference multimedia object for a second preset classification number again, and a multimedia object belonging to a category obtained by prediction and similar to the reference multimedia is selected from a multimedia object library, and recommendation is performed by using the selected multimedia object. Because the scheme provided by the embodiment of the disclosure considers the user characteristics of the user and the multimedia characteristics of the multimedia object selected by the user when obtaining the category to which the reference multimedia belongs, the category prediction is performed according to the two types of characteristics, and the category to which the multimedia object belongs is obtained. Therefore, when the category to which the reference multimedia belongs is obtained, the information category considered is rich and is no longer single, so that the accuracy of the obtained category can be improved. Furthermore, when the multimedia object recommendation is performed according to the obtained category, the accuracy of the recommended multimedia object can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flow chart illustrating a multimedia object recommendation method according to an exemplary embodiment.

FIG. 2 is a schematic diagram of a video cover image, according to an example embodiment.

Fig. 3 is a flow chart illustrating another multimedia object recommendation method according to an exemplary embodiment.

Fig. 4 is a schematic diagram illustrating a multimedia object category obtaining process according to an exemplary embodiment.

Fig. 5 is a flowchart of a class prediction model training method according to an embodiment of the disclosure.

Fig. 6 is a flowchart of another class prediction model training method according to an embodiment of the disclosure.

Fig. 7 is a schematic structural diagram of a multimedia object recommendation apparatus according to an embodiment of the present disclosure.

Fig. 8 is a schematic diagram of an electronic device according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating a multimedia object recommendation method according to an exemplary embodiment, which may be applied to a client, such as video software providing video, audio software providing audio, novice software providing novice, etc., and may be applied to an electronic device, such as a desktop computer, a notebook computer, a smart phone, etc., running the client. As shown in fig. 1, the multimedia object recommendation method includes the following steps 101-104.

Step 101, obtaining user characteristics of a user and obtaining multimedia characteristics of a reference multimedia object.

The user features may be categories to which the user belongs, the categories may be categories classified according to age groups, such as teenagers, young people, middle-aged people, etc., and the categories may also be categories classified according to work, such as teachers, doctors, composers, etc.

The reference multimedia object is a multimedia object requested by the user. The multimedia objects may be video, audio, novels, etc. For video, the multimedia features may be cover images of video, production companies, distribution time, etc.; for novels, the multimedia features described above may be titles, authors, publishers, etc. The multimedia object requested by the user may be a multimedia object clicked by the user on a browsing page of the media object, or may be a multimedia object searched by the user on a searching page, etc.

In one embodiment of the present disclosure, the user features and the multimedia features may be acquired with the user authorization acquired. Specifically, an authorization request window may be popped out from the interactive interface, and in the case that the user selects to confirm the authorization, the user characteristics of the user and the multimedia characteristics of the reference multimedia object may be obtained. The method can request to obtain the authorization when the multimedia object recommendation is performed each time, and can also request to obtain the authorization when the multimedia object recommendation is performed for the first time, so that the authorization request is avoided when the multimedia object recommendation is performed each time, the resource consumption caused by requesting the authorization is saved, and the recommendation efficiency is improved.

In one embodiment of the present disclosure, registration information of a user may be obtained, and a category to which the user belongs may be determined as a user feature of the user according to the registration information.

Specifically, in the case of obtaining the authorization of the user, the registration information of the user may be obtained. The registration information may include information of age, sex, work, etc. For example, according to the age of the user, it may be determined that the category to which the user belongs is teenagers, young, middle-aged, elderly, or the like; according to the work of the user, the category to which the user belongs can be determined as students, teachers, staff, workers and the like. And taking the determined category of the user as the user characteristic. The method for obtaining the user characteristics is simple, convenient and quick, and the obtained user characteristics are high in accuracy.

In one embodiment of the present disclosure, the category to which the user belongs may also be determined as the user feature according to the multimedia object browsed by the user. Specifically, the category of the multimedia object browsed by the user can be obtained, and the category of the multimedia object interested by the user is determined according to the category of the multimedia object and the proportion occupied by each category, so that the category of the user is determined and is used as the user characteristic.

The browsed multimedia object may be a multimedia object browsed by the user within a preset period, and the preset period may be a period of one week before the current time or a period of one month before the current time. The multimedia objects browsed may also be a third preset number of multimedia objects browsed by the user since the last multimedia object browsed, and the third preset number may be 50, 100, 200, etc. The specific gravity of each category may be a ratio of browsing times of the multimedia objects of each category.

Taking the multimedia object as an example of the video, the category of all videos browsed by the user in the previous three months from the current moment can be obtained, and the obtained category is provided with martial arts and emotion, wherein the browsing times of the videos belonging to the martial arts are 10 times, and the browsing times of the videos belonging to the emotion are 3 times, so that the category of the user can be determined to be martial arts fan and used as the user characteristics of the user.

In one embodiment of the present disclosure, when obtaining the multimedia features of the reference multimedia object, the reference multimedia object requested by the user may be determined according to the browsing record of the client, so as to obtain the multimedia features of the reference multimedia object.

Specifically, after the reference multimedia object is determined, information such as the distribution time, distribution area, cover image, multimedia object name, etc. of the reference multimedia can be obtained as the multimedia feature.

For example, in the video software, a history of the video browsed by the user is stored, and the video clicked by the user can be determined according to the history, and further, the cover image of the video can be determined as the multimedia feature of the video.

In one embodiment of the present disclosure, the reference multimedia object requested by the user may be determined according to the browsing record stored in the server, so as to obtain the multimedia feature of the reference multimedia object. The multimedia features of the reference multimedia object may also be determined based on the reference multimedia object currently clicked by the user.

Step 102, category prediction is performed on the reference multimedia object according to the user characteristics and the multimedia characteristics.

Specifically, the category prediction is performed on the reference multimedia object by combining the user characteristics and the multimedia characteristics, so that the category to which the reference multimedia object belongs can be obtained. The same reference multimedia object may belong to multiple categories. The categories to which the reference multimedia object belongs may be different for different categories of users, in which case the predicted categories to which the reference multimedia object belongs are also different when the user characteristics are different.

Referring to fig. 2, assuming that the reference multimedia object is a video, fig. 2 is a cover image of the video. The cover image includes pedestrians, rivers, european buildings and the like, that is, the category to which the video belongs can be pedestrian, river, european buildings and the like, which are classified from the attributes of the cover image of the video. But for different categories of users the focus may be different, for example for teenagers the focus is on rivers, for middle aged people the focus is on European buildings and for elderly people the focus is on pedestrians. That is, when the user features teenagers and the multimedia features are the cover images of the video, the predicted category of the video is a river; when the user is characterized by middle-aged people and the multimedia is characterized by the cover image of the video, the predicted category of the video is European style building.

Step 103, in case that the first preset classification number is greater than 1, performing the second preset classification number of classification predictions on the reference multimedia object again.

The information according to which the category prediction is performed each time includes: the category and the multimedia feature obtained by the category prediction last time, and the second preset classification times are equal to the first preset classification times minus 1. The first preset number of classifications may have a value of 1, 3, 6, 8, etc.

Specifically, under the condition that the first prediction classification times are greater than 1, firstly, performing class prediction on a reference multimedia object according to user characteristics and multimedia characteristics to obtain a first class; according to the first category and the multimedia characteristics, category prediction is carried out on the reference multimedia object to obtain a second category; updating the first category into the second category, and returning to execute the steps of carrying out category prediction on the reference multimedia object according to the first category and the multimedia characteristics to obtain the second category until the number of times of carrying out category prediction on the reference multimedia object reaches the first preset classification number.

For example, assume that the first preset number of classifications is 3, in which case the reference multimedia object is subjected to 2 more classification predictions. In the category prediction of the 1 st time, the category prediction is carried out on the reference multimedia object according to the user characteristics and the multimedia characteristics to obtain a category 1; in the category prediction of the 2 nd time, performing category prediction on the reference multimedia object according to the category 1 and the multimedia features to obtain a category 2; and in the 3 rd class prediction, carrying out class prediction on the reference multimedia object according to the class 2 and the multimedia features to obtain a class 3.

In one embodiment of the present disclosure, each category may be determined as a priority of the category to which the reference multimedia object belongs in the order of acquisition of the respective categories. Specifically, the higher the priority of the category with the earlier order is obtained, the lower the priority of the category with the later order is obtained.

The priority of each category as the category to which the reference multimedia object belongs can be understood as the probability that the reference multimedia object is considered to belong to each category. For a category, the higher the probability that a reference multimedia object is considered to belong to that category, the higher the priority of that category.

For example, if category 1, category 2, and category 3 are obtained in order, the category with the highest priority is category 1, and the category with the lowest priority is category 3.

Step 104, selecting the multimedia object which belongs to the predicted category and is similar to the reference multimedia based on the feature similarity from the multimedia object library, and recommending by using the selected multimedia object.

In one embodiment of the present disclosure, candidate multimedia objects belonging to a predicted category may be first selected from a multimedia object library, then multimedia objects similar to a reference multimedia object may be determined from the candidate multimedia objects based on the feature similarity, and recommendation may be made according to the determined multimedia objects.

In one embodiment of the present disclosure, candidate multimedia objects similar to the reference multimedia object may also be selected from the multimedia object library based on the feature similarity first, then multimedia objects belonging to the predicted category may be determined from the candidate multimedia objects, and recommendation may be performed according to the determined multimedia objects.

In one embodiment of the present disclosure, in determining a multimedia object similar to a reference multimedia based on feature similarity, feature similarity between each multimedia object and the reference multimedia object may be calculated, a multimedia object with feature similarity greater than a preset similarity threshold may be selected as a multimedia object similar to the reference multimedia, and a preset number of multimedia objects with highest feature similarity may be selected as multimedia objects similar to the reference multimedia.

Wherein the feature similarity between the multimedia object and the reference multimedia object can be determined by calculating euclidean distance, hamming distance, cosine similarity, etc. between the features of the multimedia object and the multimedia features of the reference multimedia object.

In one embodiment of the present disclosure, in case that the obtained categories have different priorities, a multimedia object to be multimedia recommended may be selected according to the priorities. Specifically, the preset recommended number of multimedia objects recommended for each priority class may be preset, and from the multimedia object library, a preset recommended number of multimedia objects belonging to the class and similar to the reference multimedia are selected and determined based on the feature similarity.

For example, assuming that the preset recommendation number of multimedia objects recommended for the first category having the highest priority is 5 and the preset recommendation number of multimedia objects recommended for the second category having the lower priority is 2, 5 multimedia objects belonging to the first category and determined to be similar to the reference multimedia based on the feature similarity and 2 multimedia objects belonging to the second category and determined to be similar to the reference multimedia based on the feature similarity are selected from the multimedia object library.

In one embodiment of the present disclosure, multimedia objects belonging to a higher priority category may be preferentially recommended to a user when making a multimedia object recommendation to the user. Specifically, when the recommended multimedia objects are displayed to the user, the display position of each multimedia object on the display page can be determined according to the priority of the category to which each multimedia object to be displayed belongs.

For example, the multimedia objects to be displayed may be sequentially displayed on the display page from left to right according to the order of the category priorities from high to low.

Therefore, the multimedia object recommendation can be performed pertinently according to the categories with different priorities, the recommended multimedia objects are rich in category and are not monotonous, and better experience is brought to the user.

Referring to fig. 3, in one embodiment of the present disclosure, for steps 102 and 103 described above, a category prediction model may be utilized to perform category prediction on the reference multimedia object.

The category prediction model is as follows: taking sample user characteristics and sample multimedia characteristics of a sample user as model input information, taking a first preset number of labeling categories with different priorities of the sample multimedia objects as training supervision information, training an initial model of a category prediction model, and obtaining a model for predicting the category of the multimedia objects, wherein the sample multimedia characteristics are as follows: the value of the first preset number of values is not smaller than the value of the first preset classification times.

The initial model may be an LSTM (Long Short-Term Memory) model, an RNN (Recurrent Neural Network ) model, or the like, which is not limited in this disclosure.

See specifically steps 1021 and 1031 below:

step 1021, inputting the user characteristics and the multimedia characteristics into a pre-trained category prediction model, and performing category prediction on the reference multimedia object.

Specifically, the user characteristics and the multimedia characteristics are input into a category prediction model to obtain an output result, and the output result is used as a category to which the reference multimedia object belongs.

For example, when the reference multimedia object is a video and the multimedia feature is a cover image of the video, the user feature and the cover image are input into a category prediction model, and the cover image is classified to obtain a category to which the cover image belongs as the category to which the video belongs.

Step 1031, in case that the first preset classification number is greater than 1, re-performing the second preset classification number for the class prediction on the reference multimedia object by using the class prediction model.

The input information of the category prediction model in each category prediction comprises: and outputting the category and the multimedia features by the category prediction model when category prediction is performed last time.

Specifically, referring to fig. 4, in the case that the number of first prediction classifications is greater than 1, firstly, inputting user features and multimedia features into a classification prediction model to obtain a first classification output by the model; inputting the first category and the multimedia features into a category prediction model to obtain a second category output by the model; updating the first class into the second class, and returning to execute the steps of inputting the first class and the multimedia features into the class prediction model to obtain the second class output by the model until the number of class predictions on the reference multimedia object reaches a first preset classification number. And each time the category prediction model classifies, a category to which the reference multimedia object belongs can be obtained, and the reference multimedia object is classified for a first preset classification times, so that a plurality of categories of the first preset classification times can be obtained.

In one embodiment of the present disclosure, each category may be set as a priority of a category to which the reference multimedia object belongs according to the obtained output order of each category, and the category after the priority is set may be determined as the category to which the reference multimedia object belongs.

For example, in the case that the reference multimedia object is a video and the multimedia feature is a cover image of the video, inputting the user feature and the cover image into a category prediction model, and performing category prediction on the cover image to obtain a first category, which is denoted as P1; inputting the obtained first category and the cover image into the category prediction model again, and performing category prediction on the cover image again to obtain a second category, which is recorded as P2; at this time, the first category is updated to the second category, and the first category and the cover image are input into the category prediction model again to obtain the second category, and the second category is denoted as P3 … … until the classification frequency reaches the first preset classification frequency. At this time, a first preset number of classification results of P1 and P2 … … Pn are obtained, the priority of each classification result is set according to the output sequence of each classification result, that is, the priority of P1 is the highest, the priority of P2 is the next priority corresponding to the priority of P1, and finally the classification result with the set priority is used as the category to which the reference multimedia object belongs.

The proposal provided by the embodiment can obtain a plurality of categories to which the reference multimedia object belongs, and the priority difference exists between different categories, so the category of the obtained category is not single any more and is richer. Based on this, when making multimedia object recommendation to the user according to categories, different objects can be recommended to the user according to the priority order between the different categories.

Referring to fig. 5, fig. 5 is a flowchart of a class prediction model training method according to an embodiment of the disclosure, where the method includes the following steps 501 to 501:

step 501, obtaining characteristics of a sample user as characteristics of the sample user, and obtaining characteristics of a sample multimedia object requested by the sample user as characteristics of the sample multimedia.

The method for obtaining the sample user characteristics and the sample multimedia characteristics of the sample user and the sample multimedia object is the same as the method for obtaining the user characteristics and the multimedia characteristics in the step 101, and will not be described herein.

Step 502, obtaining a first preset number of annotation categories to which the sample multimedia object belongs.

Wherein each annotation class has a different priority. The labeling category is a category to which the sample multimedia object belongs for labeling the sample multimedia feature. For example, when the sample multimedia object is a video and the sample multimedia feature is a cover image of the video, taking the above fig. 2 as an example, the category to which the cover image belongs may be a pedestrian, a river, or an european building.

Because the number of labeling categories to which the sample multimedia objects used by the category prediction model in the training stage belong is the first preset number, the category to which the first preset number of multimedia objects belong can be obtained based on the category prediction model obtained by training in the using stage. In the scheme provided by the embodiment of the disclosure, in the actual use stage of the class prediction model, the reference multimedia object needs to be subjected to the first preset classification for several times of class prediction, and in order to ensure the accuracy of the obtained class, the value of the first preset number is greater than or equal to the value of the first preset classification times. For example, assuming that the value of the first preset number of classifications is 3, the first preset number may be 1, 2, 3, etc.

In one embodiment of the disclosure, when the labeling categories are obtained, a worker may classify the sample multimedia object, and divide the sample multimedia object into a first preset number of categories, thereby obtaining the first preset number of labeling categories.

In one embodiment of the present disclosure, when the labeling category is obtained, the sample multimedia features may be further input into a feature classification model that is trained in advance, to obtain a first preset number of categories, which are labeling categories to which the sample multimedia object belongs.

The feature classification model may be an RNN model or a CNN (Convolutional NeuralNetworks, convolutional neural network) model. The characteristic classification model can be obtained by taking the sample multimedia characteristics as input, taking the labels of the categories to which the sample multimedia objects belong as supervision and training the initial model.

In one embodiment of the present disclosure, for the first preset number of annotation categories obtained, the priority may be manually set for each annotation category.

Step 503, inputting the sample user features and the sample multimedia features into an initial model, and performing category prediction on the sample multimedia objects.

Specifically, the sample user characteristics and the sample multimedia characteristics are taken as input information, an initial model is input, the initial model processes the input information, and a first sample category for carrying out category prediction on the sample multimedia object is output.

Step 504, performing a second preset number of sub-category predictions on the sample multimedia object by using the initial model, if the first preset number is greater than 1.

The input information of the initial model when the category prediction is performed each time comprises: the category and sample multimedia features output by the initial model when the category prediction is performed last time, and the second preset number is equal to the first preset number minus 1.

Specifically, under the condition that the first prediction quantity is larger than 1, firstly, taking sample user characteristics and sample multimedia characteristics as input information, inputting an initial model to obtain a first sample category output by the model; then, taking the first sample category and the sample multimedia features as input information, inputting an initial model to obtain a second sample category output by the model; updating the first sample category into the second sample category, and returning to execute the steps of taking the first sample category and the sample multimedia features as input information, inputting an initial model and obtaining the second sample category output by the model until the sample multimedia objects are subjected to a first preset number of secondary category predictions.

Each time the initial model performs the class prediction, a sample class to which a sample multimedia object belongs can be obtained, and the first preset number of classes can be obtained by performing the first preset number of sub-classifications on the sample multimedia object.

And 505, carrying out parameter adjustment on the initial model based on the difference of the category obtained by sequential prediction relative to the obtained labeling category, and realizing model training to obtain a category prediction model.

Specifically, the difference between the first preset number of categories and the first preset number of labeling categories, which are sequentially obtained, can be determined, and parameters of the initial model are adjusted according to the difference. And repeatedly executing the steps by using the model with the parameters adjusted, so as to realize multiple training of the model. Until the preset training times are reached, or the difference between the class output by the model and the labeling class is smaller than a preset difference threshold, namely, the model is considered to be converged, and the training of the model is finished. And taking the trained model as a category prediction model. In one embodiment of the present disclosure, the above-mentioned difference may be determined by calculating euclidean distance, cosine similarity, contrast loss, etc. between the output class and the labeling class.

For example, assume that the first preset number is 3, and the categories of the 3 outputs sequentially obtained are: [ S1S 2S 3], wherein S1 represents the class of the first output of the model, S2 represents the class of the second output, and S3 represents the class of the third output; the 3 labeling categories are: [ Z1Z 2Z 3], wherein Z1 represents the highest priority annotation class, Z2 represents the second priority annotation class, and Z3 represents the lowest priority annotation class. The difference of [ S1S 2S 3] relative [ Z1Z 2Z 3] can be calculated, and parameter adjustment is carried out on the model based on the calculated difference, so that model training is realized.

Referring to fig. 6, in one embodiment of the present disclosure, for the step 502 described above, a first preset number of labeling categories with different priorities may be obtained through the following steps 5021 to 5023:

in step 5021, a category to which the sample multimedia object belongs is obtained.

Specifically, a class to which a third preset number of multimedia objects belong may be obtained, where, because the first preset number of annotation classes are to be obtained, the value of the third preset number is not smaller than the value of the first preset number, and the first preset number of classes are selected from the final third preset number of classes as the annotation classes.

In step 5022, the number of users corresponding to each category is counted among the users having the sample user characteristics.

Wherein, the number of users corresponding to each category is: the number of users of the sample user characteristics to which the sample multimedia object belongs is considered among the users having the sample user characteristics. A user having sample user characteristics that are the same as the sample user characteristics described above. Specifically, in the case where the category to which the user belongs is used as the user feature, the user having the sample user feature is: users belonging to the same category.

For example, assuming that the sample multimedia object is a video, the category to which the sample multimedia object belongs is martial arts, and the sample user is characterized as teenagers, the number of users corresponding to the category is: the number of users in teenagers who consider the video as martial arts video.

The number of users corresponding to each labeling category can be obtained in the form of a difference adjustment questionnaire. Specifically, after a first preset number of annotation categories are obtained for a sample multimedia object, voting is conducted on each annotation category by a fourth preset number of users to obtain the number of tickets corresponding to each annotation category, and then the number of users corresponding to each annotation category is obtained. Wherein, the fourth preset number may be 100, 500, 1000, etc.

The number of users corresponding to each annotation category can also be obtained by mining the browsing behavior of the users. Specifically, for one labeling category of one sample multimedia object, the sample multimedia object and a fifth preset number of sample multimedia objects belonging to the same labeling category are displayed to a user together, when the user browses the displayed sample user characteristics, if the sample multimedia object is selected, the user is confirmed to consider that the sample multimedia object belongs to the labeling category, and further statistics is performed on the user considering that the sample multimedia object belongs to the labeling category, so as to obtain the number of users corresponding to each labeling category. Wherein, the fifth preset number may be 10, 50, 100, etc.

Step 5023, selecting a first preset number of categories with the largest number of users as labeling categories.

The priority of each labeling category is as follows: the number of users corresponding to each labeling category is sequentially reduced from more to less.

Specifically, according to the sequence from high to low of the number of users corresponding to each labeling category, the priority of the sample multimedia object belonging to each labeling category from high to low is obtained. For example, if the number of users corresponding to the labeling category 1 is 100 and the number of users corresponding to the labeling category 2 is 150, the priority of the labeling category 2 is higher than that of the labeling category 1.

In one embodiment of the disclosure, when counting the number of users corresponding to each labeling category, the users may have misoperation, and the number of the users does not have reference significance; or the number of users corresponding to a certain annotation class is too small, and the probability that the sample multimedia object belongs to the annotation class is considered to be too low, so that when the annotation class is obtained, a first preset number of annotation classes with higher priority are selected, and the annotation class with too low priority is cleared.

In one embodiment of the present disclosure, when only one category to which the reference multimedia object belongs needs to be obtained, the reference multimedia object may be subjected to the category prediction only once using the category prediction model.

In this case, when the category to which the reference multimedia object belongs is obtained, the user feature and the multimedia feature may be input into a pre-trained category prediction model, and the category prediction may be performed on the reference multimedia object, so as to obtain a category to which the reference multimedia object belongs.

Assuming that the reference multimedia object is a video, the multimedia feature is a cover image of the video, and the user feature and the cover image can be input into a category prediction model to obtain an output result, wherein the output result is the category to which the cover image belongs, and the category is used as the category to which the video belongs.

In one embodiment of the present disclosure, the above-described class prediction model may be trained by:

obtaining user characteristics of a sample user as sample user characteristics; the characteristics of the sample multimedia object requested by the sample user are obtained and used as the sample multimedia characteristics; obtaining 1 annotation category to which a sample multimedia object belongs; and training the initial model by taking the sample user characteristics and the sample multimedia characteristics as input information and the labeling type as training supervision information to obtain a type prediction model.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a multimedia object recommendation apparatus according to an embodiment of the present disclosure, where the apparatus includes:

a feature obtaining module 701, configured to obtain a user feature of a user and obtain a multimedia feature of a reference multimedia object, where the reference multimedia object is a multimedia object requested by the user;

a first category prediction module 702, configured to perform category prediction on the reference multimedia object according to the user feature and the multimedia feature;

the second class prediction module 703 is configured to re-perform class prediction on the reference multimedia object for a second preset classification number of times when the first preset classification number of times is greater than 1, where the information according to which class prediction is performed each time includes: the category obtained by the last category prediction and the multimedia feature, wherein the second preset classification times are equal to the first preset classification times minus 1;

An object recommendation module 704, configured to select a multimedia object belonging to a predicted category from a multimedia object library, and determine similarity to the reference multimedia based on feature similarity, and make a recommendation using the selected multimedia object.

In one embodiment of the disclosure, the first class prediction module 702 is specifically configured to:

the second class prediction module 703 is specifically configured to:

obtaining the category to which the sample multimedia object belongs;

In one embodiment of the disclosure, the feature obtaining module 701 is specifically configured to:

Fig. 8 is a block diagram of an electronic device, applied to application software, according to an exemplary embodiment. Referring to fig. 8, the electronic device comprises a processor 801, a communication interface 802, a memory 803 and a communication bus 804, wherein the processor 801, the communication interface 802, the memory 803 complete communication with each other through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the multimedia object recommendation method provided in the present disclosure when executing the program stored in the memory 803.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present disclosure, there is also provided a computer readable storage medium having a computer program stored therein, the computer program implementing the steps of any of the above-described multimedia object recommendation methods when executed by a processor.

In yet another embodiment provided by the present disclosure, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the multimedia object recommendation method of any of the above embodiments.

Compared with the prior art, the multimedia object recommendation method provided by the embodiment of the invention has the advantages that the information types are abundant and are not single any more, so that the accuracy of obtaining the object types can be improved.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present disclosure, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, the electronic device embodiments, the computer-readable storage medium embodiments, and the computer program product embodiments, the description is relatively simple, and reference should be made to the description of method embodiments in part, since they are substantially similar to the method embodiments.

The foregoing description is only of the preferred embodiments of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present disclosure are included in the protection scope of the present disclosure.

Claims

1. A method of multimedia object recommendation, the method comprising:

and determining each category as the priority of the category to which the reference multimedia object belongs according to the acquisition sequence of each category, selecting the multimedia object which belongs to the predicted category from a multimedia object library according to the priority of each category, determining the multimedia object similar to the reference multimedia based on the feature similarity, and recommending by utilizing the selected multimedia object.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

said performing category prediction on said reference multimedia object based on said user characteristics and multimedia characteristics, comprising:

3. The method of claim 2, wherein the class prediction model is trained by:

4. A method according to claim 3, wherein said obtaining said first predetermined number of annotation categories to which said sample multimedia object belongs comprises:

obtaining the category to which the sample multimedia object belongs;

5. The method according to any of claims 1-4, wherein said obtaining user characteristics of a user comprises:

6. A multimedia object recommendation apparatus, the apparatus comprising:

and the object recommendation module is used for determining each category as the priority of the category to which the reference multimedia object belongs according to the acquisition sequence of each category, selecting the multimedia object which belongs to the predicted category and is similar to the reference multimedia based on the feature similarity from the multimedia object library according to the priority of each category, and recommending by utilizing the selected multimedia object.

7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the first class prediction module is specifically configured to:

The second class prediction module is specifically configured to:

8. The apparatus of claim 7, further comprising a model training module for training the class prediction model, the model training module comprising:

9. The apparatus according to claim 8, wherein the annotation class obtaining unit is specifically configured to:

obtaining the category to which the sample multimedia object belongs;

10. The apparatus according to any one of claims 6-9, wherein the feature acquisition module is specifically configured to:

11. An electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia object recommendation method of any one of claims 1 to 5.

12. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the multimedia object recommendation method of any one of claims 1 to 5.