CN110737783B

CN110737783B - Method and device for recommending multimedia content and computing equipment

Info

Publication number: CN110737783B
Application number: CN201910950867.6A
Authority: CN
Inventors: 张恒
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2023-01-17
Anticipated expiration: 2039-10-08
Also published as: CN110737783A

Abstract

The application discloses a method, a device and a computing device for recommending multimedia contents, which are used for improving the accuracy and effectiveness of recommending the multimedia contents and further improving the recommendation performance of a recommendation system. The method comprises the following steps: obtaining a recommendation request; after obtaining a recommendation request, responding to the recommendation request, and determining candidate multimedia contents of which the comprehensive attraction degrees meet a preset attraction degree condition from a multimedia content recommendation pool; the comprehensive attraction degree of the multimedia content to be recommended is determined according to a title attraction degree and a picture attraction degree corresponding to the multimedia content to be recommended, the title attraction degree indicates the attraction degree of a text title of the multimedia content to be recommended to a user, and the picture attraction degree indicates the attraction degree of at least one abstract picture of the multimedia content to be recommended to the user; and recommending the multimedia content according to the candidate multimedia content.

Description

Method and device for recommending multimedia content and computing equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a computing device for recommending multimedia content.

Background

At present, many recommendation systems, such as an information recommendation system, a short video recommendation system, etc., generally display a plurality of recommended multimedia contents in one page at the same time in order to improve the display efficiency of the page in the process of recommending the multimedia contents to a user by using the recommendation system, so that titles and partial pictures of the multimedia contents are generally displayed together on the display page, and the user can view rough information of the plurality of multimedia contents in one page at the same time and then click some of the multimedia contents to view according to his own viewing preferences.

Based on the current display mode, in order to improve the click rate and the view rate of the user, the recommendation performance of the recommendation system needs to be optimized, so how to improve the recommendation performance of the recommendation system is a considerable problem.

Disclosure of Invention

The embodiment of the application provides a method, a device and a computing device for recommending multimedia content, which are used for improving the accuracy and effectiveness of a recommendation system for recommending the multimedia content and further improving the recommendation performance of the recommendation system.

In one aspect, a method of recommending multimedia content is provided, the method comprising:

obtaining a recommendation request;

responding to the recommendation request, and determining candidate multimedia contents of which the comprehensive attraction meets a preset attraction condition from a multimedia content recommendation pool; the comprehensive attraction degree of the multimedia content to be recommended is determined according to a title attraction degree and a picture attraction degree corresponding to the multimedia content to be recommended, the title attraction degree indicates the attraction degree of a text title of the multimedia content to be recommended to a user, and the picture attraction degree indicates the attraction degree of at least one abstract picture of the multimedia content to be recommended to the user;

and recommending the multimedia content according to the candidate multimedia content.

In one aspect, an apparatus for recommending multimedia content is provided, the apparatus comprising:

an obtaining module, configured to obtain a recommendation request;

the determining module is used for responding to the recommendation request and determining candidate multimedia contents of which the comprehensive attraction degrees meet the preset attraction degree condition from a multimedia content recommendation pool; the comprehensive attraction degree of the multimedia content to be recommended is determined according to a title attraction degree and a picture attraction degree corresponding to the multimedia content to be recommended, the title attraction degree indicates the attraction degree of a text title pair of the multimedia content to be recommended, and the picture attraction degree indicates the attraction degree of at least one abstract picture of the multimedia content to be recommended to a user;

and the recommending module is used for recommending the multimedia content according to the candidate multimedia content.

Optionally, the determining module is configured to:

performing word segmentation processing on the text title to obtain a plurality of words, and determining a word vector corresponding to each word;

determining semantic features corresponding to the text titles according to all the obtained word vectors and a pre-trained attraction degree prediction model, and determining semantic attraction degrees corresponding to the text titles according to the semantic features; the attraction degree prediction model is obtained by training according to a plurality of text training samples marked with semantic attraction degrees;

and determining the title temptation degree according to the semantic temptation degree corresponding to the text title.

Optionally, the determining module is configured to:

determining preset strong-allure content included in the text title; the preset strong-temptation content comprises at least one of a preset hot keyword, a plurality of preset keywords which are mutually contradictory, a specific sentence pattern structure and a specific point symbol combination;

determining the literal temptation corresponding to the text title according to the included preset strong temptation content;

and determining the title temptation degree according to the semantic temptation degree and the literal temptation degree.

Optionally, the determining module is configured to:

determining a target abstract picture from the at least one abstract picture;

extracting the features of the target abstract picture to obtain image feature information corresponding to the target abstract picture;

inputting image characteristic information corresponding to the target abstract picture into a plurality of preset image classification models to obtain a plurality of classification probabilities corresponding to the target abstract picture; the image classification models are used for classifying and describing the pictures from a plurality of image description dimensions respectively;

determining image-level allure corresponding to the target abstract picture according to a plurality of classification probabilities corresponding to the target abstract picture and a preset classification fusion model;

and determining the picture temptation according to the image level temptation corresponding to the target picture.

Optionally, the determining module is configured to:

determining the picture type of the target abstract picture;

determining an image classification model group corresponding to the picture type of the target abstract picture according to the corresponding relation between the preset picture type and the image classification model group;

and inputting the image characteristic information corresponding to the target abstract picture into each image classification model in the corresponding image classification model group.

Optionally, the determining module is configured to:

determining the ratio of the viewing degree of the user to the text titles of other multimedia contents and the corresponding abstract pictures according to the historical viewing information of the user viewing other multimedia contents;

and carrying out proportional processing on the title allure and the picture allure in positive correlation corresponding to the ratio to obtain the comprehensive allure.

Optionally, the determining module is configured to:

comparing the title temptation with a first temptation threshold, and comparing the picture temptation with a second temptation threshold;

if at least one of the title allure and the picture allure is greater than or equal to a corresponding allure threshold value, determining the comprehensive allure according to the title allure and the picture allure according to a preset calculation strategy;

if the title allure and the picture allure are both smaller than corresponding allure thresholds, inputting the title allure and the picture allure into a previously trained allure fusion model to obtain the comprehensive allure; the attraction degree fusion model is obtained by training according to a plurality of multimedia content training samples marked with title attraction degrees and picture attraction degrees.

In one aspect, a computing device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps included in the method for recommending multimedia content as described above.

In one aspect, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform the steps included in the method for recommending multimedia content described above.

In one aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the method of recommending multimedia content described in the various possible implementations described above.

In the embodiment of the application, the index of 'temptation' is introduced to measure the recommendation performance of the recommendation system, and the 'temptation' refers to the attraction degree of a user. Specifically, for multimedia content, corresponding title allure and picture allure are respectively calculated from two dimensions of a text title and a summary picture, which considers that some users are more interested in the text title and some users are more interested in the summary picture in the prior art, so that the comprehensive allure of the multimedia content calculated by combining the two dimensions is integrally considered from the view preference of most users, for example, the multimedia content with high comprehensive allure is preferentially recommended to the users, and the higher the comprehensive allure is, the greater the attractiveness to the users is, the greater the probability of attracting the users to click and view the multimedia content is, so that the accuracy and effectiveness of multimedia content recommendation can be improved to a certain extent, and the generalization is better, thereby improving the recommendation performance of a recommendation system.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only the embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of information presentation performed in an embodiment of the present application;

FIG. 2 is a schematic diagram of short video presentation in the example of the application;

FIG. 3 is a schematic diagram of an application scenario in which the present application is applied;

FIG. 4 is a flowchart of calculating a composite temptation of multimedia content according to an embodiment of the present application;

FIG. 5 is another flowchart of calculating a composite temptation of multimedia content in an embodiment of the present application;

FIG. 6 is a diagram illustrating a method for calculating title enticement level using an enticement level prediction model according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for calculating an attraction of a picture according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a network for extracting image feature information of a summary picture in an embodiment of the present application;

FIG. 9 is a flow chart of a method of recommending multimedia content in an embodiment of the present application;

FIG. 10 is a diagram illustrating an apparatus for recommending multimedia content according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computing device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the technical solutions in the embodiments of the present application will be described below clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the claimed protection. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing the association object, and means that there may be three kinds of relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship, unless otherwise specified.

Some terms referred to herein are explained below to facilitate understanding by those skilled in the art.

1. Multimedia content refers to the content of a man-machine interactive information exchange and transmission medium combining two or more media, the media used may include characters, pictures, photos, sounds (including music, voice, special sound effects), animation, and movies, etc., and specifically, the multimedia content may include short videos, information, articles, or other types of content.

Short video is a mode of internet content transmission, generally is video transmission content which is transmitted on new internet media for a period of time within 5 minutes (e.g. several seconds to several minutes), and is video content which is suitable for being watched in a mobile state and a short leisure state and is pushed at high frequency. With the pace of life increasing, short video is now a hot spot competing for user fragmentation time.

The information is information that can express a specific event, and it can be understood that the information is information that a user can bring value to the user in a relatively short time because the user obtains the information in time and utilizes the information.

2. The text titles are generally expressed in words, and in the embodiment of the present application, the titles expressed in words of the multimedia content are referred to as text titles, and the text titles may substantially represent the content of the multimedia content.

Referring to the information display interface shown in fig. 1, three pieces of information are displayed in the order from top to bottom in fig. 1, and the 1 st piece of information is taken as an example, and the text is entitled "MM is really ugly when star is small? However, the title of the 3 rd piece of information is "flagship model YYYY release by XX corporation", and there is a stabbing pain after the back is flawed: is the Chinese manufacturer really cheated? ". The content of the information can be approximately reflected through the text title of the information, a user can estimate the approximate content of the information through the text title of the information so as to judge whether the content is the content which the user is interested in, if the content is the content which is interested in, the user can click to check, and therefore the text title of the information is an important factor which attracts the user to click to check the information to a certain extent.

Referring again to the short video presentation interface shown in fig. 2, two short videos are expanded in fig. 2, and the text titles of the short videos in fig. 2 are overlaid on the thumbnails of the short videos, so that it can be seen that the text titles of the short videos above are "what kind of style is you updated in a 20 year old mother street, what kind of style is you happy? "the text title of the short video below is" the net red beef grazing that must be tried ". Through the text titles of the short videos, the user can roughly know the video content corresponding to the short videos, and the text titles also have the function of summary introduction for the short videos. In a specific implementation process, the text titles of the short videos may also be presented in other forms, and the embodiments of the present application are not limited.

3. The abstract picture refers to a picture for summarizing and introducing the multimedia content, and it can be understood that the abstract picture of the multimedia content corresponds to an image for roughly concentrating and introducing the multimedia content. The summary picture may be, for example, a picture of a plurality of pictures included in the information content, or may also be a video thumbnail of the short video, and the summary picture of the multimedia content may be one or more pictures.

As shown in fig. 1, the 1 st piece of information includes only one abstract picture located at the right side of the text title, the 2 nd piece of information includes three abstract pictures located below the text title, the abstract pictures of the 1 st piece of information and the 2 nd piece of information are both still pictures, and the abstract pictures of the 1 st piece of information and the 2 nd piece of information are both partial pictures in the information content. The abstract picture of the 3 rd piece of information is a video thumbnail, and it can be understood that the content of the 3 rd piece of information at least includes a video, which is the core content of the piece of information. As can be seen from fig. 1, the text titles and the abstract pictures of the information may have various display layouts, and different information may include one or more abstract pictures, and the abstract pictures of the information may be still pictures or video thumbnails.

Referring to fig. 2 again, the abstract picture of the short video is a thumbnail of the short video, a video playing control may also be displayed on the abstract picture, and when the user needs to view the video, the user may click the video playing control to control the short video to be played.

4. The attraction degree in the embodiment of the present application may be understood as a degree of attraction of multimedia content to a user, for example, a degree of attraction of information or short videos to a user, and specifically, may be understood as a probability or a possibility of a user clicking to view information or videos, and the attraction degree may be embodied by a specific numerical value, for example, a numerical value divided by 1 to 100, that is, an attraction degree score, where a higher attraction degree score indicates a greater attraction to the user, and a probability of the user clicking to view corresponding information or short videos is greater.

As mentioned above, when the recommendation system recommends multimedia content, in order to improve the conversion of the recommended content, i.e. improve the click rate and the view rate, the content recommended by the recommendation system to the user should be as attractive as possible for the user, because the user generally only is interested in the content with a certain attraction degree, and then clicks and views the content.

In the related art, when resource content recommendation is performed, the following methods are generally adopted:

1) Based on the title. Most of the concepts of 'title party' are adopted to label the titles, and whether the title party is attractive or not is judged through some rules or simple classification models. However, the title-based approach only uses some rules or simple classification models to determine whether the title party is attractive. The practical application significance is limited due to the fact that the definition of the 'title party' is too broad.

2) Based on the picture. In general information articles, some classification tasks such as entertainment and sports are mostly performed on pictures through models, and whether the pictures are enticing for users is judged through the information. In a recommendation mode based on pictures, by extracting category information (such as food, basketball and the like) of the pictures as a measure of 'temporality', the flexibility in user recommendation is not enough.

3) Based on the article as a whole. In the existing scheme, the quality score of an article is finally determined by scoring each dimension of the article, and personalized recommendation is performed on a user through the quality score. The overall work based on the article is mainly to score each dimension of the article (format, number of characters, number of pictures, picture quality and the like) to serve as the article temptation, and the method has poor effect on accuracy and generalization.

In the related technology, either only rough recommendation is performed in a way of extracting keywords from titles, or only simple tag classification is performed on pictures, and corresponding recommendation is performed according to the classification and tags of users.

5. Artificial intelligence is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence to produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

6. Machine Learning (ML) is a multi-domain cross discipline, relating to multi-domain disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

7. The Long Short-Term Memory model (LSTM) is used for modifying a circulation layer unit and avoiding directly calculating a hidden layer state value by using a formula. LSTM is a long and short term memory network, a time-recursive neural network, suitable for processing and predicting important events of relatively long intervals and delays in a time series. LSTM is understood to be a recurrent neural network, while bidirectional LSTM is simply BLSTM or BilSTM or Bi-LSTM.

8. Attention (Attention) mechanism, an Attention model used in the field of artificial neural networks. The nature of the attention mechanism is derived from the human visual attention mechanism, when people perceive things, people generally do not see a scene from beginning to end and all the things at each time, but often observe a specific part of attention according to needs, and when people find that a scene often appears in a certain part and what people want to observe, people can learn to pay attention to the part when similar scenes reappear in the future. Thus, the attention mechanism is essentially a means of screening out high-value information from a large amount of information where different information has different importance to the result, which importance can be reflected by giving attention weights of different sizes, in other words, the attention mechanism can be understood as a rule of assigning weights when synthesizing multiple sources.

9. The word Vector, or called Embedding feature or word Vector feature, is used to describe the semantic relationship between words included in text data, the description idea of the word Vector is to convert words represented by natural language into Dense Vector (Dense Vector) or matrix form that can be understood by a computer, and the word Vector is the embodiment of the text in the machine in numerical value. The extraction of the Word Vector features may be performed by a deep learning model, for example, a Convolutional Neural Network (CNN) model, an LSTM model, an RNN (Word to Vector) model, or a Word to Vector (Word 2 Vector) model may be used for extraction, and of course, other possible deep learning models may also be used for extraction.

10. The Word2Vec model is used as an open source Word vector tool of Google, words can be converted into Word vectors by utilizing semantic relations among the words, and the words can be identified by utilizing semantic distance relations among the Word vectors.

As described above, how to improve the recommendation performance of a recommendation system is a problem that needs to be considered at present, in view of this, an embodiment of the present application provides a method for recommending multimedia content, for example, for recommending information or recommending short videos, in a recommendation process, corresponding title attraction and picture attraction are respectively calculated according to a text title and an abstract picture of the multimedia content to be recommended, then a comprehensive attraction of the multimedia content to be recommended is determined according to the title attraction and the picture attraction, and then the multimedia content to be recommended meeting a preset attraction condition is screened out as a candidate multimedia content according to a preset attraction condition and the respective comprehensive attraction of each multimedia content to be recommended, and finally, corresponding recommendation is performed on a user according to the candidate multimedia content.

That is to say, when resource recommendation is performed, the measurement manner of "allure" is introduced in the embodiment of the present application, and the attraction degree of the multimedia content to the user is comprehensively measured by combining two parts, namely, a text title and an abstract picture of the multimedia content, because in practice, some users are more interested in titles, some users are more interested in abstract pictures, and some users are more interested in both of the titles and the abstract pictures, and both the text title and the abstract picture are used for indicating the approximate content of the multimedia content, the overall attraction degree of the multimedia content to the user is comprehensively reflected through two dimensions, namely, the title allure of the text title and the picture allure of the abstract picture, so that the actual viewing requirement of the user can be more accurately reflected, and further, the recommendation indexes such as click rate, browsing duration, browsing depth and the like can be improved to a certain extent, thereby improving the accuracy and effectiveness of the multimedia content recommendation, and improving the recommendation performance of the recommendation system.

In order to better understand the technical solutions provided in the embodiments of the present application, some simple descriptions are provided below for application scenarios to which the technical solutions provided in the embodiments of the present application are applicable, and it should be noted that the application scenarios described below are only used for describing the embodiments of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 3, fig. 3 is an application scenario to which the scheme for recommending multimedia content in the embodiment of the present application is applied, where the application scenario includes a plurality of terminal devices (e.g., a first terminal device 301, a second terminal device 302, and a third terminal device 303) and a server 304, where the server 304 may be a server serving a multimedia recommendation platform, such as an information recommendation server, a short video recommendation server, and the like. Each terminal device may correspond to one user, taking the second terminal device 302 and the short video recommendation scenario as an example, the second terminal device 302 corresponds to the user 2, the user 2 may operate the second terminal device 302 to send a short video recommendation request to the server 304, and further, the server 304 may recommend a short video to the second terminal device 302 by using the method for recommending multimedia content in the embodiment of the present application, so that the user 2 can watch the short video. It should be noted that each terminal device in fig. 3 can be used as a multimedia content (e.g. information or short video) requesting end to request a multimedia content resource from the server 304, and can also upload multimedia content to be distributed through the server 304, that is, can also be a multimedia content distributing end at the same time.

Each terminal device in fig. 3 may be a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a notebook computer, an intelligent wearable device (such as a smart watch and a smart helmet), a Personal computer, and so on. And, the server 304 in FIG. 3 may be a personal computer, a midrange computer, a cluster of computers, and so forth.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or figures, more or fewer operation steps may be included in the methods based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of these steps is not limited to the order of execution provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.

As described above, in the embodiment of the present application, the comprehensive attraction of the multimedia content is calculated from two dimensions, namely, the text title and the abstract picture, and then the resource recommendation is performed according to the comprehensive attraction of the multimedia content as a recommendation basis, so as to improve the recommendation performance of the recommendation system. A specific implementation of calculating the comprehensive temptation of the multimedia content in the embodiment of the present application is described below with reference to fig. 4 and 5.

Referring to fig. 4, fig. 4 shows a general flow of the integrated enticement level of the multimedia content, that is: aiming at the text titles, the title allure of the text titles can be calculated through two parts of strategy quick identification and model semantic calculation; firstly extracting image characteristics of the abstract picture, then utilizing a plurality of submodels to calculate the allure from a plurality of dimensions of the picture, and finally fusing the calculation results of the submodels to obtain the picture allure of the abstract picture; and finally, fusing the attraction results of the two dimensions of the title and the picture to obtain the final comprehensive attraction of the multimedia content, wherein the comprehensive attraction of the multimedia content can be used as the recommendation basis of a recommendation system to recommend the multimedia content to a user.

Referring again to fig. 5, fig. 5 is a detailed embodiment based on fig. 4, and the flow shown in fig. 5 is described below.

Step 501: and determining a text title and at least one abstract picture of the multimedia content to be recommended according to the multimedia content to be recommended.

It should be noted that the multimedia content recommendation pool includes various types of multimedia content to be recommended, for example: including only a text title, only a summary picture (e.g., a video thumbnail), including a text title and one summary picture, including a text title and a plurality of summary pictures. The embodiment of the present application mainly recommends and explains multimedia content including a text title and a (one or more) abstract pictures, that is, the method for calculating comprehensive attraction in the embodiment of the present application is used for multimedia content including a text title and an abstract picture, and for multimedia content including only a text title, the title attraction can be calculated by calculating the title attraction of a text title, which is introduced later in the embodiment of the present application, so that the title attraction is directly used as the overall attraction of the multimedia content, and for multimedia content including only abstract pictures, the picture attraction can be calculated by calculating the picture attraction of an abstract picture, which is introduced later in the embodiment of the present application, so that the picture attraction is directly used as the overall attraction of the multimedia content.

Therefore, for a multimedia content to be recommended including both a text title and a summary picture, the text title and at least one summary picture of the multimedia content to be recommended may be determined first, for example, the text title of the first piece of information in fig. 1 is "MM is really ugly when stars are small? The true phases are, however, \8230; "and a corresponding abstract picture, again for example the text title of the second short video in fig. 2 is" the net red beef grazing which must be tried "and a corresponding abstract picture (video thumbnail).

Step 502: and determining title allure corresponding to the text titles of the multimedia content to be recommended.

In the embodiment of the present application, an attraction prediction model may be used to determine a title attraction corresponding to a text title, that is, for one text title, a machine model may be used to calculate the corresponding title attraction. The following description will first describe the training process of the attraction prediction model.

In a first step, training samples for training the temptation prediction model are obtained, for example, these training samples are called text training samples, and according to the type of text title of the multimedia content, these text training samples are some sentence patterns, for example. In a specific implementation process, in order to improve generalization, a large number of text training samples can be collected, the text training samples can be labeled by different users according to the interest degrees of the users themselves, or one text training sample can be labeled by multiple users at the same time, that is, one text training sample may be labeled with multiple semantic attraction degrees at the same time, so that one text training sample and a corresponding semantic attraction degree can be independently used as one training sample, for example, a certain text training sample is labeled with three semantic attraction degrees of 93, 88 and 90, for example, three users score the attraction degree of the text, and then the text training sample can be derived to obtain three text training samples with different semantic attraction degrees. Each semantic temptation degree indicates the degree of interest of a user in the content expressed by the semantics of the text training sample, and it can be known that the higher the degree of interest of the user is, that is, the greater the attraction degree of the text training sample to the user is, the greater the possibility that the user may click on the text training sample is indicated, that is, the greater the click probability and the viewing probability are, and the greater the viewing duration (that is, the viewing depth) is.

And secondly, after a large number of text training samples marked with semantic allure are obtained, model training can be carried out according to the text training samples and an initial prediction model, for example, each text training sample is preprocessed, for example, some special symbols such as "·" are removed, and moods such as "are removed, and the like. And then performing word segmentation processing to obtain word vector representations of the word segments, and finally inputting the word vectors and the corresponding semantic temporities into an initial prediction model for model training, thereby obtaining a trained tempority prediction model. The attraction prediction model in the embodiment of the present application is, for example, a deep learning model such as a BLSTM model, TEXT-CNN, BERT (Bidirectional Encoder reconstruction from transformations), and the like, and the embodiment of the present application is not limited.

After the trained temptation prediction model is obtained through the above description, for each text title (i.e. one text sequence) to be subjected to temptation, the temptation prediction model is used to predict the attraction degree of the text title to the user, i.e. predict the semantic temptation degree thereof, the text title may be preprocessed as described above, then the text title may be subjected to word segmentation processing by using NPL technology to obtain word vectors corresponding to the respective words, finally the temptation prediction model obtained through the above training is used to calculate the semantic temptation degree corresponding to the text title according to the obtained word vectors, and the calculation of the temptation prediction model may obtain the semantic temptation degree of each text title, taking the first text title in fig. 1 as an example, for example, the calculated semantic temptation degree is 89, if the temptation degree range is 0-100, wherein the larger the temptation degree value indicates that the attraction degree to the user is higher, so the temptation degree of the semantic title of 89 is relatively high, that the first text title in fig. 1 is likely to be clicked by a large number of the text titles, and thus the corresponding to the user may be viewed.

For the understanding of semantics by the temptation prediction model, attention mechanism (Attention) can be fused in the temptation prediction model, so that the context semantics can be more accurately understood for longer titles, the semantics of text titles can be more accurately understood, and the true semantics of text titles can be more accurately identified.

Referring to the schematic diagram of calculating title temptation by using the temptation prediction model shown in fig. 6, firstly, the Word vector model Word2Vec is used to pre-train the semantic vectors of words by using large-scale corpus samples, and for the text titles with title temptation to be calculated, the text titles are firstly subjected to Word segmentation to obtain words x ₁ 、x ₂ 、x ₃ 、x ₄ 、……、x _n And then, obtaining Word vector representations of the words by using the trained Word2Vec model, inputting the obtained Word vectors into the BLSTM model for semantic feature extraction, and inputting the extracted semantic features into a Softmax classifier to obtain a final allure result. The semantic feature extraction module is BLSTM, and the bidirectional LSTM unit can capture the context features at a longer distance, so that more accurate semantic expression is obtained, and the accuracy of semantic recognition is improved.

In the embodiment of the application, the semantic allure of the text titles on the semantic level calculated by the allure prediction model can be measured from the actual semantic level of the text titles to the attraction degree of the text titles to the user, and the accuracy of the allure of the text titles can be further improved to a certain extent.

In a possible embodiment, after obtaining the semantic temptation of the text title, the semantic temptation can be directly used as the title temptation corresponding to the text title, that is, the attraction degree of the text title to the user is directly embodied from the semantic level of the title.

In another possible embodiment, while obtaining the semantic allure of the text title, the literal allure corresponding to the text title may be calculated, where the literal allure refers to the attraction degree to the user from the literal structure of the text title, such as some keywords included in the text title that attract the user or sentence structure that attracts the user, and so on. The literal temptation degree shows the literal direct attraction of the text title, the semantic temptation degree shows the indirect attraction of the text title connotation, the objective attraction degree of the whole text title to different users can be more accurately reflected through the two dimensions, the accuracy of determining the title optimal temptation degree of the text title can be improved, meanwhile, the method can be suitable for more users, and the generalization and universality of the scheme are enhanced. After the semantic attraction and the literal attraction of the text title are obtained simultaneously, the title attraction corresponding to the text title may be further obtained based on the two, in a specific implementation process, some specific calculation strategies may be adopted for calculation, for example, an average value of the two may be used as the title attraction, or a larger one of the two may be directly used as the title attraction, or a weight may be respectively set for the semantic attraction and the literal attraction, and a sum of the weighted values obtained by multiplying the semantic attraction and the literal attraction by the corresponding weight is used as a final title attraction, and the embodiments of the present application are not limited to the specific calculation strategies.

The content with strong attraction is referred to as strong-attraction content in the embodiment of the present application, and the strong-attraction content may include at least one of a preset hit keyword, a preset plurality of keywords contradictory to each other, a specific sentence structure, and a specific target symbol combination, for example. The hot keywords refer to some keywords which are relatively attractive to the public at present, for example, a flagship model with ultrahigh cost performance is newly released by a company which is popular with the public, and many users are interested in the keywords, so that the name of the flagship model can be understood as the hot keywords, and most users are likely to be interested in text titles containing the hot keywords; the preset plurality of mutually contradictory keywords can be keywords with mutually contradictory semantics, and based on the curiosity of the user, when the user sees the self-contradictory text titles, the possibility of being interested in the keywords is large; specific sentence pattern structures, such as '\8230 \ 8230;' true '\ 8230;' true very good '8230;' or '8230;' true phase nevertheless \8230; 'or' 8230; 'self-explosion' 8230; '8230, inner screen', etc., which are sentence pattern structures that attract the attention of the user, are more attractive to the user based on the curiosity of the user; a particular punctuation corresponds to a combination, punctuation symbols may represent the general semantics of a text title from the side, for example including simultaneously a question mark "? "and exclamation mark"! The sentence "generally expresses a relatively strong mood, such as distrust, startle, surprise, and incredible emotions, which are also attractive to the general user, so that the text including the specific combination of the target symbols is also generally attractive to the user. Based on the strong enticement content included in the text titles, this type of text titles can be understood to some extent as a "title party" because the user is attracted from the face of the title itself.

After extracting the strong-allure contents included in the text titles, the literal allure of the text titles can be comprehensively determined according to the specific types and the number of the included strong-allure contents, for example, the literal allure including both the contradictory keys and the specific point symbol combinations is generally larger than the literal allure including only the specific point symbol combinations, that is, the more the types and the number of the strong-allure contents hit by the text titles are, the more the attention of the user can be attracted from multiple aspects, so the literal allure of the text titles can be considered to be larger.

Step 503: and determining the picture allure corresponding to the target abstract picture of the multimedia content to be recommended.

As mentioned above, a piece of multimedia content may include one or more summary pictures, and the picture temptation in the embodiment of the present application may be an overall picture temptation corresponding to all summary pictures of the multimedia content, and the temptation of each summary picture may be calculated first, and then the overall picture temptation corresponding to the multimedia content may be calculated according to the temptation of all summary pictures. Or the picture temptation in the embodiment of the present application may be a picture temptation corresponding to one or a part of digest pictures of the multimedia content. Therefore, a target abstract picture which requires calculating the temptation degree may be determined from all the abstract pictures of the multimedia content to be recommended, and the determined target abstract picture may be all the abstract pictures, or may be one of the abstract pictures, for example, one of the abstract pictures which includes the largest number of colors, that is, the richest image content, or may be multiple (or less than all) of the abstract pictures. Further, the temptation of all the digest pictures can be expressed by the temptation of the target digest picture.

In the embodiment of the present application, when the selected target abstract picture includes a plurality of target abstract pictures, the same method may be used to calculate the temptation of each target abstract picture, and the following description is given to the procedure of calculating the temptation of one target abstract picture with reference to fig. 7, and the calculation of the temptation of other target abstract pictures can be understood in the same manner.

Step 701: and performing feature extraction on the target abstract picture to obtain image feature information of the target abstract picture.

The image feature extraction tool may be used to obtain the image information of the target digest picture, for example, the target digest picture may be input into an inclusion-v 3 network (or other network capable of extracting image feature information) to perform image feature extraction, so as to obtain the content features in the target digest picture.

The inclusion-v 3 belongs to a deep convolutional neural network and can extract abstract semantic expression in picture content, a specific network structure of the inclusion-v 3 is shown in fig. 8, wherein "a" is an overall model structure diagram, the model includes three parts, which are respectively denoted as an inclusion B (part "B" on the right side of fig. 8), an inclusion C (part "C" on the right side of fig. 8), and an inclusion D (part "D" on the right side of fig. 8), and a general deep convolutional neural network includes the following parts:

1) An input layer: the image size is width height channel, wherein width is image width, height is image height, channel is image channel number, and for the color image, the image is three-channel image, namely R, G, B three channels, and then the R, G, B pixels of the image are compressed into one dimension for input.

2) A convolutional layer: the method is composed of weights and bias terms of convolution kernels. In a convolutional layer, the output of the previous layer (also called feature map) is convolved by a convolution kernel, and the feature map of the output is obtained through an activation function, wherein each feature map is the value of combining and convolving a plurality of input feature maps.

3) A pooling layer: the convolved feature maps are downsampled, and typical pooling includes maximum pooling, sum pooling, and average pooling.

4) Full connection layer: all the network nodes of the previous layer are connected with each network node of the next layer by weight values.

5) And finally, performing multi-classification processing by using softmax to obtain image classification information, namely a classification probability value, and further calculating an attraction score according to the classification probability value.

Step 702: and inputting the extracted image characteristic information into a plurality of preset image classification models to obtain a plurality of classification probabilities corresponding to the target abstract picture.

Each image classification model is used for classifying and describing pictures from one image description dimension, for example, sub-models such as gender, aesthetic feeling, sexuality, expression, color and action can be included, the image classification models are trained in advance, by taking the image description dimension of the action as an example, the image classification model corresponding to the action can be trained through pictures including various actions of a large number of human beings or animals or other objects in the training process, the action of the objects (human beings or animals and the like) in the pictures can be recognized through the trained image classification model, for example, for a certain picture, the probability of lying on the 'flat' is determined to be 20%, the probability of running on the 'is 93%, the probability of walking on the' is determined to be 78%, and the classification probability of the picture under the 'action' sub-model can be determined to be 93% according to the principle of taking the maximum value of the probabilities.

Since pictures may represent various subjects, such as people, landscapes, gouges, pets, and the like, and the picture description dimensions for measuring the attraction of the pictures to the user are generally different for different types of pictures, for example, for people, the dimensions concerned by the user may include which people are specifically (i.e., people identification), gender, beauty, muscle degree, sense of sex, clothing, accessories, and the like, that is, different types of pictures may correspond to image classification model groups having matching, so that when each picture is described by using an image classification model, the image classification model group corresponding to the picture may be selected for correlation calculation, so that the amount of calculation may be reduced to some extent, the calculation efficiency may be improved, and meanwhile, the effectiveness may be improved by performing targeted description measurement and processing according to the picture type to which the picture belongs.

Step 703: and fusing the classification probabilities by using a preset classification fusion model to obtain the image-level allure of the target abstract picture.

After obtaining a plurality of classification probabilities corresponding to the target abstract picture under the plurality of image classification models in the manner introduced in step 702, further performing fusion calculation on the obtained plurality of classification probabilities by using a preset classification fusion model to finally obtain the image-level allure of the target abstract picture. The preset classification fusion model is based on a plurality of image description dimensions of an image, and embodies the overall image-level temptation evaluation according to the combination of the plurality of dimensions, for example, the image-level temptation of a female + high aesthetic feeling + sexual feeling type image is high.

If the target abstract picture is one abstract picture, the image-level temptation of the target abstract picture can be used as the integral temptation of all abstract pictures, namely the picture temptation corresponding to all abstract pictures; if the target abstract picture comprises a plurality of abstract pictures, one of the abstract pictures with the highest image level temptation can be selected as the picture temptation corresponding to all the abstract pictures, because generally, as long as one abstract picture can attract the user, the user can click to view, namely, one picture with high temptation can attract the user, or the average value of a plurality of image level temptation corresponding to the plurality of target abstract pictures can be used as the picture temptation corresponding to all the abstract pictures, and the like.

After the title allure of the text titles of the multimedia contents to be recommended and the picture allure of the abstract pictures are obtained through the

steps

502 and 503, respectively, the comprehensive allure of the multimedia contents to be recommended can be further obtained according to the obtained title allure and picture allure, and as for the way of calculating the comprehensive allure, two ways of strategy and model fusion can be roughly adopted, which will be illustrated below.

First mode

A first threshold value of attraction degree for the text titles and a second threshold value of attraction degree for the abstract pictures are preset, and after the title attraction degree and the picture attraction degree of a certain piece of multimedia content to be recommended are obtained, the size comparison of the title attraction degree with the first threshold value of attraction degree as shown in step 504 and the size comparison of the picture titles with the second threshold value of attraction degree as shown in step 505 can be executed. It should be noted that the first and second temptation threshold values may not have a certain magnitude relationship, in other words, the magnitude relationship between the first and second temptation threshold values is not limited in the embodiments of the present application.

If both the title temptation and the picture temptation are greater than or equal to the corresponding temptation threshold, i.e. the title temptation is greater than or equal to the first temptation threshold, and at the same time the picture temptation is also greater than or equal to the second temptation threshold, or if there is any one of the title temptation and the picture temptation that is greater than or equal to the corresponding temptation threshold, e.g. the title temptation is greater than or equal to the first temptation threshold and the picture temptation is less than the second temptation threshold, or e.g. the title temptation is less than the first temptation threshold but the picture temptation is greater than or equal to the second temptation threshold, then the overall temptation of the multimedia content to be recommended may be determined in the manner shown in step 506.

Step 506: determining the comprehensive attraction degree of the multimedia content to be recommended according to the title attraction degree and the picture attraction degree according to a preset calculation strategy.

That is, when at least one of the title enticement degree and the picture enticement degree is greater than or equal to the corresponding enticement degree threshold, the final comprehensive enticement degree can be calculated according to the preset calculation strategy, and for the preset calculation strategy, several possible implementation manners are listed below.

For example, when both the title attraction and the picture attraction are greater than the corresponding attraction threshold, one of the titles and the picture attraction may be randomly selected as the combined attraction, or the larger of the titles and the picture attraction may be selected as the combined attraction, or an average of the titles and the picture attraction may be taken as the combined attraction, or an average may be taken as the combined attraction when a difference between the titles and the picture attraction is small (e.g., when the difference is smaller than the other threshold), or the larger of the titles and the picture attraction may be selected as the combined attraction when the difference between the titles and the picture attraction is large (e.g., when the difference is greater than the other threshold), and so on.

Second mode

Through the comparison between the first attraction threshold and the second attraction threshold, if both the title attraction and the picture attraction are smaller than the corresponding attraction thresholds, it can be considered that both the text title and the abstract picture have a low attraction to the user, and at this time, the model fusion method shown in step 507 can be adopted to calculate the comprehensive attraction between the text title and the abstract picture.

Step 507: and inputting the title allure and the picture allure into a previously trained allure fusion model to obtain the comprehensive allure of the multimedia content to be recommended.

The attraction degree fusion model is obtained by training a plurality of multimedia content training samples marked with title attraction degrees and picture attraction degrees, the attraction degree fusion model can learn the dependence relationship between the comprehensive attraction degree of the multimedia content training samples and the corresponding title attraction degree and picture attraction degree in a machine learning mode, and the dependence relationship is obtained by learning a large number of training samples, so when the dependence relationship is applied to other multimedia contents needing to calculate the comprehensive attraction degree, the corresponding comprehensive attraction degree can be calculated more accurately.

In the second mode, the model fusion is used for jointly deciding the comprehensive attraction degree by utilizing the information of the titles and the pictures, the accuracy of the comprehensive attraction degree can be ensured in a machine learning mode, the generalization of the obtained result is better, the calculation efficiency of the model is higher, and the determination efficiency of the final attraction degree evaluation of the multimedia content can be improved.

Third mode

By comparing the first attraction threshold with the second attraction threshold, if both the title attraction and the picture attraction are smaller than the corresponding attraction thresholds, it can be considered that both the text title and the abstract picture have a low attraction to the user, and at this time, another calculation strategy as shown in step 508 can be adopted to calculate the comprehensive attraction between the text title and the abstract picture.

Step 508: and determining the ratio of the viewing degrees of the user to other text titles and other corresponding abstract pictures according to the historical viewing information of other multimedia contents viewed by the user, and carrying out positive correlation proportional calculation on the title allure and the picture allure obtained by the calculation so as to obtain the comprehensive allure of the multimedia contents to be recommended.

For example, for the user a, it can be known from the historical viewing information that, when the user a views the multimedia content, the user a views the picture in the multimedia content each time or by multiple clicks, and the viewing time for each time is long, and the overall reading time for the page content (that is, most of the text) of the entire multimedia content is short, which indicates that the viewing degree of the user a on the picture is greater than the viewing degree of the text, for example, the ratio between the viewing degrees of the user a on the text and the picture can be roughly determined according to the respective reading time lengths, and the ratio is approximated as the ratio between the viewing degrees of the user a on the text title and the abstract picture. For example, the calculated ratio is 50%, which indicates that the user a has about 2 times of the focus of the text title on the abstract picture, i.e., the user a has a higher focus on the picture.

Further, the calculated title allure and picture allure are subjected to proportional calculation of positive correlation corresponding to the above ratio (i.e. 50%), where the proportional calculation of positive correlation may refer to proportional calculation directly according to the above ratio, or may refer to calculation of positive correlation according to the size relationship embodied by the above ratio, for example, 50% indicates that the attention degree to the text title is less than the attention degree to the abstract picture, then in the following process according to the calculated title allure and picture allure, a smaller weight may be set for the title allure, and a larger weight may be set for the picture allure, for example, the weight ratio of the title allure to the picture allure is 1, or may be 1 3, or may be 2, and so on, and then the final comprehensive allure is calculated by the respectively set weights and the corresponding two allures. For example, the determined title allure is 80, the determined picture allure is 88, and the weight ratio of the set title allure to the picture allure is 1.

It should be noted that, the implementation principle of the third embodiment is described above only by a simple example, and in the specific implementation process, other calculation methods may be used to calculate the comprehensive attraction degree based on the above calculation principle, and the embodiment of the present application is not limited.

Further, the above-mentioned manner of calculating the comprehensive attraction of the multimedia content may be applied to various recommendation scenes, such as recommending information with a higher comprehensive attraction to the user, or recommending short videos with a higher comprehensive attraction to the user, and so on, and the following sets and with reference to fig. 9 describe the method for recommending multimedia content in this embodiment of the application.

Step 901: and obtaining a recommendation request of a user.

Step 902: and responding to the recommendation request, and determining candidate multimedia contents of which the comprehensive attraction meets a preset attraction condition from the multimedia contents to be recommended included in the multimedia content recommendation pool.

Step 903: and recommending the user according to the candidate multimedia content according to a preset recommendation strategy.

For the multimedia content to be recommended in the multimedia content recommendation pool, the title allure corresponding to the text title and the picture allure corresponding to the abstract picture can be respectively calculated according to the introduced manner of calculating the comprehensive allure of the multimedia content, namely from the two dimensions of the text title and the abstract picture, and then the comprehensive allure of the multimedia content to be recommended is calculated according to the allures of the two dimensions, so that the comprehensive allure of each multimedia content to be recommended is obtained.

The preset allure condition is, for example, that the comprehensive allure needs to be greater than a certain threshold, for example, greater than 80, or a predetermined number (for example, 50) of multimedia contents are selected according to the height of the allure to be recommended, and the like, so as to obtain to-be-recommended multimedia contents meeting the preset allure condition as candidate multimedia contents, and further recommend the candidate multimedia contents, for example, batch recommendation is performed according to a certain number, or batch recommendation is performed according to the height of the allure, and the like, thereby realizing recommendation of multimedia contents based on the allure, so that the recommended multimedia contents can attract most users to a certain degree, so as to improve the accuracy and effectiveness of recommendation, improve the click rate and viewing of users, and further improve the recommendation performance of the recommendation system.

Compared with the prior art, the temptation evaluation algorithm provided by the embodiment of the application can more flexibly identify the temptation of the multimedia content, and has better accuracy and generalization capability. And obtaining the temptation score of the whole multimedia content by independently modeling the titles and the pictures. Judging in a strategy + model mode on the title; adopting a multi-dimensional and multi-model judging mode on the picture; and fusing the characteristics of the text dimension and the picture dimension by adopting a strategy and a model on the whole multimedia content so as to obtain the comprehensive allure score of the whole multimedia content.

The method for calculating the comprehensive temptation of the multimedia content provided by the embodiment of the application is applied to various recommendation systems, the accuracy and the recall rate of the multimedia content are obviously improved, the click rate and the browsing duration of a user are improved, and the use experience of the user is enhanced.

Based on the same inventive concept, the embodiment of the present application provides a device for recommending multimedia content, which may be a hardware structure, a software module, or a hardware structure plus a software module. The device for recommending multimedia content is, for example, the server 304 in fig. 3, or may be a functional device disposed in the server 304, and the device for recommending multimedia content may be implemented by a chip system, and the chip system may be formed by a chip, and may also include a chip and other discrete devices. Referring to fig. 10, an apparatus for recommending multimedia content in the embodiment of the present application includes an obtaining module 1001, a determining module 1002, and a recommending module 1003, where:

an obtaining module 1001 configured to obtain a recommendation request;

a determining module 1002, configured to determine, in response to a recommendation request, candidate multimedia content whose comprehensive attraction meets a preset attraction condition from a multimedia content recommendation pool; the comprehensive attraction degree of the multimedia content to be recommended is determined according to a title attraction degree and a picture attraction degree which correspond to the multimedia content to be recommended, the title attraction degree indicates the attraction degree of a text title pair of the multimedia content to be recommended, and the picture attraction degree indicates the attraction degree of at least one abstract picture of the multimedia content to be recommended to a user;

a recommending module 1003, configured to recommend the multimedia content according to the candidate multimedia content.

In one possible implementation, the determining module 1002 is configured to:

performing word segmentation processing on the text title to obtain a plurality of segmented words, and determining a word vector corresponding to each segmented word;

determining semantic features corresponding to the text titles according to all the obtained word vectors and a pre-trained temptation degree prediction model, and determining semantic temptation degrees corresponding to the text titles according to the semantic features; the temptation degree prediction model is obtained by training according to a plurality of text training samples marked with semantic temptation degrees;

and determining title allure according to the semantic allure corresponding to the text titles.

In one possible implementation, the determining module 1002 is configured to:

determining preset strong allure content included in the text title; the preset strong-temptation content comprises at least one of a preset hot keyword, a plurality of preset keywords which are mutually contradictory, a specific sentence pattern structure and a specific point symbol combination;

and determining title temptation according to the semantic temptation and the literal temptation.

In one possible implementation, the determining module 1002 is configured to:

determining a target abstract picture from at least one abstract picture;

performing feature extraction on the target abstract picture to obtain image feature information corresponding to the target abstract picture;

and determining the image temptation according to the image level temptation corresponding to the target image.

In one possible implementation, the determining module 1002 is configured to:

determining the picture type of the target abstract picture;

In one possible implementation, the determining module 1002 is configured to:

and carrying out proportional processing of positive correlation corresponding to the ratio on the title allure and the picture allure to obtain comprehensive allure.

In one possible implementation, the determining module 1002 is configured to:

if at least one of the title attraction degree and the picture attraction degree is greater than or equal to the corresponding attraction degree threshold value, determining a comprehensive attraction degree according to the title attraction degree and the picture attraction degree according to a preset calculation strategy;

if the title attraction degree and the picture attraction degree are both smaller than the corresponding attraction degree threshold values, inputting the title attraction degree and the picture attraction degree into a pre-trained attraction degree fusion model to obtain a comprehensive attraction degree; the attraction fusion model is obtained by training according to a plurality of multimedia content training samples marked with title attraction and picture attraction.

All the related contents of the steps involved in the foregoing embodiments of the method for recommending multimedia contents may be cited in the description of the functions of the functional modules corresponding to the apparatus for recommending multimedia contents in the embodiments of the present application, which is not described herein again.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, an embodiment of the present application provides a computing device, for example, the server 304 in fig. 3, as shown in fig. 11, the computing device in the embodiment of the present application includes at least one processor 1101, and a memory 1102 and a communication interface 1103 connected to the at least one processor 1101, a specific connection medium between the processor 1101 and the memory 1102 is not limited in the embodiment of the present application, an example is that the processor 1101 and the memory 1102 are connected by a bus 1100 in fig. 11, the bus 1100 is represented by a thick line in fig. 11, and a connection manner between other components is merely schematically illustrated and is not limited. The bus 1100 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 11 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus.

In the embodiment of the present application, the memory 1102 stores instructions executable by the at least one processor 1101, and the at least one processor 1101 may execute the steps included in the foregoing full link performance testing method by executing the instructions stored in the memory 1102.

The processor 1101 is a control center of the computing device, and may connect various parts of the entire computing device by using various interfaces and lines, and perform various functions of the computing device and process data by operating or executing instructions stored in the memory 1102 and calling data stored in the memory 1102, thereby performing overall monitoring of the computing device. Optionally, the processor 1101 may include one or more processing modules, and the processor 1101 may integrate an application processor and a modem processor, wherein the processor 1101 mainly handles an operating system, a user interface, an application program, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1101. In some embodiments, the processor 1101 and the memory 1102 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 1101 may be a general purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 1102, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1102 may include at least one type of storage medium, which may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 1102 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1102 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.

The communication interface 1103 is a transmission interface capable of performing communication, and may receive data or transmit data through the communication interface 1103, for example, data interaction may be performed with other devices through the communication interface 1103 to achieve the purpose of communication.

Further, the computing device includes a basic input/output system (I/O system) 1104, a mass storage device 1108 for storing an operating system 1105, application programs 1106, and other program modules 1107 that facilitate the transfer of information between the various devices within the computing device.

The basic input/output system 1104 includes a display 1109 for displaying information and an input device 1110 such as a mouse, keyboard, etc. for a user to input information. Wherein a display 1109 and input devices 1110 are connected to the processor 1101 by a basic input/output system 1104 connected to the system bus 1100. The basic input/output system 1104 may also include an input/output controller for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input-output controller may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 1108 is connected to the processor 1101 through a mass storage controller (not shown) connected to the system bus 1100. The mass storage device 1108 and its associated computer-readable media provide non-volatile storage for the server package. That is, mass storage device 1108 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.

According to various embodiments of the present application, the computing device package may also be operated by a remote computer connected to the network through a network, such as the Internet. That is, the computing device may be connected to the network 1111 via the communication interface 1103 coupled to the system bus 1100, or may be connected to another type of network or remote computer system (not shown) using the communication interface 1103.

Based on the same inventive concept, the present application also provides a storage medium, which may be a computer-readable storage medium, having stored therein computer instructions, which, when executed on a computer, cause the computer to execute the steps of the method for recommending multimedia content as described above.

Based on the same inventive concept, the embodiment of the present application further provides a chip system, which includes a processor and may further include a memory, and is configured to implement the steps of the method for recommending multimedia content as described above. The chip system may be formed by a chip, and may also include a chip and other discrete devices.

In some possible embodiments, the various aspects of the method for recommending multimedia content provided in the embodiments of the present application may also be implemented in the form of a program product including program code for causing a computer to perform the steps of the method for recommending multimedia content according to various exemplary embodiments of the present application described above when the program product runs on the computer.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of recommending multimedia content, the method comprising:

obtaining a recommendation request;

responding to the recommendation request, and determining candidate multimedia contents with comprehensive temptation degree meeting a preset temptation degree condition from a multimedia content recommendation pool; the comprehensive attraction degree of the multimedia content to be recommended is determined according to title attraction degrees and picture attraction degrees corresponding to the multimedia content to be recommended, the title attraction degrees indicate attraction degrees of text titles of the multimedia content to be recommended to a user, the title attraction degrees are determined based on semantic attraction degrees and literal attraction degrees corresponding to the text titles of the multimedia content to be recommended, the picture attraction degrees indicate attraction degrees of at least one abstract picture of the multimedia content to be recommended to the user, and the picture attraction degrees are determined based on image-level attraction degrees corresponding to target abstract pictures in the at least one abstract picture; the image-level allure is determined based on a plurality of classification probability fusions corresponding to the target summary picture;

recommending the multimedia content according to the candidate multimedia content;

determining the picture temptation corresponding to at least one abstract picture of the multimedia content to be recommended, wherein the picture temptation comprises the following steps:

determining a target abstract picture from the at least one abstract picture;

and determining the picture temptation according to the image-level temptation corresponding to the target abstract picture.

2. The method of claim 1, wherein determining title temptation corresponding to text titles of the multimedia content to be recommended comprises:

3. The method of claim 2, wherein the method further comprises:

determining preset strong allure content included in the text titles; wherein the preset strong-allure content comprises at least one of preset hot keywords, preset multiple contradictory keywords, a specific sentence pattern structure and a specific point symbol combination;

determining the literal allure corresponding to the text titles according to the included preset strong allure content;

then, determining the title temptation according to the semantic temptation corresponding to the text title, including:

and determining the title allure according to the semantic allure and the literal allure.

4. The method of claim 1, wherein inputting image feature information corresponding to the target abstract picture into a plurality of preset image classification models comprises:

determining the picture type of the target abstract picture;

5. The method according to any one of claims 1 to 4, wherein determining the comprehensive attraction of the multimedia content to be recommended according to the title attraction and the picture attraction comprises:

determining the ratio of the user to the watching degrees of the text titles of other multimedia contents and the corresponding abstract pictures according to the historical watching information of the other multimedia contents watched by the user;

and carrying out proportional processing of positive correlation corresponding to the ratio on the title allure and the picture allure to obtain the comprehensive allure.

6. The method according to any one of claims 1 to 4, wherein determining a comprehensive attraction of the content to be recommended according to the title attraction and the picture attraction comprises:

7. An apparatus for recommending multimedia contents, the apparatus comprising:

an obtaining module, configured to obtain a recommendation request;

the determining module is used for responding to the recommendation request and determining candidate multimedia contents of which the comprehensive attraction meets a preset attraction condition from a multimedia content recommendation pool;

the comprehensive attraction degree of the multimedia content to be recommended is determined according to a title attraction degree and a picture attraction degree corresponding to the multimedia content to be recommended, the title attraction degree indicates the attraction degree of a text title pair of the multimedia content to be recommended, the title attraction degree is determined based on the semantic attraction degree and the literal attraction degree corresponding to the text title of the multimedia content to be recommended, the picture attraction degree indicates the attraction degree of at least one abstract picture of the multimedia content to be recommended to a user, and the picture attraction degree is determined based on the image-level attraction degree corresponding to a target abstract picture in the at least one abstract picture; the image-level allure is determined based on a plurality of classification probability fusions corresponding to the target summary picture;

the recommending module is used for recommending the multimedia content according to the candidate multimedia content;

wherein the determining module is specifically configured to:

determining a target abstract picture from the at least one abstract picture;

8. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps comprised by the method according to any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 1-6.