CN112131411A

CN112131411A - Multimedia resource recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN112131411A
Application number: CN202010994385.3A
Authority: CN
Inventors: 刘刚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2020-12-25

Abstract

The application relates to the technical field of internet, in particular to a multimedia resource recommendation method, a multimedia resource recommendation device, electronic equipment and a storage medium, which are used for improving the efficiency of recommending multimedia resources for a new user in a cold start process. The method comprises the steps of obtaining an avatar image of a target account of the multimedia resource to be recommended; determining category information corresponding to the head portrait image according to the content of the head portrait region in the acquired head portrait image; screening at least one target multimedia resource from the candidate multimedia resource pool according to the category information corresponding to the head portrait image; and recommending the screened at least one target multimedia resource to a target account. According to the embodiment of the application, the avatar image is used as a new dimension to reflect the characteristics of the target account, and the target multimedia resources which are interested by the user can be screened out more efficiently during recall.

Description

Multimedia resource recommendation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a multimedia resource recommendation method and apparatus, an electronic device, and a storage medium.

Background

In the age of media, users can produce, accumulate, share, and propagate content anytime and anywhere. The content of the self-media comprises articles, pictures, videos and the like, and the content has no established core and no fixed format and shows the individual habits and interests of the creators.

The self-media application can recommend multimedia content for the user so as to improve the browsing amount of the multimedia content. In addition, in the past, when recommending multimedia content to a user, a self-media application mainly obtains a behavior log of the user through big data analysis, obtains a historical behavior of the user through the behavior log of the user, and recommends the multimedia content for the user according to the historical behavior of the user. However, when the historical behavior of the user cannot be obtained, the multimedia content cannot be accurately recommended to the user.

Disclosure of Invention

The application provides a multimedia resource recommendation method, a multimedia resource recommendation device, electronic equipment and a storage medium, which are used for improving the accuracy of recommending contents for a user. The technical scheme of the application is as follows:

in a first aspect, an embodiment of the present application provides a multimedia resource recommendation method, including:

acquiring an avatar image of a target account of a multimedia resource to be recommended;

determining category information corresponding to the avatar image according to the content of the avatar area in the acquired avatar image;

screening at least one target multimedia resource from a candidate multimedia resource pool according to the category information corresponding to the head portrait image;

recommending the screened at least one target multimedia resource to the target account.

In a second aspect, an embodiment of the present application provides a multimedia resource recommendation device, including:

the acquisition module is used for acquiring an avatar image of a target account of the multimedia resource to be recommended;

the determining module is used for determining the category information corresponding to the head portrait image according to the obtained content of the head portrait region in the head portrait image;

the screening module is used for screening at least one target multimedia resource from the candidate multimedia resource pool according to the category information corresponding to the head portrait image;

and the recommending module is used for recommending the screened at least one target multimedia resource to the target account.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multimedia resource recommendation methods provided herein.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions for executing the multimedia resource recommendation method provided in the present application.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

according to the multimedia resource recommendation method provided by the embodiment of the application, at least one target multimedia resource recommended to a target account is screened from a candidate multimedia resource pool according to the category information corresponding to the avatar image by acquiring the avatar image of the target account of the multimedia resource to be recommended and determining the category information corresponding to the avatar image; in the multimedia resource recommendation process of the embodiment of the application, the avatar image of the target account is used as a new dimension to reflect the characteristics of the target account, so that the target multimedia resources which are interested by the user can be screened out more efficiently during recall, and the target multimedia resources can be recommended to the target account more accurately. In addition, aiming at the newly registered target account, the multimedia resources can be accurately recommended for the newly registered account according to the head portrait image of the target account, so that the multimedia resources can be accurately recommended for the new user in the cold starting process.

Drawings

FIG. 1 is a schematic diagram of an application scenario shown in accordance with an exemplary embodiment;

fig. 2 is a flowchart of a multimedia resource recommendation method according to an embodiment of the present application;

FIGS. 3a and 3b are diagrams of a multi-classification model structure provided in an embodiment of the present application;

FIG. 3c is an avatar image of a target account provided in an embodiment of the present application;

FIG. 3d is a multimedia resource presentation interface for providing recommendations for a target account according to an embodiment of the present application;

fig. 4 is a complete flowchart of a multimedia resource recommendation method according to an embodiment of the present application;

FIG. 5 is a block diagram of a multimedia resource recommendation system according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a multimedia resource recommendation device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Hereinafter, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.

(1) In the embodiment of the present application, the term "and/or" describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

(2) The term "electronic device" in the embodiments of the present application may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, etc.

(3) In the embodiment of the present application, the term "client", or referred to as a terminal device, refers to a device (e.g., an iOS, a mobile phone with an Android system, a tablet, etc.) corresponding to a server and providing a local service program for a user. Except for some locally-running applications (such as self-media applications that can receive information streams), the application is generally installed on a common client and needs to be operated together with a server. The server is a server program which is deployed on a plurality of groups of servers and is specially used for providing remote network services for application programs on the client.

(4) The term "server" in the embodiments of the present application is a server program deployed on multiple groups of servers and dedicated to providing remote network services for terminal applications.

(5) The term "content" in the embodiments of the present application may be multimedia resources such as articles, videos, and/or pictures recommended to a user from a media application. The article is usually actively edited and published after a public number is created from a media, and the article can contain pictures; videos are actively published by users of professional-generated Content (PGC) or User-generated Content (UGC). The PGC is an internet term, and is used to generally refer to content of personalization, view diversification, democratization and social relationship virtualization. Also known as PPC (professional-produced Content); UGC is an Internet term, and comprises professional production content and non-professional production content. The contents in the embodiments of the present application are also referred to as multimedia assets.

(6) The term "streams" in the embodiments of the present application, translates into message sources, feeds, information feeds, summaries, sources, news feeds, web feeds (web feeds, news feeds, synchronized feeds) which are a format of information through which a web site propagates up-to-date information to users, usually arranged in a Timeline format, which is the most intuitive and basic presentation form of feeds. A prerequisite for a user to be able to subscribe to a website is that the website provides a source of messages. The confluence of feeds is called aggregation, and the software used for aggregation is called aggregator. Aggregators are software dedicated to subscribing web sites for end users, and are also commonly referred to as RSS readers, feed readers, news readers, etc.

(7) In the embodiment of the present application, the term "User Profile" labels User information, and abstracts a complete picture of the User information by collecting and analyzing main data of social attributes, living habits, consumption behaviors, and the like of a User (also referred to as a content consumer). The user portrait is also called a user label, and is a cognitive expression of the user obtained based on user behavior analysis, and can be used as a basis for subsequent data processing. The user label is consistent with the mode that people know the world, and the conceptual cognition is the label, which simplifies the cognition of things in a conceptualized mode. The user representation includes user tags that are awareness of individuals and user perspectives that are a global tag distribution. In the embodiment of the application, the user portrait mainly refers to a user label.

(8) In the embodiment of the present application, the term "Convolutional Neural Networks (CNN)" is a type of feed-forward Neural Networks (feed-forward Neural Networks) that includes convolution calculation and has a deep structure, is one of the representative algorithms for deep learning, has a feature learning (rendering) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to its hierarchical structure.

(9) The term "ImageNet" in the embodiments of the present application is a large visual database for visual object recognition software research. More than 1400 million image URLs were manually annotated by ImageNet to indicate objects in the picture, and in at least 100 million images, a bounding box was provided.

In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the self-media age, many users can perform network social activities by publishing multimedia resources such as pictures, texts, videos and the like on self-media applications. After the user triggers the page display request, the multimedia resources are screened by the media application program and then recommended to the user in an information flow mode.

The following introduces a recommendation process of multimedia resources (content) by taking short videos as an example, and mainly includes three processes of content uploading, content auditing and content recommendation.

First, a user shoots a short video using a client and uploads the short video. In the uploading process, the short video is transcoded again to obtain a standard video file, and the meta-information of the short video, such as the uploading time of the short video, the size of the short video, the format of the short video and the like, is stored, so that the compatibility of the short video played on each platform is improved.

Then, the uploaded short videos are audited manually, meanwhile, the server processes the short videos, such as classification and labeling, and conducts cover page image interception and video quality scoring on the short videos to obtain the screened short videos.

And finally, recommending the short video for the user by adopting a recommendation algorithm. The recommendation algorithm includes a Collaborative Filtering (CF), a matrix Decomposition (DF), a Factorization (FM), a supervised learning algorithm, a Gradient Boosting Decision Tree (GTDB), and the like.

It should be noted that the content such as the graphics and text also needs to be checked manually, for example, deleting wrongly written characters in the title, adjusting discordant sentences, and the like, and the graphics and text are classified and the user image is obtained by Processing such as label extraction through Natural Language Processing (NLP) technology.

The user portrait as a key point affects the effect of content recommendation. Currently, user images are mainly extracted through user interaction behaviors (such as clicking, praise, comment and the like) with contents. The user image is deposited on the label of the content, and comprises static information and dynamic information. Wherein, the static information can be provided when the user registers for the first time, such as gender, age, frequent location, native place, height, academic calendar, marriage status, education level, asset condition, income condition, occupation and other population attribute information and social attributes; dynamic information can be mined from user behaviors, including obtaining interests of user photography, sports, gourmet, beauty, clothes, tourism, education and the like through content logs or third-party data, and conscious awareness including user psychology, motivation, value view, life attitude, personality and the like. However, for the cold start process of the new user, the user image information is less, the content of interest of the new user cannot be accurately recommended to the new user, and a long time is required to be spent for gathering the content of interest of the user, so that the retention rate of the new user is low.

For a new user, an image is typically selected or authorized for use as an avatar for the account when initially registering the account. The head portrait images relate to a wide variety of fields and types, such as cartoon head portraits, game head portraits, landscape head portraits, self-portrait head portraits, family group pictures, lovers 'pictures, children's head portraits, pet head portraits, industry head portraits, system default head portraits, and the like, and the frequency of head portraits changed by users is also different. The type of the used head portrait images and the replacement frequency of the head portrait images show the information of the account user such as the character, interest, human setting and the like, for example, the user using beautiful images such as scenery, favorite and the like as the head portrait may be more friendly, and the user using cool tone images such as black background, special symbols and the like as the head portrait is more independent, unique in thinking and conservative in thought; the mind of the user taking the whole family happiness as the head portrait is mature, the age is basically over 40 years old, but the mind of the user taking the expression bag, the funny image, the personal image and the like as the head portrait is younger, and the age is generally between 15 and 25 years old; the thinking way and interpersonal relationship of the user using the body image as the head portrait can be simple and pure, the pursuit efficiency of doing things is pursued, and the thinking way and interpersonal relationship of the user using the abstract image as the head portrait are wider and more complicated. Accordingly, the avatar image of the target account may be utilized to enrich the user representation of the target account to better assist in locating user status and user needs from the media application.

Based on the above analysis, embodiments of the present application provide a multimedia resource recommendation method and apparatus, an electronic device, and a storage medium. The method comprises the steps of determining category information corresponding to an avatar image by obtaining the avatar image of a target account, and recommending interested target multimedia resources for the target account according to the corresponding category information. The multimedia resource recommendation method fully utilizes the head portrait image of the target account, determines the category of the head portrait image according to the content of the head portrait region in the head portrait image, and recommends the multimedia resource for the target account according to the category of the head portrait image; in the multimedia resource recommendation process, the head portrait image of the target account is used as a new user portrait of the target account with abundant dimensionality, so that the accuracy of recommending the multimedia resources for the target account is improved. Aiming at a newly registered target account, because the behavior characteristics of the target account cannot be acquired, the multimedia resource recommendation method provided by the embodiment of the application provides a new dimension of the avatar image to reflect the user characteristics, can accurately recommend multimedia resources for the newly registered account according to the avatar image of the target account, and can recommend more interesting multimedia resources for the new registration, thereby improving the user experience.

After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Fig. 1 is an application scenario of a multimedia resource recommendation method according to an embodiment of the present application.

As shown in fig. 1, the application scenario may include at least one server 20 and a plurality of terminal devices 30. Wherein, the terminal equipment (30_ 1-30 _ N) is provided with a self-media application. Terminal device 30 may be any suitable electronic device that may be used for network access including, but not limited to, a computer, laptop, smartphone, tablet, or other type of terminal. The server 20 is any server capable of providing information required for an interactive service through a network. The terminal device 30 can perform content transmission and reception with the server 20 via the network 40. The server 20 can acquire contents, such as statistics, contents index, and the like, required by the terminal device 30 by accessing the database 50. Terminal devices (e.g., 30_1 and 30_2 or 30_ N) may also communicate with each other via network 40. Network 40 may be a network for information transfer in a broad sense and may include one or more communication networks such as a wireless communication network, the internet, a private network, a local area network, a metropolitan area network, a wide area network, or a cellular data network, among others.

In the following description, only a single server or terminal device is described in detail, but it should be understood by those skilled in the art that the single server 20, terminal device 30 and database 50 shown are intended to represent that the technical solution of the present application relates to the operation of the terminal device, server and database. The detailed description of a single terminal device and a single server and database is for convenience of description at least and does not imply limitations on the type or location of terminal devices and servers. It should be noted that the underlying concepts of the exemplary embodiments of the present invention are not altered if additional modules are added or removed from the illustrated environments. In addition, although a bidirectional arrow from the database 50 to the server 20 is shown in the figure for convenience of explanation, it will be understood by those skilled in the art that the above-described transmission and reception may be realized through the network 40.

It should be noted that the self-media application of the embodiment of the present application may be a client installed in a terminal device, or the self-media application may also be a sub-application in the client (for example, an applet in a certain client).

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. The embodiment of the application also relates to artificial intelligence and machine learning, and the artificial intelligence and machine learning are used for determining the category information corresponding to the avatar image, screening target multimedia resources from the multimedia resource pool and sequencing the screened target multimedia resources.

The technical solution of the embodiment of the present application is further described below. It should be noted that the technical solutions described below are only exemplary.

Fig. 2 is a flowchart of a multimedia resource recommendation method provided in an embodiment of the present application, where the specific implementation flow of the method is as follows:

step S201, acquiring an avatar image of a target account of a multimedia resource to be recommended;

step S202, determining the category information corresponding to the head portrait image according to the content of the head portrait area in the acquired head portrait image;

s203, screening at least one target multimedia resource from the candidate multimedia resource pool according to the category information corresponding to the head portrait image;

and step S204, recommending the screened at least one target multimedia resource to a target account.

It should be noted that the multimedia resources in the embodiment of the present application include image-text content and video content, where the number of words of the image-text content is generally within 1000 words, and the image-text content is suitable for fast reading and consumption in the mobile era, the video content includes short video and long video, and the playing duration of the short video can be controlled within 5 minutes.

It should be noted that the category information in the embodiment of the present application includes a quadratic element avatar, a landscape avatar, a self-portrait avatar, a whole body avatar, a family group avatar, a lovers group avatar, a child avatar, a pet avatar, a star photo, an industry-related avatar, a system default avatar, and others; other types of avatars can include avatars which do not belong to the secondary avatar category, scenery avatar … system default avatar category, and user operation behaviors on the avatar image, such as frequent replacement, basic non-replacement, occasional replacement, etc., wherein the replacement frequency can be defined according to actual business scenarios.

According to the multimedia resource recommendation method provided by the embodiment of the application, after the avatar image of the target account is acquired, the image characteristics used for expressing the content of the avatar area are extracted from the avatar image; and performing fusion processing on the extracted image features, and determining the category information corresponding to the head portrait image.

In an alternative embodiment, the class information corresponding to the avatar image may be determined by a trained multi-classification model, wherein the multi-classification model includes an input layer, at least one convolution layer, and at least one full-link layer.

In specific implementation, an input layer of a multi-classification model preprocesses an acquired avatar image of a target account, and based on at least one convolution layer of a trained multi-classification model, shallow image features and deep image features of the avatar image are extracted from the preprocessed avatar image of the target account; and performing fusion processing on the extracted shallow image features and deep image features through at least one full-connection layer of the trained multi-classification model to obtain classification results for representing the probabilities that the head portrait images of the target account respectively belong to each preset category, outputting the classification results after normalization processing, and determining the preset category corresponding to the maximum probability value as the category information corresponding to the head portrait images of the target account according to the classification results after the normalization processing.

In an alternative embodiment, the multi-classification model may be a Convolutional Neural Network (CNN). Fig. 3a and fig. 3b are structural diagrams of a multi-classification model provided in an embodiment of the present application. Taking the multi-classification model structure shown in fig. 3a and 3b as an example, a process of determining category information corresponding to an avatar image is described.

As shown in fig. 3a, the CNN includes an input layer, at least one convolutional layer, at least one sampling layer, and at least one fully-connected layer. The convolutional layer and the sampling layer are cascaded in multiple stages, called a hidden layer. The captured image of the avatar of the target account is recorded as image X, as shown in fig. 3c, the input layer of CNN normalizes image X to an image of 32X 3 size, obtaining images of 28 × 64 size through the first convolution layer of the CNN, and marking the image features extracted from the images of 28 × 64 size as shallow image features, wherein the convolution kernel size is 5 x 5, the filter count is 64, the first sample level of CNN is reduced in dimension using 2 x 2 convolution kernels, resulting in an image size of 14 x 64, and obtaining an image with the size of 10X 64 through the second convolution layer of the CNN, marking the image features extracted from the image with the size of 10X 64 as deep image features, and fusing the shallow image features and the deep image features through the full-connected layer of the net CNN to obtain and output a vector [0,1,0 … 0] of the image X.

As shown in fig. 3b, the input layer includes 5 neurons, the hidden layer includes 4 neurons, the output layer includes 3 neurons, the input layer is fully connected with the hidden layer, the hidden layer and the output layer are fully connected, W1, b1, W2 and b2 represent weight matrices. The output layer then uses the softmax function as the activation function. The softmax function is generally used at the output end of the neural network as an activation function of the multi-classification task, and has the function of normalizing the output components corresponding to each class to make the sum of the components be 1.

The calculation formula of the softmax function is as follows:

wherein x is_iRepresenting the ith input vector, n being the total number of classes, s (x)_i) Probability values for input vectors belonging to the ith category, corresponding to P0, P1, P2, s (x) in FIG. 3b_i) Is in the range of 0 to 1. 1 and 0 in FIG. 3b indicate whether or not they are corresponding categories.

Specifically, output vector [0,1,0 … 0] of image X output by the output layer (the last fully-connected layer in fig. 3 a) is normalized by using a softmax function, and the probability corresponding to the first component in output vector [0,1,0 … 0] is P0, the probability corresponding to the second component is P1, the probability corresponding to the third component is P2 …, the probability sum corresponding to each component is 1, and P1 is the largest, and the category label "quadratic head image" corresponding to P1 is determined as the category information of image X.

It should be noted that fig. 3a and 3b are only schematic diagrams of the network structure of CNN, and the size of the input image, the size of the convolution kernel, and the number of neurons are not limited.

In an alternative embodiment, the multi-classification model may be trained by:

the input layer of the multi-classification model acquires a plurality of head portrait training samples from a head database, and the acquired training samples are subjected to standardization processing so as to improve the convergence speed of the multi-classification model and reduce the training time;

the head portrait training system comprises a head portrait database, a head portrait training database and a head portrait database, wherein the head portrait database comprises a large number of head portrait training samples, and corresponding category labels need to be labeled for the obtained head portrait training samples in advance; specifically, the obtained multiple head portrait training samples can be divided into 12 category labels, including quadratic element head portraits, landscape head portraits, self-photographing head portraits, whole body head portraits, family group head portraits, lovers group head portraits, children head portraits, pet head portraits, star photos, industry-related head portraits, system default head portraits and others; wherein, other can include user behavior, such as frequent replacement, basic non-replacement, occasional replacement, etc., and the replacement frequency can be defined according to actual service scenario;

in specific implementation, category labels can be labeled for part of the obtained multiple avatar training samples, and the obtained multiple avatar training samples are adjusted according to the sample proportion corresponding to each category label in the labeled samples, so that sample balance is ensured, and better learning of a multi-classification model is facilitated; for example, the number of the acquired avatar training samples is 20 ten thousand, 5000 avatar training samples containing all categories are selected for category label labeling, the proportion of the avatar training sample corresponding to each category label is counted, the number of the avatar training samples corresponding to each category label in the 20 ten thousand avatar training samples is adjusted according to the counted proportion, and the category labels are labeled for all the avatar training samples;

it should be noted that, when the categories of the avatar training samples are labeled, classification labeling with finer granularity can be performed on the basis of the categories, for example, the quadratic avatar is used as a first-level classification, and the quadratic avatar can be further divided into game characters, cartoon characters and the like according to the character images; the industry-related head portraits are classified as a first-level classification, and can be further divided into finance, catering, sports, dancing and the like according to the industry; the pet head portrait is classified as a first-level classification, and can be further divided into cats, dogs, alpacas and the like according to animal attributes.

Inputting a plurality of avatar training samples and pre-labeled class labels corresponding to each avatar training sample into an initial multi-classification model; performing fusion processing on the image characteristics of the avatar training samples through an initial multi-classification model to obtain a prediction classification result corresponding to each avatar training sample and used for representing the probability that the avatar training samples belong to each preset class respectively;

in specific implementation, a parameter weight of a network pre-trained by ImageNet can be used as an initial weight of an initial multi-classification model, normalization processing is performed on each head portrait training sample, multi-scale head portrait training samples are obtained through transformation (such as sample enhancement technologies of filter, rotation, pull-up and the like), shallow image features and deep image features of the multi-scale head portrait training samples are extracted, and the extracted shallow image features and deep image features are fused to obtain a prediction classification result for representing the probability that the head portrait training samples belong to each preset category respectively; the pre-trained network can be Inception V4, which is a pre-trained model based on ImageNet data, and GoogleNet (also Inception V1), which is used as a mountain-opening rhinobyon of the Inception network, uses convolution of 1x1 to reduce the dimension and simultaneously perform convolution and re-aggregation on multiple scales, and can extract more features under the same calculation amount, thereby improving the training result;

determining a classification loss value according to a prediction classification result corresponding to each head portrait training sample and a pre-labeled class label;

in specific implementation, for each avatar training sample, determining a class label corresponding to the maximum probability in the prediction classification result as the class of the avatar training sample, and determining a classification loss value according to the determined class and the pre-labeled class label of the avatar training sample;

and adjusting model parameters of the initial multi-classification network according to the classification loss value until the determined classification loss value is within a preset range, so as to obtain the trained multi-classification model.

In the multimedia resource recommendation method provided by the embodiment of the application, the avatar image of each category and the corresponding behavior of the avatar image can reflect the characters, emotion and other characteristics of the user, and the avatar image can be used for enriching the portrait of the user. For example, users who use cartoon characters or personalized pictures as avatars are more outsider, and for example, users who use characters in a latest movie as avatars like movies. In some optional embodiments, the correspondence between the category information of the avatar image and the user avatar may be established in advance.

For example, in the embodiment of the present application, the avatar image is divided into 12 categories, and the user portrait corresponding to each category is as follows:

a quadratic element head portrait: the quadratic element avatar is different from the default avatar of the system, the default avatar of the system is usually an animation avatar, the default avatar of the system has no quadratic element culture, and the quadratic element avatar is a typical animation type character with a story plot or animation elements; in self-media applications, a large part of the user avatars belong to the two-dimensional animation category, and in the case of japanese animations, the avatars are character avatars in animations or games. Users who use a quadratic avatar, regardless of the character selected by the user, generally have strong imagination or entertainment capabilities, are younger in age, and prefer to read or consume related cartoon products. According to the method, the information of the sex, the age, the occupation and the like of the user can be analyzed to be used as a recommendation basis of the cartoon content in the cold starting process according to different cartoon IP characters and different times, and the folding time of the content interested by the user in the cold starting process is shortened.

Default head portrait of system: users who use the default avatar of the system, for example, users who use their own avatar in their own media applications, may not use the media applications frequently, may not be the target account, and the users generally have the characteristics of being steady, random, high in age, low in product cognition degree, etc., and the users are new users with a high probability.

Landscape head portrait: the characters of the users corresponding to the head portraits tend to be steady and steady, and the scenic scenes can be classified into various types. The volcano or waterfall is fierce, and the lake surface or the mountain is stable, the volcano or the waterfall and the lake surface or the mountain represent different user psychology, but the second user is taken as the main user, and the states of the peace and the living Buddha of the user are revealed; it is worth mentioning that the scenery avatars can be further classified into two levels according to the scenery as the primary avatars.

Taking a head portrait from a big head: the users in the category generally have confidence in the face value, and the larger the face proportion is, the more natural head portrait indicates that the users are more confident, the more self-awareness is, the more independent and self-conscious, the less conscious of evaluation of others, and the less susceptible of the others; it is worth mentioning that a face proportion detection can be added on the basis of a self-portrait photo, thereby providing more accurate user portrait information.

Taking a head picture of the whole body: the users in the category are different from the users who use the self-photographing head portrait, and the psychological induction analysis of the users in the category is as follows: the shoes have the advantages of self-confidence, strong self-respect, strong wearing match, strong fashion sense, hope of communication or being appreciated by others; it is worth to say that the whole-body portrait can be used as a primary portrait to further perform more detailed secondary classification according to clothes, and more user portrait information is mined.

Family group photo head portrait: the family feeling and responsibility feeling of the users in the category are strong, the users in the category are good at the heart of forever and young, the users in the category have positive energy, and serious subject matter contents can be recommended in the cold starting process.

Lovers' head portrait: most users in the category are in a love period or a new wedding period, and the users in the category dare to bear responsibility, are loved by lovers, have strong female or male protection safety feelings, dare to pay and dare to be proved by love; some emotional story content may be recommended during the cold start.

Head portrait of the child: users in this category are generally parents or more childish, having a pure and natural heart and liking a carefree life.

Head portrait of pet: users of the category like small animals, have love, are not impatient to trivia in life, and are sanitary; it should be noted that the pet avatar may be further classified into more detailed secondary categories according to pet type, such as cat, dog, etc., as a primary avatar, and content related to a specific pet may be recommended during cold start.

And (3) star photo: a user who uses a star or a great photo as an avatar usually compares worship and admiration to the character and accepts the behavior, value, concept, and the like of the character. The personality of the category of users may be inferred from the personality traits in the reverse direction; it is worth to say that the star photo can be used as a primary head portrait to be further classified in a more detailed secondary mode according to people, and more user portrait information can be mined.

Industry related head portrait: the users of the category are generally strong in career, dare to rush and are quite cool and quiet when being in distress; IT is worth to be noted that the industry-related avatar can be further classified into more detailed secondary categories according to industries, such as finance, IT and the like, as a primary avatar, and contents related to specific industries can be recommended in the cold start process.

And others: the head portraits not belonging to the category 11 are classified into other head portraits, which may include head portraits behaviors, such as frequently changing head portraits, occasionally changing head portraits, not basically changing head portraits, etc., generally, the more psychological age, the more reading, the more responsibility of the user are more mature and more serious, the frequency of head portraits changing is smaller, and vice versa; taking the example of frequently changing the head portrait, the user of this category is in a state of being unsmooth in the recent mind for the following two reasons: a. the user self character problems, such as immature and variable self characters; b. the state of the user changes, for example, the user is in the key change periods of high school, employment, love and the like, and the emotion is unstable; the current state of the user can be positioned with other information in an auxiliary way; the frequent replacement frequency may be defined by a week or a month, specifically by an actual service scenario, such a user is more active in the network, and a user with a lower replacement frequency may be considered to be more stable and older.

After the category information corresponding to the head portrait image of the target account is determined through the multi-classification model, the user portrait corresponding to the category information is determined according to the corresponding relation between the category and the user portrait; and screening out at least one target multimedia resource from the candidate multimedia resource pool according to the user portrait corresponding to the category information.

An alternative embodiment is: extracting keywords in the user portrait to obtain a portrait label of a target account; determining similarity between the portrait label of the target account and resource labels of all candidate multimedia resources in the candidate multimedia resource pool; and screening at least one target multimedia resource from the candidate multimedia resource pool according to the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool.

In a specific implementation, a Natural Language Processing (NLP) algorithm may be used to extract keywords from user images corresponding to category information to obtain one or more image tags; for example, keywords extracted from the user portrait corresponding to the type of the head portrait of the self-portrait are "self-confident", "independent", and "self", and portrait tags of "self-confident", "independent", and "self" are obtained; the NLP includes a Term Frequency-Inverse Document Frequency algorithm (TF-IDF), a Document theme generation model (LDA), a TextRank algorithm, and the like, and the embodiments of the present application do not make a restrictive requirement on the algorithm for extracting keywords;

before the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool is determined, the resource label of each candidate multimedia resource in the candidate multimedia resource pool can be extracted in advance, the resource label can be a keyword extracted from titles, comments and description contents of the multimedia resources, and the algorithm for extracting the keyword is consistent with the algorithm for extracting the keyword from the portrait of the user and is not repeated here;

if the similarity between the portrait label of the target account and the resource label of the candidate multimedia resource is higher than a set threshold value, taking the candidate multimedia resource as the target multimedia resource; the method for determining the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool is not limited, and a collaborative filtering algorithm, a matrix decomposition algorithm, a factorization algorithm, a supervised learning algorithm, a gradient boosting iterative decision tree algorithm and the like can be used.

It should be noted that, in other embodiments of the present application, each candidate multimedia resource in the candidate multimedia resource pool may be classified to obtain a category label of the candidate multimedia resource, a similarity between the portrait label of the target account and the category label of the candidate multimedia resource is determined, a resource label is extracted from the candidate multimedia resource corresponding to the category label of the candidate multimedia resource whose similarity is higher than a set threshold, a similarity between the portrait label of the target account and the resource label of the candidate multimedia resource corresponding to the category label of the candidate multimedia resource whose similarity is higher than the set threshold is determined, and at least one target multimedia resource is screened from the candidate multimedia resource pool according to the similarity. Compared with the method for determining the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool, the method is small in calculation amount and improves recommendation efficiency.

In the multimedia resource recommendation method provided by the embodiment of the application, after at least one target multimedia resource for recommending to a target account is screened out, the at least one target multimedia resource can be sequenced, and the target multimedia resource with higher user interest degree is preferentially recommended to a user.

According to the attribute information of each target multimedia resource and the account characteristics of the target account, determining a sorting parameter corresponding to each target multimedia resource and used for predicting the probability of the target account clicking to view the target multimedia resource; according to the sequencing parameters corresponding to the target multimedia resources, at least one target multimedia resource is recommended to a target account after being sequenced, so that the retention time of the target account is prolonged;

the probability of clicking and checking the target multimedia resources by the target account reflects the interest degree of the target account in the multimedia resources, and can be used as a sorting parameter of each target multimedia resource, wherein the higher the probability value is, the more interested the user in the target multimedia resources is represented, the longer the time retained by the user is, the smaller the probability value is, the less interested the user in the target multimedia resources is represented, and the shorter the time retained by the user is;

in specific implementation, the attribute information of the target multimedia resource includes consumption duration, click times, comment number, praise number and the like of the target multimedia resource, and the account characteristics of the target account include a network environment (such as 3G, 4G, WIFI and the like) where the target account is located, a motion state (such as yoga, jogging, public transportation and the like) of the target account, internet surfing time (working day, holiday, morning, evening and the like) of the target account, a geographic location of the target account and the like; according to the attribute information of each target multimedia resource and the account characteristics of the target account, determining a sorting parameter corresponding to each target multimedia resource and used for predicting the probability of the target account clicking to view the target multimedia resource, and sorting at least one target multimedia resource according to the sorting parameter and recommending the target multimedia resource to the target account;

for example, the current morning time of the target account is 8 am in working day, the target account may be in the way of working in the current time period, the mood is urgent, the target multimedia resource with the consumption duration less than 5 minutes can arouse the interest of the target account compared with the target multimedia resource with the consumption duration more than 10 minutes, the probability that the target account clicks the target multimedia resource with the consumption duration less than 5 minutes is greater than that of clicking the target multimedia resource with the consumption duration more than 10 minutes, and the target multimedia resource with the consumption duration less than 5 minutes is preferentially recommended to the target account;

for another example, when a user sits on a subway, the network environments generally used by the user are 3G and 4G, the network signals are poor, a large amount of data traffic needs to be consumed for consuming the video type target multimedia resources relative to the image-text type target multimedia resources, the quality requirement on the network signals is high, the probability that the image-text type target multimedia resources are clicked and viewed by the target account is high, and the image-text type target multimedia resources can be preferentially recommended to the target account.

Fig. 3d is a target multimedia resource interface diagram recommended for a target account shown in the user interface provided in the embodiment of the present application. As shown in fig. 3d, the multimedia resource interface diagram recommended by the target account with the avatar image as the diagram X is shown, the first multimedia resource is image-text content, the number of prawns is 10 ten thousand, the second multimedia resource is short video content, the consumption time is 2 minutes and 11 seconds, and the number of browsed people is 1879.

Fig. 4 is a complete flowchart of a multimedia resource recommendation method provided in an embodiment of the present application, where the method includes the following specific implementation flows:

in step S401, acquiring an avatar image of a target account of a multimedia resource to be recommended;

in step S402, the acquired avatar image is input into the CNN model, and category information corresponding to the avatar image is determined;

in step S403, a user figure corresponding to the category information is determined based on the category information corresponding to the avatar image;

in step S404, extracting keywords in the user portrait to obtain portrait label of the target account;

in step S405, at least one target multimedia resource is screened out from the candidate multimedia resource pool according to the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool;

in step S406, determining a ranking parameter corresponding to each target multimedia resource and used for predicting the probability of the target account clicking to view the target multimedia resource according to the attribute information of each target multimedia resource and the account characteristics of the target account;

in step S407, at least one target multimedia resource is ranked and then recommended to a target account according to the ranking parameter corresponding to each target multimedia resource.

In implementation, the embodiment of the application makes full use of the head portrait image of the target account to enrich the user portrait of the target account, so that on one hand, the characteristics of individuality, favorite number and the like of the target account can be better known, and the accuracy of recommending contents for a new user in the cold start process is improved; on the other hand, the richer the user portrait, the higher the efficiency of gathering the interesting content of the new user is, and further the new user experience is improved; after the user portrait of the target account is obtained, portrait labels of the target account are obtained from the user portrait, at least one target multimedia resource is screened out according to the similarity between the portrait labels and the resource labels of all candidate multimedia resources in the candidate multimedia resource pool and then recommended to the target account, the probability that the user clicks to check and screen out the at least one target multimedia resource is increased, and the retention rate of a new user is improved.

Fig. 5 is a framework diagram of a multimedia resource recommendation system according to an embodiment of the present application, including: the system comprises a content production end, a content consumption end, head portrait image crawling data, an uplink and downlink content interface server, a content distribution outlet service, a content database, a scheduling center service, a manual auditing system, a duplicate elimination service, a statistical reporting interface server, a statistical database, a head portrait classification and mining model, a head portrait mining model and service, a head database, a recommendation recall system, a recommendation sorting service and a user portrait service. The functions of the various modules in the application system are described below:

1. content producing end and content consuming end

The PGC, UGC or PUGC is an MCN content producer, and provides locally edited or web publishing system-provided image-text content or video content through a mobile terminal or a backend interface API system, wherein the video content comprises short videos and small videos and is a main content source for recommending and distributing content;

the content production end obtains interface addresses of the uplink and downlink content interface servers through communication with the uplink and downlink content interface servers, image-text content or video content is uploaded through the interface addresses, the image-text content source is usually a lightweight publishing end and an edited content inlet, the video content publishing is usually image acquisition equipment, and local video content can be selected to match music, a filter template and the beautifying function of video and the like in the shooting process.

And the content consumption end communicates with the uplink and downlink content interface server to acquire index information of the recommended content, and the index information is displayed in a Feeds stream mode. When the content consumption end sends a specific image-text content or video content request message, the content consumption end communicates with the content distribution export service to acquire the image-text content or video content corresponding to the index information.

In addition, the content production end and the content consumption end report the behavior data played by the user in the uploading and downloading processes to the statistical reporting interface server for statistical analysis, such as pause, loading time, playing click and the like.

In the process that a user logs in by using a product, for example, the user logs in by using a self-media application account, an authorized account ID and an avatar image are crawled and stored in an avatar database; and the head portrait images set by the user can be actively crawled as marked samples, and the head portrait image training samples acquired offline are used as the input of the head portrait classification and mining model.

2. Uplink and downlink content interface server and content distribution export service

The uplink and downlink content interface server is directly communicated with the content production end, stores the meta information of the content submitted by the content production end into the content database, the meta information of the content generally comprises the title, the publisher, the abstract, the cover picture, the publishing time, the file size and the like of the content, and stores the source file of the content into the content database.

The uplink and downlink content interface server reports the message sending flow information of each account to the statistical reporting interface server, wherein the message sending flow information comprises message sending time and content types, and simultaneously, content marking information provided by each account, such as classification, labels and selected cover pictures, is stored in a content database.

In addition, the contents submitted by the content production end are synchronized to the dispatching center service through the uplink and downlink content interface server, and the dispatching center service processes and transfers the contents.

And the content distribution export service is communicated with the recommendation sequencing service, acquires the recommended and sequenced content and sends the acquired content to the content consumption end in a feeds form, and the content distribution export service is usually a group of access services which are deployed with user accessories nearby in regions.

3. Content database

The content database stores the meta information of the content generated by the content production end, wherein the meta information generally comprises the title, the publisher, the abstract, the cover map, the publishing time, the file size, the cover map link, the code rate, the file format, whether the original mark or the initial issue exists, the classification of the content in the manual checking process and the like. The content classification in the manual review process comprises a first-level classification, a second-level classification and a third-level classification and label information, for example, an article explaining the Huaqi as a mobile phone, the first-level classification is science and technology, the second-level classification is a smart phone, the third-level classification is a domestic mobile phone, and the label information is the Huaqi, mate 30; and reading the content meta information from the content database in the checking process, and returning the checking result and the state to the content database.

The content processing mainly comprises machine processing and manual auditing processing, and the content database is divided into different content pools according to different content marks. The recommendation sorting service, the deduplication service and the recommendation recall system all need to acquire contents from a content database, for example, the deduplication service loads contents which have been put into storage for a period of time (for example, a week) in the past according to business requirements, and the contents which are repeatedly put into storage again are added with filtering marks and are not provided for the content distribution export service to be displayed to users.

4. Dispatch center service

The scheduling center service is responsible for the whole scheduling process of content flow, controls the uplink and downlink content interface servers to receive the uploaded content and acquires the meta-information of the content from the content database; the scheduling and duplicate removal service marks and filters the repeatedly stored contents, and simultaneously synchronizes the duplicate removal stream information to the head portrait classification and mining model; in addition, for the content which can not be processed by the machine, such as political problems and safety problems, the dispatching center service dispatches the manual auditing system to carry out manual auditing processing.

5. Manual auditing system

The manual auditing system is a carrier of manual service capability, is mainly used for auditing and filtering contents which cannot be determined and judged by machines with political sensitivity, pornography, law impermissibility and the like, and labeling and secondarily confirming the contents, wherein the audited contents are issued by media application and acquired from a public network;

and writing the result of the manual review into the content database through the service of the dispatching center.

6. Weight-shedding service

And the duplication elimination service is in service communication with the dispatching center and is used for title duplication elimination, picture duplication elimination of the cover picture, content text duplication elimination, video fingerprint and audio fingerprint duplication elimination and the like. Text vectors and picture vectors can be deduplicated using simhash or BERT, video fingerprints and audio fingerprints can be extracted for video content to construct vectors, and then distances between the vectors (such as euclidean distances) are calculated to determine whether to repeat. The detailed duplicate removal method is not described in the embodiments of the present application.

7. Statistical reporting interface server

The statistical reporting interface server receives the current network environment of the user, the clicking operation behavior of the user on Feeds, the exposure data of the Feeds and the like reported by the content consumption end, and writes the reported data result into a statistical database;

in addition, the statistical reporting interface server receives the original stream of the account number text reported by the uplink and downlink content interface server.

8. Head portrait classification and mining model

Reading head portrait image data in a head database, and training according to the multi-classification model provided by the embodiment of the application.

9. Avatar mining model and service

Extracting and fusing characteristics of the head portrait images which are changed and uploaded by the target account, predicting the category information corresponding to the head portrait images, and taking the category corresponding to the maximum probability as the category of the head portrait images;

in addition, the avatar mining model and service communicates with the user portrait service, sending the results of avatar image classification to the user portrait service.

10. Head database

The head portrait database is communicated with the head portrait classification and mining model, the head portrait mining model and the service, provides sample data of the head portrait image and newly added head portrait image data, and stores head portrait image data crawled from the Internet.

11. Recommendation recall system

The recommendation recall system realizes recall of contents, and recall algorithms comprise various recall algorithms such as collaborative recall, classification, subject recall, user historical behavior recall, user long-term and short-term interest point recall and the like;

in addition, the recommendation recall system communicates with a user portrait service to recall candidate multimedia resources using a user portrait mining model and corresponding services.

12. Recommendation ranking service

The recommendation and ranking service comprises rough ranking and fine ranking, results of the recommendation and recall system are ranked, and ranking parameters of the probability of the target account clicking the target multimedia resources are predicted according to target multimedia resource attribute information (such as the click rate and the consumption duration of the target multimedia resources) and in combination with the account characteristics of the target account; and sequencing the target multimedia resources according to the sequencing parameters corresponding to the target multimedia resources and combining a rule strategy of a certain service, and recommending the target multimedia resources to a target account.

13. User profile service

The user portrait service saves a portrait mining model, a user portrait result mined by the service and user portrait static information mined by other channels;

in addition, the user representation service provides service for the recommendation recall system and is used for gathering content for the target account in the cold starting process.

It should be noted that the above application scenarios are only examples and are not to be construed as limiting the scope of the present application.

Based on the same inventive concept, the embodiment of the application also provides a multimedia resource recommendation device, and as the principle of solving the problems of the device is similar to that of the multimedia resource recommendation method, the implementation of the device can refer to the implementation of the method, and repeated parts are not described again.

As shown in fig. 6, a schematic structural diagram of a multimedia resource recommendation device provided in an embodiment of the present application includes:

an obtaining module 601, configured to obtain an avatar image of a target account of a multimedia resource to be recommended;

a determining module 602, configured to determine, according to the content of the avatar region in the acquired avatar image, category information corresponding to the avatar image;

the screening module 603 is configured to screen at least one target multimedia resource from the candidate multimedia resource pool according to the category information corresponding to the avatar image;

and a recommending module 604, configured to recommend the screened at least one target multimedia resource to a target account.

An optional implementation manner is that the determining module 602 is specifically configured to:

extracting image features for representing the content of the head portrait region from the head portrait image;

and performing fusion processing on the extracted image features, and determining the category information corresponding to the head portrait image.

extracting image features from the avatar image based on the trained convolutional layer of the multi-classification model;

performing fusion processing on the extracted image features through a full connection layer of the trained multi-classification model to obtain a classification result for representing the probability that the head portrait image belongs to each preset class respectively; and determining the category information corresponding to the head portrait image according to the classification result.

An alternative embodiment is that the multi-classification model is trained according to the following way:

acquiring a plurality of head portrait training samples from a head database;

An optional implementation manner is that the screening module 603 is specifically configured to:

determining a user portrait corresponding to the category information according to the category information corresponding to the head portrait image;

and screening out at least one target multimedia resource from the candidate multimedia resource pool according to the user portrait corresponding to the category information.

extracting keywords in the user portrait to obtain a portrait label of a target account;

determining similarity between the portrait label of the target account and resource labels of all candidate multimedia resources in the candidate multimedia resource pool;

and screening at least one target multimedia resource from the candidate multimedia resource pool according to the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool.

In an optional implementation manner, the recommending module 604 is specifically configured to:

determining a sorting parameter corresponding to each target multimedia resource and used for predicting the probability of the target account clicking to view the target multimedia resource according to the attribute information of each target multimedia resource and the account characteristics of the target account;

and sequencing at least one target multimedia resource according to the sequencing parameters corresponding to the target multimedia resources, and recommending the target multimedia resource to a target account.

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

In addition, the embodiment of the application also provides the electronic device 700. As shown in fig. 7, includes: at least one processor 701; and a memory 702 communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory 702 stores instructions executable by the at least one processor 701 to cause the at least one processor 701 to perform the method for multimedia resource recommendation described above.

Having described the multimedia recommendation method and apparatus of an exemplary embodiment of the present application, a computing apparatus according to another exemplary embodiment of the present application is next described.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible embodiments, a computing device according to the present application may include at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps of the test data processing method according to various exemplary embodiments of the present application described above in the present specification. For example, the processing unit may execute the flow of the multimedia resource recommendation method as shown in fig. 2 or fig. 4.

A computing device 800 according to this embodiment of the present application is described below with reference to fig. 8. The computing device 800 shown in fig. 8 is only one example and should not bring any limitations to the functionality or scope of use of the embodiments of the present application.

As shown in fig. 8, computing apparatus 800 is embodied in the form of a general purpose computing device. Components of computing device 800 may include, but are not limited to: the at least one processing unit 801, the at least one memory unit 802, and a bus 803 that couples various system components including the memory unit 802 and the processing unit 801.

Bus 803 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The storage unit 802 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)821 and/or cache memory 822, and may further include Read Only Memory (ROM) 823.

The storage unit 802 may also include a program/utility 825 having a set (at least one) of program modules 824, such program modules 824 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The computing apparatus 800 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the computing apparatus 800, and/or with any devices (e.g., router, modem, etc.) that enable the computing apparatus 800 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interfaces 805. Moreover, the computing device 800 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 806. As shown, the network adapter 806 communicates with other modules for the computing device 800 over the bus 803. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computing device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Embodiments of the present application also provide a computer-readable medium having stored thereon a computer program, which when executed by a processor, performs the steps of the above-described method for processing a title text.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A multimedia resource recommendation method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining, according to the content of the acquired avatar image, the category information corresponding to the avatar image specifically includes:

extracting image features for representing the content of the avatar area from the avatar image;

and performing fusion processing on the extracted image features, and determining the category information corresponding to the avatar image.

3. The method according to claim 2, wherein the extracting of the image feature representing the content of the avatar region from the avatar image specifically includes:

extracting image features from the avatar image based on a convolutional layer of the trained multi-classification model;

the fusing the extracted image features to determine category information corresponding to the avatar image specifically includes:

4. The method of claim 3, wherein the multi-classification model is trained according to the following:

acquiring a plurality of head portrait training samples from a head database;

inputting the plurality of avatar training samples and the pre-labeled class labels corresponding to each avatar training sample into an initial multi-classification model; performing fusion processing on the image characteristics of the avatar training samples through the initial multi-classification model to obtain a prediction classification result corresponding to each avatar training sample and used for representing the probability that the avatar training samples belong to each preset class respectively;

and adjusting the model parameters of the initial multi-classification network according to the classification loss value until the determined classification loss value is within a preset range to obtain the trained multi-classification model.

5. The method of claim 1, wherein the screening of at least one target multimedia resource from the candidate multimedia resource pool according to the category information corresponding to the avatar image comprises:

determining a user portrait corresponding to the category information according to the category information corresponding to the avatar image;

6. The method of claim 5, wherein the screening of the at least one target multimedia asset from the pool of candidate multimedia assets based on the user representation corresponding to the category information comprises:

extracting key words in the user portrait to obtain a portrait label of the target account;

determining similarity between the portrait label of the target account and resource labels of each candidate multimedia resource in the candidate multimedia resource pool;

and screening out at least one target multimedia resource from the candidate multimedia resource pool according to the similarity between the portrait label of the target account and the resource label of each candidate multimedia resource in the candidate multimedia resource pool.

7. The method of claim 1, wherein the recommending the screened out at least one target multimedia resource to the target account comprises:

and according to the sequencing parameters corresponding to the target multimedia resources, sequencing the at least one target multimedia resource and recommending the target multimedia resource to the target account.

8. A multimedia resource recommendation apparatus, comprising:

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon computer-executable instructions for performing the method of any one of claims 1 to 7.