CN115114461A

CN115114461A - Method and apparatus for recommending multimedia data, and computer-readable storage medium

Info

Publication number: CN115114461A
Application number: CN202210422793.0A
Authority: CN
Inventors: 赵光耀; 何新昇; 赵忠; 傅妍玫; 梁瀚明; 马骊; 户维波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-09-27

Abstract

The embodiment of the application discloses a method and equipment for recommending multimedia data and a computer-readable storage medium, wherein the method comprises the following steps: and acquiring multimedia data characteristics, wherein the multimedia data characteristics comprise a first multimedia data characteristic and a second multimedia data characteristic. And performing feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommended features, and obtaining a plurality of weighted recommended features through a plurality of first gating networks. And acquiring a plurality of splicing vectors obtained by splicing each weighted recommendation feature in the plurality of weighted recommendation features with the second multimedia data feature, and acquiring a plurality of splicing features based on the second feature extraction network. And inputting the splicing vectors and the splicing characteristics of the splicing vectors into a service target prediction model for obtaining recommended media so as to obtain a plurality of target recommended media. By the method and the device, the selection effectiveness of the multimedia data to be recommended can be improved, the personalized recommendation experience of the multimedia data is enhanced, and the applicability is high.

Description

Method and apparatus for recommending multimedia data, and computer-readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for recommending multimedia data, and a computer-readable storage medium.

Background

As a product of the development of the internet and Artificial Intelligence (AI), personalized recommendation is a technology that is based on mass data mining and provides personalized information services and decision support to customers. In the practical application process, the personalized recommendation can be generally divided into the stages of recalling, coarse ranking, fine ranking and the like in terms of flow. The recall stage generally includes a plurality of models or policies, and can quickly screen out part (e.g., thousands) of media contents to be pushed from a large number (e.g., millions) of media contents to be recommended (e.g., media contents to be pushed) and provide the part to be pushed to the coarse ranking stage for uniform pre-ranking, wherein the coarse ranking includes re-screening out part (e.g., hundreds) of media contents to be pushed from the recalled part of media contents to be pushed and outputting the part to be pushed to the fine ranking stage for more precise ranking, and pushing the fine ranking result to the user. However, in the prior art, the coarse ranking stage generally filters media content to be pushed and then outputs the filtered media content to the fine ranking stage, and the filtered media content to be pushed has serious loss of original characteristic signals, so that the ability of the personalized recommendation model to learn differential representation is limited, and the personalized recommendation effect is poor and the applicability is poor.

Disclosure of Invention

The embodiment of the application provides a method and equipment for recommending multimedia data and a computer-readable storage medium, which can improve the selection effectiveness of the multimedia data to be recommended, enhance the personalized recommendation experience of the multimedia data and have high applicability.

In a first aspect, an embodiment of the present application provides a method for recommending multimedia data, where the method includes:

acquiring multimedia data characteristics corresponding to multimedia data, wherein the multimedia data characteristics comprise a first multimedia data characteristic and a second multimedia data characteristic, the first multimedia data characteristic is a data characteristic of a recommendation object and the second multimedia data characteristic is a multimedia data characteristic to be recommended, or the first multimedia data characteristic is a multimedia data characteristic to be recommended and the second multimedia data characteristic is a data characteristic of a recommendation object;

performing feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommended features corresponding to the first multimedia data features, and obtaining a plurality of weighted recommended features corresponding to the plurality of recommended features through a plurality of first gating networks, wherein one first feature extraction network is used for obtaining one recommended feature corresponding to the first multimedia data feature, and one first gating network is used for obtaining one weighted recommended feature corresponding to the plurality of recommended features;

acquiring a plurality of splicing vectors obtained by splicing each weighted recommendation feature in the plurality of weighted recommendation features with the second multimedia data feature, and acquiring the splicing features of each splicing vector based on a second feature extraction network corresponding to each first gating network for acquiring each weighted recommendation feature to obtain a plurality of splicing features;

and inputting the splicing vectors and the splicing characteristics of the splicing vectors into a business target prediction model for obtaining recommended media so as to obtain a plurality of target recommended media based on the business target prediction model.

In a possible implementation manner, the obtaining, by the plurality of first gating networks, a plurality of weighted recommendation features corresponding to the plurality of recommendation features includes:

obtaining feature merging weights corresponding to recommended features adopted by the weighted recommended features obtained by the first gating networks based on the first multimedia data features;

performing weighted summation on the plurality of recommended features through any one of the first gating networks based on feature merging weights corresponding to the recommended features adopted by the any one of the first gating networks to obtain one of the weighted recommended features obtained by the any one of the first gating networks;

obtaining each weighted recommendation feature obtained by each first gating network to obtain the plurality of weighted recommendation features;

wherein the first gating network comprises at least one of a gating network based on linear variation or a gating network based on normalized weighting.

In a possible implementation manner, after obtaining a plurality of splicing vectors obtained by splicing each of the weighted recommended features with the second multimedia data feature, the method further includes:

and acquiring cross features, and splicing the cross features with each splicing vector in the splicing vectors respectively to obtain a plurality of updated splicing vectors.

In a possible implementation manner, the service target prediction model includes a plurality of second gating networks and prediction networks corresponding to the second gating networks;

after the splicing vectors and the splicing characteristics of the splicing vectors are input into a business target prediction model for obtaining recommended media, the method further comprises the following steps:

obtaining feature merging weights corresponding to all the splicing features adopted by all the second gating networks to obtain target recommended features based on the plurality of splicing vectors, performing weighted summation on the plurality of splicing features through any one of the second gating networks based on the feature merging weights corresponding to all the splicing features adopted by any one of the second gating networks to obtain one target recommended feature obtained by any one of the second gating networks, and obtaining all the target recommended features obtained by all the second gating networks to obtain the plurality of target recommended features;

based on the prediction network corresponding to each second gate control network for obtaining each target recommendation characteristic, obtaining a service target prediction value corresponding to each target recommendation characteristic to obtain a plurality of service target prediction values, and determining a plurality of target recommendation media from the multimedia data based on the plurality of service target prediction values;

wherein the second gating network comprises at least one of a gating network based on linear variation or a gating network based on normalized weighting.

In a possible implementation manner, the first multimedia data feature is a data feature of a recommendation object and the second multimedia data feature is a multimedia data feature to be recommended;

the performing feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommended features corresponding to the first multimedia data features, and obtaining a plurality of weighted recommended features corresponding to the plurality of recommended features through a plurality of first gating networks includes:

performing feature extraction on the recommended object data features through a plurality of first feature extraction networks to obtain a plurality of recommended object features corresponding to the recommended object data features, and obtaining a plurality of weighted recommended object features corresponding to the plurality of recommended object features through a plurality of first gating networks as a plurality of weighted recommended features;

the obtaining of the plurality of splicing vectors obtained by splicing each weighted recommendation feature in the plurality of weighted recommendation features with the second multimedia data feature includes:

and acquiring a plurality of splicing vectors obtained by splicing each weighted recommendation object characteristic in the plurality of weighted recommendation object characteristics with the to-be-recommended multimedia data characteristic.

In a possible implementation manner, the first multimedia data feature is a multimedia data feature to be recommended and the second multimedia data feature is a data feature of a recommendation object;

performing feature extraction on the multimedia data features to be recommended through a plurality of first feature extraction networks to obtain a plurality of recommended multimedia features corresponding to the multimedia data features to be recommended, and obtaining a plurality of weighted recommended multimedia features corresponding to the recommended multimedia features through a plurality of first gating networks to serve as a plurality of weighted recommended features;

the obtaining of the plurality of splicing vectors obtained by splicing each of the weighted recommended features with the second multimedia data feature includes:

and acquiring a plurality of splicing vectors obtained by splicing each weighted recommended multimedia feature in the plurality of weighted recommended multimedia features with the recommended object data feature.

In a possible implementation manner, the acquiring the multimedia data feature corresponding to the multimedia data includes:

acquiring multimedia data, wherein the multimedia data comprises recommendation object data and multimedia data to be recommended, and the multimedia data to be recommended comprises at least one of image-text media data, audio data and video data;

and acquiring recommendation object data characteristics corresponding to the recommendation object data in the multimedia data through vectorization processing and embedded compression processing, and acquiring to-be-recommended multimedia data characteristics corresponding to the to-be-recommended multimedia data.

In a second aspect, an embodiment of the present application provides an apparatus for recommending multimedia data, where the apparatus includes:

the multimedia data processing system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring multimedia data characteristics corresponding to multimedia data, the multimedia data characteristics comprise a first multimedia data characteristic and a second multimedia data characteristic, the first multimedia data characteristic is a data characteristic of a recommended object and the second multimedia data characteristic is a data characteristic of multimedia data to be recommended, or the first multimedia data characteristic is a multimedia data characteristic to be recommended and the second multimedia data characteristic is a data characteristic of the recommended object;

a weighted recommendation feature generation module, configured to perform feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommendation features corresponding to the first multimedia data features, and obtain a plurality of weighted recommendation features corresponding to the plurality of recommendation features through a plurality of first gating networks, where one first feature extraction network is used to obtain one recommendation feature corresponding to the first multimedia data feature, and one first gating network is used to obtain one weighted recommendation feature corresponding to the plurality of recommendation features;

the splicing feature generation module is used for acquiring a plurality of splicing vectors obtained by splicing each weighted recommended feature in the plurality of weighted recommended features with the second multimedia data feature, and acquiring the splicing features of each splicing vector based on a second feature extraction network corresponding to each first gating network for acquiring each weighted recommended feature so as to acquire a plurality of splicing features;

and the target recommended media generation module is used for inputting the splicing vectors and the splicing characteristics of the splicing vectors into a business target prediction model for obtaining recommended media so as to obtain a plurality of target recommended media based on the business target prediction model.

In a possible implementation manner, the weighted recommended feature generation module is configured to:

In a possible implementation manner, after obtaining a plurality of splicing vectors obtained by splicing each of the weighted recommended features with the second multimedia data feature, the splicing feature generating module is configured to:

after the splicing vectors and the splicing characteristics of the splicing vectors are input into a service target prediction model for obtaining recommended media, the target recommended media generation module is configured to:

based on the prediction network corresponding to each second gating network for obtaining each target recommendation characteristic, obtaining a service target prediction value corresponding to each target recommendation characteristic to obtain a plurality of service target prediction values, and determining a plurality of target recommendation media from the multimedia data based on the plurality of service target prediction values;

In a possible implementation manner, the obtaining module is configured to:

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes: a processor, a memory, and a network interface;

the processor is connected to a memory and a network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method according to the first aspect of the embodiment of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and when the processor executes the program instructions, the method according to the first aspect of the present application is performed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of a multi-gated hybrid expert network architecture;

FIG. 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;

fig. 3 is a scene schematic diagram of a recommendation method for multimedia data according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a method for recommending multimedia data according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a personalized recommendation model provided in an embodiment of the present application;

fig. 6 is a schematic diagram illustrating comparison of experimental effects of a method for recommending multimedia data according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a multimedia data recommendation apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to natural language processing and machine learning technology in the field of artificial intelligence, and is specifically explained by the following embodiment:

the multimedia data recommendation method provided by the embodiment of the application (or the method provided by the embodiment of the application for short) is suitable for recommending and outputting various multimedia data (such as video and image-text contents) in application programs (such as video applications, news applications, short video applications, social applications and the like) with video and image-text content viewing and/or uploading functions after screening. Specifically, the method provided by the embodiment of the application is suitable for screening the multimedia data screened in the recall stage in the rough ranking stage of personalized recommendation, so that part of the multimedia data (for convenience of description, a target recommendation medium can be taken as an example for explanation) is obtained from the multimedia data screened in the recall stage and is sent to the fine ranking stage for further screening. The various video and graphics contents may be Professional Generated Content (PGC) issued by media and organizations, or User Generated Content (UGC) created by a User, and may be determined according to an actual application scenario, which is not limited herein. In order to achieve better personalized recommendation effect, a network model based on Multi-objective learning is usually used for screening multimedia data at each stage of personalized recommendation, for example, a personalized recommendation model based on Multi-gate mixed-expert-networks (mmoes) can be used for screening multimedia data at a coarse ranking stage. The multi-gate hybrid expert network is a common network structure for multi-target learning, and referring to fig. 1, fig. 1 is a schematic diagram of a multi-gate hybrid expert network architecture. As shown in fig. 1, the multi-gated hybrid expert network includes a plurality of expert networks (expert network 1, expert network 2, … …, expert network N) for extracting different features and a plurality of gating networks (gating network 1, … …, gating network K) for assigning a weight to each expert network. The multi-gate hybrid expert network can perform multi-target estimation based on input multimedia data to obtain a plurality of service target predicted values (service targets can be click rate, conversion rate, browsing duration and the like) corresponding to each multimedia data, so that the multimedia data can be screened based on the plurality of service target predicted values. In addition, in order to improve the individualization of multimedia data screening, the multimedia data to be recommended and the data of the recommended object (which may be various attribute information of the recommended object) may be screened in combination, and in the rough arrangement stage, the multimedia data (for convenience of description, the multimedia data to be recommended may be exemplified as the example) with a certain number (usually thousands or tens of thousands of data) obtained by screening in the recall stage is processed, and from the aspect of the screening efficiency, the input data of the recommended object and the input multimedia data to be recommended may be screened through the multi-gated hybrid expert network after being subjected to the dimension reduction processing. For example, the recommendation object data and the multimedia data to be recommended in fig. 1 are subjected to dimension reduction processing by the tower structure 1 and the tower structure 2, and at the same time, the recommendation object data and the multimedia data to be recommended after the dimension reduction processing are spliced with cross data (internal association information between the recommendation object and the multimedia data can be introduced), and then input to each expert network for feature extraction. However, compared with the method of directly inputting the multimedia data to be recommended and the recommended object data, original characteristic signals contained in the dimensionality-reduced recommended object data and the multimedia data to be recommended are lost, the capability of end-to-end learning of differentiated representations of different expert networks in the multi-gate hybrid expert network is limited, the multi-target learning effect is poor, the screening effect of the multimedia data to be recommended is poor, and the applicability is low. Therefore, before the multimedia data is screened through a network model based on multi-target learning (such as a multi-gate control hybrid expert network), differential representations can be extracted from original data (such as multimedia data to be recommended and recommended object data), so that the capability of the expert network in the multi-gate control hybrid expert network for learning the differential representations is enhanced, the effect of multi-target estimation through the multi-gate control hybrid expert network is enhanced, the screening effectiveness of the multimedia data to be recommended in the coarse ranking stage is improved, and the applicability is high.

In the method provided by the embodiment of the application, multimedia data characteristics corresponding to multimedia data can be obtained in the recommendation process of the multimedia data, recommendation object data characteristics or multimedia data characteristics to be recommended (which can be first multimedia data characteristics) in the multimedia data characteristics are input into first characteristic extraction networks in an individual recommendation model, and characteristic extraction is performed on the basis of the first multimedia data characteristics through each first characteristic extraction network to obtain a plurality of differentiated recommendation characteristics, so that recommendation characteristics with better diversity are provided for multimedia data recommendation of a subsequent individual recommendation model. And carrying out weighted summation on the plurality of recommended features output by the first feature extraction network through a plurality of first gating networks to obtain a plurality of weighted recommended features. Because different first gating networks distribute different feature merging weights to recommended features, weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that the first gating networks can obtain a plurality of differential weighted recommended features, the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model, and the capability of the personalized recommendation model for learning the differential characterization is enhanced by extracting the differential characterization (the first multimedia data features) from original data (such as multimedia data to be recommended and recommended object data) based on the first gating networks, so that the effect of multi-target estimation through the personalized recommendation model is enhanced. And inputting a plurality of splicing vectors obtained by splicing the weighted recommendation characteristics with second multimedia data characteristics (recommendation object data characteristics or to-be-recommended multimedia data characteristics which are different from the first multimedia data characteristics) and the splicing characteristics of the splicing vectors into an individual recommendation model to obtain a service target prediction model (such as a multi-gate mixed expert network) of recommended media so as to obtain a plurality of target recommended media based on the service target prediction model, wherein the adaptability is high.

Referring to fig. 2, fig. 2 is a schematic diagram of a system architecture provided in the embodiment of the present application. As shown in fig. 2, the system architecture may include a service server 100 and a terminal cluster, where the terminal cluster may include: terminal devices such as terminal device 200a, terminal device 200b, terminal devices 200c, … …, and terminal device 200 n. The service server 100 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud database, a cloud service, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal device (including the terminal device 200a, the terminal device 200b, the terminal devices 200c, … …, and the terminal device 200n) may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), a smart computer, a smart car-mounted smart terminal, and the like. The service server 100 may establish a communication connection with each terminal device in the terminal cluster, and a communication connection may also be established between each terminal device in the terminal cluster. In other words, the service server 100 may establish a communication connection with each of the terminal device 200a, the terminal device 200b, the terminal devices 200c, … …, and the terminal device 200n, for example, a communication connection may be established between the terminal device 200a and the service server 100. A communication connection may be established between the terminal device 200a and the terminal device 200b, and a communication connection may also be established between the terminal device 200a and the terminal device 200 c. The communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, or may be directly or indirectly connected through a wireless communication manner, and the like, and may be determined according to an actual application scenario, and the present application is not limited herein.

It should be understood that each terminal device in the terminal cluster shown in fig. 2 may be installed with an application client, and when the application client runs in each terminal device, the application client may perform data interaction with the service server 100 shown in fig. 2, respectively, so that the service server 100 may receive service data (such as public numbers, multimedia data to be recommended uploaded by users through the terminal devices) from each terminal device. The application client can be an application client having a function of displaying data information such as text, images and videos, such as a social application, an instant messaging application, a live application, a news application, a short video application, a music application, a shopping application, a novel application, a payment application, and the like, and can be specifically determined according to requirements of an actual application scene, and is not limited herein. The application client may be an independent client, or may be an embedded sub-client integrated in a certain client (e.g., an instant messaging client, a social client, etc.), which may be determined specifically according to an actual application scenario and is not limited herein. Taking a news application as an example, when a user uses the news application through a terminal device, the user may upload related user original content (for example, a news text written and uploaded by the user himself, a news video made by the user himself, etc.) through the terminal device. The service server 100, as a server for the news application, may be a set including a plurality of servers such as a background server and a data processing server corresponding to the application client. The service server 100 may receive multimedia data to be recommended from the terminal device (for example, user original content uploaded by the user through the terminal device), and in addition, the service server 100 may receive recommendation object data to perform personalized recommendation on a specific object. The recommendation object data may include attribute information of the recommendation object, for example, the attribute information may include portrait information (such as an object rating) of the recommendation object and behavior information (such as a click rate and a download rate of the recommendation object based on the media data) of the recommendation object. The service server 100 may select a part of the multimedia data from the multimedia data to be recommended as a target recommended medium based on the recommendation object data. Or, the multimedia data to be recommended may also be multimedia data generated by filtering in a recall stage, that is, part of the multimedia data to be recommended may be obtained by quickly filtering a plurality of multimedia data to be recommended that are input or produced by a user, a public number and an organization through an application client (such as a news application) loaded in a terminal device, and may be determined specifically according to an actual application scenario, which is not limited herein. The service server 100 filters the multimedia data to be recommended to obtain a target recommended medium, where the service server 100 may return the target recommended medium to the terminal device to be recommended to the user through the installed news application and display the target recommended medium, or the service server 100 may further filter the target recommended medium again and return the filtered target recommended medium to the terminal device to be recommended to the user through the installed news application and display the target recommended medium. Referring to fig. 3, fig. 3 is a scene schematic diagram of a multimedia data recommendation method according to an embodiment of the present application. As shown in fig. 3, a user using the news application described above may be as shown in interface 1 in fig. 3. Specifically, a plurality of target recommended media recommended to and presented to the user may be displayed in the interface 1, and may include the target recommended media 10, the target recommended media 20, and the target recommended media 20. For example, the target recommended media 10 may include a cover image 10a, a title text 10b (which may include a multimedia data title and account information, not shown in the figure) and a summary text 10c, the target recommended media 20 may include a cover image 20a, a title text 20b and a summary text 20c, and the target recommended media 30 may include a cover image 30a, a title text 30b and a summary text 30 c. The user may browse the visual elements of each presented target recommendation media and select a favorite target recommendation media to further view the corresponding text, video, etc. (which may be a click on the corresponding visual element).

The method provided in the embodiment of the present application may be executed by the service server 100 shown in fig. 2, or may be executed by a terminal device (any one of the terminal device 200a, the terminal devices 200b, … …, and the terminal device 200n shown in fig. 2), or may be executed by both the terminal device and the service server, which may be determined according to an actual application scenario, and is not limited herein.

In some possible embodiments, the terminal device 200a may be used as a provider of multimedia data to be recommended, and the service server 100 may screen a part of the multimedia data to be recommended (which may be multimedia data to be recommended generated during a recall phase) acquired by the terminal device 200a as a target recommended medium. The service server 100 may be deployed with a personalized recommendation model, and the service server 100 may obtain multimedia data features (including recommendation object data features and multimedia data features to be recommended) corresponding to multimedia data, and input recommendation object data features or multimedia data features to be recommended (which may be first multimedia data features) in the multimedia data features into first feature extraction networks in the personalized recommendation model, so as to perform feature extraction on the first multimedia data features through each first feature extraction network, respectively, to obtain a plurality of recommendation features. The network configurations of the plurality of first feature extraction networks may be the same (the network parameters of the first feature extraction networks are different) or different. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model. The service server 100 may input the first multimedia data feature into each first gating network, and each first gating network may determine, according to the currently input first multimedia data feature, a feature combining weight corresponding to each recommended feature input to the first gating network, so that each first gating network may respectively perform weighted summation on the plurality of recommended features based on the feature combining weights corresponding to each recommended feature corresponding to the first gating network, to obtain a plurality of weighted recommended features. Because the feature merging weights distributed to the recommended features by different first gating networks are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that a plurality of differentiated weighted recommended features can be obtained through the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The service server 100 may further splice each of the weighted recommendation features with a second multimedia data feature to obtain a plurality of spliced vectors (here, cross features may also be obtained, and the cross features and each of the spliced vectors in the spliced vectors are spliced to obtain updated spliced vectors), and input each spliced vector obtained by splicing each of the weighted recommendation features and the second multimedia data feature to a second feature extraction network corresponding to a first gating network that outputs the weighted recommendation feature to perform feature extraction, so as to obtain a plurality of differentiated spliced features, thereby providing a better diversity of spliced features for a subsequent personalized recommendation model to perform multimedia data recommendation. The service server 100 may input the splicing vectors into each second gated network, and each second gated network may determine, according to the currently input splicing vector, a feature merging weight of each splicing feature input by the second gated network to the second gated network, so that each second gated network may respectively perform weighted summation on the splicing features based on the feature merging weights corresponding to the splicing features corresponding to the second gated network, so as to obtain a plurality of target recommended features. The service server 100 may input the target recommendation features output by each second gated network into the prediction network corresponding to each second gated network, thereby obtaining the service target prediction values corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. The target recommended media are obtained by screening a plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration, click rate and the like), so that the problem that only a single service target is optimized and screened to obtain the target recommended media excessively biased to part of the service targets is solved, and the personalized recommended experience of the multimedia data is enhanced.

In some possible embodiments, the terminal device 200a may obtain a plurality of multimedia data to be recommended through an application client (e.g., a news application) loaded in the terminal device 200a, the terminal device 200a may be deployed with a multi-modal vector generation model, and the terminal device 200a may obtain multimedia data features (including recommendation object data features and multimedia data features to be recommended) corresponding to the multimedia data, and input the recommendation object data features or the multimedia data features to be recommended (which may be first multimedia data features) in the personality recommendation model into first feature extraction networks in the personality recommendation model, so as to perform feature extraction on the first multimedia data features through the respective first feature extraction networks to obtain a plurality of recommendation features. The network configurations of the plurality of first feature extraction networks may be the same (the network parameters of the first feature extraction networks are different) or different. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device 200a may input the first multimedia data characteristics into each first gating network, and each first gating network may determine, according to the currently input first multimedia data characteristics, a characteristic combining weight corresponding to each recommended characteristic input by the first gating network, so that each first gating network may perform weighted summation on the plurality of recommended characteristics based on the characteristic combining weights corresponding to each recommended characteristic corresponding to the first gating network, to obtain a plurality of weighted recommended characteristics. Because the feature merging weights distributed to the recommended features by different first gating networks are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that a plurality of differentiated weighted recommended features can be obtained through the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device 200a may further splice each weighted recommendation feature of the weighted recommendation features with a second multimedia data feature to obtain a plurality of spliced vectors (here, cross features may also be obtained, and the cross features and each spliced vector of the spliced vectors are spliced to obtain a plurality of updated spliced vectors), and each spliced vector obtained by splicing each weighted recommendation feature and the second multimedia data feature is input to a second feature extraction network corresponding to a first gating network that outputs the weighted recommendation feature to perform feature extraction, so as to obtain a plurality of differentiated spliced features, thereby providing a better diversity of spliced features for a subsequent personalized recommendation model to perform multimedia data recommendation. The terminal device 200a may input the splicing vectors into each second gated network, and each second gated network may determine, according to the currently input splicing vector, a feature merging weight of each splicing feature input to the second gated network, so that each second gated network may respectively perform weighted summation on the splicing features based on the feature merging weights corresponding to the splicing features, so as to obtain a plurality of target recommended features. The terminal device 200a may input the target recommendation feature output by each second gating network into the prediction network corresponding to each second gating network, thereby obtaining the service target prediction value corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. The target recommended media are obtained by screening a plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration, click rate and the like), so that the problem that only single service target is optimized and screened to obtain the target recommended media excessively biased to part of the service targets is solved, and the personalized recommendation experience of the multimedia data is enhanced.

For convenience of description, a terminal device is taken as an execution subject of the method provided by the embodiment of the present application, and a manner of recommending multimedia data by the terminal device is specifically described by an embodiment.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for recommending multimedia data according to an embodiment of the present application. As shown in fig. 4, the method includes the steps of:

s101, acquiring multimedia data characteristics corresponding to the multimedia data.

In some possible embodiments, a terminal device (such as the terminal device 200a) may obtain multimedia data, where the multimedia data may include recommendation object data and multimedia data to be recommended. Specifically, the recommendation object data may include attribute information of the recommendation object, for example, the attribute information may include portrait information (for example, an object rating and the like) of the recommendation object and behavior information (for example, a click rate, a download rate and the like of the recommendation object based on the media data) of the recommendation object. The multimedia data to be recommended may include a plurality of multimedia data to be recommended (which may include teletext media data, audio data, video data, and the like) input or produced by a user, a public number, and an organization through an application client (such as a news application) loaded in the terminal device, and attribute information thereof (such as media type, subject, duration, and the like). The terminal equipment can obtain the multimedia data to be recommended through the news application. The news application may be an independent client, an embedded sub-client integrated in a certain client (e.g., an instant messaging client, a social client, etc.), or a web application accessed through a browser, and may be determined specifically according to an actual application scenario, which is not limited herein. Alternatively, the multimedia data to be recommended may also be multimedia data screened from a recall stage, that is, part of the multimedia data to be recommended may be obtained by quickly screening a plurality of multimedia data to be recommended that are input or produced by a user, a public number and an organization through an application client (for example, a news application) loaded in a terminal device. Taking the example that the multimedia data to be recommended is the multimedia data screened out from the recall stage, the terminal device may receive the multimedia data to be recommended screened out from the recall stage.

Optionally, vectorizing the recommendation object data and the multimedia data to be recommended to obtain a plurality of feature vectors, where the plurality of feature vectors may include discrete feature vectors (for example, feature vectors obtained by vectorizing media types in the multimedia data to be recommended) and continuous feature vectors (for example, feature vectors obtained by vectorizing durations in the multimedia data to be recommended), and since the dimensions of the discrete feature vectors are often very large, if the discrete feature vectors are directly input to the personalized recommendation model, the number of network parameters of the entire personalized recommendation model is very large, and the discrete feature vectors cause very slow convergence of the entire personalized recommendation model, it is usually necessary to perform Embedding compression processing on the discrete feature vectors to thicken the discrete feature vectors (eliminate useless features) so as to generate low-dimensional dense feature vectors, and inputting the low-dimensional dense feature vectors into the personalized recommendation model for processing. Specifically, discretization processing may be performed on the continuous feature vector (which may be obtained by vectorizing the recommendation object data or the multimedia data to be recommended) to obtain discrete feature vectors corresponding to the continuous feature vector, and then Embedding processing may be performed on the discrete feature vectors corresponding to the continuous feature vectors to obtain the coding features of the continuous feature vectors (which may be the recommendation object data features or the multimedia data features to be recommended). The discrete feature vector (which may be a discrete feature vector obtained by vectorizing the recommendation object data or the multimedia data to be recommended) is directly subjected to Embedding processing to obtain the coding features of the discrete feature vector (which may be the recommendation object data features or the multimedia data features to be recommended). The recommendation object data feature and the to-be-recommended multimedia data feature constitute a multimedia data feature (may include a first multimedia data feature and a second multimedia data feature), where the first multimedia data feature may be a recommendation object data feature and the second multimedia data feature may be a to-be-recommended multimedia data feature, or the first multimedia data feature may be a to-be-recommended multimedia data feature and the second multimedia data feature may be a recommendation object data feature. In the following, an example will be described that the first multimedia data feature is a data feature of a recommendation object and the second multimedia data feature is a feature of multimedia data to be recommended, which is not described in detail below.

In some possible embodiments, the recommendation object data may include recommendation object data corresponding to a plurality of objects (for example, may include portrait information and behavior information corresponding to each of the plurality of objects), and the multimedia data to be recommended may include multimedia data to be recommended corresponding to a plurality of multimedia data (for example, may include a plurality of multimedia data and attribute information corresponding to each of the plurality of multimedia data). The recommended object data feature may include recommended object data features corresponding to a plurality of objects (e.g., recommended object data feature of object 1, recommended object data feature of object 2, recommended object data feature of object 3, etc.), the multimedia data to be recommended features may include multimedia data to be recommended features corresponding to a plurality of multimedia data (e.g., multimedia data to be recommended features of multimedia data 1, multimedia data to be recommended features of multimedia data 2, multimedia data to be recommended features of multimedia data 3, etc.), that is, the terminal device may determine, for one object, a plurality of target recommended media corresponding to the object, and the terminal device may also determine, for a plurality of objects, a plurality of target recommended media corresponding to each object in the plurality of objects at the same time, which may be specifically determined according to an actual application scenario, and the embodiments of the present application are not limited herein.

S102, performing feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommendation features corresponding to the first multimedia data features, and obtaining a plurality of weighted recommendation features corresponding to the recommendation features through a plurality of first gating networks.

In some possible embodiments, the terminal device may input a first multimedia data feature of the multimedia data features into a plurality of expert networks (for convenience of description, a first feature extraction network may be taken as an example for illustration), so as to perform feature extraction on the first multimedia data feature through each first feature extraction network respectively to obtain a plurality of recommended features. Specifically, each of the first feature extraction networks is configured to obtain a recommended feature corresponding to a first multimedia data feature, where the first multimedia data feature may be a recommended object data feature, and the recommended object data feature may include one or more recommended object data features corresponding to objects. It can be understood that, when the recommendation object data feature includes recommendation object data features corresponding to a plurality of objects (that is, when the terminal device determines a plurality of target recommended media corresponding to each of the plurality of objects for the plurality of objects at the same time), the recommendation feature corresponding to the first multimedia data feature may include a recommendation feature corresponding to each recommendation object data feature obtained through the first feature extraction network based on each recommendation object data feature in the plurality of recommendation object data features. The plurality of first feature extraction networks may have the same network configuration (the network parameters of the first feature extraction networks are different). For example, the Network structures of the plurality of first feature extraction Networks may all be Deep Neural Networks (DNN) structures, may all be Factorization Machines (FM) structures, and may all be Deep Cross Networks (DCN) structures. Or, the network structure of the plurality of first feature extraction networks may be a combination of at least two network structures of a deep neural network structure, a factorization machine structure, and a deep cross network structure, and may be determined according to an actual application scenario, and the embodiment of the present application is not limited herein. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model.

Further, the terminal device may obtain, through a plurality of first gating networks, a plurality of weighted recommendation features corresponding to the plurality of recommendation features, specifically, the terminal device may input the first multimedia data feature into each first gating network, and each first gating network may determine, according to the currently input first multimedia data feature, a feature combining weight corresponding to each recommendation feature input by the first gating network, so that each first gating network may respectively perform a weighted summation on the plurality of recommendation features based on the feature combining weights corresponding to each recommendation feature corresponding thereto to obtain a plurality of weighted recommendation features. Because the feature merging weights distributed to the recommended features by different first gating networks are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks (for example, the first gating network 1 distributes more weights to the recommended features output by the first feature extraction network 1, and the first gating network 2 distributes more weights to the recommended features output by the first feature extraction network 2), so that a plurality of differentiated weighted recommended features can be obtained by the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. Optionally, the first gating network may include at least one of a linear-variation-based gating network or a normalized-weighting-based gating network. The method can be determined according to an actual application scenario, and the embodiment of the application is not limited herein. For example, if the first multimedia data feature is U (dimension may be L), the number of first gating networks is N, and the number of first feature extraction networks is M, the feature combining weight corresponding to each recommended feature determined by the nth first gating network may be represented as:

wherein,

for a parameter matrix of dimension M × L, i.e. G ⁿ The (U) dimension may be M, and the values of the dimensions respectively represent the feature merge weights corresponding to the recommended features generated by the first feature extraction network.

S103, obtaining a plurality of splicing vectors obtained by splicing each weighted recommended feature in the plurality of weighted recommended features with the second multimedia data feature, and obtaining the splicing features of each splicing vector obtained based on each weighted recommended feature based on the second feature extraction network corresponding to each first gating network for obtaining each weighted recommended feature so as to obtain a plurality of splicing features.

In some possible embodiments, the terminal device may splice each of the weighted recommendation features with the second multimedia data feature to obtain a plurality of splicing vectors, and input each splicing vector obtained by splicing each of the weighted recommendation features and the second multimedia data feature to the second feature extraction network corresponding to the first gating network that outputs each of the weighted recommendation features. Specifically, one first gating network may correspond to one second feature extraction network one to one, that is, the number of the first gating networks may be the same as that of the second feature extraction networks, and the weighted recommended features output by any one first gating network and the features of the second multimedia data are spliced to obtain a splicing vector, which may be input to the second feature extraction network corresponding to the first gating network to perform feature extraction to obtain a splicing feature corresponding to the splicing vector. Here, the network structures of the plurality of second feature extraction networks may be all deep neural network structures, may be all factorization machine structures, and may be all deep cross network structures. Or, the network structure of the plurality of second feature extraction networks may be a combination of at least two network structures of a deep neural network structure, a factorization machine structure, and a deep cross network structure, and may be determined according to an actual application scenario, and the embodiment of the present application is not limited herein. And performing feature extraction on the basis of the splicing vectors through each second feature extraction network to obtain a plurality of differentiated splicing features, and providing better-diversity splicing features for multimedia data recommendation of a subsequent personalized recommendation model.

In some feasible embodiments, after the terminal device obtains a plurality of splicing vectors obtained by splicing each weighted recommended feature in the plurality of weighted recommended features with the second multimedia data feature, the terminal device may further obtain a cross feature, and splice the cross feature and each splicing vector in the plurality of splicing vectors to obtain a plurality of updated splicing vectors. Specifically, the cross feature may include a cross feature obtained based on cross data generated by cross generation of portrait information of the sample recommended object and the sample recommended multimedia data, and the cross feature may further include a cross feature obtained based on cross data generated by cross generation of behavior information of the sample recommended object and the sample recommended multimedia data. Taking the example of cross data generated by crossing the image information of the sample recommended object with the sample recommended multimedia data, the image information of the sample recommended object may be an object level, the sample recommended multimedia data (for example, attribute information of the multimedia data to be recommended) may be a subject, and then distribution data of different object levels and different subjects may be generated as cross data based on the object level and the subject (for example, a statistical distribution situation of sample recommended objects of different levels corresponding to each subject under the subjects a and B may be used). And a plurality of updated splicing vectors are obtained by adding the cross features, and the internal association information between the recommended object and the multimedia data can be introduced into the splicing vectors so as to further enhance the personalized recommendation effect of the personalized recommendation model.

It can be understood that, the plurality of first feature extraction networks and the plurality of first gating networks may form a multi-gated hybrid expert network (which may be referred to as a lower-layer multi-gated hybrid expert network), when the first multimedia data features are recommendation object data features, that is, the lower-layer multi-gated hybrid expert network only involves processing (including feature extraction, weighted summation, etc.) recommendation object data features of recommendation objects (such as users), so that, in a process of performing personalized recommendation on a recommendation object based on a large amount of multimedia data (such as thousands, tens of thousands) through a personalized recommendation model, a small amount of recommendation object data features may be processed through the lower-layer multi-gated hybrid expert network to obtain corresponding weighted recommendation features, and the weighted recommendation features are copied to obtain recommendation weighted features equal to the amount of multimedia data features (i.e., second multimedia data features), therefore, the weighted recommendation features and the second multimedia data features are spliced to obtain a plurality of spliced vectors, multi-target estimation is carried out on the basis of the spliced vectors through the personalized recommendation model to generate the target recommendation media (multi-target estimation can be carried out through an upper multi-gate control hybrid expert network), multi-target estimation efficiency is further improved, and multi-target estimation performance is enhanced.

And S104, inputting the splicing vectors and the splicing characteristics of the splicing vectors into a business target prediction model for obtaining recommended media so as to obtain a plurality of target recommended media based on the business target prediction model.

In some possible embodiments, the service target prediction model may include a plurality of second gating networks and a plurality of prediction networks, where one prediction network of the plurality of prediction networks corresponds to one second gating network, the terminal device may input the plurality of splicing vectors into each second gating network, and each second gating network may determine, according to a currently input splicing vector, a feature merging weight of each splicing feature input by the second gating network, so that each second gating network may respectively weight and sum the plurality of splicing features based on the feature merging weights corresponding to the respective splicing features to obtain a plurality of target recommended features. Because the feature merging weights distributed to the splicing features by different second gating networks are different, the target recommended features generated by the second gating networks based on the splicing features are obviously different from the weights of the second feature extraction networks (for example, the second gating network 1 distributes more weights to the splicing features output by the second feature extraction network 1, and the second gating network 2 distributes more weights to the splicing features output by the second feature extraction network 2), so that a plurality of differentiated weighted splicing features can be obtained by the second gating networks, and a weighted splicing feature with better diversity is provided for multimedia data recommendation of a subsequent personalized recommendation model. Alternatively to this, the first and second parts may,the second gating network may comprise at least one of a linear variation based gating network or a normalized weighting based gating network. The method can be determined according to an actual application scenario, and the embodiment of the application is not limited herein. For example, if the number of the first gating networks is N, the number of the second feature extraction networks is also N, the number of the first feature extraction networks is M, and the output of the mth first feature extraction network can be represented as u _m (U), the weighted recommended features output by the nth first gating network can be expressed as:

wherein,

and representing the feature merging weight determined by the nth first gating network for the recommended feature output by the mth first feature extraction network. And splicing each of the N weighted recommended features with a second multimedia data feature (which may be represented as i) and a cross feature (which may be represented as c) to obtain N spliced vectors, where the nth spliced vector may be represented as:

v _n ＝concat(u ⁿ ,i,c)

if the number of the first feature extraction networks is K, the feature merging weight corresponding to each splicing feature determined by the kth second gating network may be represented as:

wherein

Is a parameter matrix with dimension of NxD, and V represents a D-dimensional vector formed by jointly splicing the output of all the first gating networks, the second multimedia data characteristic and the cross characteristic, namely

V＝concat(u ¹ ,…,u ⁿ ,i,c)

The target recommendation feature from which the kth second gated network output is derived may be expressed as:

wherein

A feature merge weight, f, determined for the stitching feature output by the nth second feature extraction network representing the kth second gating network _n (v _n ) And representing the splicing characteristics output by the nth second characteristic extraction network. The terminal device may input the target recommendation features output by each second gated network into the prediction network corresponding to each second gated network, thereby obtaining the service target prediction values corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. Specifically, each of the plurality of prediction networks corresponds to one second gating network, and each prediction network can bear a service target prediction value for generating a different service target.

In some possible embodiments, the service target may include a click rate, a conversion rate, a click conversion rate, a duration, and the like, and may be determined according to an actual application scenario, which is not limited herein. The click rate may be a ratio of the number of clicks to the number of exposures, and may be used to indicate the heat of the multimedia data to be recommended. The conversion rate is the ratio of the conversion number to the number of clicks and can be used for indicating the popularity of the multimedia data to be recommended. The click conversion rate refers to a ratio of the conversion number to the exposure number, namely, a product of the click rate and the conversion rate, and can be used for indicating the popularity of the multimedia data to be recommended. The duration is a browsing duration of the pointer to the multimedia data to be recommended, such as a reading duration, a staying duration, and the like, and may be used to indicate a degree of interest of the recommending object in the multimedia data to be recommended.

In some possible embodiments, business objective prediction values are used to predict recommendationsThe object is directed at an operation behavior of the multimedia data to be recommended, and optionally, the service target prediction value may be a prediction probability value of the service target, or may also be a prediction value of the service target, which may be specifically determined according to an actual application scenario, and the embodiment of the present application is not limited herein. Taking the predicted value of the service target as the predicted probability value of the service target as an example, if the service target is a time length, if the prediction result is 1, it can be shown that the browsing time length of the recommended object for the multimedia data to be recommended exceeds the threshold time length, if the prediction result is 0, it can be shown that the browsing time length of the recommended object for the multimedia data to be recommended does not exceed the threshold time length, if the prediction result is a value (such as 0.5) between 0 and 1, it can be shown that 50% of the possibility exists, and the browsing time length of the recommended object for the multimedia data to be recommended exceeds the threshold time length. For example, if the prediction network includes the prediction network 1 and the prediction network 2, and the service targets corresponding to the prediction network 1 and the prediction network 2 are respectively the duration and the click rate, taking the example that the terminal device determines a plurality of target recommended media corresponding to an object for the object, the terminal device may obtain, through the prediction network, a service target prediction value of each to-be-recommended multimedia data in the plurality of to-be-recommended multimedia data, that is, a duration prediction result of each to-be-recommended multimedia data obtained through the prediction network 1 and a click rate prediction result obtained through the prediction network 2, so that each to-be-recommended multimedia data has a plurality of corresponding service target prediction values. Assuming that the number of the predicted networks is K, the scoring hidden layer of the kth predicted network is h ^k The target recommendation characteristic output by the kth second gate control network is f ^k Then the output of the kth prediction network can be expressed as:

p _k ＝h ^k (f ^k )

the terminal equipment can determine a plurality of target recommendation media from the plurality of multimedia data to be recommended based on the plurality of multimedia data to be recommended and a plurality of service target predicted values corresponding to the multimedia data to be recommended, the plurality of target recommendation media are obtained by screening the plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration and click rate), the problem that only single service target is optimized and screened to obtain target recommendation media excessively biased to part of the service targets is solved, and personalized recommendation experience of the multimedia data is enhanced.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a personalized recommendation model provided in an embodiment of the present application. In the method provided by the embodiment of the application, the terminal device may obtain multimedia data features corresponding to the multimedia data, and input recommendation object data features or multimedia data features to be recommended (which may be first multimedia data features) in the multimedia data features into first feature extraction networks in the personalized recommendation model, so as to perform feature extraction on the first multimedia data features through each first feature extraction network respectively to obtain a plurality of recommendation features. As shown in fig. 5, the first multimedia data feature is input into the first feature extraction network 1, the first feature extraction network 2, … …, and the first feature extraction network M, and the recommended features are output through the first feature extraction network 1, the first feature extraction network 2, … …, and the first feature extraction network M, respectively. The plurality of first feature extraction networks may have the same network configuration (different network parameters of the first feature extraction networks) or different network configurations. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device may input the first multimedia data characteristic into each first gating network, such as first gating network 1, first gating network 2, … …, first gating network N in fig. 5. Each first gating network can determine the feature merging weight corresponding to each recommended feature input to the first gating network according to the feature of the currently input first multimedia data, so that each first gating network can respectively perform weighted summation on the recommended features based on the feature merging weight corresponding to each recommended feature corresponding to the first gating network to obtain a plurality of weighted recommended features. Because the feature merging weights distributed to the recommended features by different first gating networks are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that a plurality of differentiated weighted recommended features can be obtained through the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The first gating network may comprise at least one of a linear variation based gating network or a normalized weighting based gating network. The terminal device may further splice each of the weighted recommendation features with a second multimedia data feature to obtain a plurality of spliced vectors (here, cross features may also be obtained, and the cross features and each of the spliced vectors in the spliced vectors are spliced to obtain updated spliced vectors), and input each spliced vector obtained by splicing each of the weighted recommendation features and the second multimedia data feature to a second feature extraction network corresponding to a first gating network that outputs each of the weighted recommendation features, such as the second feature extraction network 1, the second feature extraction network 2, … …, and the second feature extraction network N in fig. 5, perform feature extraction based on the spliced vectors through the second feature extraction network 1, the second feature extraction network 2, … …, and the second feature extraction network N, respectively, the method and the device have the advantages that a plurality of differentiated splicing features are obtained, and the splicing features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device may input the plurality of stitching vectors into each second gating network, such as second gating network 1, … … and second gating network K in fig. 5. Each second gating network may determine, according to the currently input stitching vector, a feature combining weight of each stitching feature input to the second gating network, so that each second gating network may respectively perform weighted summation on the plurality of stitching features based on the feature combining weights corresponding to each stitching feature corresponding to the second gating network, to obtain a plurality of target recommended features. The terminal device may input the target recommendation features output by each second gated network into the prediction networks corresponding to each second gated network, such as prediction networks 1, … … and prediction network K in fig. 5 (the second gated networks 1, … … and the second gated networks K are respectively in one-to-one correspondence with the prediction networks 1, … … and the prediction networks K), so as to obtain the service target prediction values corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. The target recommended media are obtained by screening a plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration, click rate and the like), so that the problem that only a single service target is optimized and screened to obtain the target recommended media excessively biased to part of the service targets is solved, and the personalized recommended experience of the multimedia data is enhanced.

In some possible embodiments, the plurality of first feature extraction networks and the plurality of first gating networks may form a multi-gated hybrid expert network (which may be referred to as a lower-layer multi-gated hybrid expert network), and the plurality of second feature extraction networks and the plurality of second gating networks may form a multi-gated hybrid expert network (which may be referred to as an upper-layer multi-gated hybrid expert network). It can be understood that the personality recommendation model provided in the embodiment of the present application may be a personality recommendation model obtained by improving a multi-gated hybrid expert network (for example, refer to the multi-gated hybrid expert network in fig. 1), and may be a personality recommendation model obtained by replacing a tower structure (for example, the tower structure 1 and the tower structure 2 in fig. 1) in the multi-gated hybrid expert network with another multi-gated hybrid expert network including a plurality of feature extraction networks (also referred to as an expert network) and a plurality of gated networks, that is, a tower structure in the upper-layer multi-gated hybrid expert network is replaced with the lower-layer multi-gated hybrid expert network to obtain the personality recommendation model provided in the embodiment of the present application, so that after the tower structure performs dimension reduction processing on recommendation object data and multimedia data to be recommended, original feature signals included in the recommendation object data and the multimedia data to be recommended are both lost and are limited in different multi-gated hybrid expert networks The ability of a network to learn differentiated tokens end-to-end. By improving the tower structure to be the lower-layer multi-gating hybrid expert network associated with the upper-layer multi-gating hybrid expert network, differentiated user low-dimensional vector representation (such as differentiated recommendation characteristics) can be provided for the upper-layer multi-gating hybrid expert network through the lower-layer multi-gating hybrid expert network, the multi-target prediction performance of the multi-gating hybrid expert network is guaranteed, the multi-target prediction effect is improved, the screening effectiveness of the multimedia data to be recommended in the coarse arrangement stage is improved, and the applicability is high.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating comparison of experimental effects of a recommendation method for multimedia data according to an embodiment of the present application. As shown in fig. 6, a graph 601 in fig. 6 is a graph of comparison results generated by an experimental group (using the technical solution of the present application) and a control group (using the technical solution provided by other related technologies) during an idle running period based on a duration service objective (such as a browsing duration of a text content), a graph 602 is a graph of comparison results generated by the experimental group and the control group during the experimental period based on the duration service objective, and a graph 603 is a histogram of difference of results between the experimental group and the control group during the idle running period and the experimental period based on the duration service objective (positive values represent that the experimental group is better than the control group, and negative values represent that the experimental group is later than the control group). It can be seen that, during the experiment period, the experiment group adopting the multimedia data recommendation method provided by the application can obtain a better duration result than the control group (during the experiment period, the duration result of the experiment group is always better than that of the control group, the leading percentages are respectively 0.64%, 0.67%, 0.84%, 1.05%, 1.17% and 0.99%), that is, by using the multimedia data recommendation method provided by the application, the target recommended media which can better meet the user requirements can be screened from the multimedia data to be recommended, and the personalized recommendation effect of the multimedia data is good.

In the method provided by the embodiment of the application, the terminal device may obtain multimedia data features (including recommendation object data features and to-be-recommended multimedia data features) corresponding to the multimedia data, and input the recommendation object data features or the to-be-recommended multimedia data features (which may be first multimedia data features) in the multimedia data features into the first feature extraction networks in the personalized recommendation model, so as to perform feature extraction on the first multimedia data features through each first feature extraction network respectively to obtain a plurality of recommendation features. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device may input the first multimedia data feature into each first gating network, and each first gating network may determine, according to the currently input first multimedia data feature, a feature combining weight corresponding to each recommended feature input to the first gating network, so that each first gating network may weight and sum the plurality of recommended features based on the feature combining weight corresponding to each recommended feature corresponding to the first gating network, to obtain a plurality of weighted recommended features. Because the feature merging weights distributed by different first gating networks for the recommended features are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that a plurality of differentiated weighted recommended features can be obtained through the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device can also splice each weighted recommendation feature in the weighted recommendation features with a second multimedia data feature to obtain a plurality of spliced vectors (here, cross features can also be obtained, and the cross features and each spliced vector in the spliced vectors are spliced to obtain a plurality of updated spliced vectors), and each spliced vector obtained by splicing each weighted recommendation feature and the second multimedia data feature is input to a second feature extraction network corresponding to a first gating network outputting each weighted recommendation feature to perform feature extraction to obtain a plurality of differentiated spliced features, so that better-diversity spliced features are provided for multimedia data recommendation of a subsequent personalized recommendation model. The terminal device may input the splicing vectors into each second gated network, and each second gated network may determine, according to the currently input splicing vector, a feature merging weight of each splicing feature input to the second gated network, so that each second gated network may respectively perform weighted summation on the splicing features based on the feature merging weights corresponding to the splicing features corresponding to the second gated network, to obtain a plurality of target recommended features. The terminal device may input the target recommendation features output by each second gated network into the prediction network corresponding to each second gated network, thereby obtaining the service target prediction values corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. The target recommended media are obtained by screening a plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration, click rate and the like), so that the problem that only single service target is optimized and screened to obtain the target recommended media excessively biased to part of the service targets is solved, and the personalized recommendation experience of the multimedia data is enhanced.

Based on the description of the embodiment of the recommendation method for multimedia data, the embodiment of the application also discloses a recommendation device for multimedia data. The apparatus for recommending multimedia data may be applied to the method for recommending multimedia data of the embodiments shown in fig. 4 to 5, for performing the steps in the method for recommending multimedia data. Here, the multimedia data recommendation apparatus may be the service server or the terminal device in the embodiments shown in fig. 4 to 5, that is, the multimedia data recommendation apparatus may be an execution subject of the multimedia data recommendation method in the embodiments shown in fig. 4 to 5. Referring to fig. 7, fig. 7 is a schematic structural diagram of a multimedia data recommendation device according to an embodiment of the present application. In the embodiment of the application, the device can operate the following modules:

an obtaining module 41, configured to obtain multimedia data features corresponding to multimedia data, where the multimedia data features include a first multimedia data feature and a second multimedia data feature, the first multimedia data feature is a data feature of a recommendation object and the second multimedia data feature is a multimedia data feature to be recommended, or the first multimedia data feature is a multimedia data feature to be recommended and the second multimedia data feature is a data feature of a recommendation object;

a weighted recommendation feature generation module 42, configured to perform feature extraction on the first multimedia data features through a plurality of first feature extraction networks to obtain a plurality of recommendation features corresponding to the first multimedia data features, and obtain a plurality of weighted recommendation features corresponding to the plurality of recommendation features through a plurality of first gating networks, where one first feature extraction network is used to obtain one recommendation feature corresponding to the first multimedia data feature, and one first gating network is used to obtain one weighted recommendation feature corresponding to the plurality of recommendation features;

a splicing feature generation module 43, configured to obtain a plurality of splicing vectors obtained by splicing each weighted recommended feature of the plurality of weighted recommended features with the second multimedia data feature, and obtain a plurality of splicing features based on the splicing features of each splicing vector obtained by the second feature extraction network corresponding to each first gating network that obtains each weighted recommended feature;

and a target recommended media generating module 44, configured to input the splicing vectors and the splicing characteristics of the splicing vectors into a business target prediction model for obtaining recommended media, so as to obtain multiple target recommended media based on the business target prediction model.

In some possible embodiments, the weighted recommended features generation module 42 is further configured to:

In some possible embodiments, after obtaining a plurality of splicing vectors obtained by splicing each of the weighted recommended features and the second multimedia data feature, the splicing feature generating module 43 is further configured to:

In some possible embodiments, the service objective prediction model includes a plurality of second gating networks and a prediction network corresponding to each of the second gating networks;

after the splicing vectors and the splicing characteristics of the splicing vectors are input into a business target prediction model for obtaining recommended media, the target recommended media generating module 44 is further configured to:

obtaining feature merging weights corresponding to all the splicing features adopted by all the second gate-controlled networks to obtain target recommended features based on the splicing vectors, performing weighted summation on the splicing features through any one second gate-controlled network based on the feature merging weights corresponding to all the splicing features adopted by any one second gate-controlled network to obtain one target recommended feature obtained by any one second gate-controlled network, and obtaining all the target recommended features obtained by all the second gate-controlled networks to obtain the target recommended features;

In some possible embodiments, the first multimedia data feature is a data feature of a recommendation object and the second multimedia data feature is a multimedia data feature to be recommended;

the above-mentioned splicing feature generation module 43 is further configured to:

performing feature extraction on the recommended object data features through a plurality of first feature extraction networks to obtain a plurality of recommended object features corresponding to the recommended object data features, and obtaining a plurality of weighted recommended object features corresponding to the recommended object features through a plurality of first gating networks as a plurality of weighted recommended features;

In some possible embodiments, the first multimedia data feature is a multimedia data feature to be recommended and the second multimedia data feature is a data feature of a recommendation object;

In some possible embodiments, the obtaining module 41 is further configured to:

According to the embodiment corresponding to fig. 4, the implementation manner described in steps S101 to S104 in the method for recommending multimedia data shown in fig. 4 can be executed by each module of the apparatus shown in fig. 7. For example, in the method for recommending multimedia data shown in fig. 4, the implementation described in step S101 may be performed by the acquisition module 41 in the apparatus shown in fig. 7, the implementation described in step S102 may be performed by the weighted recommended feature generation module 42, the implementation described in step S103 may be performed by the splicing feature generation module 43, and the implementation described in step S104 may be performed by the target recommended media generation module 44. The implementation manners executed by the obtaining module 41, the weighted recommendation feature generating module 42, the splicing feature generating module 43, and the target recommended media generating module 44 may refer to the implementation manners provided in each step in the embodiment corresponding to fig. 4, and are not described herein again.

In the embodiment of the application, the multimedia data recommendation device can acquire multimedia data features (including recommendation object data features and to-be-recommended multimedia data features) corresponding to multimedia data, and input the recommendation object data features or to-be-recommended multimedia data features (which may be first multimedia data features) in the multimedia data features into first feature extraction networks in an individual recommendation model, so as to perform feature extraction on the first multimedia data features through each first feature extraction network respectively to obtain a plurality of recommendation features. The network configurations of the plurality of first feature extraction networks may be the same (the network parameters of the first feature extraction networks are different) or different. And performing feature extraction on the basis of the first multimedia data features through each first feature extraction network to obtain a plurality of differentiated recommendation features, and providing recommendation features with better diversity for multimedia data recommendation of a subsequent personalized recommendation model. The multimedia data recommendation device may input the first multimedia data characteristics into each first gating network, and each first gating network may determine, according to the currently input first multimedia data characteristics, a characteristic combining weight corresponding to each recommended characteristic input by the first gating network, so that each first gating network may perform weighted summation on the plurality of recommended characteristics based on the characteristic combining weights corresponding to each recommended characteristic corresponding to the first gating network, to obtain a plurality of weighted recommended characteristics. Because the feature merging weights distributed to the recommended features by different first gating networks are different, the weighted recommended features generated by the first gating networks based on the recommended features are obviously different from the weighted recommended features corresponding to the first feature extraction networks, so that a plurality of differentiated weighted recommended features can be obtained through the first gating networks, and the weighted recommended features with better diversity are provided for multimedia data recommendation of a subsequent personalized recommendation model. The recommendation device of the multimedia data can also splice each weighted recommendation feature in the weighted recommendation features with a second multimedia data feature to obtain a plurality of splicing vectors (here, cross features can also be obtained, and the cross features and each splicing vector in the splicing vectors are spliced to obtain a plurality of updated splicing vectors), and each splicing vector obtained by splicing each weighted recommendation feature and the second multimedia data feature is input to a second feature extraction network corresponding to a first gating network outputting each weighted recommendation feature to perform feature extraction to obtain a plurality of differentiated splicing features, so that a better-diversity splicing feature is provided for a subsequent personalized recommendation model to perform multimedia data recommendation. The apparatus for recommending multimedia data may input the plurality of stitching vectors into each second gated network, and each second gated network may determine a feature merge weight of each stitching feature input thereto by the second gated network according to the currently input stitching vector, so that each second gated network may respectively weight and sum the plurality of stitching features based on the feature merge weights corresponding to the respective stitching features to obtain a plurality of target recommended features. The apparatus for recommending multimedia data may input the target recommendation feature output by each second gating network into the prediction network corresponding to each second gating network, thereby obtaining the service target prediction value corresponding to each target recommendation feature to obtain a plurality of service target prediction values, and determine a plurality of target recommended media from the multimedia data based on the plurality of service target prediction values. The target recommended media are obtained by screening a plurality of multimedia data to be recommended by integrating a plurality of service targets (such as duration, click rate and the like), so that the problem that only single service target is optimized and screened to obtain the target recommended media excessively biased to part of the service targets is solved, and the personalized recommendation experience of the multimedia data is enhanced.

In the embodiment of the present application, the modules in the apparatus shown in fig. 7 may be respectively or entirely combined into one or several other modules to form the apparatus, or some of the modules may be further split into multiple functionally smaller modules to form the apparatus, which may implement the same operation without affecting implementation of technical effects of the embodiment of the present application. The modules are divided based on logic functions, and in practical application, the functions of one module can be realized by a plurality of modules, or the functions of a plurality of modules can be realized by one module. In other possible implementations of the present application, the apparatus may also include other modules, and in practical applications, the functions may also be implemented by being assisted by other modules, and may be implemented by cooperation of a plurality of modules, which is not limited herein.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 1000 may be the terminal device in the embodiments corresponding to fig. 4 to fig. 7. The computer device 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may alternatively be at least one memory device located remotely from the processor 1001. As shown in fig. 8, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

The network interface 1004 in the computer device 1000 may also be connected to the terminal 200a in the embodiment corresponding to fig. 4 through a network, and the optional user interface 1003 may further include a Display screen (Display) and a Keyboard (Keyboard). In the computer device 1000 shown in fig. 8, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user (or developer) with input; the processor 1001 may be configured to call the device control application stored in the memory 1005, so as to implement the method for recommending multimedia data in the embodiment corresponding to fig. 4.

It should be understood that the computer device 1000 described in this embodiment of the application may perform the description of the method for recommending multimedia data in the embodiment corresponding to fig. 4, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Moreover, it should be noted that, in this embodiment, a computer-readable storage medium is further provided, and a computer program executed by the aforementioned multimedia data recommendation apparatus is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the method for recommending multimedia data in the embodiment corresponding to fig. 4 can be performed, and therefore, details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method for recommending multimedia data, the method comprising:

acquiring a plurality of splicing vectors obtained by splicing each weighted recommendation feature in the plurality of weighted recommendation features with the second multimedia data feature, and acquiring the splicing feature of each splicing vector in the plurality of splicing vectors based on a second feature extraction network corresponding to each first gating network for acquiring each weighted recommendation feature to obtain a plurality of splicing features;

inputting the splicing vectors and the splicing characteristics of the splicing vectors into a business target prediction model for obtaining recommended media so as to obtain a plurality of target recommended media based on the business target prediction model.

2. The method of claim 1, wherein obtaining, by the plurality of first gating networks, a plurality of weighted recommendation features corresponding to the plurality of recommendation features comprises:

weighting and summing the plurality of recommended features through any one first gating network based on the feature merging weight corresponding to each recommended feature adopted by any one first gating network so as to obtain one weighted recommended feature obtained by any one first gating network;

obtaining each weighted recommendation characteristic obtained by each first gating network to obtain a plurality of weighted recommendation characteristics;

wherein the first gating network comprises at least one of a linear variation-based gating network or a normalized weighting-based gating network.

3. The method of claim 2, wherein after obtaining a plurality of splicing vectors obtained by splicing each of the plurality of weighted recommended features with the second multimedia data feature, the method further comprises:

and acquiring cross features, and splicing the cross features with each splicing vector in the splicing vectors to obtain a plurality of updated splicing vectors.

4. The method of claim 3, wherein the traffic objective prediction model comprises a plurality of second gated networks and a prediction network corresponding to each of the second gated networks;

obtaining feature merging weights corresponding to all the splicing features adopted by all the second gating networks to obtain target recommendation features based on the plurality of splicing vectors, performing weighted summation on the plurality of splicing features through any one of the second gating networks based on the feature merging weights corresponding to all the splicing features adopted by any one of the second gating networks to obtain one target recommendation feature obtained by any one of the second gating networks, and obtaining all the target recommendation features obtained by all the second gating networks to obtain the plurality of target recommendation features;

wherein the second gating network comprises at least one of a linear variation-based gating network or a normalized weighting-based gating network.

5. The method of claim 4, wherein the first multimedia data feature is a recommendation object data feature and the second multimedia data feature is a to-be-recommended multimedia data feature;

the obtaining of the plurality of splicing vectors obtained by splicing each weighted recommended feature of the plurality of weighted recommended features with the second multimedia data feature includes:

and acquiring a plurality of splicing vectors obtained by splicing each weighted recommendation object feature in the plurality of weighted recommendation object features with the to-be-recommended multimedia data feature.

6. The method of claim 4, wherein the first multimedia data feature is a multimedia data feature to be recommended and the second multimedia data feature is a recommendation object data feature;

performing feature extraction on the multimedia data features to be recommended through a plurality of first feature extraction networks to obtain a plurality of recommended multimedia features corresponding to the multimedia data features to be recommended, and obtaining a plurality of weighted recommended multimedia features corresponding to the plurality of recommended multimedia features through a plurality of first gating networks to serve as a plurality of weighted recommended features;

7. The method according to any one of claims 1-6, wherein the obtaining of the multimedia data characteristics corresponding to the multimedia data comprises:

8. An apparatus for recommending multimedia data, comprising:

the multimedia data characteristics comprise a first multimedia data characteristic and a second multimedia data characteristic, wherein the first multimedia data characteristic is a recommended object data characteristic and the second multimedia data characteristic is a to-be-recommended multimedia data characteristic, or the first multimedia data characteristic is a to-be-recommended multimedia data characteristic and the second multimedia data characteristic is a recommended object data characteristic;

a weighted recommendation feature generation module, configured to perform feature extraction on the first multimedia data features through multiple first feature extraction networks to obtain multiple recommendation features corresponding to the first multimedia data features, and obtain multiple weighted recommendation features corresponding to the multiple recommendation features through multiple first gating networks, where one first feature extraction network is used to obtain one recommendation feature corresponding to the first multimedia data feature, and one first gating network is used to obtain one weighted recommendation feature corresponding to the multiple recommendation features;

the splicing feature generation module is used for acquiring a plurality of splicing vectors obtained by splicing each weighted recommended feature in the plurality of weighted recommended features with the second multimedia data feature, and acquiring a plurality of splicing features based on the splicing features of each splicing vector obtained by the second feature extraction network corresponding to each first gating network for acquiring each weighted recommended feature;

9. A computer device, comprising: a processor, a memory, and a network interface;

the processor is connected to the memory and the network interface, wherein the network interface is configured to provide data communication functions, the memory is configured to store program code, and the processor is configured to call the program code to perform the method of any one of claims 1 to 5.

10. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 5.