CN115482021A

CN115482021A - Multimedia information recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN115482021A
Application number: CN202110605178.9A
Authority: CN
Inventors: 丘志杰; 王良栋; 钟云; 张博; 林乐宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-12-16

Abstract

The invention provides a multimedia information recommendation method, a multimedia information recommendation device and electronic equipment, wherein the method comprises the following steps: determining a first input characteristic of a multimedia information recommendation model; obtaining a user playing characteristic vector and a user sharing characteristic vector which are matched with the target object; determining a second input characteristic of the multimedia information recommendation model; obtaining a multimedia information playing characteristic vector and a multimedia information sharing characteristic vector which are matched with the multimedia information to be recommended; and adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector. Therefore, the multimedia information recommendation model can recommend the multimedia information in the use environment to different users, the accuracy and relevance of multimedia information recommendation are enhanced, the quality of multimedia information recommendation is effectively improved, and the use experience of the users is improved.

Description

Multimedia information recommendation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to information processing technologies, and in particular, to a multimedia information recommendation method, apparatus, and electronic device.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning and the like, and it is believed that with the development of the technology, the artificial intelligence technology will be applied in more fields and play more and more important values.

In the traditional technology, when historical data are sparse, different targets are difficult to accurately model, and a recommendation network is shared by all tasks, so that more complex relationships among the tasks can not be captured, certain noise is brought to part of the tasks, hidden features in input data are lost, accurate recommendation of multimedia information can not be realized, high-quality multimedia information meeting user requirements cannot be well spread, and on the other hand, the user obtains multimedia information with uneven quality, so that the use experience of the user is influenced.

Disclosure of Invention

In view of this, an embodiment of the present invention provides a multimedia information recommendation method, an apparatus, an electronic device, and a storage medium, and a technical solution of the embodiment of the present invention is implemented as follows:

the embodiment of the invention provides a multimedia information recommendation method, which comprises the following steps:

acquiring historical data of a target object in a multimedia information recommendation environment;

determining a first input characteristic of a multimedia information recommendation model based on the historical data of the target object;

performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object;

acquiring multimedia information to be recommended in a multimedia information data source;

determining a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended;

performing feature fusion processing on the second input features through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended;

and adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector.

The embodiment of the invention also provides a multimedia information recommendation device, which comprises:

the information transmission module is used for acquiring historical data of a target object in a multimedia information recommendation environment;

the information processing module is used for determining a first input characteristic of a multimedia information recommendation model based on the historical data of the target object;

the information processing module is used for performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object;

the information transmission module is used for acquiring multimedia information to be recommended from a multimedia information data source;

the information processing module is used for determining a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended;

the information processing module is used for performing feature fusion processing on the second input features through the multimedia information recommendation model to obtain multimedia information playing feature vectors and multimedia information sharing feature vectors which are matched with the multimedia information to be recommended;

the information processing module is used for adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for extracting the play type sub-information included in the historical data of the target object, and determining a first play identification feature, a first play label feature and a first play category feature which are matched with the target object;

the information processing module is used for extracting the sharing type sub-information included in the historical data of the target object, and determining a first sharing identification feature, a first sharing label feature and a first sharing category feature which are matched with the target object.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for performing feature fusion processing on the first input feature through a hybrid expert network in the multimedia information recommendation model to obtain a user playing high-order feature vector and a user sharing high-order feature vector;

the information processing module is used for performing weighting processing on the first input feature through a feature weight adjusting network in the multimedia information recommendation model to obtain a user playing low-order feature vector and a user sharing low-order feature vector corresponding to the target object;

the information processing module is used for splicing the user playing high-order feature vector and the user playing low-order feature vector to obtain a user playing feature vector;

and the information processing module is used for splicing the user sharing high-order feature vector and the user sharing low-order feature vector to obtain the user sharing feature vector.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining the context characteristics matched with the target object according to the historical data of the target object;

the information processing module is used for adjusting the user playing feature vector by utilizing the context feature through a mixed expert network in the multimedia information recommendation model;

and the information processing module is used for adjusting the user risk feature vector by utilizing the context feature through a hybrid expert network in the multimedia information recommendation model.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining the number of the hybrid expert networks according to the multimedia information recommendation environment;

and the information processing module is used for adjusting the structure of the hybrid expert network in the multimedia information recommendation model according to the number of the hybrid expert networks.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for extracting the playing type sub-information included in the multimedia information to be recommended, and determining a second playing identification characteristic, a second playing label characteristic and a second playing category characteristic which are matched with the media information to be recommended;

the information processing module is used for extracting the sharing type sub-information included in the historical data of the target object, and determining a second sharing identification feature, a second sharing label feature and a second sharing category feature which are matched with the target object.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for performing feature fusion processing on the second input feature through a hybrid expert network in the multimedia information recommendation model to obtain a multimedia information playing high-order feature vector and a multimedia information sharing high-order feature vector;

the information processing module is configured to perform weighting processing on the second input feature through a feature weight adjustment network in the multimedia information recommendation model to obtain a multimedia information playing low-order feature vector and a multimedia information sharing low-order feature vector corresponding to the multimedia information, where all weights in the feature weight adjustment network are fixed values;

the information processing module is used for splicing the multimedia information playing high-order feature vector and the multimedia information playing low-order feature vector to obtain a multimedia information playing feature vector;

the information processing module is used for splicing the multimedia information sharing high-order feature vector and the multimedia information sharing low-order feature vector to obtain a multimedia information sharing feature vector.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining a first dot product value based on a user playing high-order feature vector, a user playing low-order feature vector, a multimedia information playing high-order feature vector and a multimedia information playing low-order feature vector in the user playing feature vector;

the information processing module is used for determining a second dot product value based on the user sharing high-order feature vector, the user sharing low-order feature vector, the multimedia information sharing low-order feature vector and the multimedia information sharing high-order feature vector in the user sharing feature vector;

and the information processing module is used for determining multimedia information to be recalled according to the sum of the first dot product value and the second dot product value.

In the above-mentioned scheme, the first and second light sources,

the information processing module is configured to determine a third dot product value based on the user playing high-order feature vector, the user playing low-order feature vector, the multimedia information playing high-order feature vector, the multimedia information playing low-order feature vector, the user sharing feature vector, the multimedia information sharing high-order feature vector, the user sharing low-order feature vector, the multimedia information sharing high-order feature vector, and the multimedia information sharing low-order feature vector in the user playing feature vector;

and the information processing module is used for determining the multimedia information to be recalled according to the third dot value.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for carrying out data screening processing on the multimedia information to be recommended and analyzing to obtain a title and a label of the multimedia information to be recommended;

the information processing module is used for triggering a target word segmentation library and performing word segmentation processing on the title and the label of the multimedia information to be recommended through the target word segmentation library to obtain word-level multimedia information to be recommended;

the information processing module is used for vectorizing the word-level multimedia information to be recommended through a text information processing network in the multimedia information recommendation model to form a multi-dimensional word-level title characteristic vector and a multi-dimensional word-level label characteristic vector of the multimedia information to be recommended.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for acquiring historical browsing information of a target user;

the information processing module is used for determining the exposure history of the multimedia information corresponding to the historical browsing information based on the historical browsing information of the target user;

and the information processing module is used for dynamically adjusting the playing strategy of the multimedia information based on the multimedia information exposure history corresponding to the history browsing information.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining the type of the multimedia information recommendation environment;

the information processing module is used for determining the category of the multimedia information to be played according to the type of the multimedia information recommendation environment;

the information processing module is used for responding to the category of the multimedia information to be played and triggering the matched multimedia information data source so as to adjust the multimedia information to be played through the multimedia information data source matched with the category of the multimedia information to be played.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining historical parameters of the multimedia information to be recommended according to the type of a multimedia information recommendation environment in which the multimedia information to be recommended is located;

the information processing module is used for determining a training sample set matched with the multimedia information recommendation model based on the historical parameters of the multimedia information to be recommended, wherein the training sample set comprises at least one group of training samples;

the information processing module is used for extracting a training sample set matched with the training samples through the noise threshold matched with the multimedia information recommendation model;

and the information processing module is used for training the multimedia information recommendation model according to the training sample set matched with the training samples.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining a multitask loss function matched with the multimedia information recommendation model;

the information processing module is used for adjusting parameters of a hybrid expert network and network parameters of a characteristic weight adjusting network in the multimedia information recommendation model based on the multitask loss function until loss functions of different dimensions corresponding to the multimedia information recommendation model reach corresponding convergence conditions; and the parameters of the multimedia information recommendation model are matched with the multimedia information recommendation environment.

In the above-mentioned scheme, the first step of the method,

the information processing module is used for determining a dynamic noise threshold value matched with the use environment of the multimedia information recommendation model when the multimedia information recommendation environment of the multimedia information to be recommended is short video recommendation;

the information processing module is used for carrying out noise removal processing on the first training sample set according to the dynamic noise threshold value so as to form a second training sample set matched with the dynamic noise threshold value;

the information processing module is used for determining a fixed noise threshold corresponding to a multimedia information recommendation model when the multimedia information recommendation environment in which the multimedia information to be recommended is located is played in an instant messaging client, and performing noise removal processing on the first training sample set according to the fixed noise threshold to form a second training sample set matched with the fixed noise threshold.

An embodiment of the present invention further provides an electronic device, where the electronic device includes:

a memory for storing executable instructions;

and the processor is used for realizing the multimedia information recommendation method when the executable instructions stored in the memory are operated.

The embodiment of the present invention further provides a computer-readable storage medium, which stores executable instructions, and when the executable instructions are executed by a processor, the method for recommending multimedia information is implemented.

The embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining historical data of a target object in a multimedia information recommendation environment; determining a first input characteristic of a multimedia information recommendation model based on the historical data of the target object; performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object; acquiring multimedia information to be recommended in a multimedia information data source; determining a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended; performing feature fusion processing on the second input feature through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended; and adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector. Therefore, the multimedia information recommendation model can recommend the multimedia information in the use environment to different users, the accuracy and relevance of the multimedia information recommendation are enhanced, the quality of the recommendation of the multimedia information is effectively improved, and the use experience of the users is improved.

Drawings

Fig. 1 is a schematic view of a usage scenario of a multimedia information recommendation method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a multimedia information recommendation device according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of an alternative multimedia information recommendation method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a hybrid expert network according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a feature weight adjustment network according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating a processing procedure of a user playing a feature vector and a user sharing the feature vector according to an embodiment of the present invention;

fig. 7 is a schematic flow chart illustrating an alternative multimedia information recommendation method according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating an alternative multimedia information recommendation in accordance with an embodiment of the present invention;

fig. 9 is a schematic view of an alternative flow chart of a multimedia information recommendation method according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an application environment of a training method for a multimedia information recommendation model according to an embodiment of the present invention;

fig. 11 is a schematic diagram of data processing performed by the multimedia information recommendation model applied to the instant messaging client according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments that can be obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.

2) Based on the condition or state on which the operation to be performed depends, when the condition or state on which the operation depends is satisfied, the operation or operations to be performed may be in real time or may have a set delay; there is no restriction on the order of execution of the operations performed unless otherwise specified.

3) And (4) model training, namely performing multi-classification learning on the image data set. The model can be constructed by adopting deep learning frames such as Tensor Flow, torch and the like, and a multi-classification model is formed by combining multiple layers of neural network layers such as CNN and the like. The input of the model is a three-channel or original channel matrix formed by reading an image through openCV and other tools, the output of the model is multi-classification probability, and the webpage category is finally output through softmax and other algorithms. During training, the model approaches to a correct trend through an objective function such as cross entropy and the like.

4) Neural Networks (NN): an Artificial Neural Network (ANN), referred to as Neural Network or Neural Network for short, is a mathematical model or computational model that imitates the structure and function of biological Neural Network (central nervous system of animals, especially brain) in the field of machine learning and cognitive science, and is used for estimating or approximating functions.

5) Multi-task learning: in the field of machine Learning, a plurality of related tasks are simultaneously subjected to Joint Learning and optimization to achieve model accuracy better than that of a single task, the tasks are mutually assisted by sharing a presentation layer, and the training method is called Multi-task Learning (Joint Learning).

6) Recommendation accuracy: the recommended multimedia information content has a certain effect within a period of time, and the effect is measured by the interest degree of the user in the video content. Accuracy plays an important role in user retention, clicking and CTR on the terminal side line.

7) Context information: time, location, network status, etc. of the user's access to the recommendation system.

8) MMoE: a model applies the structure of the syntax-of-Experts (MoE) to multi-task learning, explicitly models the relationship between tasks, learns a plurality of gated cyclic unit networks to balance the expression of shared Experts in different tasks, and automatically distributes parameters to capture the common information of the tasks and the distinction among the tasks without adding a large amount of new parameters.

9) ESMM: a model for borrowing from the thought of multi-task learning (transfer learning), introduces two auxiliary tasks, respectively fits pCTR and pCTCVR, regards pCVR as an intermediate variable, thereby weakens the problems of sample selection deviation (SSB), training data sparseness and delayed feedback.

10 Attention: the help model gives different weights to each input part, extracts more key and important information, makes more accurate judgment on the model, and simultaneously does not bring more overhead to calculation and storage of the model.

11 Multi-target recall): i.e. multiple objectives are considered in one recall model. In a recommendation system, it is often necessary to optimize multiple business objectives simultaneously, and to earn more business benefits. Like the E-market scene: the click rate and the conversion rate are expected to be optimized simultaneously, so that the platform has a better target; in an information flow scene, on the basis of hope of improving the click rate of a user, the user attention, praise, comment and other behaviors are improved, and a better community atmosphere is built, so that the retention is improved.

12 Softmax: the very common and important functions in machine learning, especially in multi-class scenarios, are widely used, mapping some inputs to real numbers between 0-1, and normalizing the guaranteed sum to 1.

13 Tag): is a keyword mark, and a group of words representing the core content of the document are extracted from the text and the title of the article.

The embodiment of the present invention may be implemented by combining a Cloud technology, where the Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, and a network in a wide area network or a local area network to implement calculation, storage, processing, and sharing of data, and may also be understood as a generic term of a network technology, an information technology, an integration technology, a management platform technology, an application technology, and the like applied based on a Cloud computing business model. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, photo-like websites and more portal websites, so cloud technology needs to be supported by cloud computing.

It should be noted that cloud computing is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can obtain computing power, storage space and information services as required. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand. As a basic capability provider of cloud computing, a cloud computing resource pool platform, which is called an Infrastructure as a Service (IaaS) for short, is established, and multiple types of virtual resources are deployed in a resource pool and are used by external clients selectively. The cloud computing resource pool mainly comprises: a computing device (which may be a virtualized machine, including an operating system), a storage device, and a network device.

Fig. 1 is a schematic view of a usage scenario of a multimedia information processing method according to an embodiment of the present invention, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a corresponding client capable of playing embedded multimedia information, the terminal is connected to a server 200 through a network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link, where the multimedia information includes, but is not limited to, video, pictures, GIF animation, and advertisement information. The types of multimedia information obtained by the terminals (including the terminal 10-1 and the terminal 10-2) from the corresponding server 200 through the network 300 may be the same or different, for example: the terminal (including the terminal 10-1 and the terminal 10-2) may obtain the video advertisement delivered by the advertiser from the corresponding server 200 through the network 300, or may obtain the image advertisement delivered by the advertiser from the corresponding server 200 through the network 300, and the specific type is not limited in this application. Different multimedia information may be stored in the server 200, wherein the multimedia information as advertisement may be different content in dynamic format, such as gif, mp4, mov, etc.

In the process that the terminal (terminal 10-1 and/or terminal 10-2) acquires and displays the corresponding service with the embedded multimedia information to the server 200 through the network 300, the user may perform different operations on the multimedia information presented in the multimedia information playing window through the terminal (terminal 10-1 and/or terminal 10-2) to generate different user behaviors, for example, when the multimedia information is a video advertisement, the user may share and/or approve the exposed short video in the process of viewing the information, or may click on the short video. When the multimedia information is the dynamic GIF advertisement, in the process that the advertisement passes through the terminal (the terminal 10-1 and/or the terminal 10-2) and is exposed, the user can forward and/or comment the advertisement, and can jump to a corresponding product purchase link page through the GIF advertisement.

As an example, when determining what kind of multimedia information is recommended to the terminal 10-1 or 10-2 of the User for playing, the server 200 needs to adjust the multimedia information to be played in time, for example, replace any multimedia information in the set of multimedia information to be played to adapt to the viewing requirements of different target users, taking short video information as an example, the multimedia information recommendation model provided by the present invention may be applied to short video playing, different short video information of different data sources is usually processed in the short video playing, and finally, different information corresponding to the short video information and corresponding video to be recommended corresponding to the corresponding short video recommendation process are presented on a User Interface UI (User Interface), and the accuracy and timeliness of the features of different information directly affect the User experience. A background database for video playing receives a large amount of video data from different sources every day, and the obtained different information for information recommendation to a target user can be called by other application programs (for example, a recommendation result of a short video recommendation process is migrated to a long video recommendation process or a news recommendation process), and of course, a multimedia information recommendation model matched with the corresponding target user can also be migrated to different video recommendation processes (for example, a web video recommendation process, an applet video recommendation process, or a video recommendation process of a long video client).

As an example, the server 200 is configured to lay a corresponding multimedia information recommendation model to implement the multimedia information recommendation method provided by the present invention, or lay a multimedia information recommendation apparatus to implement the multimedia information recommendation method, specifically, by obtaining historical data of a target object in a multimedia information recommendation environment; determining a first input feature of a multimedia information recommendation model based on the historical data of the target object; performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object; acquiring multimedia information to be recommended in a multimedia information data source; determining a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended; performing feature fusion processing on the second input features through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended; based on the user playing feature vector, the user sharing feature vector, the multimedia information playing feature vector and the multimedia information sharing feature vector, adjusting a recall strategy of multimedia information, recommending the multimedia information to be recommended to the target user, and displaying and outputting the multimedia information to be recommended matched with the target user through a terminal (a terminal 10-1 and/or a terminal 10-2). Taking short multimedia information as an example, the multimedia information recommendation model provided by the invention can be applied to short video playing, different short multimedia information of different data sources is usually processed in the short video playing, corresponding different multimedia information and corresponding multimedia information to be recommended corresponding to a corresponding short video recommendation process are finally presented on a User Interface (User Interface), and the accuracy and timeliness of the characteristics of the different multimedia information directly influence the User experience. A background database for video playing receives a large amount of multimedia information data from different sources every day, and the obtained different multimedia information for multimedia information recommendation to a target user can be called by other application programs (for example, a recommendation result of a short video recommendation process is migrated to a recommendation process in an instant messaging client or a news recommendation process), and certainly, a multimedia information recommendation model matched with the corresponding target user can also be migrated to different video recommendation processes (for example, a web video recommendation process, an applet video recommendation process, or a video recommendation process of a client in an instant messaging client).

The multimedia information recommendation method provided by the embodiment of the application is realized based on Artificial Intelligence (AI), which is a theory, method, technology and application system for simulating, extending and expanding human Intelligence, sensing environment, acquiring knowledge and obtaining optimal results by using knowledge by using a digital computer or a machine controlled by the digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In the embodiment of the present application, the artificial intelligence software technology mainly involved includes the above-mentioned voice processing technology and machine learning and other directions. For example, the Speech Recognition Technology (ASR) in the Speech Technology (Speech Technology) may be involved, including Speech signal preprocessing (Speech signal preprocessing), speech signal frequency domain analysis (Speech signal analysis), speech signal feature extraction (Speech signal feature extraction), speech signal feature matching/Recognition (Speech signal feature matching/Recognition), training of Speech (Speech training), and the like.

For example, machine Learning (ML) may be involved, which is a multi-domain cross discipline, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and so on. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine Learning generally includes techniques such as Deep Learning (Deep Learning), which includes artificial Neural networks (artificial Neural networks), such as Convolutional Neural Networks (CNN), recurrent Neural Networks (RNN), deep Neural Networks (DNN), and the like.

It can be understood that the multimedia information recommendation method and the voice processing provided by the present application can be applied to an Intelligent device (Intelligent device), and the Intelligent device can be any device with an information display function, for example, an Intelligent terminal, an Intelligent home device (such as an Intelligent sound box, an Intelligent washing machine, etc.), an Intelligent wearable device (such as an Intelligent watch), a vehicle-mounted Intelligent center control system (which displays multimedia information to a user through applets executing different tasks), or an AI Intelligent medical device (which displays a treatment case through displaying multimedia information), and the like.

As will be described in detail below, the multimedia information recommendation apparatus according to an embodiment of the present invention may be implemented in various forms, such as a dedicated terminal with a multimedia information recommendation processing function, or a server with a processing function of the multimedia information recommendation apparatus, such as the server 200 in fig. 1. Fig. 2 is a schematic diagram of a composition structure of a multimedia information recommendation device according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the multimedia information recommendation device, and not a whole structure thereof, and a part of or the whole structure shown in fig. 2 may be implemented as needed.

The multimedia information recommendation device provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components of the multimedia information recommendation device are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components of the connection. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

In some embodiments, the multimedia information recommendation apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and as an example, the multimedia information recommendation apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the training method of the multimedia information recommendation model provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

As an example of the multimedia information recommendation apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the multimedia information recommendation apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, where the software modules may be located in a storage medium, the storage medium is located in the memory 202, and the processor 201 reads executable instructions included in the software modules in the memory 202, and completes the training method of the multimedia information recommendation model provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).

By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

As an example of the multimedia information recommendation apparatus provided in the embodiment of the present invention implemented by hardware, the apparatus provided in the embodiment of the present invention may be implemented by directly using a processor 201 in the form of a hardware decoding processor, for example, a training method for implementing the multimedia information recommendation model provided in the embodiment of the present invention is implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

The memory 202 in the embodiment of the present invention is used to store various types of data to support the operation of the multimedia information recommendation apparatus. Examples of such data include: any executable instructions for operating on the multimedia information recommendation device, such as executable instructions, may be included in the executable instructions to implement the method for training from a multimedia information recommendation model according to the embodiments of the present invention.

In other embodiments, the multimedia information recommendation apparatus provided in the embodiments of the present invention may be implemented in software, and fig. 2 illustrates the multimedia information recommendation apparatus stored in the memory 202, which may be software in the form of programs, plug-ins, and the like, and includes a series of modules, as examples of the programs stored in the memory 202, the multimedia information recommendation apparatus may include the following software modules:

an information transmission module 2081 and an information processing module 2082. When the software modules in the multimedia information recommendation device are read into the RAM by the processor 201 and executed, the method for training the multimedia information recommendation model provided by the embodiment of the invention is implemented, wherein the functions of each software module in the multimedia information recommendation device include:

the information transmission module 2081 is used for acquiring historical data of a target object in a multimedia information recommendation environment.

The information processing module 2082 is configured to determine a first input feature of the multimedia information recommendation model based on the historical data of the target object.

The information processing module 2082 is configured to perform feature fusion processing on the first input feature through the multimedia information recommendation model, so as to obtain a user playing feature vector and a user sharing feature vector that are matched with the target object.

The information transmission module 2082 is used for acquiring multimedia information to be recommended from a multimedia information data source.

The information processing module 2082 is configured to determine a second input feature of the multimedia information recommendation model based on the multimedia information to be recommended.

The information processing module 2082 is configured to perform feature fusion processing on the second input feature through the multimedia information recommendation model, so as to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector that are matched with the multimedia information to be recommended.

The information processing module 2082 is configured to adjust a recall policy of the multimedia information based on the user playing feature vector, the user sharing feature vector, the multimedia information playing feature vector, and the multimedia information sharing feature vector.

The information processing module 2082 is configured to determine a target user matched with to-be-recommended multimedia information in a multimedia information data source based on a feature classification processing result, and recommend the to-be-recommended multimedia information to the target user.

According to the electronic device shown in fig. 2, in one aspect of the present application, the present application further provides a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes different embodiments and combinations of embodiments provided in various alternative implementations of the multimedia information recommendation method.

Referring to fig. 3, fig. 3 is an optional flowchart of the multimedia information recommendation method provided by the embodiment of the present invention, and it can be understood that the steps shown in fig. 3 may be executed by various electronic devices operating the multimedia information recommendation apparatus, such as a dedicated terminal with a multimedia information recommendation apparatus, a server, or a server cluster, where the dedicated terminal with a multimedia information recommendation apparatus may be the electronic device with a multimedia information recommendation apparatus in the embodiment shown in the foregoing fig. 2. The following is a description of the steps shown in fig. 3.

Step 301: the multimedia information recommendation device receives a multimedia information recommendation request sent by a terminal.

Step 302: the multimedia information recommendation device responds to the multimedia information recommendation request, and obtains historical data of the target object in the media information recommendation environment.

In some embodiments of the present invention, various types of behaviors of the user matched with the corresponding client may be collected through different program components, and the original log of the user behavior data may be effectively extracted, for example, an equipment number (user account number) of the user, a type of the multimedia information, a browsing duration of the multimedia information, and a browsing integrity parameter of the multimedia information are extracted. The historical clicking behaviors of the users and the browsing duration of the corresponding information are recorded through the subscription service and stored in Redis, and the online recommendation system pulls the historical clicking behaviors of the corresponding users when the user requests to arrive and determines the historical data of the target object.

Step 303: the multimedia information recommendation device determines a first input feature of a multimedia information recommendation model based on the historical data of the target object.

In some embodiments of the present invention, determining the first input feature of the multimedia information recommendation model based on the historical data of the target object may be implemented by:

extracting the sub-information of the playing type included in the historical data of the target object, and determining a first playing identification characteristic, a first playing label characteristic and a first playing category characteristic which are matched with the target object; and extracting the sharing type sub-information included in the historical data of the target object, and determining a first sharing identification characteristic, a first sharing label characteristic and a first sharing category characteristic which are matched with the target object. For the multimedia information recommendation model, recommendation can be performed based on implicit feedback of a user when the multimedia information recommendation model is used, and the satisfaction degree of the user on a recommendation result generally depends on a plurality of indexes, for example, the satisfaction degree evaluation of item recommendation in an electronic mall is a related index based on behaviors such as clicking, browsing depth (stay time), purchasing, collecting, purchasing, repeated purchasing, good evaluation and the like. In the use process of the multimedia information recommendation model, when the play rate and the share rate are taken as targets, the play ID, the play Tag, the play category, the share ID, the share Tag and the share category can be taken as input of a network, and the adjustment of the feature vector is performed through context information.

Step 304: and the multimedia information recommendation device performs feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object.

In some embodiments of the present invention, the multimedia information recommendation model is used to perform feature fusion processing on the first input feature to obtain a user playing feature vector and a user sharing feature vector that are matched with the target object, and the method may be implemented in the following manner:

performing feature fusion processing on the first input feature through a hybrid expert network in the multimedia information recommendation model to obtain a user playing high-order feature vector and a user sharing high-order feature vector; weighting the first input features through a feature weight adjusting network in the multimedia information recommendation model to obtain a user playing low-order feature vector and a user sharing low-order feature vector corresponding to the target object; splicing the user playing high-order feature vector and the user playing low-order feature vector to obtain a user playing feature vector; and splicing the user sharing high-order feature vector and the user sharing low-order feature vector to obtain a user sharing feature vector. Referring to fig. 4, fig. 4 is a schematic structural diagram of a hybrid expert network in an embodiment of the present invention, wherein the hybrid expert network in the multimedia information recommendation model may set up a plurality of different expert subnetworks on the pooling layer embedding at the bottom according to usage requirements, different targets may select an expert network according to requirements, and an input of each target (where each target is a separate mlp tower) is obtained by performing a weighted summation on an output of the expert network through a gated cyclic unit network. Wherein the gated loop element network has the same input as the expert network. When multimedia information is processed, in order to fully utilize the context information of a target user, additional context features can be added into the gated cyclic unit network, and the expert weights corresponding to tasks are obtained through linear transformation and softmax. The input integrating the target and the context characteristics can strengthen the learning of the network to the target, is beneficial to relieving the conflict among different targets, and particularly can determine the context characteristics matched with the target object according to the historical data of the target object; adjusting the user playing feature vector by utilizing the context feature through a hybrid expert network in the multimedia information recommendation model; and adjusting the user risk feature vector by utilizing the context feature through a hybrid expert network in the multimedia information recommendation model.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a feature weight adjusting network in an embodiment of the present invention, where the feature weight adjusting network can solve the problem of feature combination under sparse data when in operation, and the prediction complexity of the feature weight adjusting network is linear and has better universality on continuous and discrete features.

For a common linear model, each feature can be considered independently, and the correlation between the features is not considered. In practice, however, there may be some correlation between features in the multimedia information recommendation environment. Taking news recommendation as an example, a general male user watches more military news, a female user likes emotional news, and the male user browses more sports goods and the female user prefers to browse clothing goods, so that it can be seen that the gender has a certain correlation with the news channel, and the goods category is also related to the gender of the target user.

In order to optimize the process of multimedia information recommendation, only the case of second-order intersection may be considered in the present application, and a specific model may refer to formula 1:

wherein n represents a sampleNumber of features, x _i Is the value of the ith feature, w ₀ 、w _i 、w _ij Is a model parameter, only if x _i And x _j When all the data are not 0, the cross is significant, however, under the condition of sparse data, the samples meeting the requirement that the cross terms are not 0 are very few, and when the training samples are insufficient, the parameter training is easy to be insufficient and inaccurate, and finally the effect of the model is influenced.

Then, the training problem of cross term parameters can be solved approximately by matrix decomposition, refer to equation 2.

Wherein the parameter that the model needs to consider is w ₀ ∈R，w∈R ⁿ ，V∈R ^n×k And <, > is the inner product of two k-dimensional vectors, see equation 3:

for any positive definite matrix W, as long as k is large enough, the matrix W is present such that W = VV ^T . However, in the case of sparse data, a smaller k should be chosen because there is not enough data to estimate w _ij . Limiting the size of k improves the better generalization capability of the model. The time complexity of the direct calculation of equation (2) is O (kn) ² ) Since all cross features need to be calculated. But can be reduced to linear complexity by formula change, refer to formula 4:

as shown in fig. 4, target context is embedding of a context feature, and by means of the embedding, an inner product is made with a feature of each user side, and a mathematical expression of the inner product refers to formula 5:

under the condition of the current context, the importance of the feature in the current target can be obtained, and the larger the inner product value is, the more important the current feature is, so that the introduction of the context feature can add strong prior to the training of the target.

Referring to fig. 6, fig. 6 is a schematic diagram of a processing procedure of playing a feature vector by a user and sharing the feature vector by the user in the embodiment of the present invention, in which a user feature is composed of two parts: 1) High order features were obtained from MMoE: the high-order characteristics output by different expert networks are combined into different target characteristics through the weight values output by the gating cycle unit network combined with the context characteristics, and the dimensionality is d1; 2) Low order features are derived from FM: the original FM features are weighted by the context features to obtain the features with the dimension d 2. The loss can be minimized by adopting a direct splicing mode for the high-order and low-order features, and finally the user sharing feature vector and the user playing feature vector with the dimensionality of d1+ d2 are obtained.

Step 305: the multimedia information recommending device acquires multimedia information to be recommended from a multimedia information data source.

Step 306: and the multimedia information recommendation device determines a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended.

In some embodiments of the present invention, determining the second input feature of the multimedia information recommendation model based on the historical data of the target object may be implemented by:

extracting the sub-information of the playing type included in the multimedia information to be recommended, and determining a second playing identification characteristic, a second playing label characteristic and a second playing category characteristic which are matched with the media information to be recommended; and extracting the sharing type sub-information included in the historical data of the target object, and determining a second sharing identification characteristic, a second sharing label characteristic and a second sharing category characteristic which are matched with the target object.

Step 307: and the multimedia information recommendation device performs feature fusion processing on the second input features through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended.

Referring to fig. 7, fig. 7 is an optional flowchart of the multimedia information recommendation method provided by the embodiment of the present invention, and it can be understood that the steps shown in fig. 7 may be executed by various electronic devices operating the multimedia information recommendation apparatus, for example, a dedicated terminal, a server, or a server cluster with the multimedia information recommendation apparatus, where the dedicated terminal with the multimedia information recommendation apparatus may be the electronic device with the multimedia information recommendation apparatus in the embodiment shown in the foregoing fig. 2. The following is a description of the steps shown in fig. 7.

Step 701: and performing feature fusion processing on the second input feature through a hybrid expert network in the multimedia information recommendation model to obtain a multimedia information playing high-order feature vector and a multimedia information sharing high-order feature vector.

Step 702: and performing weighting processing on the second input features through a feature weight adjusting network in the multimedia information recommendation model to obtain a multimedia information playing low-order feature vector and a multimedia information sharing low-order feature vector corresponding to the multimedia information, wherein all weights in the feature weight adjusting network are fixed values.

Step 703: and splicing the multimedia information playing high-order characteristic vector and the multimedia information playing low-order characteristic vector to obtain a multimedia information playing characteristic vector.

Step 704: and splicing the multimedia information sharing high-order feature vector and the multimedia information sharing low-order feature vector to obtain a multimedia information sharing feature vector.

It should be noted that the difference between the process of acquiring the feature vector corresponding to the multimedia information (Item) and the process of acquiring the feature vector corresponding to the target user is that, because the Item has no context feature, the extra input in fig. 2 is reduced in the gated cyclic unit network of the hybrid expert network MMoE, and when the feature weight of the FM is adjusted, all the feature weights are set to 1, and finally, the Item sharing feature and the Item playing feature with the dimensions d1+ d2 can be obtained similarly. In the process, a Gated cyclic Unit network (GRU Gated Recurrent Unit) is a model which has fewer parameters than LSTM and can process sequence information very well, and then the fusion features are input into a feedforward neural network for the purpose of processing effective information of other features. Taking the predicted plug-in behavior as a predicted occurrence probability problem, using a sigmoid function (logic function) as an output layer, wherein the loss function is standard cross entropy loss, and referring to a formula 6:

the GRU layer is used for extracting the depth features, and can be replaced by a plurality of spliced feedforward neural network layers without the GRU layer, so that the features can be effectively processed and fused.

Step 308: the multimedia information recommendation device adjusts the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector, and carries out multimedia information recommendation through the adjusted recall strategy.

In some embodiments of the present invention, the adjustment of the recall policy of the multimedia information may be implemented by:

determining a first point value based on a user playing high-order feature vector, a user playing low-order feature vector and a multimedia information playing low-order feature vector in the user playing feature vector; determining a second dot product value based on the high-order user sharing feature vector, the low-order user sharing feature vector and the low-order multimedia information sharing feature vector in the high-order user sharing feature vector and the low-order multimedia information sharing feature vector in the multimedia information sharing feature vector; and determining multimedia information to be recalled according to the sum of the first dot product value and the second dot product value. The dot product is carried out on < the user plays the high-order characteristic vector, the user plays the low-order characteristic vector > and < the multimedia information plays the high-order characteristic vector, the multimedia information plays the low-order characteristic vector >, and Top k1 pieces of multimedia information are taken as the multimedia information to be recalled. And (3) performing dot product on < the user shares the high-order characteristic vector, the user shares the low-order characteristic vector > and < the multimedia information shares the high-order characteristic, the multimedia information shares the low-order characteristic >, and taking Top k2 items as recall candidate items. And adding the two inner product values of the recalled candidates, and taking top k with the highest score as a final recall result.

In some embodiments of the present invention, adjusting the recall policy of the multimedia information may be further implemented by:

determining a third dot product value based on the multimedia information sharing high-order feature vector, the multimedia information sharing low-order feature vector, the multimedia information sharing high-order feature vector and the multimedia information sharing low-order feature vector in the user playing feature vector, the user playing high-order feature vector, the user playing low-order feature vector, the multimedia information playing feature vector, the user sharing high-order feature vector, the multimedia information sharing low-order feature vector, and the multimedia information sharing feature vector; and acquiring top k as a recall candidate item according to the third dot product value to determine the multimedia information to be recalled.

In some embodiments of the present invention, referring to fig. 8, fig. 8 is a schematic diagram of an optional multimedia information recommendation in an embodiment of the present invention, where all tasks share a network structure, but features extracted from different network layers correspond to different tasks. Generally speaking, the bottom layer of the model structure corresponds to an NLP task with smaller complexity, when in use, when a target resource includes different advertisements of the same advertiser, the time-efficient short video advertisement information included in different resource groups can be sequentially played in the time-efficient short video playing window, when all time-efficient short video playing areas in the display interface are contracted by the advertiser, the advertisement information of the same advertiser can be cyclically presented in the time-efficient short video playing window of the advertisement information display interface when the advertisement information playing is finished, and meanwhile, when the time-efficient short video of the advertiser is a video advertisement, the video advertisement information of the same advertiser can be cyclically presented, and the audio volume carried by the video is sequentially adjusted to the maximum to prompt a user to watch the played video advertisement. Advertisement A is replaced by advertisement B, so that more playing flow is configured for advertisement B, and a user can obtain better watching experience. Specifically, based on the flow parameters and the iterative experiment parameters matched with the playing strategy of the advertisement information, when the playing strategy of the advertisement information is dynamically adjusted, the exposure rate of the advertisement can be increased. Taking fig. 8 as an example, when it is determined that the advertisement sharing the feature B in the historical browsing information of the male target user 1 has been used, the current advertisement may be replaced by other advertisement information (e.g., an advertisement or a short video) including the feature B by dynamically adjusting the playing policy, and when it is determined that the advertisement purchasing the feature X is clicked in the historical browsing information of the female target user 2, the advertisement a may be replaced by other advertisement information (e.g., an advertisement or a short video advertisement link including the feature X) by dynamically adjusting the playing policy, so as to meet the usage habit of the target user, and enable the user to obtain better usage experience.

Referring to fig. 9, fig. 9 is an optional flowchart of the multimedia information recommendation method provided by the embodiment of the present invention, and it can be understood that the steps shown in fig. 9 can be executed by various electronic devices operating the multimedia information recommendation apparatus, such as a dedicated terminal with a multimedia information recommendation apparatus, a server, or a server cluster, where the dedicated terminal with a multimedia information recommendation apparatus can be an electronic device with a multimedia information recommendation apparatus in the embodiment shown in fig. 2. The following is a description of the steps shown in fig. 9.

Step 901: and determining the historical parameters of the multimedia information to be recommended according to the type of the multimedia information recommendation environment in which the multimedia information to be recommended is located.

Step 902: and determining a training sample set matched with the multimedia information recommendation model based on the historical parameters of the multimedia information to be recommended.

Step 903: and extracting a training sample set matched with the training sample through the noise threshold matched with the multimedia information recommendation model.

Step 904: and determining a multitask loss function matched with the multimedia information recommendation model.

Step 905: and adjusting parameters of a hybrid expert network and characteristic weights in the multimedia information recommendation model based on the multitask loss function to adjust network parameters of the network.

Therefore, in the training process, until loss functions of different dimensions corresponding to the multimedia information recommendation model reach corresponding convergence conditions; the adaptation of the parameters of the multimedia information recommendation model to the multimedia information recommendation environment can be realized. For example, when the usage environment of the multimedia information recommendation model is short video recommendation, and different short videos are recommended to a user in a short video process, a short video playing interface may be displayed in a corresponding APP, or may be triggered by an instant messaging client applet (a multimedia information recommendation model may be packaged in a corresponding APP after being trained or stored in the instant messaging client applet in a plug-in form), as short video application products are continuously developed and increased, the bearing capacity of multimedia information is far greater than that of text information, different types of short videos in a short video server may be uninterruptedly recommended to the user through the corresponding application program, and in this training process, in the usage environment where short video recommendation is triggered by the instant messaging client applet, a dynamic noise threshold value matched with the usage environment of the multimedia information recommendation model needs to be smaller than a dynamic noise threshold value directly recommended to the user in the short multimedia information playing client.

In some embodiments of the present invention, when the multimedia information recommendation model is applied to a news information recommendation process, a fixed noise threshold corresponding to the news information recommendation process is determined, and a first training sample set is denoised according to the fixed noise threshold to form a second training sample set matched with the fixed noise threshold. When the multimedia information recommendation model is solidified in a corresponding hardware mechanism (such as a news reading terminal, an electronic book terminal and a financial and economic news terminal) and the use environment is that different news information is pushed to a user through the news reading terminal or the electronic book terminal, the training speed of the multimedia information recommendation model can be effectively increased and the waiting time of the user is reduced by fixing the fixed noise threshold corresponding to the multimedia information recommendation model. In a use environment with fixed noise, the training sample set can be from historical data of a target user, and the historical recommended multimedia information browsing data can be recommended multimedia information viewing behavior data generated when recommended multimedia information is recommended for the target user once and can be extracted from a historical browsing log. Here, the historical recommended multimedia information browsing data may be all of the historical recommended multimedia information browsing data; the timeliness of the behavior data may also be considered, and only the historical recommended multimedia information browsing data in the preset time period, for example, the historical recommended multimedia information browsing data in the week, and other different historical data may be included.

The multimedia information recommendation method provided in the embodiment of the present invention is described below by taking a commodity spot advertisement information recommendation scene in a short video playing interface as an example, where fig. 10 is an application environment schematic diagram of a training method of a multimedia information recommendation model in the embodiment of the present invention, where as shown in fig. 10, a commodity spot advertisement information playing interface may be displayed in a corresponding APP or triggered by an instant messaging client applet (a multimedia information recommendation model may be packaged in a corresponding APP after training or stored in an instant messaging client applet in a plug-in form, and a use environment is recommendation of news information). For example, a "see-one" portal included in a discovery page of an instant messaging client application, or an audio recommendation portal of an audio application, or a video recommendation portal of a video application, or a live recommendation portal of a live application, etc. When the target terminal runs the target application according to the user operation and controls the target application to display an application page comprising a trigger entry for triggering the opening of the recommended content display page, the trigger operation on the trigger entry can be detected. And when the trigger operation corresponding to the trigger entrance is generated, sending a recommendation request to the server, and after receiving the recommended content fed back by the server in response to the recommendation request, displaying the recommended content in a recommended content display page according to a recommendation sequence.

Referring to fig. 11, fig. 11 is a schematic diagram of data processing of the multimedia information recommendation model applied to the instant messaging client according to the embodiment of the present invention, which specifically includes the following steps:

step 1101: and acquiring historical data of a target object of the instant messaging client.

Step 1102: and acquiring multimedia information to be recommended in a multimedia information data source of the instant messaging client.

Step 1103: and obtaining a user playing characteristic vector and a user sharing characteristic vector which are matched with the target object through a multimedia information recommendation model.

Step 1104: and obtaining a multimedia information playing characteristic vector and a multimedia information sharing characteristic vector which are matched with the multimedia information to be recommended through the multimedia information recommendation model.

Step 1105: and adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector.

Step 1106: and recommending the multimedia information by the adjusted recall strategy.

Multimedia information recommendation mainly includes 1) recall logic: the context-aware multimedia information recommendation model is used for helping to recall candidate items of each user with a reading click behavior, taking the latest playing tag, playing ID, playing category, sharing tag, sharing ID, sharing category and context characteristics of the user, and then calculating some candidate items with highest scores according to a strategy described in a principle section to recall and recommend. 2) And (3) coarse arrangement logic: the context-aware multimedia information recommendation model can be used for helping candidate screening and can also be used in a ranking stage, an embedding vector of each user and embedding of candidates are already calculated in a recall stage, cosine similarity of the embedding vector of the candidates and the embedding vector of the user is calculated in a rough ranking stage, and a head item is taken as a recall result.

The beneficial technical effects are as follows:

the method comprises the steps of obtaining historical data of a target object in a multimedia information recommendation environment; determining a first input feature of a multimedia information recommendation model based on the historical data of the target object; performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector which are matched with the target object; acquiring multimedia information to be recommended in a multimedia information data source; determining a second input characteristic of the multimedia information recommendation model based on the multimedia information to be recommended; performing feature fusion processing on the second input feature through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended; and adjusting the multimedia information recall strategy based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector. Therefore, the multimedia information recommendation model can recommend the multimedia information in the use environment to different users, the accuracy and relevance of the multimedia information recommendation are enhanced, the quality of the recommendation of the multimedia information is effectively improved, and the use experience of the users is improved.

The above description is intended to be illustrative only, and should not be taken as limiting the scope of the invention, which is intended to include all such modifications, equivalents, and improvements as fall within the true spirit and scope of the invention.

Claims

1. A method for recommending multimedia information, the method comprising:

performing feature fusion processing on the second input feature through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector which are matched with the multimedia information to be recommended;

and adjusting the recall strategy of the multimedia information based on the user playing characteristic vector, the user sharing characteristic vector, the multimedia information playing characteristic vector and the multimedia information sharing characteristic vector, and recommending the multimedia information through the adjusted recall strategy.

2. The method of claim 1, wherein the first input feature of the multimedia information recommendation model comprises: a first play identification feature, a first play label feature, a first play category feature, a first share identification feature, a first share label feature, a first share category feature;

the determining of the first input feature of the multimedia information recommendation model based on the historical data of the target object comprises:

extracting the sub-information of the playing type included in the historical data of the target object, and determining a first playing identification characteristic, a first playing label characteristic and a first playing category characteristic which are matched with the target object;

and extracting the sharing type sub-information included in the historical data of the target object, and determining a first sharing identification characteristic, a first sharing label characteristic and a first sharing category characteristic which are matched with the target object.

3. The method according to claim 1, wherein the performing feature fusion processing on the first input feature through the multimedia information recommendation model to obtain a user playing feature vector and a user sharing feature vector that are matched with the target object includes:

performing feature fusion processing on the first input feature through a hybrid expert network in the multimedia information recommendation model to obtain a user playing high-order feature vector and a user sharing high-order feature vector;

weighting the first input features through a feature weight adjusting network in the multimedia information recommendation model to obtain a user playing low-order feature vector and a user sharing low-order feature vector corresponding to the target object;

splicing the user playing high-order feature vector and the user playing low-order feature vector to obtain a user playing feature vector;

and splicing the user sharing high-order characteristic vector and the user sharing low-order characteristic vector to obtain a user sharing characteristic vector.

4. The method of claim 3, further comprising:

determining context characteristics matched with the target object according to the historical data of the target object;

adjusting the user playing feature vector by utilizing the context feature through a hybrid expert network in the multimedia information recommendation model;

and adjusting the user risk feature vector by utilizing the context feature through a hybrid expert network in the multimedia information recommendation model.

5. The method of claim 3, further comprising:

determining the number of the hybrid expert networks according to the multimedia information recommendation environment;

and adjusting the structure of the hybrid expert network in the multimedia information recommendation model according to the number of the hybrid expert networks.

6. The method of claim 1, wherein the second input feature of the multimedia information recommendation model comprises: a second playing identification feature, a second playing label feature, a second playing category feature, a second sharing identification feature, a second sharing label feature and a second sharing category feature;

the determining of the second input feature of the multimedia information recommendation model based on the historical data of the target object comprises:

extracting the sub-information of the playing type included in the multimedia information to be recommended, and determining a second playing identification characteristic, a second playing label characteristic and a second playing category characteristic which are matched with the media information to be recommended;

and extracting the sharing type sub-information included in the historical data of the target object, and determining a second sharing identification characteristic, a second sharing label characteristic and a second sharing category characteristic which are matched with the target object.

7. The method according to claim 1, wherein the performing feature fusion processing on the second input feature through the multimedia information recommendation model to obtain a multimedia information playing feature vector and a multimedia information sharing feature vector that are matched with the multimedia information to be recommended comprises:

performing feature fusion processing on the second input feature through a hybrid expert network in the multimedia information recommendation model to obtain a multimedia information playing high-order feature vector and a multimedia information sharing high-order feature vector;

performing weighting processing on the second input features through a feature weight adjusting network in the multimedia information recommendation model to obtain a multimedia information playing low-order feature vector and a multimedia information sharing low-order feature vector corresponding to the multimedia information, wherein all weights in the feature weight adjusting network are fixed values;

splicing the multimedia information playing high-order feature vector and the multimedia information playing low-order feature vector to obtain a multimedia information playing feature vector;

and splicing the multimedia information sharing high-order feature vector and the multimedia information sharing low-order feature vector to obtain a multimedia information sharing feature vector.

8. The method of claim 1, wherein the adjusting the multimedia information recall policy based on the user play feature vector, the user sharing feature vector, the multimedia information play feature vector, and a multimedia information sharing feature vector comprises:

determining a first point value based on a user playing high-order feature vector, a user playing low-order feature vector and a multimedia information playing low-order feature vector in the user playing feature vector;

determining a second dot product value based on the high-order user sharing feature vector, the low-order user sharing feature vector and the low-order multimedia information sharing feature vector in the high-order user sharing feature vector and the low-order multimedia information sharing feature vector in the multimedia information sharing feature vector;

and determining multimedia information to be recalled according to the sum of the first dot product value and the second dot product value.

9. The method of claim 1, wherein the adjusting the multimedia information recall strategy based on the user play feature vector, the user sharing feature vector, the multimedia information play feature vector, and a multimedia information sharing feature vector comprises:

determining a third dot product value based on the high-order feature vector played by the user, the low-order feature vector played by the user, the multimedia information playing high-order feature vector, the multimedia information playing low-order feature vector, the high-order feature vector shared by the user, the low-order feature vector shared by the user, the multimedia information sharing feature vector, the high-order feature vector shared by the user and the low-order feature vector shared by the multimedia information in the user playing feature vector;

and determining multimedia information to be recalled according to the third dot value.

10. The method of claim 1, further comprising:

performing data screening processing on the multimedia information to be recommended, and analyzing to obtain a title and a label of the multimedia information to be recommended;

triggering a target word segmentation library, and performing word segmentation processing on the title and the label of the multimedia information to be recommended through the target word segmentation library to obtain word-level multimedia information to be recommended;

vectorizing the word-level multimedia information to be recommended through a text information processing network in the multimedia information recommendation model to form a multi-dimensional word-level title characteristic vector and a multi-dimensional word-level label characteristic vector of the multimedia information to be recommended.

11. The method of claim 1, further comprising:

acquiring historical browsing information of a target user;

determining a multimedia information exposure history corresponding to the historical browsing information based on the historical browsing information of the target user;

and dynamically adjusting the playing strategy of the multimedia information based on the multimedia information exposure history corresponding to the history browsing information.

12. The method of claim 1, further comprising:

determining the type of a multimedia information recommendation environment;

determining the category of the multimedia information to be played according to the type of the multimedia information recommendation environment;

and responding to the category of the multimedia information to be played, and triggering the matched multimedia information data source so as to adjust the multimedia information to be played through the multimedia information data source matched with the category of the multimedia information to be played.

13. An apparatus for recommending multimedia information, the apparatus comprising:

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the method of multimedia information recommendation of any of claims 1 to 12 when executing the executable instructions stored in the memory.

15. A computer-readable storage medium storing executable instructions, wherein the executable instructions when executed by a processor implement the method for multimedia information recommendation according to any one of claims 1-12.