CN113806568B

CN113806568B - Multimedia resource recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN113806568B
Application number: CN202110915601.5A
Authority: CN
Inventors: 赵鑫; 王辉; 冯翔; 毛景树; 王珵; 江鹏
Original assignee: Renmin University of China; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Renmin University of China; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2023-11-03
Anticipated expiration: 2041-08-10
Also published as: CN113806568A

Abstract

The method comprises the steps of responding to a multimedia resource acquisition request of a target object, acquiring historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, wherein the historical behavior sequence information comprises second resource identification information of a plurality of multimedia resources of which the target object has behaved in a preset time period and first number of time sequence identification information corresponding to the plurality of multimedia resources; inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of a target object on at least one multimedia resource; and recommending the target multimedia resource in the at least one multimedia resource to the target object based on the target interest index. By utilizing the embodiment of the disclosure, the recommendation accuracy and effect can be improved.

Description

Multimedia resource recommendation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to a multimedia resource recommendation method, a device, electronic equipment and a storage medium.

Background

With the development of internet technology, a large number of network platforms are continuously upgraded, and besides some graphic information can be released, the personal users can share daily short videos and the like at any time, and how to accurately capture interests of the users is a challenge for a large number of recommendation systems.

In the related technology, the behavior sequence comprising a large number of historical behavior records of the user for a long time is often directly used as input data of the neural network for learning interest preferences of the user, but a large number of historical behavior records are mixed together, fine granularity characteristics which cannot be learned are often, the interest preferences of the user cannot be learned effectively, and the problem of poor recommendation accuracy and effect in a recommendation system is caused.

Disclosure of Invention

The disclosure provides a multimedia resource recommendation method, a device, electronic equipment and a storage medium, which at least solve the problems that the real interests of users cannot be effectively represented in the related technology, and the recommendation accuracy and effect are poor in a recommendation system. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a multimedia resource recommendation method, including:

responding to a multimedia resource acquisition request of a target object, acquiring historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, wherein the historical behavior sequence information comprises second resource identification information of a plurality of multimedia resources of which the target object has acted in a preset time period and first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information is an identification determined based on a preset distribution time period where the behavior time corresponding to the plurality of multimedia resources is located;

Inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of the target object on the at least one multimedia resource;

and recommending the target multimedia resource in the at least one multimedia resource to the target object based on the target interest index.

Optionally, the interest recognition network includes a coding network, the first number of basic capsule networks, a second number of interest capsule networks, a feature fusion network and an interest perception network, where the second number is a number corresponding to a preset resource type;

the step of inputting the historical behavior sequence information and the first resource identification information into an interest identification network to perform interest identification, and the step of obtaining the target interest index of the target object on the at least one multimedia resource comprises the following steps:

performing coding processing on the historical behavior sequence information based on the coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding characteristic information corresponding to the second resource identification information and first number of time sequence coding characteristic information corresponding to the first number of time sequence identification information;

Determining basic capsule characteristic information in the first number of basic capsule networks according to the first number of time sequence coding characteristic information;

transmitting the basic capsule characteristic information to the second number of interest capsule networks for interest identification based on the transmission weight of each basic capsule network relative to the second number of interest capsule networks, and obtaining initial interest characteristic information corresponding to the second number of interest capsule networks;

inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, so as to obtain target interest feature information;

and inputting the resource coding characteristic information, the target interest characteristic information and the resource characteristic information corresponding to the first resource identification information into the interest perception network to perform interest perception processing, so as to obtain the target interest index.

Optionally, the coding network includes: the system comprises a feature extraction network, a position coding network and at least one sub-coding network connected in sequence, wherein any sub-coding network comprises a self-attention network and a feedforward neural network, the historical behavior sequence information is coded based on the coding network, and behavior coding information is obtained, and the method comprises the following steps:

Extracting behavior feature information of the historical behavior sequence information based on the feature extraction network;

performing position coding processing on the behavior characteristic information based on the position coding network to obtain target behavior characteristic information;

traversing the at least one sub-coding network connected in sequence, and under the condition of traversing any sub-coding network, inputting the current behavior characteristic information into a self-attention network in the currently traversed sub-coding network to perform self-attention learning to obtain current initial coding information;

inputting the current initial coding information into a feedforward neural network in the sub-coding network traversed currently to perform nonlinear processing to obtain current coding behavior information;

taking current coding behavior information at the end of traversal as the behavior coding information;

the current behavior characteristic information of the first sub-coding network in the at least one sub-coding network connected in sequence is the target behavior characteristic information, and the current behavior characteristic information of the non-first sub-coding network in the at least one sub-coding network connected in sequence is the current coding behavior information output by the previous sub-coding network.

Optionally, the method further comprises:

Generating target mask information, wherein the target mask information comprises first mask information, second mask information, third mask information and fourth mask information, and the first mask information characterizes association relations among the first number of behavior sequence fragments corresponding to the first number of time sequence identification information; the second mask information characterizes the association relationship between the second resource identification information of any one of the plurality of multimedia resources and the second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to the plurality of multimedia resources; the fourth mask information characterizes the association information of the plurality of multimedia resources to the first number of behavior sequence segments;

the step of inputting the current behavior characteristic information into the self-attention network in the sub-coding network traversed currently to perform self-attention learning, and the step of obtaining the current initial coding information comprises the following steps:

and inputting the current behavior characteristic information and the target mask information into a self-attention network in the sub-coding network traversed currently to perform self-attention learning to obtain the current initial coding information.

Optionally, the method further comprises:

inputting target resource coding feature information, the target interest feature information and the first resource identification information into the interest perception network to perform interest perception processing to obtain the target interest index;

the target resource coding characteristic information is resource coding characteristic information of which the time difference between the corresponding behavior time and the current time in the resource coding characteristic information corresponding to the plurality of multimedia resources meets the preset condition.

Optionally, the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, the inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, and obtaining the target interest feature information includes:

inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the first resource identification information into the weight learning layer for weight learning to obtain interest weight information, wherein the interest weight information characterizes the influence degree of the initial interest feature information corresponding to the second plurality of interest capsule networks on the interest of the target object;

And inputting the interest weight information and the initial interest feature information into the weighting processing layer for weighting processing to obtain the target interest feature information.

Optionally, the method further comprises:

acquiring sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource and a marked interest index, wherein the sample behavior sequence information comprises fourth resource identification information of a plurality of historical multimedia resources of the sample object, which have been performed in a preset sample time period, and the first number of sample time sequence identification information corresponding to the plurality of historical multimedia resources, which are determined based on behavior time corresponding to the plurality of historical multimedia resources;

determining a plurality of resource types corresponding to the plurality of historical multimedia resources;

determining the sample occurrence probability of the corresponding resource identification information of each resource type in the sample behavior sequence information;

inputting the sample behavior sequence information and the third resource identification information into an interest identification network to be trained for interest identification, and obtaining a predicted interest index of the sample object on the at least one sample multimedia resource and a plurality of predicted occurrence probabilities corresponding to the resource types;

Determining target interest loss based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability and the labeled interest index;

and training the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network.

Optionally, the determining the target interest loss based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the labeled interest index includes:

determining initial interest loss information according to the predicted interest index and the marked interest index;

determining interest guide loss information according to the predicted occurrence probability and the sample occurrence probability;

the target interest loss is determined based on the initial interest loss information and the interest guidance loss information.

According to a second aspect of the embodiments of the present disclosure, there is provided a multimedia resource recommendation apparatus, including:

a first information acquisition module configured to perform a response to a multimedia resource acquisition request of a target object, and acquire historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, where the historical behavior sequence information includes second resource identification information of a plurality of multimedia resources in which the target object has performed in a preset time period and first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information is an identification determined based on a preset distribution time period in which behavior times corresponding to the plurality of multimedia resources are located;

The first interest recognition module is configured to perform interest recognition by inputting the historical behavior sequence information and the first resource identification information into an interest recognition network, so as to obtain a target interest index of the target object on the at least one multimedia resource;

and a multimedia resource recommendation module configured to perform recommendation of a target multimedia resource of the at least one multimedia resource to the target object based on the target interest index.

the first interest identification module includes:

the coding processing unit is configured to perform coding processing on the historical behavior sequence information based on the coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding characteristic information corresponding to the second resource identification information and first number of time sequence coding characteristic information corresponding to the first number of time sequence identification information;

A base capsule characteristic information determination unit configured to perform determination of base capsule characteristic information in the first number of base capsule networks from the first number of time-series coded characteristic information;

the interest identification unit is configured to execute interest identification by transmitting the basic capsule characteristic information to the second number of interest capsule networks based on the transmission weight of each basic capsule network relative to the second number of interest capsule networks, so as to obtain initial interest characteristic information corresponding to the second number of interest capsule networks;

the feature fusion unit is configured to input initial interest feature information corresponding to the second number of interest capsule networks and resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, so as to obtain target interest feature information;

the first interest perception processing unit is configured to input the resource coding feature information, the target interest feature information and the resource feature information corresponding to the first resource identification information into the interest perception network to perform interest perception processing, so as to obtain the target interest index.

Optionally, the coding network includes: a feature extraction network, a position encoding network, and at least one sub-encoding network connected in sequence, any sub-encoding network comprising a self-attention network and a feed-forward neural network, the encoding processing unit comprising:

a behavior feature information extraction unit configured to perform behavior feature information of extracting the historical behavior sequence information based on the feature extraction network;

the position coding processing unit is configured to perform position coding processing on the behavior characteristic information based on the position coding network to obtain target behavior characteristic information;

a traversing unit configured to perform traversing the at least one sequentially connected sub-coding network;

the self-attention learning unit is configured to perform self-attention learning by inputting the current behavior characteristic information into a self-attention network in the currently traversed sub-coding network under the condition of traversing any sub-coding network, so as to obtain the current initial coding information;

the nonlinear processing unit is configured to perform nonlinear processing on the feedforward neural network which inputs the current initial coding information into the sub-coding network traversed currently to obtain current coding behavior information;

A behavior encoding information determination unit configured to perform current encoding behavior information at the end of traversal as the behavior encoding information;

Optionally, the apparatus further includes:

a target mask information generating module configured to perform generation of target mask information, the target mask information including first mask information, second mask information, third mask information, and fourth mask information, the first mask information characterizing an association relationship between the first number of behavior sequence pieces corresponding to the first number of timing identification information; the second mask information characterizes the association relationship between the second resource identification information of any one of the plurality of multimedia resources and the second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to the plurality of multimedia resources; the fourth mask information characterizes the association information of the plurality of multimedia resources to the first number of behavior sequence segments;

The self-attention learning unit is further configured to perform self-attention learning by inputting current behavior feature information and the target mask information into a self-attention network in the sub-coding network currently traversed to obtain the current initial coding information.

Optionally, the apparatus further includes:

the second interest perception processing unit is configured to input target resource coding feature information, the target interest feature information and the first resource identification information into the interest perception network to perform interest perception processing, so as to obtain the target interest index;

Optionally, the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, and the feature fusion unit includes:

the weight learning unit is configured to perform weight learning by inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the first resource identification information into the weight learning layer, so as to obtain interest weight information, wherein the interest weight information characterizes the influence degree of the initial interest feature information corresponding to the second plurality of interest capsule networks on the interest of the target object;

And the weighting processing unit is configured to input the interest weight information and the initial interest feature information into the weighting processing layer for weighting processing, so as to obtain the target interest feature information.

Optionally, the apparatus further includes:

a second information acquisition module configured to perform acquisition of sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource, and a labeling interest index, the sample behavior sequence information including fourth resource identification information of a plurality of historical multimedia resources in which the sample object has been behaved within a preset sample period, and the first number of sample timing identification information corresponding to the plurality of historical multimedia resources determined based on behavior times corresponding to the plurality of historical multimedia resources;

a resource type determining module configured to perform determining a plurality of resource types corresponding to the plurality of historical multimedia resources;

a sample occurrence probability determining module configured to perform determining a sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information;

the second interest recognition module is configured to input the sample behavior sequence information and the third resource identification information into an interest recognition network to be trained for interest recognition, so as to obtain a predicted interest index of the sample object on the at least one sample multimedia resource and the corresponding predicted occurrence probabilities of a plurality of resource types;

A target interest loss determination module configured to perform a target interest loss based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the labeled interest index;

and the interest recognition network training module is configured to execute training of the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network.

Optionally, the target interest loss determination module includes:

an initial interest loss information determination unit configured to perform determination of initial interest loss information according to the predicted interest index and the labeled interest index;

an interest guidance loss information determination unit configured to perform determination of interest guidance loss information based on the predicted occurrence probability and the sample occurrence probability;

a target interest loss determination unit configured to perform determination of the target interest loss based on the initial interest loss information and the interest guidance loss information.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of any one of the first aspects of embodiments of the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspects of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the process of recommending the multimedia resources, in a historical behavior sequence comprising second resource identification information of a plurality of multimedia resources of which target objects act in a preset time period, first time sequence identification information determined based on a preset distribution time period of behavior time corresponding to the plurality of multimedia resources is also introduced, so that in the process of carrying out interest identification, behavior data can be comprehensively mastered, interest preference and more fine granularity characteristics in different time periods can be learned by combining the time sequence identification information, real interests of users can be effectively represented, interest identification efficiency and identification accuracy are improved, and information recommendation accuracy and recommendation effect in a recommendation system are greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of an application environment shown in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a method of multimedia asset recommendation, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a method for inputting historical behavior sequence information and first resource identification information into an interest recognition network for interest recognition to obtain a target interest indicator for a target object for at least one multimedia resource, according to an example embodiment;

FIG. 4 is a flowchart illustrating a process for encoding historical behavior sequence information based on an encoding network to obtain behavior encoded information according to an example embodiment;

FIG. 5 is a diagram illustrating target mask information according to an example embodiment;

FIG. 6 is a flowchart illustrating another method for inputting historical behavior sequence information and first resource identification information into an interest recognition network for interest recognition to obtain a target interest indicator for a target object for at least one multimedia resource, according to an example embodiment;

FIG. 7 is a flowchart illustrating a pre-training interest recognition network, according to an example embodiment;

FIG. 8 is a flowchart illustrating a method for determining a target interest loss based on a predicted interest metric, a predicted occurrence probability, a sample occurrence probability, and a labeled interest metric, according to an example embodiment;

FIG. 9 is a schematic diagram of an interest identification network, shown in accordance with an exemplary embodiment;

FIG. 10 is a block diagram of a multimedia asset recommendation device, according to an example embodiment;

FIG. 11 is a block diagram of an electronic device for multimedia asset recommendation, shown in accordance with an exemplary embodiment;

FIG. 12 is a block diagram of an electronic device for multimedia asset recommendation, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment, which may include a server 100 and a terminal 200, as shown in fig. 1, according to an exemplary embodiment.

In an alternative embodiment, the server 100 may be used to train the interest recognition network. Specifically, the server 100 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.

In an alternative embodiment, the terminal 200 may perform the multimedia resource recommendation process in conjunction with the interest recognition network trained by the server 100. In particular, the terminal 200 may include, but is not limited to, smart phones, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, smart wearable devices, and other types of electronic devices. Alternatively, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that, fig. 1 is only an application environment provided by the present disclosure, and in practical applications, other application environments, such as multimedia resource recommendation processing, may also be included, and may also be implemented on the server 100.

In the embodiment of the present disclosure, the server 100 and the terminal 200 may be directly or indirectly connected through a wired or wireless communication manner, which is not limited herein.

Fig. 2 is a flowchart illustrating a multimedia asset recommendation method according to an exemplary embodiment, and the multimedia asset recommendation method is used in a terminal and a server electronic device, as shown in fig. 1, and includes the following steps.

In step S201, in response to a multimedia resource acquisition request of a target object, historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended are acquired.

In an alternative embodiment, the target object is a user, and may also be a user account of the user in the recommendation system. In practical application, the user can actively trigger the multimedia resource acquisition request based on the corresponding terminal, or the recommendation system can actively trigger the multimedia resource acquisition request of the target object after the target object enters the system.

In a specific embodiment, the historical behavior sequence information includes second resource identification information of a plurality of multimedia resources of the target object that has been behaved in a preset time period and first number of time sequence identification information corresponding to the plurality of multimedia resources, where the time sequence identification information may be an identification determined based on a preset distribution time period where behavioural time corresponding to the plurality of multimedia resources is located.

In a specific embodiment, the resource identification information of the multimedia resource may be identification information for distinguishing between different multimedia resources. Accordingly, the first resource identification information may include resource identification information of at least one multimedia resource to be recommended, and the second resource identification information may include resource identification information of the plurality of multimedia resources. Specifically, the at least one multimedia resource to be recommended may be a multimedia resource to be recommended in the recommendation system. Alternatively, the multimedia resources may include static resources such as text and images, and may also include dynamic resources such as short videos.

In a specific embodiment, the plurality of multimedia resources that the target object behaves on may be multimedia resources that the target object performs a preset operation within a preset time period. The preset operations may include, but are not limited to, browsing, clicking, converting (e.g., purchasing related products based on the multimedia asset, or downloading related applications based on the multimedia asset, etc.), etc. Specifically, the preset time period may be a preset collection period of the historical behavior sequence information, for example, the preset time period is half a month before the current time, and correspondingly, the historical behavior sequence information may include resource identification information of the multimedia resource in which the target object performs the preset operation in half a month before the current time.

In a specific embodiment, the first number of timing identification information may be used to characterize a behavior distribution of the target object over a preset period of time from a dimension of the timing. The behavior time corresponding to the plurality of multimedia resources may be time information of the target object for executing the preset operation on each multimedia resource. In a specific embodiment, a preset unit division time may be determined; dividing the preset time period into a first number of preset distribution time periods according to preset unit dividing time; generating the first number of marks of preset distribution time periods; and taking the identification of the preset distribution time period of the behavior time corresponding to each multimedia resource as the time sequence identification information corresponding to the multimedia resource.

In a specific embodiment, the preset unit dividing time may be preset in combination with an actual application requirement, for example, from 0 point to 24 points per day, and accordingly, the preset time period is assumed to be 15 days before the current time, and correspondingly, the preset time period may be divided into 15 (a first number) 0 points to 24 points (preset distribution time periods), and in a specific embodiment, it is assumed that respective identifiers are generated for 15 preset distribution time periods according to time from front to back: g1, g2, g3.. Accordingly, the identifier of the preset distribution time period where the action time corresponding to each multimedia resource is located may be used as the time sequence identifier information corresponding to the multimedia resource, specifically, if the action times corresponding to any two or more multimedia resources are on the same day (i.e., in the same preset distribution time period), that is, the time sequence identifier information corresponding to the two or more multimedia resources is the same.

In step S203, the historical behavior sequence information and the first resource identification information are input into an interest identification network to perform interest identification, so as to obtain a target interest index of a target object on at least one multimedia resource;

in a specific embodiment, the target interest indicator may characterize a probability that the target object performs a predictive operation on the at least one multimedia asset.

In an alternative embodiment, the interest recognition network may include a coding network, a first number of base capsule networks, a second number of interest capsule networks, a feature fusion network, and an interest perception network.

In practical applications, each multimedia resource is labeled with a corresponding resource type according to the resource characteristics of the multimedia resource in the recommendation system, and in a specific embodiment, the resource type corresponding to each multimedia resource may include one or more (at least two) resource types. In a particular embodiment, the resource types may include, but are not limited to, sports, food, games, travel, and the like. Optionally, the second number may be a number corresponding to a preset resource type; the preset resource type may be a resource type of a multimedia resource in the recommendation system. Correspondingly, the second plurality of capsule-of-interest networks respectively correspond to one of the second plurality of resource types.

Optionally, as shown in fig. 3, the step of inputting the historical behavior sequence information and the first resource identification information into the interest identification network to perform interest identification to obtain the target interest index of the target object on at least one multimedia resource may include the following steps:

in step S2031, the historical behavior sequence information is encoded based on the encoding network, to obtain behavior encoding information.

In a specific embodiment, the behavior coding information may include resource coding feature information corresponding to the second resource identification information and a first number of time sequence coding feature information corresponding to the first number of time sequence identification information;

in an alternative embodiment, the coding network may include: the feature extraction network, the position coding network and at least one sub-coding network connected in sequence, specifically, any sub-coding network comprises a self-attention network and a feedforward neural network, and correspondingly, as shown in fig. 4, the coding processing is performed on the historical behavior sequence information based on the coding network to obtain behavior coding information, which may include the following steps:

in step S401, behavior feature information of the historical behavior sequence information is extracted based on the feature extraction network;

In step S403, performing a position coding process on the behavior feature information based on the position coding network to obtain target behavior feature information;

in step S405, at least one sub-coding network connected in sequence is traversed, and under the condition of traversing any sub-coding network, the current behavior characteristic information is input into a self-attention network in the currently traversed sub-coding network to perform self-attention learning, so as to obtain current initial coding information;

in step S407, inputting the current initial coding information into a feedforward neural network in the sub-coding network traversed currently to perform nonlinear processing to obtain current coding behavior information;

in step S409, current encoding behavior information at the end of the traversal is taken as behavior encoding information.

In a specific embodiment, the multiple multimedia resources that the target object has behaved are often arranged in sequence according to the corresponding behaviours, and feature information that can represent the ordering positions of the multimedia resources in the multiple multimedia resources can be added into the behavioural feature information in the position coding processing process, so that the subsequent learning of the related association between the multimedia resources is facilitated.

In a specific embodiment, the current behavior feature information of the first sub-coding network in the at least one sub-coding network connected in sequence is target behavior feature information, and the current behavior feature information of the non-first sub-coding network in the at least one sub-coding network connected in sequence is current coding behavior information output by the previous sub-coding network.

In the above embodiment, at least one sub-coding network is set in the coding network, and the current coding behavior information output by the previous sub-coding network is continuously used as the input of the current sub-coding network, so that behavior characteristic information in the historical behavior sequence can be better learned, and further, the characterization accuracy of interest preference of the user is improved.

In an alternative embodiment, the method may further include:

target mask information is generated.

In practical application, because of the condition that the input of the network has an indefinite length, the input is often filled to a specified input size by filling the input with 0, and further, in order to shield the 0 introduced for filling, mask information can be introduced in the self-care learning process. In the embodiment of the present specification, in addition to setting mask information in combination with padding information to be masked in the process of introducing mask information, setting target mask information may be performed in combination with the following four dimensions: 1. relevant dimension of interest between multimedia resources; 2. relevant dimension of interest between behavior sequence fragments; 3. the behavior sequence segment focuses on the dimension of the corresponding multimedia resource; 4. the multimedia resource focuses on the dimension of the corresponding behavior sequence segment.

Correspondingly, the target mask information comprises first mask information, second mask information, third mask information and fourth mask information, wherein the first mask information characterizes the association relation between the first number of behavior sequence fragments corresponding to the first number of time sequence identification information; the second mask information represents the association relationship between second resource identification information of any one of the plurality of multimedia resources and second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to a plurality of multimedia resources; the fourth mask information characterizes association information of a plurality of multimedia assets to the first number of behavior sequence segments.

In a specific embodiment, the behavior sequence segment corresponding to each time sequence identification information may be resource identification information of a multimedia resource corresponding to the time sequence identification information in the plurality of multimedia resources that have been acted by the target object. Correspondingly, the association relationship between the behavior sequence segments can be that the behavior sequence segments pay attention to each other in the process of learning the characteristic information of the behavior sequence segments.

In a specific embodiment, taking the preset range as 2 as an example, the association relationship between the second resource identification information of a certain multimedia resource and the second resource identification information of the multimedia resource within the preset range may be that the second resource identification information of the multimedia resource and the second resource identification information of the two adjacent multimedia resources in front and back are focused on each other in the process of learning the characteristic information of the second resource identification information of the multimedia resource.

In a specific embodiment, any behavior sequence segment corresponds to a part of multimedia resources in the plurality of multimedia resources which are acted by the target object, and correspondingly, the association information of the behavior sequence segment to the multimedia resources may be that the corresponding part of multimedia resources is concerned in the process of learning the feature information of the behavior sequence segment; the association information of the plurality of multimedia resources to the first number of behavior sequence segments may be a corresponding behavior sequence segment that is focused in a process of learning the feature information of the resource identification information of the partial multimedia resources.

In an alternative embodiment, as shown in fig. 5, fig. 5 is a schematic diagram of a target mask information according to an exemplary embodiment, assuming that the plurality of multimedia resources includes i1, i2.. The first number of timing identification information includes g1, g2, g3, and the first mask information is M ^g2g The second mask information is M ^l2l The third mask information is M ^g2l The fourth mask information is M ^l2g Mask information corresponding to the white part is 0, and no concern relation among corresponding information is represented; mask information corresponding to the black part is 1, and the corresponding information is characterized by attention.

Correspondingly, the inputting the current behavior feature information into the self-attention network in the sub-coding network traversed currently to perform self-attention learning, and obtaining the current initial coding information may include:

and inputting the current behavior characteristic information and the target mask information into a self-attention network in the sub-coding network traversed currently to perform self-attention learning, so as to obtain the current initial coding information.

In the above embodiment, the target mask information is introduced in the self-attention learning process, so that the related attention among the multimedia resources of the behaviors in the behavior sequence of the target object can be limited within the preset range, and the processing efficiency is greatly improved; the whole behavior sequence can be connected through the related attention among the behavior sequence fragments corresponding to the time sequence identification information, so that the information loss caused by the limitation of the local range is compensated; meanwhile, mutual attention between the behavior sequence fragments and the resource identification information of the corresponding multimedia resources can be realized, and the accuracy of the learned interest preference can be greatly improved on the basis of improving the processing efficiency.

In step S2033, base capsule characteristic information in a first number of base capsule networks is determined from the first number of time series coded characteristic information.

In a specific embodiment, the first number of time-sequence-encoded characteristic information may be used as the capsule characteristic information in the first number of base capsule networks, respectively. In one particular embodiment, assume that the first number is 10 and that the 10 time series encoded characteristic information includes a 1 st time series encoded characteristic information, a 2 nd time series encoded characteristic information; the 10 th base capsule network includes a 1 st base capsule network, a 2 nd base capsule network. The 10 th base capsule network, optionally, the 1 st time sequence coding feature information may be used as the capsule feature information in the 1 st base capsule network; the 2 nd time sequence coding characteristic information is taken as the capsule characteristic information in the 2 nd basic capsule network, and so on.

In step S2035, based on the transmission weight of each basic capsule network relative to the second number of interest capsule networks, the basic capsule feature information is transmitted to the second number of interest capsule networks to perform interest identification, so as to obtain initial interest feature information corresponding to the second number of interest capsule networks.

In a particular embodiment, the transmission weights may characterize the specific gravity information in each base capsule network that the capsule characteristic information transmits to each capsule network of interest. In a specific embodiment, the initial interest feature information corresponding to any of the interest capsule networks may be the capsule feature information in the interest capsule network, and specifically, the capsule feature information in a certain interest capsule network may be the interest characterization information of the target object on the resource type corresponding to the interest capsule network.

In step S2037, the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information are input into the feature fusion network to perform feature fusion, so as to obtain target interest feature information.

In a specific embodiment, the feature fusion network may be used to fuse the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information, and optionally, the feature fusion network may be, but is not limited to, a max-pooling network, an attention network, an average pooling network, and the like.

In a specific embodiment, taking the feature fusion network as an example, the attention network including the weight learning layer and the weighting processing layer, optionally, inputting the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, and obtaining the target interest feature information may include:

Inputting the initial interest characteristic information and the first resource identification information corresponding to the second plurality of interest capsule networks into a weight learning layer for weight learning to obtain interest weight information; and inputting the interest weight information and the initial interest feature information into a weighting processing layer for weighting processing to obtain the target interest feature information.

In a specific embodiment, the interest weight information may represent a degree of influence of the initial interest feature information corresponding to the second number of interest capsule networks on the interest of the target object.

In a specific embodiment, the weight learning at the weight learning layer may be implemented in combination with the following formula:

wherein alpha is _z Representing the influence degree of the initial interest feature information corresponding to the z-th interest capsule network on the interest of the target object, wherein C is a second number (the number of the interest capsule networks), i represents the resource identification information of a certain multimedia resource to be recommended, and p _z Representing capsule characteristic information in a z-th interest capsule network, att is an attention function, exp is an exponential function based on a natural constant e.

In a specific embodiment, the weighting at the weighting layer may be implemented in conjunction with the following formula:

Where u represents target interest feature information.

In the above embodiment, when the interest characterization information corresponding to the target object on different resource types is obtained, the interest capsule network corresponding to different resource types is combined with the attention network to learn the influence degree of the capsule feature information on the interest of the target object, so that the accuracy of the interest feature characterization of the target object can be greatly improved.

In step S2039, the resource coding feature information, the target interest feature information, and the resource feature information corresponding to the first resource identification information are input into the interest perception network to perform interest perception processing, so as to obtain a target interest index.

In an optional embodiment, the resource coding feature information, the target interest feature information and the first resource identification information may be input into an interest perception network to perform an interest perception process, so as to obtain a target interest index. Specifically, the interest-aware network may be configured to perform interest awareness of the target object in combination with the resource encoding feature information, the target interest feature information, and the first resource identification information. Alternatively, the interest-aware network may be a multi-layer sensor (multilayer perceptron, MLP) and an active layer.

In the above embodiment, the historical behavior sequence further includes a first number of time sequence identification information determined based on a preset distribution time period where the behavior time corresponding to the plurality of multimedia resources is located, where the historical behavior sequence including second resource identification information of the plurality of multimedia resources where the target object behaves in the preset time period may be divided into a plurality of behavior sequence segments in the encoding process, so that the behavior preference in different time periods may be learned on the basis of learning the overall behavior preference; the interest capsule network can be corresponding to the resource types corresponding to different interest preferences by combining the behavior preferences in different time periods, the accurate representation of the identified target interest indexes on different resource types is improved, the interpretation is improved, the target interest feature information for interest perception after feature fusion is further improved, and the accuracy of interest preference depiction of the target object is improved.

In an alternative embodiment, as shown in fig. 6, the method may further include the steps of:

in step S20311, inputting the target resource coding feature information, the target interest feature information and the first resource identification information into an interest perception network to perform interest perception processing, so as to obtain a target interest index;

In a specific embodiment, the target resource coding feature information may be resource coding feature information that a time difference between a corresponding behavior time and a current time in the resource coding feature information corresponding to the plurality of multimedia resources satisfies a preset condition. Optionally, in order to better predict the current interest preference of the target object, the resource coding feature information of the target object in the last period of time can be combined in the interest perception processing process. Optionally, the resource coding feature information whose time difference satisfies the preset condition may be resource coding feature information whose time difference is less than or equal to a preset time threshold.

In the above embodiment, when the interest sensing processing is performed, the current interest preference of the target object can be better represented by combining the target resource coding characteristic information of which the time difference meets the preset condition, so that the recommendation accuracy is better improved.

In an alternative embodiment, the method further comprises: the step of pre-training the interest recognition network, specifically, as shown in fig. 7, the step of pre-training the interest recognition network may include:

in step S701, sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource, and a labeling interest index are acquired.

In a specific embodiment, the sample behavior sequence information may include fourth resource identification information of a plurality of historical multimedia resources in which the sample object has been behaved within a preset sample period, and first number of sample timing identification information corresponding to the plurality of historical multimedia resources, which is determined based on behavioural times corresponding to the plurality of historical multimedia resources. Specifically, the labeling interest index may represent a probability that the sample object performs a preset operation on at least one sample multimedia resource, specifically, the labeling interest index may be 1 or 0, specifically, the sample object performs a preset operation on a certain sample multimedia resource, and the corresponding labeling interest index may be 1; otherwise, the sample object does not perform a preset operation on a certain sample multimedia resource, and the corresponding labeling interest index may be 0.

In a specific embodiment, the duration corresponding to the preset sample period is the same as the duration corresponding to the preset period, specifically, the step of specifically refining the first number of sample timing identification information corresponding to the plurality of historical multimedia resources based on the determination of the behavior time corresponding to the plurality of historical multimedia resources may refer to the above-mentioned step of specifically refining the first number of sample timing identification information corresponding to the plurality of media resources based on the determination of the behavior time corresponding to the plurality of multimedia resources, which is not described herein again.

In step S703, determining a plurality of resource types corresponding to the plurality of historical multimedia resources;

in practical application, each multimedia resource in the recommendation system is often labeled with a corresponding resource type in advance, and accordingly, a plurality of resource types corresponding to each historical multimedia resource can be obtained from the recommendation system.

In step S705, the sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information is determined.

In step S707, the sample behavior sequence information and the third resource identification information are input into the interest recognition network to be trained to perform interest recognition, so as to obtain a predicted interest index of the sample object on at least one sample multimedia resource and a predicted occurrence probability corresponding to a plurality of resource types.

In a specific embodiment, the step of inputting the sample behavior sequence information and the third resource identification information into the interest identification network to be trained to perform interest identification to obtain the specific refinement step of the sample object to the predicted interest index of the at least one sample multimedia resource may refer to the step of inputting the historical behavior sequence information and the first resource identification information into the interest identification network to perform interest identification to obtain the specific refinement step of the target object to the target interest index of the at least one multimedia resource.

In an optional embodiment, the predicted occurrence probability corresponding to any resource type may be the size of the capsule feature information in the capsule network to be trained corresponding to the resource type, optionally, in the training process, the size of the capsule feature information in the capsule network to be trained may be determined, where the size is constrained between 0 and 1, and the probability of interest of the sample object to the resource type corresponding to the capsule network to be trained may be represented. Correspondingly, the predicted occurrence probability can be determined by combining the size of the capsule characteristic information in the capsule network in the training process.

In step S709, a target interest loss is determined based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the labeled interest index;

in an alternative embodiment, as shown in fig. 8, the determining the target interest loss based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the labeled interest index may include the following steps:

in step S7091, initial interest loss information is determined according to the predicted interest index and the labeled interest index;

in step S7093, interest guidance loss information is determined from the predicted occurrence probability and the sample occurrence probability;

In step S7095, a target interest loss is determined based on the initial interest loss information and the interest guidance loss information.

In a specific embodiment, determining the initial interest loss information based on the predicted interest index and the labeled interest index may include determining the initial interest loss information between the predicted interest index and the labeled interest index based on a preset loss function. Specifically, the initial interest loss information may characterize a difference between the predicted interest index and the labeled interest index.

In a specific embodiment, determining the interest guide loss information based on the predicted occurrence probability and the sample occurrence probability may include determining the interest guide loss information between the predicted occurrence probability and the sample occurrence probability based on a preset loss function. In particular, the interest guide loss information may characterize a difference between the predicted occurrence probability and the sample occurrence probability.

In a particular embodiment, the preset loss function may include, but is not limited to, a cross entropy loss function, a mean square error loss function, a logic loss function, an exponential loss function, and the like.

In an alternative embodiment, the initial interest loss information and the interest guide loss information may be added to obtain the target interest loss. Optionally, in combination with the actual application requirement, weight information corresponding to the interest guiding loss information is preset, and the corresponding weight is multiplied by the interest guiding loss information and then added with the initial interest loss information to obtain the target interest loss.

In the above embodiment, in the interest recognition network training process, the interest guiding loss information of the sample object on the interest probability difference of the corresponding resource type is introduced, so that the accuracy of the predicted interest recognition network for representing the object interest can be better reflected.

In step S711, the interest recognition network to be trained is trained based on the target interest loss, resulting in an interest recognition network.

Training the interest recognition network to be trained based on the target interest loss, and obtaining the interest recognition network may include: under the condition that the target interest loss does not meet the preset condition, adjusting network parameters of the interest recognition network to be trained based on the target interest loss; repeating the steps S707 and S709 on the basis of the to-be-trained interest recognition network after the network parameters are adjusted until the target interest loss meets the preset conditions, and taking the corresponding to-be-trained interest recognition network as the interest recognition network when the target interest loss meets the preset conditions.

In a specific embodiment, the target interest loss meeting the preset condition may be that the target interest loss is less than or equal to a specified threshold, or that a difference between corresponding target interest losses in two training processes is less than a certain threshold. In the embodiment of the present disclosure, the specified threshold and the certain threshold may be set in combination with the actual training requirement.

In the above embodiment, in the training process of the interest recognition network, in the historical behavior sequence including the fourth resource identification information of the plurality of historical multimedia resources in which the sample object has behaved in the preset time period, the first number of sample time sequence identification information determined based on the preset distribution time period in which the behavior time corresponding to the plurality of historical multimedia resources is located is also introduced, so that the behavior data of the sample object can be comprehensively learned in the training process of the interest recognition network, and meanwhile, the interest preference characterization in different time periods is learned by combining with the sample time sequence identification information, so that the recognition accuracy of the interest recognition network is effectively improved, and the information recommendation accuracy and the recommendation effect in the subsequent recommendation system are greatly improved.

In a specific embodiment, as shown in fig. 9, the interest identification network may include a coding network, m (a first number) basic capsule networks, C (a second number) interest capsule networks, a feature fusion network, and an interest perception network. Wherein g1, g2.. A first number of timing identification information in the gm historical behavioral sequence information; i1, i2.. In may be second resource identification information of the plurality of multimedia resources in the historical behavior sequence information. Accordingly, based on the coding process of the coding network, a first number of time sequence coding feature information can be obtained: resource coding feature information corresponding to G1, G2.. I1, I2. Further, the m time sequence coding feature information can be respectively used as capsule feature information in m basic capsule networks, and then based on the transmission weight of each basic capsule network relative to C interest capsule networks, the corresponding basic capsule feature information is transmitted to the C interest capsule networks for interest recognition, so that initial interest feature information corresponding to the C interest capsule networks is obtained. In addition, in the training process, the interest is guided by combining the occurrence probability of the sample of the resource identification information corresponding to each resource type in the Cin in the sample behavior sequence information, and in the interest identification process of transmitting the corresponding basic capsule characteristic information to the C interest capsule networks based on the transmission weight of each basic capsule network relative to the C interest capsule networks, the interest characterization information on the resource types corresponding to the C interest capsule networks can be accurately learned; then, inputting the initial interest feature information corresponding to the C interest capsule networks and the resource feature information corresponding to the first resource identification information into a feature fusion network to perform feature fusion, so as to obtain target interest feature information; and finally, inputting the resource coding characteristic information (I1, I2..In), the target interest characteristic information and the resource characteristic information corresponding to the first resource identification information (It can be the resource characteristic information corresponding to the resource identification information of the t-th multimedia resource In at least one multimedia resource to be recommended) into an interest perception network for interest perception processing, so as to obtain a target interest index.

In step S205, a target multimedia asset of the at least one multimedia asset is recommended to the target object based on the target interest index.

In a specific embodiment, recommending the target multimedia asset of the at least one multimedia asset to the target object based on the target interest indicator may include determining the target multimedia asset from the at least one multimedia asset according to the target interest indicator; and recommending the target multimedia resource to the target object.

In an alternative embodiment, a confidence threshold may be set in advance in conjunction with the information recommendation accuracy requirement (the higher the confidence threshold is, the more accurate the recommended information is), and accordingly, when the value corresponding to the target interest index is greater than or equal to the confidence threshold, the multimedia resource may be used as the target multimedia resource. Alternatively, the target interest indexes can be combined to perform descending order sorting, and correspondingly, the preset number of multimedia resources can be selected as target multimedia resources.

As can be seen from the technical solutions provided in the embodiments of the present disclosure, in a historical behavior sequence including second resource identification information of a plurality of multimedia resources in which a target object behaves in a preset time period, a first amount of time sequence identification information determined based on a preset distribution time period in which behavior time corresponding to the plurality of multimedia resources is located is also introduced, so that in an interest identification process, behavior data can be comprehensively mastered, and interest preference and more fine-grained characteristics in different time periods can be learned by combining with the time sequence identification information, so that real interests of users can be effectively represented, interest identification efficiency and identification accuracy can be improved, and information recommendation accuracy and recommendation effect in a recommendation system can be greatly improved.

Fig. 10 is a block diagram illustrating a multimedia asset recommendation device according to an exemplary embodiment. Referring to fig. 10, the apparatus includes:

a first information obtaining module 1010 configured to perform obtaining, in response to a multimedia resource obtaining request of a target object, historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, the historical behavior sequence information including second resource identification information of a plurality of multimedia resources in which the target object has behaved in a preset time period and a first number of time-series identification information corresponding to the plurality of multimedia resources, the time-series identification information being an identification determined based on a preset distribution time period in which behavioural times corresponding to the plurality of multimedia resources are located;

the first interest recognition module 1020 is configured to perform interest recognition by inputting the historical behavior sequence information and the first resource identification information into an interest recognition network, so as to obtain a target interest index of the target object on at least one multimedia resource;

the multimedia asset recommendation module 1030 is configured to perform recommending a target multimedia asset of the at least one multimedia asset to the target object based on the target interest indicator.

Optionally, the interest recognition network includes a coding network, a first number of basic capsule networks, a second number of interest capsule networks, a feature fusion network and an interest perception network, where the second number is a number corresponding to a preset resource type;

the first interest identification module 1020 includes:

the coding processing unit is configured to perform coding processing on the historical behavior sequence information based on a coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding characteristic information corresponding to second resource identification information and first number of time sequence coding characteristic information corresponding to first number of time sequence identification information;

a base capsule characteristic information determination unit configured to perform determination of base capsule characteristic information in a first number of base capsule networks from the first number of time-series coded characteristic information;

the interest identification unit is configured to execute the transmission weight of each basic capsule network relative to the second number of interest capsule networks, and transmit the basic capsule characteristic information to the second number of interest capsule networks for interest identification, so as to obtain initial interest characteristic information corresponding to the second number of interest capsule networks;

The feature fusion unit is configured to perform feature fusion by inputting initial interest feature information corresponding to the second number of interest capsule networks and resource feature information corresponding to the first resource identification information into a feature fusion network, so as to obtain target interest feature information;

Optionally, the coding network includes: a feature extraction network, a position encoding network, and at least one sub-encoding network connected in sequence, any of the sub-encoding networks including a self-attention network and a feed-forward neural network, the encoding processing unit comprising:

a behavior feature information extraction unit configured to perform behavior feature information based on the feature extraction network extracting the history behavior sequence information;

a traversing unit configured to perform traversing at least one sequentially connected sub-coding network;

the nonlinear processing unit is configured to perform nonlinear processing on feedforward neural networks which input the current initial coding information into the sub-coding network traversed currently to obtain current coding behavior information;

a behavior-encoding-information determining unit configured to perform, as behavior encoding information, current encoding behavior information at the end of traversal;

the current behavior characteristic information of a first sub-coding network in at least one sub-coding network connected in sequence is target behavior characteristic information, and the current behavior characteristic information of a non-first sub-coding network in at least one sub-coding network connected in sequence is current coding behavior information output by a previous sub-coding network.

Optionally, the apparatus further includes:

a target mask information generating module configured to perform generation of target mask information including first mask information, second mask information, third mask information, and fourth mask information, the first mask information characterizing an association relationship between the first number of behavior sequence pieces corresponding to the first number of timing identification information; the second mask information characterizes the association relationship between the second resource identification information of any one of the plurality of multimedia resources and the second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to the plurality of multimedia resources; the fourth mask information characterizes the association information of the plurality of multimedia resources to the first number of behavior sequence segments;

The self-attention learning unit is further configured to perform self-attention learning by inputting the current behavior feature information and the target mask information into a self-attention network of the sub-coding network currently traversed to obtain current initial coding information.

Optionally, the apparatus further includes:

the second interest perception processing unit is configured to input the target resource coding feature information, the target interest feature information and the first resource identification information into an interest perception network to perform interest perception processing to obtain a target interest index;

Optionally, the feature fusion network is an attention mechanics learning network including a weight learning layer and a weighting processing layer, and the feature fusion unit includes:

And the weighting processing unit is configured to input the interest weight information and the initial interest feature information into the weighting processing layer for weighting processing to obtain target interest feature information.

Optionally, the apparatus further includes:

the second information acquisition module is configured to acquire sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource and a marked interest index, wherein the sample behavior sequence information comprises fourth resource identification information of a plurality of historical multimedia resources of which the sample object has acted in a preset sample time period and first number of sample time sequence identification information corresponding to the plurality of historical multimedia resources, which are determined based on behavior time corresponding to the plurality of historical multimedia resources;

a resource type determining module configured to perform determining a plurality of resource types corresponding to a plurality of historical multimedia resources;

the second interest recognition module is configured to input sample behavior sequence information and third resource identification information into an interest recognition network to be trained for interest recognition, and a predicted interest index of the sample object on at least one sample multimedia resource and the corresponding predicted occurrence probability of a plurality of resource types are obtained;

A target interest loss determination module configured to perform target interest loss based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the labeled interest index;

and the interest recognition network training module is configured to perform training on the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network.

Optionally, the target interest loss determination module includes:

an initial interest loss information determination unit configured to perform determination of initial interest loss information based on the predicted interest index and the labeled interest index;

and a target interest loss determination unit configured to perform determination of a target interest loss based on the initial interest loss information and the interest guidance loss information.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 11 is a block diagram illustrating an electronic device for multimedia asset recommendation, which may be a terminal, according to an exemplary embodiment, and an internal structure diagram thereof may be as shown in fig. 11. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a multimedia asset recommendation method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Fig. 12 is a block diagram illustrating an electronic device for multimedia asset recommendation, which may be a server, and an internal structure diagram thereof may be as shown in fig. 12, according to an exemplary embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a multimedia asset recommendation method.

It will be appreciated by those skilled in the art that the structures shown in fig. 11 or 12 are merely block diagrams of partial structures related to the present disclosure and do not constitute limitations of the electronic device to which the present disclosure is applied, and that a particular electronic device may include more or fewer components than shown in the drawings, or may combine certain components, or have different arrangements of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement a multimedia asset recommendation method as in the embodiments of the present disclosure.

In an exemplary embodiment, a computer readable storage medium is also provided, which when executed by a processor of an electronic device, enables the electronic device to perform the multimedia asset recommendation method in the embodiments of the present disclosure.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the multimedia asset recommendation method in the embodiments of the present disclosure.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A multimedia asset recommendation method, comprising:

Inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of the target object on the at least one multimedia resource; the interest identification network comprises a coding network, a first quantity of basic capsule networks, a second quantity of interest capsule networks, a feature fusion network and an interest perception network, wherein the second quantity is a quantity corresponding to a preset resource type; the step of inputting the historical behavior sequence information and the first resource identification information into an interest identification network to perform interest identification, and the step of obtaining the target interest index of the target object on the at least one multimedia resource comprises the following steps: performing coding processing on the historical behavior sequence information based on the coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding characteristic information corresponding to the second resource identification information and first number of time sequence coding characteristic information corresponding to the first number of time sequence identification information; determining basic capsule characteristic information in the first number of basic capsule networks according to the first number of time sequence coding characteristic information; transmitting the basic capsule characteristic information to the second number of interest capsule networks for interest identification based on the transmission weight of each basic capsule network relative to the second number of interest capsule networks, and obtaining initial interest characteristic information corresponding to the second number of interest capsule networks; inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, so as to obtain target interest feature information; inputting the resource coding feature information, the target interest feature information and the resource feature information corresponding to the first resource identification information into the interest perception network to perform interest perception processing to obtain the target interest index;

2. The multimedia asset recommendation method according to claim 1, wherein said encoding network comprises: the system comprises a feature extraction network, a position coding network and at least one sub-coding network connected in sequence, wherein any sub-coding network comprises a self-attention network and a feedforward neural network, the historical behavior sequence information is coded based on the coding network, and behavior coding information is obtained, and the method comprises the following steps:

3. The method of claim 2, further comprising:

generating target mask information, wherein the target mask information comprises first mask information, second mask information, third mask information and fourth mask information, and the first mask information characterizes association relations among a first number of behavior sequence fragments corresponding to the first number of time sequence identification information; the second mask information characterizes the association relationship between the second resource identification information of any one of the plurality of multimedia resources and the second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to the plurality of multimedia resources; the fourth mask information characterizes the association information of the plurality of multimedia resources to the first number of behavior sequence segments;

4. The method of claim 1, further comprising:

5. The method of claim 1, wherein the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, the inputting the initial interest feature information corresponding to the second plurality of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion, and obtaining the target interest feature information includes:

6. The method of multimedia resource recommendation according to any one of claims 1 to 5, further comprising:

7. The method of claim 6, wherein determining the target interest loss based on the predicted interest indicators, the predicted occurrence probabilities, the sample occurrence probabilities, and the labeled interest indicators comprises:

8. A multimedia asset recommendation device, comprising:

the first interest recognition module is configured to perform interest recognition by inputting the historical behavior sequence information and the first resource identification information into an interest recognition network, so as to obtain a target interest index of the target object on the at least one multimedia resource; the interest identification network comprises a coding network, a first quantity of basic capsule networks, a second quantity of interest capsule networks, a feature fusion network and an interest perception network, wherein the second quantity is a quantity corresponding to a preset resource type;

The first interest identification module includes: the coding processing unit is configured to perform coding processing on the historical behavior sequence information based on the coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding characteristic information corresponding to the second resource identification information and first number of time sequence coding characteristic information corresponding to the first number of time sequence identification information; a base capsule characteristic information determination unit configured to perform determination of base capsule characteristic information in the first number of base capsule networks from the first number of time-series coded characteristic information; the interest identification unit is configured to execute interest identification by transmitting the basic capsule characteristic information to the second number of interest capsule networks based on the transmission weight of each basic capsule network relative to the second number of interest capsule networks, so as to obtain initial interest characteristic information corresponding to the second number of interest capsule networks; the feature fusion unit is configured to input initial interest feature information corresponding to the second number of interest capsule networks and resource feature information corresponding to the first resource identification information into the feature fusion network to perform feature fusion, so as to obtain target interest feature information; the first interest perception processing unit is configured to input the resource coding feature information, the target interest feature information and the resource feature information corresponding to the first resource identification information into the interest perception network to perform interest perception processing, so as to obtain the target interest index;

9. The multimedia asset recommendation device of claim 8, wherein said encoding network comprises: a feature extraction network, a position encoding network, and at least one sub-encoding network connected in sequence, any sub-encoding network comprising a self-attention network and a feed-forward neural network, the encoding processing unit comprising:

10. The multimedia asset recommendation device of claim 9, wherein said device further comprises:

a target mask information generating module configured to perform generation of target mask information, the target mask information including first mask information, second mask information, third mask information, and fourth mask information, the first mask information characterizing an association relationship between a first number of behavior sequence pieces corresponding to the first number of timing identification information; the second mask information characterizes the association relationship between the second resource identification information of any one of the plurality of multimedia resources and the second resource identification information of the multimedia resources within a preset range; the third mask information characterizes the association information of the first number of behavior sequence segments to the plurality of multimedia resources; the fourth mask information characterizes the association information of the plurality of multimedia resources to the first number of behavior sequence segments;

11. The multimedia asset recommendation device of claim 8, wherein said device further comprises:

12. The multimedia resource recommendation device of claim 8, wherein the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, the feature fusion unit comprising:

13. The multimedia asset recommendation device of any one of claims 8 to 12, further comprising:

14. The multimedia asset recommendation device of claim 13, wherein said target interest loss determination module comprises:

15. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia asset recommendation method of any of claims 1 to 7.

16. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the multimedia asset recommendation method of any one of claims 1 to 7.