CN113806568A

CN113806568A - Multimedia resource recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN113806568A
Application number: CN202110915601.5A
Authority: CN
Inventors: 赵鑫; 王辉; 冯翔; 毛景树; 王珵; 江鹏
Original assignee: Renmin University of China; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Renmin University of China; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2021-12-17
Anticipated expiration: 2041-08-10
Also published as: CN113806568B

Abstract

The method comprises the steps of responding to a multimedia resource obtaining request of a target object, obtaining historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, wherein the historical behavior sequence information comprises second resource identification information of a plurality of multimedia resources which are acted by the target object within a preset time period and a first number of time sequence identification information corresponding to the plurality of multimedia resources; inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of a target object to at least one multimedia resource; and recommending the target multimedia resource in the at least one multimedia resource to the target object based on the target interest index. By the aid of the method and the device, recommendation accuracy and effect can be improved.

Description

Multimedia resource recommendation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a multimedia resource recommendation method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, a large number of network platforms are continuously upgraded, so that besides some image-text information can be issued, personal users can share daily short videos at any time, and how to accurately capture the interests of the users is a challenge met by a large number of recommendation systems.

In the related technology, a behavior sequence including a large amount of historical behavior records of a user within a long period of time is often directly used as input data of a neural network for learning user interest preference, but a large amount of historical behavior records are mixed together, so that fine-grained characteristics which cannot be learned often cannot effectively learn the user interest preference, and the problem of poor recommendation accuracy and effect in a recommendation system is caused.

Disclosure of Invention

The disclosure provides a multimedia resource recommendation method, a multimedia resource recommendation device, electronic equipment and a storage medium, which are used for at least solving the problem that the real interest of a user cannot be effectively represented in the related art, so that the recommendation accuracy and effect in a recommendation system are poor. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, a multimedia resource recommendation method is provided, including:

responding to a multimedia resource acquisition request of a target object, acquiring historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, wherein the historical behavior sequence information comprises second resource identification information of a plurality of multimedia resources which are acted by the target object in a preset time period and a first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information is an identification determined based on a preset distribution time period in which behavior time corresponding to the plurality of multimedia resources is located;

inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of the target object to the at least one multimedia resource;

recommending a target multimedia resource of the at least one multimedia resource to the target object based on the target interest indicator.

Optionally, the interest identification network includes a coding network, the first number of basic capsule networks, a second number of interest capsule networks, a feature fusion network, and an interest awareness network, where the second number is a number corresponding to a preset resource type;

the step of inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification to obtain a target interest index of the target object to the at least one multimedia resource comprises:

coding the historical behavior sequence information based on the coding network to obtain behavior coding information, wherein the behavior coding information comprises resource coding feature information corresponding to the second resource identification information and a first number of time sequence coding feature information corresponding to the first number of time sequence identification information;

determining basic capsule characteristic information in the first number of basic capsule networks according to the first number of time sequence coding characteristic information;

based on the transmission weight of each basic capsule network relative to the second quantity of interest capsule networks, transmitting the basic capsule characteristic information to the second quantity of interest capsule networks for interest identification to obtain initial interest characteristic information corresponding to the second quantity of interest capsule networks;

inputting the initial interest feature information corresponding to the second quantity of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion to obtain target interest feature information;

and inputting the resource coding characteristic information, the target interest characteristic information and the resource characteristic information corresponding to the first resource identification information into the interest perception network for interest perception processing to obtain the target interest index.

Optionally, the encoding network includes: the method comprises a feature extraction network, a position coding network and at least one sequentially connected sub-coding network, wherein any sub-coding network comprises a self-attention network and a feed-forward neural network, and the encoding processing is performed on the historical behavior sequence information based on the coding network to obtain behavior coding information comprises the following steps:

extracting behavior characteristic information of the historical behavior sequence information based on the characteristic extraction network;

performing position coding processing on the behavior characteristic information based on the position coding network to obtain target behavior characteristic information;

traversing the at least one sequentially connected sub-coding network, and under the condition of traversing any sub-coding network, inputting the current behavior characteristic information into a self-attention network in the currently traversed sub-coding network for self-attention learning to obtain current initial coding information;

inputting the current initial coding information into a feedforward neural network in the currently traversed sub-coding network to perform nonlinear processing to obtain current coding behavior information;

taking the current coding behavior information at the end of traversal as the behavior coding information;

the current behavior characteristic information of a first sub-coding network in the at least one sequentially connected sub-coding network is the target behavior characteristic information, and the current behavior sequence information of a non-first sub-coding network in the at least one sequentially connected sub-coding network is the current coding behavior information output by a previous sub-coding network.

Optionally, the method further includes:

generating target mask information, where the target mask information includes first mask information, second mask information, third mask information, and fourth mask information, and the first mask information represents an association relationship between the first number of behavior sequence segments corresponding to the first number of timing sequence identification information; the second mask information represents the incidence relation between the second resource identification information of any one multimedia resource in the plurality of multimedia resources and the second resource identification information of the multimedia resource in a preset range; the third mask information characterizes association information of the first number of behavior sequence segments to a plurality of the multimedia resources; the fourth mask information characterizes correlation information of a plurality of the multimedia resources to the first number of behavior sequence segments;

the inputting the current behavior feature information into a self-attention network in the currently traversed sub-coding network for self-attention learning to obtain the current initial coding information includes:

and inputting the current behavior characteristic information and the target mask information into a self-attention network in the currently traversed sub-coding network for self-attention learning to obtain the current initial coding information.

Optionally, the method further includes:

inputting target resource coding characteristic information, the target interest characteristic information and the first resource identification information into the interest perception network for interest perception processing to obtain the target interest index;

the target resource coding feature information is resource coding feature information in which the time difference between corresponding behavior time and current time in the resource coding feature information corresponding to the plurality of multimedia resources meets a preset condition.

Optionally, the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, and the inputting of the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion to obtain the target interest feature information includes:

inputting the initial interest characteristic information and the first resource identification information corresponding to the second quantity of interest capsule networks into the weight learning layer for weight learning to obtain interest weight information, wherein the interest weight information represents the influence degree of the initial interest characteristic information corresponding to the second quantity of interest capsule networks on the interest of the target object;

and inputting the interest weight information and the initial interest characteristic information into the weighting processing layer for weighting processing to obtain the target interest characteristic information.

Optionally, the method further includes:

acquiring sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource and a mark interest index, wherein the sample behavior sequence information comprises fourth resource identification information of a plurality of historical multimedia resources which are acted by the sample object in a preset sample time period and first quantity of sample time sequence identification information corresponding to the plurality of historical multimedia resources, which is determined based on behavior time corresponding to the plurality of historical multimedia resources;

determining a plurality of resource types corresponding to a plurality of historical multimedia resources;

determining the sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information;

inputting the historical behavior sequence information and the third resource identification information into an interest recognition network to be trained for interest recognition to obtain a predicted interest index of the sample object to the at least one sample multimedia resource and a corresponding predicted occurrence probability of a plurality of resource types;

determining a target interest loss based on the predicted interest indicator, the predicted occurrence probability, the sample occurrence probability, and the annotated interest indicator;

and training the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network.

Optionally, the determining a target interest loss based on the predicted interest indicator, the predicted occurrence probability, the sample occurrence probability, and the labeled interest indicator includes:

determining initial interest loss information according to the predicted interest index and the marked interest index;

determining interest guide loss information according to the predicted occurrence probability and the sample occurrence probability;

determining the target interest loss based on the initial interest loss information and the interest guidance loss information.

According to a second aspect of the embodiments of the present disclosure, there is provided a multimedia resource recommendation apparatus, including:

the system comprises a first information acquisition module, a second information acquisition module and a third information acquisition module, wherein the first information acquisition module is configured to execute a multimedia resource acquisition request responding to a target object, and acquire historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, the historical behavior sequence information comprises second resource identification information of a plurality of multimedia resources which have acted by the target object in a preset time period and a first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information is an identification determined based on a preset distribution time period in which behavior time corresponding to the plurality of multimedia resources is located;

a first interest identification module configured to perform interest identification by inputting the historical behavior sequence information and the first resource identification information into an interest identification network, so as to obtain a target interest index of the target object on the at least one multimedia resource;

a multimedia resource recommending module configured to recommend a target multimedia resource of the at least one multimedia resource to the target object based on the target interest indicator.

the first interest identification module comprises:

the encoding processing unit is configured to perform encoding processing on the historical behavior sequence information based on the encoding network to obtain behavior encoding information, and the behavior encoding information comprises resource encoding characteristic information corresponding to the second resource identification information and a first number of time sequence encoding characteristic information corresponding to the first number of time sequence identification information;

a basic capsule characteristic information determining unit configured to perform determining basic capsule characteristic information in the first number of basic capsule networks according to the first number of time-series coded characteristic information;

an interest identification unit configured to perform interest identification by transmitting the basic capsule feature information to the second quantity of interest capsule networks based on a transmission weight of each basic capsule network relative to the second quantity of interest capsule networks, so as to obtain initial interest feature information corresponding to the second quantity of interest capsule networks;

the feature fusion unit is configured to input initial interest feature information corresponding to the second quantity of interest capsule networks and resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion to obtain target interest feature information;

and the first interest perception processing unit is configured to execute the step of inputting the resource coding feature information, the target interest feature information and the resource feature information corresponding to the first resource identification information into the interest perception network for interest perception processing to obtain the target interest index.

Optionally, the encoding network includes: the system comprises a feature extraction network, a position coding network and at least one sequentially connected sub-coding network, wherein any sub-coding network comprises a self-attention network and a feedforward neural network, and the coding processing unit comprises:

a behavior feature information extraction unit configured to perform extraction of behavior feature information of the historical behavior sequence information based on the feature extraction network;

a position coding processing unit configured to perform position coding processing on the behavior feature information based on the position coding network to obtain target behavior feature information;

a traversal unit configured to perform traversal of the at least one sequentially connected sub-coding network;

the self-attention learning unit is configured to input the current behavior feature information into a self-attention network in the currently traversed sub-coding network for self-attention learning under the condition of traversing any sub-coding network to obtain current initial coding information;

the nonlinear processing unit is configured to input the current initial coding information into a feedforward neural network in the currently traversed sub-coding network for nonlinear processing to obtain current coding behavior information;

a behavior encoding information determination unit configured to perform encoding of current encoding behavior information at the end of traversal as the behavior encoding information;

Optionally, the apparatus further comprises:

a target mask information generating module configured to perform generating target mask information, where the target mask information includes first mask information, second mask information, third mask information, and fourth mask information, and the first mask information represents an association relationship between the first number of behavior sequence segments corresponding to the first number of timing identification information; the second mask information represents the incidence relation between the second resource identification information of any one multimedia resource in the plurality of multimedia resources and the second resource identification information of the multimedia resource in a preset range; the third mask information characterizes association information of the first number of behavior sequence segments to a plurality of the multimedia resources; the fourth mask information characterizes correlation information of a plurality of the multimedia resources to the first number of behavior sequence segments;

the self-attention learning module is further configured to perform self-attention learning by inputting current behavior feature information and the target mask information into a self-attention network in the currently traversed sub-coding network, so as to obtain the current initial coding information.

Optionally, the apparatus further comprises:

a second interest perception processing unit, configured to perform interest perception processing by inputting target resource coding feature information, the target interest feature information, and the first resource identification information into the interest perception network, so as to obtain the target interest indicator;

Optionally, the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, and the feature fusion unit includes:

a weight learning unit configured to perform weight learning by inputting initial interest feature information and the first resource identification information corresponding to the second quantity of interest capsule networks into the weight learning layer, so as to obtain interest weight information, where the interest weight information represents a degree of influence of the initial interest feature information corresponding to the second quantity of interest capsule networks on interest of the target object;

and the weighting processing unit is configured to input the interest weight information and the initial interest characteristic information into the weighting processing layer for weighting processing to obtain the target interest characteristic information.

Optionally, the apparatus further comprises:

a second information obtaining module configured to perform obtaining of sample behavior sequence information of a sample object, third resource identification information of at least one sample multimedia resource, and a labeling interest index, where the sample behavior sequence information includes fourth resource identification information of a plurality of historical multimedia resources in which the sample object has acted within a preset sample time period, and the first number of sample timing identification information corresponding to the plurality of historical multimedia resources, which is determined based on behavior time corresponding to the plurality of historical multimedia resources;

a resource type determination module configured to perform determining a plurality of resource types corresponding to a plurality of the historical multimedia resources;

a sample occurrence probability determination module configured to perform determination of a sample occurrence probability of resource identification information corresponding to each resource type in the sample behavior sequence information;

a second interest identification module configured to perform interest identification by inputting the historical behavior sequence information and the third resource identification information into an interest identification network to be trained, so as to obtain a predicted interest index of the sample object for the at least one sample multimedia resource and a corresponding predicted occurrence probability of a plurality of resource types;

a target interest loss determination module configured to perform a target interest loss based on the predicted interest indicator, the predicted occurrence probability, the sample occurrence probability, and the annotation interest indicator;

an interest recognition network training module configured to perform training of the interest recognition network to be trained based on the target interest loss, resulting in the interest recognition network.

Optionally, the target interest loss determining module includes:

an initial interest loss information determination unit configured to perform determining initial interest loss information according to the prediction interest index and the annotation interest index;

an interest guidance loss information determination unit configured to perform determination of interest guidance loss information according to the predicted occurrence probability and the sample occurrence probability;

a target interest loss determination unit configured to perform determining the target interest loss based on the initial interest loss information and the interest guidance loss information.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first aspects of the embodiments of the present disclosure.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspects of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the multimedia resource recommendation process, a first amount of time sequence identification information determined based on a preset distribution time period in which behavior time corresponding to a plurality of multimedia resources is located is introduced into a historical behavior sequence of second resource identification information of the plurality of multimedia resources in which a target object has acted within a preset time period, so that behavior data can be comprehensively mastered in an interest identification process, interest preferences and more fine-grained characteristics in different time periods can be learned by combining the time sequence identification information, the real interest of a user can be effectively represented, the interest identification efficiency and the identification accuracy are improved, and the information recommendation accuracy and the recommendation effect in a recommendation system are greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an application environment in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram illustrating a method for multimedia resource recommendation in accordance with an exemplary embodiment;

FIG. 3 is a flowchart illustrating inputting historical behavior sequence information and first resource identification information into an interest recognition network for interest recognition to obtain a target interest indicator of a target object for at least one multimedia resource according to an exemplary embodiment;

FIG. 4 is a flow diagram illustrating an encoding process for historical behavior sequence information based on an encoding network to obtain behavior encoding information according to an example embodiment;

FIG. 5 is a diagram illustrating target mask information in accordance with an illustrative embodiment;

FIG. 6 is another flow chart illustrating inputting historical behavior sequence information and first resource identification information into an interest recognition network for interest recognition to obtain a target interest indicator of a target object for at least one multimedia resource according to an exemplary embodiment;

FIG. 7 is a flow diagram illustrating a pre-trained interest recognition network in accordance with an exemplary embodiment;

FIG. 8 is a flow diagram illustrating a determination of a target interest loss based on a predicted interest indicator, a predicted occurrence probability, a sample occurrence probability, and a labeling interest indicator in accordance with an illustrative embodiment;

FIG. 9 is a schematic diagram illustrating an interest recognition network in accordance with an illustrative embodiment;

FIG. 10 is a block diagram illustrating a multimedia resource recommendation apparatus in accordance with an exemplary embodiment;

FIG. 11 is a block diagram illustrating an electronic device for multimedia resource recommendation in accordance with an exemplary embodiment;

FIG. 12 is a block diagram illustrating an electronic device for multimedia asset recommendation, according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment according to an exemplary embodiment, which may include a server 100 and a terminal 200, as shown in fig. 1.

In an alternative embodiment, the server 100 may be used to train an interest recognition network. Specifically, the server 100 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

In an alternative embodiment, the terminal 200 may perform the multimedia resource recommendation process in conjunction with the interest recognition network trained by the server 100. Specifically, the terminal 200 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that fig. 1 shows only one application environment provided by the present disclosure, and in practical applications, other application environments, such as multimedia resource recommendation processing, may also be included and implemented on the server 100.

In the embodiment of the present specification, the server 100 and the terminal 200 may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited herein.

Fig. 2 is a flowchart illustrating a multimedia resource recommendation method according to an exemplary embodiment, which is used in a terminal and a server electronic device, as shown in fig. 1, and includes the following steps.

In step S201, in response to a multimedia resource acquisition request of a target object, acquiring historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended.

In an optional embodiment, the target object is a user, and may also be a user account of the user in the recommendation system. In practical application, a user can actively trigger a multimedia resource acquisition request based on a corresponding terminal, and also can actively trigger the multimedia resource acquisition request of a target object by a recommendation system after the target object enters the system.

In a specific embodiment, the historical behavior sequence information includes second resource identification information of a plurality of multimedia resources that the target object has performed in a preset time period and a first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information may be an identification determined based on a preset distribution time period in which behavior time corresponding to the plurality of multimedia resources is located.

In a specific embodiment, the resource identification information of the multimedia resource may be identification information for distinguishing different multimedia resources. Correspondingly, the first resource identification information may include resource identification information of at least one multimedia resource to be recommended, and the second resource identification information may include resource identification information of the plurality of multimedia resources. Specifically, the at least one multimedia resource to be recommended may be a multimedia resource to be recommended in the recommendation system. Optionally, the multimedia resource may include static resources such as text and images, and may also include dynamic resources such as short video.

In a specific embodiment, the plurality of multimedia assets that the target object has acted on may be multimedia assets that the target object has performed a predetermined operation within a predetermined time period. The preset operations may include, but are not limited to, browsing, clicking, converting (e.g., purchasing a related product based on the multimedia resource, or downloading a related application based on the multimedia resource, etc.), and the like. Specifically, the preset time period may be a preset collection period of the historical behavior sequence information, for example, the preset time period is a half month before the current time, and correspondingly, the historical behavior sequence information may include resource identification information of a multimedia resource on which the target object has performed a preset operation in the half month before the current time.

In a specific embodiment, the first number of time sequence identification information may be used to characterize behavior distribution of the target object within a preset time period from a time sequence dimension. The behavior time corresponding to the plurality of multimedia resources may be time information for the target object to perform a preset operation on each multimedia resource. In a specific embodiment, a preset unit division time may be determined; dividing the preset time period into a first number of preset distribution time periods according to preset unit division time; generating identifiers of the first number of preset distribution time periods; and taking the identifier of the preset distribution time period in which the behavior time corresponding to each multimedia resource is positioned as the time sequence identifier information corresponding to the multimedia resource.

In a specific embodiment, the preset unit dividing time may be preset in combination with an actual application requirement, for example, 0 to 24 points per day, and correspondingly, assuming that the preset time period is 15 days before the current time, the preset time period may be divided into 15 (first number) 0 to 24 points (preset distribution time periods), and in the specific embodiment, it is assumed that respective identifiers are generated for the 15 preset distribution time periods from first arrival to last according to the time: g15 g1, g2, g3.. Correspondingly, the identifier of the preset distribution time period in which the behavior time corresponding to each multimedia resource is located may be used as the time sequence identifier information corresponding to the multimedia resource, and specifically, if the behavior times corresponding to any two or more multimedia resources are on the same day (i.e., in the same preset distribution time period), the time sequence identifier information corresponding to the two or more multimedia resources is the same.

In step S203, inputting the historical behavior sequence information and the first resource identification information into an interest identification network for interest identification, so as to obtain a target interest index of the target object for at least one multimedia resource;

in a particular embodiment, the target interest indicator may characterize a probability that the target object performs a prediction operation on the at least one multimedia resource.

In an alternative embodiment, the interest identification network may include an encoding network, a first number of base capsule networks, a second number of interest capsule networks, a feature fusion network, and an interest awareness network.

In practical applications, a corresponding resource type is often labeled for each multimedia resource in advance according to resource characteristics of the multimedia resource in the recommendation system, and in a specific embodiment, the resource type corresponding to each multimedia resource may include one or more (at least two) resource types. In a particular embodiment, the asset types may include, but are not limited to, sports, gourmet, game, travel, and the like. Optionally, the second number may be a number corresponding to a preset resource type; the preset resource type may be a resource type of a multimedia resource in the recommendation system. Correspondingly, the second quantity of interest capsule networks respectively correspond to one resource type in the second quantity of resource types.

Optionally, as shown in fig. 3, the step of inputting the historical behavior sequence information and the first resource identification information into an interest recognition network for interest recognition to obtain a target interest indicator of the target object for the at least one multimedia resource may include the following steps:

in step S2031, the historical behavior sequence information is encoded based on the encoding network, and behavior encoding information is obtained.

In a specific embodiment, the behavior encoding information may include resource encoding feature information corresponding to the second resource identification information and a first number of time sequence encoding feature information corresponding to the first number of time sequence identification information;

in an alternative embodiment, the encoding network may include: specifically, any sub-coding network includes a self-attention network and a feed-forward neural network, and accordingly, as shown in fig. 4, the above-mentioned encoding processing of the historical behavior sequence information based on the coding network to obtain the behavior coding information may include the following steps:

in step S401, behavior feature information of the historical behavior sequence information is extracted based on the feature extraction network;

in step S403, performing position coding processing on the behavior feature information based on a position coding network to obtain target behavior feature information;

in step S405, traversing at least one sequentially connected sub-coding network, and inputting the current behavior feature information into a self-attention network in the currently traversed sub-coding network for self-attention learning to obtain current initial coding information under the condition of traversing any sub-coding network;

in step S407, inputting the current initial coding information into a feedforward neural network in the currently traversed sub-coding network to perform nonlinear processing, so as to obtain current coding behavior information;

in step S409, the current encoding behavior information at the end of traversal is taken as behavior encoding information.

In a specific embodiment, the plurality of multimedia resources that have been acted by the target object are often arranged in sequence according to the corresponding action time, and in the position coding processing, the characteristic information that can represent the sequencing position of the multimedia resources in the plurality of multimedia resources can be added to the action characteristic information, so as to facilitate the subsequent learning of the relevant association among the multimedia resources.

In a specific embodiment, the current behavior feature information of the first sub-coding network in the at least one sequentially connected sub-coding network is the target behavior feature information, and the current behavior sequence information of the non-first sub-coding network in the at least one sequentially connected sub-coding network is the current coding behavior information output by the previous sub-coding network.

In the above embodiment, at least one sub-coding network is set in the coding network, and the current coding behavior information output by the previous sub-coding network is continuously used as the input of the current sub-coding network, so that the behavior feature information in the historical behavior sequence can be better learned, and the characterization accuracy of the interest and the preference of the user is further improved.

In an optional embodiment, the method may further include:

target mask information is generated.

In practical application, due to the fact that the input of the network has an indefinite length, the input is often combined with 0 to be filled to a specified input size, and further, in order to mask the 0 introduced for filling, mask information can be introduced in the self-attention learning process. In the embodiment of the present specification, in the process of introducing the mask information, in addition to setting the mask information in combination with the padding information to be masked, the target mask information may also be set in combination with the following four dimensions: 1. a relevant dimension of interest among the multimedia resources; 2. correlation attention dimensions among the behavior sequence segments; 3. paying attention to the corresponding dimension of the multimedia resource by the behavior sequence segment; 4. the multimedia resource pays attention to the dimension of the corresponding behavior sequence segment.

Correspondingly, the target mask information includes first mask information, second mask information, third mask information, and fourth mask information, where the first mask information represents an association relationship between the first number of behavior sequence segments corresponding to the first number of timing sequence identification information; the second mask information represents the incidence relation between the second resource identification information of any one multimedia resource in the plurality of multimedia resources and the second resource identification information of the multimedia resource in a preset range; the third mask information characterizes association information of the first number of behavior sequence segments to a plurality of the multimedia resources; the fourth mask information characterizes association information of a plurality of the multimedia resources to the first number of behavior sequence segments.

In a specific embodiment, each time sequence identification information corresponds to a behavior sequence segment, and the behavior sequence segment may be resource identification information of a multimedia resource corresponding to the time sequence identification information in a plurality of multimedia resources that have been acted by the target object. Correspondingly, the association relationship between the behavior sequence segments can be that the behavior sequence segments concern each other in the process of learning the characteristic information of the behavior sequence segments.

In a specific embodiment, the preset range may be the number of resource identification information combining intervals between a plurality of sequentially arranged second resource identification information, and in a specific embodiment, taking the preset range as 2 as an example, the association relationship between the second resource identification information of a certain multimedia resource and the second resource identification information of the multimedia resource in the preset range may be that the second resource identification information of the multimedia resource and the second resource identification information of two adjacent multimedia resources in the front and back concern each other in the process of learning the feature information of the second resource identification information of the multimedia resource.

In a specific embodiment, any behavior sequence segment corresponds to a part of multimedia resources in the plurality of multimedia resources that have been acted by the target object, and accordingly, the correlation information of the behavior sequence segment to the multimedia resources may be that the corresponding part of multimedia resources will be concerned in the process of learning the feature information of the behavior sequence segment; the associated information of the plurality of multimedia resources to the first number of behavior sequence segments may be that, in the process of learning the feature information of the resource identification information of the partial multimedia resources, the corresponding behavior sequence segment is concerned.

In an alternative embodiment, as shown in fig. 5, fig. 5 is a schematic diagram illustrating a target mask information according to an exemplary embodiment, assuming that the plurality of multimedia assets include i1, i2... i 7; the first number of time sequence identification information comprises g1, g2 and g3, and the first mask information is M^g2gThe second mask information is M^l2lAnd the third mask information is M^g2lThe fourth mask information is M^l2gThe mask information corresponding to the white part is 0, and no attention relation exists among corresponding information represented; the mask information corresponding to the black part is 1, which represents the related annotation relation between the corresponding information.

Correspondingly, the above inputting the current behavior feature information into the self-attention network in the currently traversed sub-coding network for self-attention learning, and obtaining the current initial coding information may include:

and inputting the current behavior characteristic information and the target mask information into a self-attention network in the currently traversed sub-coding network to carry out self-attention learning, so as to obtain the current initial coding information.

In the embodiment, in the self-attention learning process, the target mask information is introduced, so that the related attention among the multimedia resources of the behaviors in the behavior sequence of the target object can be limited within a preset range, and the processing efficiency is further greatly improved; the whole behavior sequence can be connected through the related attention among the behavior sequence segments corresponding to the time sequence identification information, and the information loss caused by the limitation of a local range is made up; meanwhile, mutual attention between the behavior sequence segment and the resource identification information of the corresponding multimedia resource can be realized, and the accuracy of the learned interest and hobby can be greatly improved on the basis of improving the processing efficiency.

In step S2033, basic capsule characteristic information in a first number of basic capsule networks is determined according to the first number of time-series coded characteristic information.

In a specific embodiment, the first number of time-series coded characteristic information may be respectively used as capsule characteristic information in the first number of basic capsule networks. In a specific embodiment, it is assumed that the first number is 10, and the 10 pieces of time-series coding characteristic information include 1 st time-series coding characteristic information and 2 nd time-series coding characteristic information. The 10 basic capsule networks include a 1 st basic capsule network and a 2 nd basic capsule network. And taking the 2 nd time sequence coding characteristic information as the capsule characteristic information in the 2 nd basic capsule network, and the like.

In step S2035, based on the transmission weight of each basic capsule network relative to the second interest capsule network, the basic capsule feature information is transmitted to the second interest capsule network for interest identification, and initial interest feature information corresponding to the second interest capsule network is obtained.

In a particular embodiment, the transmission weights may characterize the specific gravity information transmitted by the capsule characterization information in each base capsule network to each capsule network of interest. In a specific embodiment, the initial interest feature information corresponding to any interest capsule network may be capsule feature information in the interest capsule network, and specifically, the capsule feature information in a certain interest capsule network may be interest characterization information of the target object on a resource type corresponding to the interest capsule network.

In step S2037, the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information are input to the feature fusion network for feature fusion, so as to obtain the target interest feature information.

In a specific embodiment, the feature fusion network may be configured to fuse the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information, and optionally, the feature fusion network may be, but is not limited to, a maximum pooling network, an attention network, an average pooling network, and the like.

In a specific embodiment, taking the feature fusion network as an attention network including a weight learning layer and a weighting processing layer as an example, optionally, the inputting the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion to obtain the target interest feature information may include:

inputting initial interest characteristic information and first resource identification information corresponding to a second quantity of interest capsule networks into a weight learning layer for weight learning to obtain interest weight information; and inputting the interest weight information and the initial interest characteristic information into a weighting processing layer for weighting processing to obtain target interest characteristic information.

In a specific embodiment, the interest weight information may characterize a degree of influence of initial interest feature information corresponding to the second quantity of interest capsule networks on the interest of the target object.

In a specific embodiment, the weight learning at the weight learning layer can be implemented by combining the following formula:

wherein alpha is_zRepresenting the influence degree of initial interest characteristic information corresponding to the z-th interest capsule network on the interest of the target object, C being a second quantity (quantity of interest capsule networks), i representing resource identification information of a certain multimedia resource to be recommended, p_zAnd (3) representing capsule characteristic information in the z-th interest capsule network, wherein Att is an attention function, and exp is an exponential function with a natural constant e as a base.

In a specific embodiment, the weighting process performed at the weighting process layer may be implemented by combining the following equations:

wherein u represents target interest feature information.

In the above embodiment, after the interest representation information of the target object corresponding to different resource types is obtained, the influence degree of the capsule characteristic information in the interest capsule network corresponding to different resource types on the interest of the target object is learned by combining the attention network, so that the accuracy of the interest characteristic representation of the target object can be greatly improved.

In step S2039, the resource coding feature information, the target interest feature information, and the resource feature information corresponding to the first resource identification information are input to the interest perception network for interest perception processing, so as to obtain a target interest index.

In an optional embodiment, the resource encoding characteristic information, the target interest characteristic information, and the first resource identification information may be input to an interest-aware network for interest-aware processing, so as to obtain a target interest indicator. Specifically, the interest-aware network may be configured to perform interest-awareness of the target object by combining the resource encoding characteristic information, the target interest characteristic information, and the first resource identification information. Alternatively, the interest-aware network may be a multi-layer perceptron (MLP) and an active layer.

In the above embodiment, the historical behavior sequence further introduces a first number of time sequence identification information determined based on a preset distribution time period in which behavior time corresponding to a plurality of multimedia resources is located, and during the encoding process, the historical behavior sequence including second resource identification information of a plurality of multimedia resources that a target object has performed within the preset distribution time period may be divided into a plurality of behavior sequence segments, so that behavior preferences within different time periods may be learned on the basis of learning the overall behavior preference; and the interest capsule network can be corresponding to the resource types corresponding to different interest hobbies by combining behavior hobbies in different time periods, so that the accurate representation of the identified target interest index on different resource types is improved, the interpretability is improved, the target interest characteristic information for interest perception after characteristic fusion is improved, and the accuracy of describing the interest hobbies of the target object is improved.

In an alternative embodiment, as shown in fig. 6, the method may further include the following steps:

in step S20311, the target resource coding feature information, the target interest feature information, and the first resource identification information are input to an interest perception network for interest perception processing, so as to obtain a target interest index;

in a specific embodiment, the target resource encoding characteristic information may be resource encoding characteristic information in which a time difference between a behavior time corresponding to the resource encoding characteristic information corresponding to the plurality of multimedia resources and a current time satisfies a preset condition. Optionally, in order to better predict the current interest and preference of the target object, the feature information of the resource coding of the target object in the recent period of time may be combined in the interest perception processing process. Optionally, the resource coding feature information with the time difference meeting the preset condition may be the resource coding feature information with the time difference smaller than or equal to a preset time threshold.

In the above embodiment, when interest perception processing is performed, the current interest and hobby of the target object can be better represented by combining target resource coding feature information with the time difference meeting the preset condition, and the recommendation accuracy is further better improved.

In an optional embodiment, the method further includes: specifically, as shown in fig. 7, the step of pre-training the interest recognition network may include:

in step S701, sample behavior sequence information of the sample object, third resource identification information of the at least one sample multimedia resource, and a mark interest indicator are obtained.

In a specific embodiment, the sample behavior sequence information may include fourth resource identification information of a plurality of historical multimedia resources in which the sample object has acted within a preset sample time period and a first number of sample timing identification information corresponding to the plurality of historical multimedia resources, which is determined based on behavior times corresponding to the plurality of historical multimedia resources. Specifically, the annotation interest index may represent a probability that the sample object performs a preset operation on at least one sample multimedia resource, specifically, the annotation interest index may be 1 or 0, specifically, the sample object performs the preset operation on a certain sample multimedia resource, and the corresponding annotation interest index may be 1; otherwise, the sample object does not perform the preset operation on a certain sample multimedia resource, and the corresponding annotation interest indicator may be 0.

In a specific embodiment, the duration corresponding to the preset sample time period is the same as the duration corresponding to the preset time period, specifically, the step of specifically refining the first number of sample time sequence identification information corresponding to the plurality of historical media resources is determined based on the behavior times corresponding to the plurality of historical multimedia resources, which may be referred to as the specific refinement of the first number of time sequence identification information corresponding to the plurality of media resources, which is determined based on the behavior times corresponding to the plurality of multimedia resources, and is not described herein again.

In step S703, determining a plurality of resource types corresponding to a plurality of historical multimedia resources;

in practical application, each multimedia resource in the recommendation system is often pre-labeled with a corresponding resource type, and accordingly, a plurality of resource types corresponding to each historical multimedia resource can be obtained from the recommendation system.

In step S705, a sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information is determined.

In step S707, the historical behavior sequence information and the third resource identification information are input to an interest recognition network to be trained for interest recognition, so as to obtain a predicted interest index of the sample object for at least one sample multimedia resource and a predicted occurrence probability corresponding to multiple resource types.

In a specific embodiment, the specific refining step of inputting the historical behavior sequence information and the third resource identification information into the interest recognition network to be trained for interest recognition to obtain the predicted interest index of the sample object for the at least one sample multimedia resource may refer to the specific refining step of inputting the historical behavior sequence information and the first resource identification information into the interest recognition network for interest recognition to obtain the target interest index of the target object for the at least one multimedia resource.

In an optional embodiment, the predicted occurrence probability corresponding to any resource type may be the size of capsule feature information in an interest capsule network to be trained corresponding to the resource type, and optionally, in the training process, the size of the capsule feature information in the interest capsule network may be determined, and the size is constrained to be between 0 and 1, so that the probability of interest of a sample object in the resource type corresponding to the interest capsule network may be represented. Accordingly, the predicted occurrence probability can be determined by combining the size of the capsule characteristic information in the interest capsule network in the training process.

In step S709, a target interest loss is determined based on the predicted interest index, the predicted occurrence probability, the sample occurrence probability, and the annotation interest index;

in an alternative embodiment, as shown in fig. 8, the above-mentioned determining the target interest loss based on the predicted interest indicator, the predicted occurrence probability, the sample occurrence probability and the labeled interest indicator may include the following steps:

in step S7091, determining initial interest loss information according to the prediction interest index and the annotation interest index;

in step S7093, the interest guidance loss information is determined according to the predicted occurrence probability and the sample occurrence probability;

in step S7095, a target interest loss is determined based on the initial interest loss information and the interest guidance loss information.

In a specific embodiment, determining the initial interest loss information according to the prediction interest index and the annotation interest index may include determining the initial interest loss information between the prediction interest index and the annotation interest index based on a preset loss function. Specifically, the initial interest loss information may characterize a difference between the predicted interest indicator and the annotated interest indicator.

In a specific embodiment, determining the interest guidance loss information according to the predicted occurrence probability and the sample occurrence probability may include determining the interest guidance loss information between the predicted occurrence probability and the sample occurrence probability based on a preset loss function. In particular, the information of interest guide loss may characterize a difference between the predicted occurrence probability and the sample occurrence probability.

In a particular embodiment, the predetermined loss function may include, but is not limited to, a cross entropy loss function, a mean square error loss function, a logic loss function, an exponential loss function, and the like.

In an alternative embodiment, the initial interest loss information and the interest guidance loss information may be added to obtain the target interest loss. Optionally, the method may also include setting weight information corresponding to the interest guidance loss information in advance according to the actual application requirement, and adding the result of multiplying the interest guidance loss information by the corresponding weight to the initial interest loss information to obtain the target interest loss.

In the embodiment, in the interest identification network training process, the interest guiding loss information of the sample object about the interest probability difference of the corresponding resource type is introduced, so that the accuracy of the predicted interest identification network on the object interest representation can be better reflected.

In step S711, the interest recognition network to be trained is trained based on the target interest loss, and an interest recognition network is obtained.

The training of the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network may include: under the condition that the target interest loss does not meet the preset condition, adjusting network parameters of the interest recognition network to be trained based on the target interest loss; and repeating the steps S707 and S709 based on the interest recognition network to be trained after the network parameters are adjusted until the target interest loss meets the preset condition, and taking the corresponding interest recognition network to be trained when the preset condition is met as the interest recognition network.

In a specific embodiment, the target interest loss meeting the preset condition may be that the target interest loss is less than or equal to a specified threshold, or that a difference between corresponding target interest losses in two training processes is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In the above embodiment, in the interest recognition network training process, in the historical behavior sequence including the fourth resource identification information of the plurality of historical multimedia resources in which the sample object has performed within the preset time period, the first number of sample timing sequence identification information determined based on the preset distribution time period in which the behavior time corresponding to the plurality of historical multimedia resources is located is further introduced, so that it can be ensured that the behavior data of the sample object is comprehensively learned in the interest recognition network training process, and meanwhile, the interest preference representations within different time periods are learned in combination with the sample timing sequence identification information, so that the recognition accuracy of the interest recognition network is effectively improved, and further, the information recommendation accuracy and recommendation effect in a subsequent recommendation system are greatly improved.

In a specific embodiment, as shown in fig. 9, the interest recognition network may include an encoding network, m (first number) basic capsule networks, C (second number) interest capsule networks, a feature fusion network, and an interest awareness network. Gm historical behavior sequence information, wherein a first number of time series identification information in the g1, g2... gm historical behavior sequence information; i1, i2... in may identify information for a second resource of the plurality of multimedia resources in the historical behavior sequence information. Correspondingly, based on the encoding process of the encoding network, a first number of time-series encoding characteristic information can be obtained: resource encoding characteristic information corresponding to the G1, G2... Gm and the second resource identification information: i1, I2.. In. Furthermore, the m time sequence coding feature information can be respectively used as capsule feature information in m basic capsule networks, and then, based on the transmission weight of each basic capsule network relative to the C interest capsule networks, the corresponding basic capsule feature information is transmitted to the C interest capsule networks for interest identification, so as to obtain initial interest feature information corresponding to the C interest capsule networks. In addition, in the training process, interest guidance is performed by combining the sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information in the resource types Ci1, Ci2 and Ci3.. Cin corresponding to a plurality of historical multimedia resources, and in the process of transmitting corresponding basic capsule characteristic information to C interest capsule networks for interest identification based on the transmission weight of each basic capsule network relative to the C interest capsule networks, the interest characteristic information on the resource types corresponding to the C interest capsule networks can be accurately learned; secondly, inputting the initial interest characteristic information corresponding to the C interest capsule networks and the resource characteristic information corresponding to the first resource identification information into a characteristic fusion network for characteristic fusion to obtain target interest characteristic information; finally, the resource coding feature information (I1, I2.. In), the target interest feature information, and the resource feature information (It may be the resource feature information corresponding to the resource identification information of the tth multimedia resource In the at least one multimedia resource to be recommended) corresponding to the first resource identification information are input to the interest sensing network for interest sensing processing, so as to obtain a target interest index.

In step S205, a target multimedia resource of the at least one multimedia resource is recommended to the target object based on the target interest indicator.

In a specific embodiment, recommending a target multimedia resource of the at least one multimedia resource to the target object based on the target interest indicator may include determining the target multimedia resource from the at least one multimedia resource according to the target interest indicator; and recommending the target multimedia resource to the target object.

In an optional embodiment, a confidence threshold may be set in advance in combination with the requirement for accuracy of information recommendation (the higher the general confidence threshold is, the more accurate the recommended information is), and accordingly, when a value corresponding to the target interest indicator is greater than or equal to the confidence threshold, a multimedia resource may be used as the target multimedia resource. Optionally, the target interest indicators may be sorted in a descending order according to the corresponding numerical values, and accordingly, the preset number of multimedia resources may be selected as the target multimedia resources.

As can be seen from the technical solutions provided by the embodiments of the present specification, in the multimedia resource recommendation process, in the historical behavior sequence including the second resource identification information of the multiple multimedia resources in which the target object has performed in the preset time period, the first number of time sequence identification information determined based on the preset distribution time period in which the behavior time corresponding to the multiple multimedia resources is located is also introduced, so that in the interest identification process, the behavior data can be comprehensively grasped, and meanwhile, the interest preferences and more fine-grained features in different time periods are learned in combination with the time sequence identification information, the real interest of the user can be effectively represented, the interest identification efficiency and the identification accuracy are improved, and further, the information recommendation accuracy and the recommendation effect in the recommendation system are greatly improved.

FIG. 10 is a block diagram illustrating a multimedia resource recommendation apparatus according to an example embodiment. Referring to fig. 10, the apparatus includes:

a first information obtaining module 1010 configured to execute a multimedia resource obtaining request responding to a target object, and obtain historical behavior sequence information of the target object and first resource identification information of at least one multimedia resource to be recommended, where the historical behavior sequence information includes second resource identification information of a plurality of multimedia resources that the target object has acted within a preset time period and a first number of time sequence identification information corresponding to the plurality of multimedia resources, and the time sequence identification information is an identification determined based on a preset distribution time period in which behavior time corresponding to the plurality of multimedia resources is located;

a first interest identification module 1020 configured to perform interest identification by inputting the historical behavior sequence information and the first resource identification information into an interest identification network, so as to obtain a target interest index of the target object for the at least one multimedia resource;

and a multimedia resource recommending module 1030 configured to recommend a target multimedia resource of the at least one multimedia resource to the target object based on the target interest indicator.

Optionally, the interest identification network includes a coding network, a first number of basic capsule networks, a second number of interest capsule networks, a feature fusion network, and an interest perception network, where the second number is a number corresponding to a preset resource type;

the first interest identification module 1020 includes:

the encoding processing unit is configured to perform encoding processing on the historical behavior sequence information based on an encoding network to obtain behavior encoding information, and the behavior encoding information comprises resource encoding characteristic information corresponding to second resource identification information and a first number of time sequence encoding characteristic information corresponding to the first number of time sequence identification information;

a basic capsule characteristic information determining unit configured to perform determining basic capsule characteristic information in a first number of basic capsule networks according to a first number of time-series coded characteristic information;

the interest identification unit is configured to transmit the basic capsule characteristic information to the second quantity of interest capsule networks for interest identification based on the transmission weight of each basic capsule network relative to the second quantity of interest capsule networks, and obtain initial interest characteristic information corresponding to the second quantity of interest capsule networks;

the characteristic fusion unit is configured to input the initial interest characteristic information corresponding to the second quantity of interest capsule networks and the resource characteristic information corresponding to the first resource identification information into the characteristic fusion network for characteristic fusion to obtain target interest characteristic information;

and the first interest perception processing unit is configured to execute the step of inputting the resource coding characteristic information, the target interest characteristic information and the resource characteristic information corresponding to the first resource identification information into an interest perception network for interest perception processing to obtain a target interest index.

Optionally, the encoding network includes: the system comprises a feature extraction network, a position coding network and at least one sequentially connected sub-coding network, wherein any sub-coding network comprises a self-attention network and a feedforward neural network, and a coding processing unit comprises:

a behavior feature information extraction unit configured to perform extracting behavior feature information of the historical behavior sequence information based on the feature extraction network;

the position coding processing unit is configured to execute position coding processing on the behavior characteristic information based on a position coding network to obtain target behavior characteristic information;

a traversal unit configured to perform traversal of at least one sequentially connected sub-coding network;

a behavior encoding information determination unit configured to perform current encoding behavior information at the end of traversal as behavior encoding information;

the current behavior characteristic information of a first sub-coding network in at least one sequentially connected sub-coding network is target behavior characteristic information, and the current behavior sequence information of a non-first sub-coding network in at least one sequentially connected sub-coding network is the current coding behavior information output by a previous sub-coding network.

Optionally, the apparatus further comprises:

a target mask information generating module configured to perform generation of target mask information, where the target mask information includes first mask information, second mask information, third mask information, and fourth mask information, and the first mask information represents an association relationship between the first number of behavior sequence segments corresponding to the first number of timing identification information; the second mask information represents the incidence relation between the second resource identification information of any one multimedia resource in the plurality of multimedia resources and the second resource identification information of the multimedia resource in a preset range; the third mask information characterizes association information of the first number of behavior sequence segments to a plurality of the multimedia resources; the fourth mask information characterizes correlation information of a plurality of the multimedia resources to the first number of behavior sequence segments;

the self-attention learning module is further configured to perform self-attention learning by inputting the current behavior feature information and the target mask information into a self-attention network in the currently traversed sub-coding network, and obtain current initial coding information.

Optionally, the apparatus further comprises:

the second interest perception processing unit is configured to input the target resource coding feature information, the target interest feature information and the first resource identification information into an interest perception network for interest perception processing to obtain a target interest index;

the target resource coding feature information is resource coding feature information, wherein the time difference between corresponding behavior time and current time in the resource coding feature information corresponding to the multiple multimedia resources meets preset conditions.

the weight learning unit is configured to input the initial interest feature information and the first resource identification information corresponding to the second quantity of interest capsule networks into the weight learning layer for weight learning to obtain interest weight information, and the interest weight information represents the influence degree of the initial interest feature information corresponding to the second quantity of interest capsule networks on the interest of the target object;

and the weighting processing unit is configured to input the interest weight information and the initial interest characteristic information into the weighting processing layer for weighting processing to obtain target interest characteristic information.

Optionally, the apparatus further comprises:

the second information acquisition module is configured to execute acquisition of sample behavior sequence information of the sample object, third resource identification information of at least one sample multimedia resource and an annotation interest index, wherein the sample behavior sequence information comprises fourth resource identification information of a plurality of historical multimedia resources of the sample object, which have acted in a preset sample time period, and a first number of sample time sequence identification information corresponding to the plurality of historical multimedia resources, which is determined based on behavior time corresponding to the plurality of historical multimedia resources;

a resource type determination module configured to perform determining a plurality of resource types corresponding to a plurality of historical multimedia resources;

the sample occurrence probability determining module is configured to determine the sample occurrence probability of the resource identification information corresponding to each resource type in the sample behavior sequence information;

the second interest recognition module is configured to input the historical behavior sequence information and the third resource identification information into an interest recognition network to be trained for interest recognition to obtain a predicted interest index of the sample object to at least one sample multimedia resource and a corresponding predicted occurrence probability of a plurality of resource types;

and the interest recognition network training module is configured to execute training of the interest recognition network to be trained based on the target interest loss to obtain the interest recognition network.

Optionally, the target interest loss determining module includes:

a target interest loss determination unit configured to perform determining a target interest loss based on the initial interest loss information and the interest guidance loss information.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 is a block diagram illustrating an electronic device for multimedia resource recommendation, which may be a terminal according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 11. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of multimedia asset recommendation. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Fig. 12 is a block diagram illustrating an electronic device for multimedia resource recommendation, which may be a server, according to an exemplary embodiment, and an internal structure diagram thereof may be as shown in fig. 12. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of multimedia asset recommendation.

It will be understood by those skilled in the art that the configurations shown in fig. 11 or fig. 12 are only block diagrams of some configurations relevant to the present disclosure, and do not constitute a limitation on the electronic device to which the present disclosure is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the multimedia resource recommendation method as in the embodiments of the present disclosure.

In an exemplary embodiment, there is also provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a multimedia resource recommendation method in an embodiment of the present disclosure.

In an exemplary embodiment, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the multimedia asset recommendation method in the embodiments of the present disclosure.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A multimedia resource recommendation method is characterized by comprising the following steps:

2. The method for recommending multimedia resources of claim 1, wherein said interest recognition network comprises a coding network, said first number of basic capsule networks, a second number of interest capsule networks, a feature fusion network and an interest perception network, and said second number is a number corresponding to a preset resource type;

3. The method of claim 2, wherein the encoding network comprises: the method comprises a feature extraction network, a position coding network and at least one sequentially connected sub-coding network, wherein any sub-coding network comprises a self-attention network and a feed-forward neural network, and the encoding processing is performed on the historical behavior sequence information based on the coding network to obtain behavior coding information comprises the following steps:

4. The method of claim 3, further comprising:

5. The method of claim 2, further comprising:

6. The method as claimed in claim 2, wherein the feature fusion network is an attention learning network including a weight learning layer and a weighting processing layer, and the inputting of the initial interest feature information corresponding to the second number of interest capsule networks and the resource feature information corresponding to the first resource identification information into the feature fusion network for feature fusion to obtain the target interest feature information includes:

7. A multimedia resource recommendation apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia resource recommendation method of any of claims 1-6.

9. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the multimedia asset recommendation method of any of claims 1-6.

10. A computer program product comprising computer instructions, characterized in that said computer instructions, when executed by a processor, implement the multimedia asset recommendation method of any of claims 1 to 6.