CN114328987A

CN114328987A - Media content recall method, apparatus, device, storage medium and product

Info

Publication number: CN114328987A
Application number: CN202111111460.8A
Authority: CN
Inventors: 刘孟洋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2022-04-12

Abstract

The embodiment of the application discloses a media content recall method, a device, equipment, a storage medium and a product; the method and the device for obtaining the target media content can obtain the target media content and the graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has the content characteristics of the media content under at least one content dimension; searching for associated nodes in the graph network aiming at the target media content to determine an associated node sequence of the target media content; performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence; determining a target similar node similar to the target media content in the graph network according to the fused features; and performing content recall processing based on the media content corresponding to the target similar node. The scheme can improve the efficiency of content recall for the media content.

Description

Media content recall method, apparatus, device, storage medium and product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a product for recalling media content.

Background

Media is a medium for transmitting information, and media content is content that can be delivered through media, for example, media content may include text content, image content, video content, audio content, link content, and the like; as another example, the media content may be derived from a combination of content, such as page content, advertising content, and the like.

In the course of research and practice on the related art, the inventors of the present application found that, by performing a recall process on media content, a desired media content can be selected from a set of media content, and the recall efficiency of the media content needs to be improved due to the current inefficient processing method of the media content.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment, a storage medium and a product for recalling media content, which can improve the efficiency of content recall for the media content.

The embodiment of the application provides a media content recall method, which comprises the following steps:

acquiring target media content and a graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension;

performing association node search in the graph network aiming at the target media content to determine an association node sequence of the target media content, wherein the association node sequence comprises at least one association node of the association media content, and the association media content is the media content which has an association relation with the target media content;

performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence;

according to the fused features, determining target similar nodes similar to the target media content in the graph network;

and performing content recall processing based on the media content corresponding to the target similar node.

Correspondingly, an embodiment of the present application further provides a media content recall apparatus, including:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring target media content and a graph network, the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension;

a searching unit, configured to perform association node searching in the graph network for the target media content to determine an association node sequence of the target media content, where the association node sequence includes at least one association node of an association media content, and the association media content is a media content having an association relationship with the target media content;

the fusion unit is used for performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence;

a determining unit, configured to determine, according to the fused feature, a target similar node similar to the target media content in the graph network;

and the recalling unit is used for recalling the content based on the media content corresponding to the target similar node.

In one embodiment, the fusion unit includes:

the first fusion subunit is used for performing feature fusion on the content features of the associated nodes to obtain intra-node fusion features of the associated nodes;

the second fusion subunit is used for performing feature fusion on the intra-node fusion features based on the node types of the associated nodes to obtain intra-class fusion features corresponding to the node types;

and the third fusion subunit is used for performing feature fusion on the intra-class fusion features to obtain fused features of the associated node sequence.

In an embodiment, the associated nodes have content characteristics in at least one content dimension; the first fusion subunit is configured to:

performing feature extraction on the content features to obtain extracted features corresponding to each content dimension; and performing feature fusion on the extracted features to obtain intra-node fusion features of the associated nodes.

In an embodiment, the content dimension comprises a target content dimension; the first fusion subunit is specifically configured to:

performing feature extraction on the content features from at least one information reading direction based on context information in the content features to obtain target extraction features corresponding to the content features in each information reading direction; and carrying out feature fusion on the target extraction features to obtain extracted features corresponding to the target content dimensions.

In an embodiment, the node type comprises a target node type; the second fusion subunit is configured to:

determining a target associated node of the target node type; extracting the characteristics of the intra-node fusion characteristics of the target associated nodes to obtain extracted characteristics corresponding to the target node type; and performing feature fusion on the extracted features to obtain intra-class fusion features corresponding to the target node type.

In an embodiment, the association node sequence takes a target media content node as a starting node, and the target media content node is a node corresponding to the target media content; the third fusion subunit is configured to:

acquiring fusion characteristics of the target media content nodes in the corresponding target nodes in the associated node sequence; respectively determining fusion weights of the fusion features in the target node and the intra-class fusion features; and performing feature fusion on the target node internal fusion features and the class internal fusion features based on the fusion weight to obtain fused features of the associated node sequence.

In one embodiment, the search unit includes:

an updating subunit, configured to update the graph network based on the target media content to obtain an updated graph network, where the updated graph network includes a target node corresponding to the target media content;

the searching subunit is used for searching the associated node of the target node in the updated graph network;

and the combination subunit is used for combining the associated nodes to obtain an associated node sequence of the target media content.

In one embodiment, the target node comprises a target media content node for characterizing the target media content; the search subunit is configured to:

determining a node access path with the target media content node as an initial node in the updated graph network; and searching the associated nodes of the target media content nodes in the updated graph network according to the node access paths.

In one embodiment, the recall unit includes:

a network determination subunit, configured to determine an updated graph network, where the updated graph network is generated based on the target media content and the graph network, and the updated graph network includes a candidate media content node corresponding to at least one candidate media content;

an obtaining subunit, configured to obtain candidate post-fusion features corresponding to the candidate media content nodes;

a calculating subunit, configured to calculate, according to the fused feature, a similarity between the target media content and the candidate media content;

and the node determining subunit is used for determining a target similar node similar to the target media content from the candidate media content nodes based on the calculation result.

Accordingly, an embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the method for recalling media content according to the embodiment of the present application.

Accordingly, the present application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the recall method of media content as shown in the present application.

Accordingly, embodiments of the present application also provide a computer program product, which includes a computer program/instruction, and the computer program/instruction, when executed by a processor, implement the steps of the recall method of media content as shown in the embodiments of the present application.

The method and the device for obtaining the target media content and the graph network can obtain the target media content and the graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content under at least one content dimension; performing association node search in the graph network aiming at the target media content to determine an association node sequence of the target media content, wherein the association node sequence comprises at least one association node of the association media content, and the association media content is the media content which has an association relation with the target media content; performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence; according to the fused features, determining target similar nodes similar to the target media content in the graph network; and performing content recall processing based on the media content corresponding to the target similar node.

According to the scheme, the fused features corresponding to the target media content can be generated by using the graph network, so that the content recall processing can be performed on the target media content based on the fused features. Because the fused features are generated based on the graph network, the fused features can represent the association relationship of the media content on the topological structure, and the nodes in the graph network have the content features of the media content under at least one content dimension, so the fused features can also represent the association relationship of the media content under different content dimensions. In addition, because the scheme can search the associated nodes of the target media content and perform operations such as feature fusion from at least one fusion dimension, the fused features generated by the scheme can unify various types of data, multi-modal features and the topological structure of the graph network, and thus, when the scheme recalls the content based on the fused features, the fused features can measure and represent the target media content from multiple aspects, so that the scheme can improve the efficiency and accuracy of the recall.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a scenario of a media content recall method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a method for recalling media content provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a network for recalling media content according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an updated graph network of a recall method for media content according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of another method for recalling media content provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a recall device for media content provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of another exemplary embodiment of a recall apparatus for media content;

FIG. 8 is a schematic diagram of another exemplary embodiment of a recall apparatus for media content;

FIG. 9 is a schematic diagram of another exemplary embodiment of a recall apparatus for media content;

fig. 10 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a media content recall method, a device, equipment, a storage medium and a product. Specifically, the embodiment of the application provides a recall method and device of media content suitable for a computer device. The computer device may be a server or a terminal. Specifically, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc., but is not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The embodiment of the present application will take a computer device as an example to describe a method for recalling media content.

Referring to fig. 1, a server 10 may obtain target media content and a graph network, wherein the graph network includes at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension. For example, the server 10 may store the graph network, and thus, the server may acquire the graph network by extracting the stored data; for another example, the server 10 may obtain the graph network through another server, for example, receive the graph network sent by another server; as another example, the server 10 may obtain the graph network through the terminal 20, for example, receive the graph network transmitted by the terminal 20. Similarly, the manner in which the server obtains the target media content may refer to the manner in which the server obtains the graph network, which is not described herein again.

Further, the server 10 may perform association node lookup in the graph network for the target media content to determine an association node sequence of the target media content, where the association node sequence includes at least one association node of the association media content, and the association media content is a media content having an association relationship with the target media content; performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence; determining a target similar node similar to the target media content in the graph network according to the fused features; and performing content recall processing based on the media content corresponding to the target similar node.

In practice, the server 10 may transmit the recalled media contents to the terminal 20 to present the recalled media contents to the user through the terminal 20.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The method for recalling the media content can be executed by a server or a terminal, and can also be executed by the server and the terminal together; the embodiment of the present application is described by taking an example in which a method for recalling media content is executed by a server.

As shown in fig. 2, a specific flow of the media content recall method may be as follows:

101. the target media content and a graph network are obtained, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension.

Wherein a medium is a medium through which information is propagated. It refers to means for transmitting information and obtaining information by means of tools, channels, carriers, intermediaries or technical means, and also refers to means and means for transmitting information such as characters, voice and the like. Media can also be viewed as all the technical means that enable the transfer of information from an information source to a recipient. For example, the media may include cell phones, internet of things, television, radio, periodicals, newspapers, and the like.

The media content is content that can be delivered through media, for example, the media content may include text content, image content, video content, audio content, link content, and the like; as another example, the media content may be derived from a combination of content, such as page content, advertising content, and the like.

The content dimension refers to a dimension for describing content information included in the media content, for example, when the media content is specifically a video, the content dimension of the media content may include a screen dimension, an audio dimension, and a text dimension, and specifically, the video may include screen information in the screen dimension, such as a video screen and the like; may include audio information in the audio dimension, such as background music, dubbing information, etc.; and may include textual information in the text dimension such as video titles, video profiles, video captions, video categories, video tags, etc.

The content feature is a feature that characterizes content information included in the media content, and for example, the content feature may be specifically in the form of a feature vector.

As an example, when the media content is specifically a video, the media content may include screen information in a screen dimension, and the content feature corresponding to the screen information may be obtained by performing feature extraction on the screen information. For example, the content feature corresponding to the picture information may be acquired by a video embedding (video embedding) technique. Specifically, the video embedding technology may convert visual information in a video into corresponding feature vectors, such as embedding (embedding) vectors, through a multi-layer Convolutional Neural Network (CNN), and after the embedding (embedding), a full connection layer is accessed and an overall model is trained through a multi-classification task. In the multi-classification task, each video can be used as supervision information according to a series of labels of the category, scene, action form and the like to which the video belongs, so that the network can capture features in the video data, which are highly related to the labels. After training is completed, the video with similar visual layer has a closer Euclidean distance in the vector space through the embedded vector extracted by the model, and the embedded vector is the feature vector of the video in the picture dimension, namely the content feature of the video in the picture dimension.

Similarly, for content information in an audio form, content features corresponding to the content information may be obtained through an audio embedding (audio embedding) technology, where the content features are content features of media content in corresponding content dimensions; for content information in a text form, content features corresponding to the content information, that is, content features of media content in corresponding content dimensions, can be obtained through text embedding (textembedding) counting.

Among them, a Graph network is also called Graph (Graph), which is a discrete structure composed of nodes and edges connecting the nodes. In computer science, a graph is one of the most flexible data structures, and many problems can be solved by modeling using a graph model.

For example, the graph network may include a homogeneous graph and an heterogeneous graph, specifically, there is only one type of node in the homogeneous graph and only one type of connection relationship between nodes, for example, in a social network, only user nodes may be included, and only one connection relationship of "know" between user nodes is included, that is, users either know or do not know. The heterogeneous graph may include multiple types of nodes, and the connection relationship between the nodes may also be multiple.

It should be noted that, in the present application, a node corresponding to a media content refers to a node related to the media content, and therefore, the number of nodes corresponding to the media content may be multiple. For example, the nodes corresponding to the media content may include a media content node and a content attribute node, wherein the media content node may be used for characterizing the media content, and the content attribute node may be used for characterizing the content attribute of the media content. For example, the media content may be specifically a video, and nodes corresponding to a video may include a video node, a classification node, and a label node, where the video node may be used to represent the video, the classification node may be used to represent a video classification to which the video belongs, and the label node may be used to represent a video label of the video, in this example, the video node is a media content node, and the classification node and the label node are content attribute nodes.

In an embodiment, the media content may specifically be a video, and the graph network may specifically be an heteromorphic graph, and the graph network may include at least one node corresponding to the video, where the node corresponding to each video may specifically include a video node, a classification node, and a label node. Specifically, since each video may have multiple tags, and each tag may be shared by multiple videos, there may be many-to-many dependencies between video nodes and tag nodes; because each video can be set to have only one video classification, the video nodes and the classification nodes in the graph network can have a unique dependency relationship, namely each video node has one and only one classification node neighbor; since the same relationship may exist between different videos, for example, whether two videos are the same may be determined by a video fingerprint technique, so that the video nodes and the video nodes in the graph network may have the same relationship.

As a specific example of the above embodiment, refer to fig. 3, wherein video1 to video5 respectively represent video node 1 to video node 5, Tag1 to Tag2 respectively represent Tag node 1 and Tag node 2, and CLS1 and CLS2 respectively represent classification node 1 and classification node 2. Moreover, if the two videos have the same relationship, the video nodes corresponding to the two videos may have a connection line therebetween; if a certain video belongs to a certain video classification, a connection line can be arranged between a video node corresponding to the video and a classification node corresponding to the video classification; if a video has a video tag, a connection line may be provided between the video node corresponding to the video and the tag node corresponding to the video tag.

In this application, a graph network may include nodes corresponding to at least one media content, where each node may have content characteristics of the media content in at least one content dimension. In the above example, the graph network may include at least one video-corresponding node, each video-corresponding node may specifically include a video node, a classification node, and a label node, where the video node may have content features of the video in at least one content dimension, for example, the video node may have feature vectors of the video in a picture dimension, an audio dimension, and a text dimension, respectively; the classification node may have a feature vector of the video in a text dimension, and the feature vector table indicates a video classification to which the video belongs; the label node may have a feature vector of the video in the text dimension that may characterize the video label the video has.

In the present application, there are various ways for the server to obtain the graph network, for example, the server may store the graph network, so that the server may obtain the graph network by extracting the stored data; for another example, the server may obtain the graph network through another server, e.g., receive the graph network sent by another server; as another example, the server may obtain the graph network through the terminal, for example, receive the graph network transmitted by the terminal. Similarly, the manner in which the server obtains the target media content may refer to the manner in which the server obtains the graph network, which is not described in detail herein.

102. And aiming at the target media content, searching for associated nodes in the graph network to determine an associated node sequence of the target media content, wherein the associated node sequence comprises at least one associated node associated with the media content, and the associated media content is the media content having an associated relation with the target media content.

The associated node search is used for searching an associated node in the graph network, the associated node is a node corresponding to the associated media content in the graph network, and the associated media content is the media content having an association relation with the target media content.

For example, when the media contents are specifically videos, the association relationships between the media contents may be the same relationship, for example, it may be considered that the videos judged to be the same by the video fingerprint technology have the same relationship; for another example, when the media content is specifically an image, the association relationship between the media content may be a homogeneous relationship, for example, it may be considered that images belonging to the same category are judged to have a homogeneous relationship through the image classification model; for another example, when the media content is specifically an advertisement, the association relationship between the media content may be a similar relationship, for example, it may be considered that similar advertisements are judged to have a similar relationship by the advertisement analysis model; and so on.

The associated node sequence may include at least one associated node, where the associated node is a node corresponding to associated media content in the graph network, and the associated media content is media content having an association relationship with the target media content.

Since the graph network may be composed of nodes and edges connecting the nodes, for the target media content, the searching for the associated nodes in the graph network may be implemented by a topology structure based on the graph network. In one embodiment, a node associated with the target media content may be selected as a start node in the graph network, for example, a media node most similar to the target media content may be selected as a start node, and an association node lookup may be performed based on a connection between the start node and other nodes in the graph network.

In another embodiment, the step of updating the graph network based on the target media content to enable the associated node lookup in the updated graph network, specifically, "performing the associated node lookup in the graph network for the target media content to determine the associated node sequence of the target media content" may include:

updating the graph network based on the target media content to obtain an updated graph network, wherein the updated graph network comprises a target node corresponding to the target media content;

searching the associated node of the target node in the updated graph network;

and combining the associated nodes to obtain an associated node sequence of the target media content.

In this application, updating the graph network based on the target media content may be implemented by updating nodes in the graph network and edges connecting the nodes based on the target media content, so that the updated graph network includes target nodes corresponding to the target media content, for example, the target nodes may include target media content nodes, where the target media content nodes are used to represent the target media content.

For example, in a graph network, nodes corresponding to media content may include a media content node and a content attribute node, and then the graph network is updated based on target media content, specifically, a target media content node and a target content attribute node may be determined in the graph network, where the target media content node may be used to represent the target media content, and the target content attribute node may be used to represent a content attribute of the target media content; and determining the connection relation between the target media content node, the target content attribute node and the initial node in the graph network, namely determining the edges between the nodes, so as to update the graph network. In this way, the updated graph network may include the target node corresponding to the target media content, and in this example, the target node includes the target media content node and the target content attribute node.

As an example, the media content may be a video, the graph network may specifically be as shown in fig. 3, and the target media content may be a video6 (i.e., video6), where the video of the video6 is classified into a category 1 (i.e., CLS1), and the video label of the video6 is a label 3 (i.e., Tag3), and referring to fig. 4, a video node video6 and a label node Tag3 may be added to the graph network (since the graph network already has a classification node CLS1 that characterizes category 1, no classification node is added again). And adding a connecting line between the video6 node and the CLS1 node and a connecting line between the video6 node and the Tag3 node in the graph network. In addition, as the video6 and the video3 are determined to have the same relation through the video fingerprint technology, a connecting line between the video6 node and the video3 node can be newly added; and it is determined that the video tags of video3 and video4 both include Tag3, so a link between the video3 node and the Tag3 node and a link between the video4 node and the Tag3 node may be added. In this example, the graph network shown in fig. 3 is updated based on the video5, and the updated graph network shown in fig. 4 can be obtained, where the updated graph network includes the target nodes corresponding to the video6, and the target nodes include the video6 node, the CLS1 node, and the Tag3 node.

After the graph network is updated and the updated graph network is obtained, the associated nodes of the target nodes can be further searched in the updated graph network, so that the searched associated nodes can be combined to obtain the associated node sequence of the target media content.

In one embodiment, the target node may comprise a target media content node, and the target media content node may be used to characterize the target media content, for example, when the target media content is video6, the target media content node may be the video6 node in fig. 4. Therefore, searching for the associated node of the target node in the updated graph network may be implemented by searching for the associated node of the target media content node, and specifically, the step "searching for the associated node of the target node in the updated graph network" may include:

in the updated graph network, determining a node access path taking a target media content node as an initial node;

and searching the associated nodes of the target media content nodes in the updated graph network according to the node access paths.

The node access path is data obtained by recording a process of node access in the updated graph network starting from the start node, and for example, the node access path may be formed by nodes which are accessed in sequence in a node access process.

In the present application, there may be various ways to determine the node access path, for example, by taking the target media content node as a starting node, performing Random Walk (RW) in the updated graph network, and taking a correspondingly generated random walk sequence RW (v) as the node access path, where v is an abbreviation of vertex and represents a node vector in the updated graph network. Random walk is a more common way in graph learning, and the basic logic is to assume that an agent starts from a certain starting point on a graph, randomly select a neighbor of a current node with a certain probability, and then move to the node. This process is repeated at the second node, with random walks in the graph data, and the path it travels is a random walk sequence. The concrete representation form is a sequence of nodes, namely the nodes accessed by the agent. Agent refers to any entity with intelligence, and may include, for example, a human, intelligent hardware, intelligent software, and the like.

For another example, a target media content node may be used as a starting node, a restarted Random Walk (RWR) is performed in the updated graph network, and a correspondingly generated random walk sequence RWR (v) is used as a node access path, where v is an abbreviation of vertex and represents a node vector in the updated graph network. The random walk with restart is added with a restart mechanism on the basis of the random walk, namely, a certain probability exists on the basis of each walk to return to the original starting point. Such a mechanism emphasizes the probability of the occurrence of nodes around the origin, while avoiding the occurrence of nodes further away from the origin in the sequence.

After the node access path is obtained, the associated node of the target media content node can be searched in the updated graph network according to the node access path. Specifically, since the concrete expression form of the node access path may be a sequence formed by nodes, the nodes included in the node access path may be used as associated nodes of the target media content node in the updated graph network, and the associated nodes are searched in the updated graph network.

103. And performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence.

The fusion dimension refers to a dimension for describing feature fusion of content features of the associated nodes. For example, the fused dimensions may include intra-node fused dimensions, intra-class fused dimensions, inter-class fused dimensions, and the like.

As an example, in the present application, since the node corresponding to the media content has the content feature of the media content in at least one content dimension, and since the associated node is a node in the graph network, the associated node also has at least one content feature of the media content. In this way, the fusion dimension for feature fusion of the content features of the associated node may include an intra-node fusion dimension, and specifically, the fused feature of the associated node may be obtained by fusing the content features of the associated node. So that the fused features of the sequence of associated nodes can be determined further based on the fused features of the associated nodes.

As another example, the nodes corresponding to the media content may specifically include multiple types of nodes, for example, the nodes corresponding to the media content may include a media content node and a content attribute node (for example, when the media content is a video, the nodes corresponding to the video may include three types of nodes, namely a video node, a classification node, and a tag node, where the video node is the media content node, and the classification node and the tag node are both the content attribute nodes). Therefore, the associated node sequence may include different nodes belonging to the same node type, and in this way, the fusion dimension for performing feature fusion on the content features of the associated nodes in the associated node sequence may include an intra-class fusion dimension, and specifically, the fused feature of the node type may be obtained by fusing the content features corresponding to the different associated nodes belonging to the same type. Such that the fused features of the sequence of associated nodes can be determined based further on the fused features of the node type.

As another example, since the node types of the associated nodes in the associated node sequence may be at least one, the fusion dimension for performing feature fusion on the content features of the associated nodes in the associated node sequence may include an inter-class fusion dimension, and specifically, the fusion dimension may be obtained by fusing the content features corresponding to different node types to obtain a fused feature corresponding to a plurality of node types related to the associated node sequence in comprehensive consideration. Such that a fused feature of the sequence of associated nodes can be determined based further on the fused feature.

In an embodiment, intra-node fusion dimensions, intra-class fusion dimensions, and inter-class fusion dimensions may be taken into consideration to implement feature fusion of content features of associated nodes in an associated node sequence in at least one fusion dimension, so as to obtain post-fusion features of the associated node sequence. Specifically, the step of performing feature fusion on the content features of the associated nodes in the associated node sequence in at least one fusion dimension to obtain fused features of the associated node sequence may include:

performing feature fusion on the content features of the associated nodes to obtain intra-node fusion features of the associated nodes;

performing feature fusion on the intra-node fusion features based on the node types of the associated nodes to obtain intra-class fusion features corresponding to each node type;

and performing feature fusion on the intra-class fusion features to obtain fused features of the associated node sequences.

Hereinafter, the step "performing feature fusion on the content features of the associated nodes to obtain intra-node fusion features of the associated nodes" may be explained first.

In the application, the content features of the associated nodes are subjected to feature fusion from intra-node fusion dimensions, and the purpose is to fuse different content features of each associated node.

The feature fusion refers to a process of integrating input features and generating fused features. The way of implementing feature fusion can be various, for example, it can be implemented by feature concatenation (concatenate); as another example, this may be achieved by feature addition (add); and so on.

As an example, feature concatenation may concatenate two or more input features into one output feature, for example, if the dimensions of the two input features x and y are p and q, respectively, the dimension of the output feature obtained by feature concatenation of x and y may be p + q.

As another example, feature addition may be performed by performing an addition operation on two or more input features to obtain an output feature. For example, if the dimensions of the two input features x and y are both p, an output feature may be obtained by adding the features of x and y, and the dimension of the output feature may be p. It is noted that the feature addition may be implemented in various ways, and may include, for example, unweighted addition and weighted addition.

In an embodiment, considering that the associated node has content features under at least one content dimension, a fusion dimension of feature fusion may be performed on the content features of the associated node in the associated node sequence in the intra-node fusion dimension, that is, a fused feature of the associated node in the intra-node fusion dimension may be obtained by performing feature fusion on at least one content feature of the associated node. Specifically, the step of performing feature fusion on the content features of the associated node to obtain intra-node fusion features of the associated node may include:

extracting the characteristics of the content to obtain extracted characteristics corresponding to each content dimension;

and performing feature fusion on the extracted features to obtain intra-node fusion features of the associated nodes.

The feature extraction refers to a method of transforming input data to highlight that the input data has representative features, and particularly, the feature extraction can be used for extracting information which belongs to features in the input data.

In the present application, there are various ways of extracting features of content features, for example, the content features may be implemented by a Neural Network model, for example, by a Neural Network model based on a Recurrent Neural Network (RNN), or by a Neural Network model based on an attention mechanism; as another example, the method can be implemented by a neural network model based on a Long Short-Term Memory network (LSTM); and so on.

In this application, for each associated node, since the associated node may have a content feature in at least one content dimension, feature extraction may be performed on each content feature of the associated node, so as to obtain an extracted feature corresponding to the associated node in each feature dimension.

Specifically, a target content dimension may be selected from at least one content dimension, and the target content dimension is taken as an example. In this way, the content features of the associated node under each content dimension are extracted to obtain the extracted features corresponding to each content dimension of the associated node, that is, referring to the implementation manner of the step "extracting the features of the content to obtain the extracted features corresponding to the target content dimension", which is not described in detail herein.

In an embodiment, the content dimension may include a target content dimension, and specifically, the step "performing feature extraction on the content feature to obtain an extracted feature corresponding to the target content dimension" may include:

performing feature extraction on the content features from at least one information reading direction based on context information in the content features to obtain target extraction features corresponding to the content features in each information reading direction;

and carrying out feature fusion on the target extraction features to obtain extracted features corresponding to the target content dimensions.

The content feature may be composed of at least one feature element, for example, when the content feature is embodied in the form of a feature vector, at least one number may be included in the feature vector. When processing the content feature, the feature elements that make up the content feature may be taken as input, and the relationship between the "context" of the input may be taken into account so that when processing the content feature, it is implemented based on the context information in the content feature. Therefore, the context information in the content feature is information representing the context relationship between the feature elements.

In the present application, there are various ways of extracting features of content features based on context information in the content features, and for example, the RNN is considered as a model that can consider a relationship between input contexts, and therefore, the content features can be extracted based on the RNN. For another example, considering that the RNN can take into account context information in the content features, since the RNN is not suitable for processing information having a long-term history, feature extraction of the content features may be implemented based on the LSTM network.

The information reading direction is a direction for reading a feature element in the content feature when the content feature is taken as the input information. For example, the information reading direction for the content feature may include a forward direction and a backward direction. For example, the specific form of the content feature may be a feature vector, and the feature vector may be input into the privilege extraction model to implement feature extraction on the feature vector from the forward direction; the feature vector may be inverted and the inverted feature vector may be input into a feature extraction model to enable feature extraction of the feature vector from the back.

In an embodiment, the feature extraction model in the node fusion dimension may be a neural network model implemented based on LSTM, and feature extraction is performed on the content feature from the forward direction based on context information of the content feature to obtain a target extraction feature corresponding to the content feature in the forward direction, which may specifically refer to formula (1); based on the context information of the content features, feature extraction is performed on the content features from the backward direction to obtain target extraction features corresponding to the content features in the backward direction, and specifically, formula (2) may be referred to:

wherein x is_iFor the corresponding content feature of the target associated node in the content dimension i, FC (-) is a fully connected layer,

in order to provide a forward long-short term memory network,

is a backward long-short term memory network.

Further, after the target extraction features corresponding to the content features in each information reading direction are obtained, feature fusion can be performed on the target extraction features, and the extracted features corresponding to the target content dimensions are obtained.

For example, the feature fusion may be implemented by feature splicing, and then the feature splicing may be performed on the target extraction features corresponding to the target content dimensions in each information reading direction, and the spliced features are used as the extracted features corresponding to the target content dimensions.

In an embodiment, the feature extraction model in the intra-node fusion dimension may be a neural network model implemented based on LSTM, and feature extraction may be performed on the content feature in the target content dimension from two information reading directions, i.e., from the forward direction and the backward direction, to obtain target extraction features corresponding to the content feature in the two information reading directions, respectively. Further, feature fusion can be performed on the two target extraction features by performing feature splicing on the two target extraction features, and the spliced features are used as extracted features corresponding to the target content dimensions. Reference may be made specifically to formula (3):

wherein,

the feature concatenation operator.

In the application, after the extracted features corresponding to each content dimension of the associated node are obtained, feature fusion can be further performed on the extracted features corresponding to each content dimension to obtain intra-node fusion features of the associated node.

As an example, feature fusion is performed on the extracted features corresponding to each content dimension, which may specifically be implemented by feature addition. For example, mean calculation operation may be performed on the extracted features corresponding to each content dimension of the associated node, and the calculated mean result is used as the intra-node fusion feature of the associated node; for another example, a weighting calculation operation may be performed on the extracted features corresponding to each content dimension of the associated node, and a weighting result obtained by the calculation may be used as a node content fusion feature of the associated node; and so on.

In an embodiment, the feature extraction model in the intra-node fusion dimension may be a neural network model implemented based on LSTM, and feature extraction may be performed on the content feature in each content dimension of the associated node from the forward direction and the backward direction to obtain target extraction features of the content feature in the two information reading directions, respectively. And the two target extraction features can be subjected to feature splicing to realize feature fusion of the two target extraction features, and the spliced features are used as the extracted features corresponding to the content dimension. Further, feature fusion of the extracted features corresponding to the content dimensions of the associated node can be realized by performing mean calculation on the extracted features corresponding to the content dimensions of the associated node, and the calculated mean result is used as the intra-node fusion feature of the associated node, which can be referred to as formula (4)

Wherein, C_vRepresents the content dimension, | C, of the associated node_vI represents the number of content dimensions of the associated node, f₁(v) Representing intra-node fusion characteristics of the associated nodes.

Further, the following may explain the step "perform feature fusion on intra-node fusion features based on the node type of the associated node to obtain intra-class fusion features corresponding to each node type".

In the application, the content characteristics of the associated nodes in the associated node sequence are subjected to characteristic fusion from the intra-class fusion dimension, and the purpose is to perform characteristic fusion on the associated nodes of the same node type in the associated node sequence. Because each type of associated node can contain content features with the same attribute, feature fusion is performed from the intra-class fusion dimension, and the improvement of the overall stability of the model is facilitated.

The intra-node fusion characteristics corresponding to each associated node of the associated node type can be subjected to feature fusion aiming at each node type of the associated nodes in the associated node sequence, so as to obtain the intra-class fusion characteristics corresponding to the node type. Similarly, the manner of performing feature fusion on the intra-node fusion features may refer to an implementation manner of performing feature fusion on the content features of the associated nodes from the intra-node fusion dimensions, which is not described herein again.

Specifically, the target node type may be selected from the node types of the associated node sequences, and the target node type is taken as an example. In this way, the intra-node fusion features corresponding to each node type of the associated node are subjected to feature union to obtain intra-class fusion features corresponding to the node type, that is, referring to the step "performing feature fusion on the intra-node fusion features based on the node type of the associated node to obtain the intra-class fusion features corresponding to the target node type", which is not described in detail herein.

In an embodiment, the node type may include a target node type, and specifically, the step "performing feature fusion on the intra-node fusion feature based on the node type of the associated node to obtain an intra-class fusion feature corresponding to the target node type" may include:

determining a target associated node of the type of the target node;

performing feature extraction on the intra-node fusion features of the target associated nodes to obtain extracted features corresponding to the types of the target nodes;

and performing feature fusion on the extracted features to obtain intra-class fusion features corresponding to the target node type.

In the present application, the associated node belonging to the target node type in the associated node sequence may be taken as the target associated node. Moreover, the intra-node fusion characteristics of each target associated node may be determined, and specifically, the manner of determining the intra-node fusion characteristics of the target associated nodes may refer to the foregoing, which is not repeated herein.

Further, feature extraction can be performed on the intra-node fusion features of each target associated node to obtain extracted features corresponding to the target node type.

Similarly, the step of "performing feature extraction on the intra-node fusion feature of the target associated node to obtain the extracted feature corresponding to the target node type" may include:

based on the context information of the intra-node fusion features, performing feature extraction on the intra-node fusion features from at least one information reading direction to obtain target extraction features corresponding to the intra-node fusion features in each information reading direction; and carrying out feature fusion on the target extraction features to obtain extracted features corresponding to the target node type. The term explanation or step explanation referred to herein may refer to the aforementioned step "extracting the features of the content from at least one information reading direction based on the context information in the content features, to obtain the target extraction features corresponding to the content features in each information reading direction; and carrying out feature fusion on the target extraction features to obtain extracted features corresponding to the target content dimensions. "is not described in detail herein.

In an embodiment, the feature extraction model in the intra-class fusion dimension may be a neural network model implemented based on LSTM, and feature extraction is performed on the intra-node fusion features from the forward direction based on context information of the intra-node fusion features to obtain target extraction features corresponding to the intra-node fusion features in the forward direction, which may specifically refer to formula (5); based on the context information of the intra-node fusion features, feature extraction is performed on the intra-node fusion features from the backward direction to obtain target extraction features corresponding to the intra-node fusion features from the backward direction, and the formula (6) can be specifically referred to:

wherein v' is a target associated node, f₁(v') is an intra-node fusion feature of the target associated node,

in order to provide a forward long-short term memory network,

is a backward long-short term memory network.

Further, after the target extraction features corresponding to the fusion features in the nodes in each information reading direction are obtained, feature fusion can be performed on the target extraction features, and the extracted features corresponding to the types of the target nodes are obtained.

For example, the feature fusion may be implemented by feature splicing, and then the feature fusion features in the node may be subjected to feature splicing on the target extraction features corresponding to the respective information reading directions, and the spliced features may be used as the extracted features corresponding to the target node type.

In an embodiment, the feature extraction model in the intra-class fusion dimension may be a neural network model implemented based on LSTM, and feature extraction may be performed on the intra-node fusion features of the target associated node from the forward direction and the backward direction to obtain target extraction features corresponding to the intra-node fusion features in the two information reading directions, respectively. Further, feature fusion can be performed on the two target extraction features by performing feature splicing on the two target extraction features, and the spliced features are used as extracted features corresponding to the target node types. Reference may be made specifically to formula (7):

wherein,

the feature concatenation operator.

In the application, after the extracted features corresponding to each node type of the associated node sequence are obtained, feature fusion can be further performed on the extracted features corresponding to each node type to obtain the intra-class fusion features of the associated node sequence.

As an example, feature fusion is performed on the extracted features corresponding to each node type, which may specifically be implemented by feature addition. For example, mean calculation operation may be performed on the extracted features corresponding to each node type of the associated node sequence, and the calculated mean result is used as the intra-class fusion feature of the associated node sequence; for another example, a weighting calculation operation may be performed on the extracted features corresponding to each node type of the associated node sequence, and a weighting result obtained by the calculation may be used as a class content fusion feature of the associated node sequence; and so on.

In an embodiment, the feature extraction model in the intra-class fusion dimension may be a neural network model implemented based on LSTM, and feature extraction may be performed on the intra-node fusion feature of each associated node of the associated node sequence from the forward direction and the backward direction to obtain target extraction features corresponding to the intra-node fusion feature in the two information reading directions, respectively. And the two target extraction features can be subjected to feature splicing to realize feature fusion of the two target extraction features, and the spliced features are used as the extracted features corresponding to the node type. Further, feature fusion of the extracted features corresponding to each node type of the associated node sequence can be realized by performing mean calculation operation on the extracted features corresponding to each node type of the associated node sequence, and the calculated mean result is used as the intra-class fusion feature of the associated node sequence, which can be referred to as formula (8)

Wherein N is_t(v) Node type, | N, representing a sequence of related nodes_t(v) I represents a node of a sequence of related nodesThe number of the types of the electronic device,

an intra-class fusion feature representing a sequence of associated nodes.

Further, feature fusion may be performed on the content features in the associated node sequence from the inter-class fusion dimension, and specifically, the following step "performing feature fusion on intra-class fusion features to obtain post-fusion features of the associated node sequence" may be explained.

In the application, after the foregoing steps, corresponding intra-class fusion features may be generated for each node type of the associated node sequence, and feature fusion may be further performed on the intra-class fusion features to obtain the fused features of the associated node sequence.

In an embodiment, the association node sequence may use a target media content node as a start node, and the target media content node is a node corresponding to the target media content, and specifically, the step "performing feature fusion on the intra-class fusion features to obtain post-fusion features of the association node sequence" may include:

acquiring fusion characteristics of target media content nodes in corresponding target nodes in the associated node sequence;

respectively determining fusion weights of the fusion features in the target node and the intra-class fusion features;

and performing feature fusion on the fusion features in the target node and the intra-class fusion features based on the fusion weight to obtain fused features of the associated node sequence.

Specifically, with reference to the foregoing description, the intra-node fusion feature of the target media node in the associated node sequence is determined, and the intra-node fusion feature is used as the intra-node fusion feature of the target media content node in the associated node sequence.

The fusion weight is weight information related to the intra-class fusion feature corresponding to each node type of the target node intra-fusion feature and the associated node sequence when the features are fused.

In this application, the post-fusion feature of the association node sequence may be calculated through weighted calculation, specifically, for an association node sequence using a target media content node as a start node, the post-fusion feature of the association node sequence may be calculated through weighted calculation on a target intra-node fusion feature corresponding to a target intra-body content node in the association node sequence and an intra-class fusion feature corresponding to each node type of the association node sequence, and a weighting result obtained through calculation is used as the post-fusion feature of the association node sequence, so that the post-fusion feature is the inter-class fusion feature of the association node sequence.

As an example, feature fusion is performed on the intra-class fusion features to obtain an implementation process of the fused features of the associated node sequence, and reference may be specifically made to equation (9):

wherein f is₁(v) Representing the fusion characteristics of the target media content nodes in the corresponding target nodes in the associated node sequence, O_vA node type representing a sequence of associated nodes,

an intra-class fusion feature representing a sequence of related nodes, a^v,vFusion weight of fusion feature in target node, a^v,tFusion weight, ε, as an intra-class fusion feature_vNamely the fused feature of the associated node sequence, namely the inter-class fusion feature.

104. And according to the fused features, determining target similar nodes similar to the target media content in the graph network.

In this application, by referring to a manner of generating a post-fusion feature corresponding to a target media content (for distinction, the post-fusion feature corresponding to the target media content may be referred to as a target post-fusion feature), for each media content node in the graph network, a corresponding post-fusion feature is generated (for distinction, other media content nodes except the target media content node in the graph network may be referred to as candidate media content nodes; and the post-fusion feature corresponding to the candidate media content node may be referred to as a candidate post-fusion feature), so that a target similar node similar to the target media content may be determined in the graph network according to the target post-fusion feature and the candidate post-fusion feature. It should be noted that the graph network herein may be a graph network after update, or may be a graph network before update.

In an embodiment, the graph network may specifically be an updated graph network, and the step of determining candidate media content nodes similar to the target media content from the candidate media content nodes in the updated graph network by calculating a distance between the target fused feature and the candidate fused feature, and using the determined candidate media content nodes as target similar nodes similar to the target media content, specifically, the step of "determining target similar nodes similar to the target media content in the graph network according to the fused feature" may include:

determining an updated graph network, wherein the updated graph network is generated based on the target media content and the graph network, and the updated graph network comprises candidate media content nodes corresponding to at least one candidate media content;

acquiring candidate fused features corresponding to the candidate media content nodes;

calculating the similarity between the target media content and the candidate media content according to the fused features;

and determining a target similar node similar to the target media content from the candidate media content nodes based on the calculation result.

Since the updated graph network is explained and the manner of generating the updated graph network is introduced, the aforementioned contents may be referred to for the explanation of the step "determining the updated graph network", and details are not described herein.

And the candidate media content nodes are other media content nodes except the target media content node in the updated graph network.

Wherein, the candidate fused features corresponding to the candidate media content nodes refer to the fused features generated by the candidate media content nodes by referring to the steps mentioned above in the present application,

specifically, the media content characterized by the candidate media content node may be taken as the target media content, and the following steps are performed:

acquiring target media content and a graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content under at least one content dimension; searching for associated nodes in a graph network aiming at target media content, and determining an associated node sequence of the target media content, wherein the associated node sequence comprises at least one associated node associated with the media content, and the associated media content is the media content which has an associated relation with the target media content; and performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence, and taking the fused features as fused features corresponding to the target media content.

After determining the target fused features corresponding to the target media content nodes and the candidate fused features corresponding to the candidate media content nodes in the updated graph network, the similarity between the target media content and the candidate media content can be calculated based on the target fused features and the candidate fused features

Specifically, since the target media content node may be configured to characterize the target media content, and the candidate media content may be configured to characterize the candidate media content, the feature similarity between the target fused feature and the candidate fused feature may be determined by calculating a distance between the target fused feature and the candidate fused feature, and then the similarity between the target media content and the candidate media content may be determined according to the feature similarity. For example, the feature may be specifically in the form of a vector, and therefore, the feature similarity between the target fused feature and the candidate fused feature may be determined by calculating a vector space distance between the target fused feature and the candidate fused feature, for example, a common vector space distance may include a euclidean distance, a cosine distance, a manhattan distance, a mahalanobis distance, a hamming distance, and the like.

After the similarity between the target media content and the candidate media content is calculated, a target similar node similar to the target media content node can be determined from the candidate media content nodes based on the calculation result. For example, if the similarity between the candidate media content and the target media content is within a preset interval, taking the candidate media content node corresponding to the candidate media content as a target similar node similar to the target media content; as such, the candidate media content nodes may be ranked based on the similarity between the candidate media content and the target media content, and the candidate media content nodes in the preset proportion or the preset ranking may be used as target similar nodes similar to the target media content; and so on.

105. And performing content recall processing based on the media content corresponding to the target similar node.

In this application, a content recall refers to a process of selecting a desired content from a collection of contents, for example, a content similar to or related to a target media content may be selected by the content recall.

The content recall processing may be performed in various ways based on the media content corresponding to the target similar node, for example, the media content corresponding to the target similar node may be used as the recalled target media content to perform recall processing; for another example, the media content corresponding to the target similar node may be used as the reference media content to determine the target media content to be recalled from the content set, for example, the media content similar to or related to the reference media content in the content set may be used as the recalled target media content for recall processing; and so on.

As can be seen from the above, the present embodiment may obtain the target media content and the graph network, where the graph network includes at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension; aiming at the target media content, searching for associated nodes in the graph network to determine an associated node sequence of the target media content, wherein the associated node sequence comprises at least one associated node associated with the media content, and the associated media content is the media content having an associated relation with the target media content; performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence; determining a target similar node similar to the target media content in the graph network according to the fused features; and performing content recall processing based on the media content corresponding to the target similar node.

According to the scheme, the fused features corresponding to the target media content can be generated by using the graph network, so that the content recall processing can be performed on the target media content based on the fused features. Because the fused features are generated based on the graph network, the fused features can represent the association relationship of the media content on the topological structure, and the nodes in the graph network have the content features of the media content under at least one content dimension, so the fused features can also represent the association relationship of the media content under different content dimensions.

In addition, because the scheme can search the associated nodes of the target media content and perform operations such as feature fusion from at least one fusion dimension, the fused features generated by the scheme can unify various types of data, multi-modal features and the topological structure of the graph network. Therefore, when the content is recalled based on the fused features, the fused features can measure and represent the target media content from multiple aspects, and the efficiency and the accuracy of recall can be improved.

The method described in the above examples is further described in detail below by way of example.

In this embodiment, a media content recall device integrated in a server and a terminal is taken as an example for explanation, and as shown in fig. 5, a media content recall method specifically includes the following processes:

201. the server receives a media content recall request sent by the terminal, wherein the media content recall request comprises content identification information of target media content.

The content identification information is related information for identifying media content, and for example, the content identification information may include text information, image information, audio information, link information, and the like.

In an embodiment, the media content may specifically be a video, and a video client may be run on the terminal, and the video client may view the video based on the history of the user and recommend the video to the user. In particular, the video client may generate a video recall request, where the video recall request may include video identification information for a historically viewed video. The terminal can send the video recall request to the server to trigger the server to recall the video for the historical watching video of the user, so that the video recommendation to the user is realized.

202. The server obtains target media content and a graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension.

In an embodiment, the server may obtain the target media content according to the content identification information in the media content recall request, for example, the server may perform video search in a database according to the video identification information to obtain the historical watching video of the user.

The server may obtain the graph network in various ways, for example, the server may obtain the graph network by extracting the stored data; for another example, the server may obtain the graph network through another server, such as receiving the graph network sent by another server; for another example, the server may obtain the graph network through the terminal, such as receiving the graph network sent by the terminal; and so on.

203. The server searches for associated nodes in the graph network aiming at the target media content to determine an associated node sequence of the target media content, wherein the associated node sequence comprises at least one associated node associated with the media content, and the associated media content is the media content having an association relation with the target media content.

In an embodiment, the server may update the graph network based on the target media content to obtain an updated graph network, where the updated graph network includes a target node corresponding to the target media content; searching the associated node of the target node in the updated graph network; and combining the associated nodes to obtain an associated node sequence of the target media content, for example, the associated node sequence may specifically be a random walk sequence.

204. And the server performs feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence.

In an embodiment, the server may perform feature fusion on the content features of the associated nodes to obtain intra-node fusion features of the associated nodes; performing feature fusion on the intra-node fusion features based on the node types of the associated nodes to obtain intra-class fusion features corresponding to each node type; and performing feature fusion on the intra-class fusion features to obtain fused features of the associated node sequences.

As an example, in practical application, a sequence of associated nodes of target media content, for example, a random walk sequence, may be used as an input of the first neural network model to trigger the first neural network model to implement feature fusion on content features of the associated nodes, so as to obtain intra-node fusion features of the associated nodes; further, the obtained intra-node fusion features can be used as input of a second neural network model to trigger the second neural network model to realize node types based on the associated nodes, and feature fusion is performed on the intra-node fusion features to obtain intra-class fusion features corresponding to each node type; furthermore, the obtained intra-class fusion features can be used as the input of the third neural network model to trigger the third neural network model to realize feature fusion of the intra-class fusion features, so as to obtain the fused features of the associated node sequences.

205. And the server determines a target similar node similar to the target media content in the graph network according to the fused characteristics.

In an embodiment, the graph network may specifically be an updated graph network, and the server may determine candidate media content nodes similar to the target media content from the candidate media content nodes in the updated graph network by calculating a distance between the target fused feature and the candidate fused feature, and take the determined candidate media content nodes as target similar nodes similar to the target media content.

206. And the server performs content recall processing based on the media content corresponding to the target similar node.

For example, the media content corresponding to the target similar node may be taken as the recalled target media content for recall processing; for another example, the media content corresponding to the target similar node may be used as the reference media content to determine the target media content to be recalled from the content set, for example, the media content similar to or related to the reference media content in the content set may be used as the recalled target media content for recall processing; and so on.

207. And the server generates content recall response information according to the content recall processing result and sends the content recall response information to the terminal.

For example, the server may determine a target recall media content recalled according to the content recall processing result, generate content recall response information as a response to the media content recall request based on the target recall media content, and transmit the content recall response information to the terminal.

In an embodiment, the server may determine a recalled target recall video according to a video recall processing result for the history viewing video, generate video recall response information based on the target recall video as a response to a video recall request sent by the terminal, and send the video recall response information to the terminal, so that the terminal may recommend a video to the user based on the video recall response information.

As can be seen from the above, in the embodiment of the present application, the graph network may be used to generate the fused feature corresponding to the target media content, so that the content recall processing can be performed on the target media content based on the fused feature. Because the fused features are generated based on the graph network, the fused features can represent the association relationship of the media content on the topological structure, and the nodes in the graph network have the content features of the media content under at least one content dimension, so the fused features can also represent the association relationship of the media content under different content dimensions. In addition, because the scheme can search the associated nodes of the target media content and perform operations such as feature fusion from at least one fusion dimension, the fused features generated by the scheme can unify various types of data, multi-modal features and the topological structure of the graph network, and thus, when the scheme recalls the content based on the fused features, the target media content can be measured and characterized from multiple aspects due to the fused features, so that the efficiency and accuracy of the recall can be improved when the scheme recalls the media content.

In order to better implement the method, correspondingly, the embodiment of the application also provides a recall device of the media content, wherein the recall device of the media content can be integrated in a server or a terminal. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like.

For example, as shown in fig. 6, the recall apparatus for media content may include an acquisition unit 301, a search unit 302, a fusion unit 303, a determination unit 304, and a recall unit 305, as follows:

an obtaining unit 301, configured to obtain target media content and a graph network, where the graph network includes at least one node corresponding to the media content, and the node has a content feature of the media content in at least one content dimension;

a searching unit 302, configured to perform association node searching in the graph network for the target media content to determine an association node sequence of the target media content, where the association node sequence includes at least one association node of an association media content, and the association media content is a media content having an association relationship with the target media content;

the fusion unit 303 may be configured to perform feature fusion on the content features of the associated nodes in the associated node sequence in at least one fusion dimension to obtain fused features of the associated node sequence;

a determining unit 304, configured to determine, according to the fused feature, a target similar node similar to the target media content in the graph network;

the recalling unit 305 may be configured to perform content recall processing based on the media content corresponding to the target similar node.

In an embodiment, referring to fig. 7, the fusing unit 303 may include:

the first fusion subunit 3031 may be configured to perform feature fusion on the content features of the associated node to obtain an intra-node fusion feature of the associated node;

the second fusion subunit 3032 may be configured to perform feature fusion on the intra-node fusion features based on the node type of the associated node, to obtain intra-class fusion features corresponding to each node type;

the third fusion subunit 3033 may be configured to perform feature fusion on the intra-class fusion features to obtain fused features of the association node sequence.

In an embodiment, the associated nodes have content characteristics in at least one content dimension; the first fusion subunit 3031 may be configured to:

In an embodiment, the content dimension comprises a target content dimension; the first fusion subunit 3031 may be specifically configured to:

In an embodiment, the node type comprises a target node type; the second fusion subunit 3032 may be configured to:

In an embodiment, the association node sequence takes a target media content node as a starting node, and the target media content node is a node corresponding to the target media content; the third fusion subunit 3033 may be configured to:

In an embodiment, referring to fig. 8, the search unit 302 may include:

an updating subunit 3021, configured to update the graph network based on the target media content, to obtain an updated graph network, where the updated graph network includes a target node corresponding to the target media content;

a searching subunit 3022, configured to search, in the updated graph network, a relevant node of the target node;

the combining subunit 3023 may be configured to combine the associated nodes to obtain an associated node sequence of the target media content.

In one embodiment, the target node comprises a target media content node for characterizing the target media content; the lookup subunit 3022 may be configured to:

In an embodiment, referring to fig. 9, the recall unit 305 may include:

a network determination subunit 3051, configured to determine an updated graph network, where the updated graph network is generated based on the target media content and the graph network, and the updated graph network includes a candidate media content node corresponding to at least one candidate media content;

an obtaining subunit 3052, configured to obtain candidate post-fusion features corresponding to the candidate media content nodes;

a calculating subunit 3053, configured to calculate, according to the fused feature, a similarity between the target media content and the candidate media content;

a node determining subunit 3054, configured to determine, based on the calculation result, a target similar node similar to the target media content from the candidate media content nodes.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in the recall apparatus for media content of this embodiment, the obtaining unit 301 may be configured to obtain the target media content and a graph network, where the graph network includes at least one node corresponding to the media content, and the node has a content feature of the media content in at least one content dimension; the searching unit 302 is configured to perform association node searching in the graph network for the target media content to determine an association node sequence of the target media content, where the association node sequence includes at least one association node of an association media content, and the association media content is a media content having an association relationship with the target media content; the fusion unit 303 is configured to perform feature fusion on the content features of the associated nodes in the associated node sequence in at least one fusion dimension to obtain fused features of the associated node sequence; the determining unit 304 is configured to determine a target similar node similar to the target media content in the graph network according to the merged features; is used by the recall unit 305 for content recall processing based on the media content corresponding to the target similar node.

According to the scheme, the fused features corresponding to the target media content can be generated by using the graph network, so that the content recall processing can be performed on the target media content based on the fused features. Because the fused features are generated based on the graph network, the fused features can represent the association relationship of the media content on the topological structure, and the nodes in the graph network have the content features of the media content under at least one content dimension, so the fused features can also represent the association relationship of the media content under different content dimensions. In addition, because the scheme can search the associated nodes of the target media content and perform operations such as feature fusion from at least one fusion dimension, the fused features generated by the scheme can unify various types of data, multi-modal features and the topological structure of the graph network, and thus, when the scheme recalls the content based on the fused features, the target media content can be measured and characterized from multiple aspects due to the fused features, so that the efficiency and accuracy of the recall can be improved when the scheme recalls the media content.

In addition, the embodiment of the present application further provides a computer device, which may be a server or a terminal, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. Fig. 10 is a schematic diagram showing a structure of a computer device according to an embodiment of the present application, specifically:

the computer device may include components such as a memory 401 including one or more computer-readable storage media, an input unit 402, a processor 403 including one or more processing cores, and a power supply 404. Those skilled in the art will appreciate that the computer device architecture illustrated in FIG. 10 is not intended to be limiting of computer devices and may include more or less components than those illustrated, or combinations of certain components, or different arrangements of components. Wherein:

the memory 401 may be used to store software programs and modules, and the processor 403 executes various functional applications and data processing by operating the software programs and modules stored in the memory 401. The memory 401 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer device, and the like. Further, the memory 401 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 401 may further include a memory controller to provide the processor 403 and the input unit 402 with access to the memory 401.

The input unit 402 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, input unit 402 may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 403, and can receive and execute commands sent by the processor 403. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 402 may include other input devices in addition to a touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The processor 403 is a control center of the computer device, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 401 and calling data stored in the memory 401, thereby performing overall monitoring of the mobile phone. Optionally, processor 403 may include one or more processing cores; preferably, the processor 403 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 403.

The computer device also includes a power supply 404 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 403 via a power management system that may be used to manage charging, discharging, and power consumption. The power supply 404 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the computer device may further include a camera, a bluetooth module, etc., which will not be described herein. Specifically, in this embodiment, the processor 403 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the processor 403 runs the application program stored in the memory 401, so as to implement various functions as follows:

acquiring target media content and a graph network, wherein the graph network comprises at least one node corresponding to the media content, and the node has content characteristics of the media content in at least one content dimension; performing association node search in the graph network aiming at the target media content to determine an association node sequence of the target media content, wherein the association node sequence comprises at least one association node of the association media content, and the association media content is the media content which has an association relation with the target media content; performing feature fusion on the content features of the associated nodes in the associated node sequence on at least one fusion dimension to obtain fused features of the associated node sequence; according to the fused features, determining target similar nodes similar to the target media content in the graph network; and performing content recall processing based on the media content corresponding to the target similar node.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, the computer device of this embodiment may generate the fused feature corresponding to the target media content by using the graph network, so as to perform content recall processing on the target media content based on the fused feature. Because the fused features are generated based on the graph network, the fused features can represent the association relationship of the media content on the topological structure, and the nodes in the graph network have the content features of the media content under at least one content dimension, so the fused features can also represent the association relationship of the media content under different content dimensions. In addition, because the computer device can search the associated nodes of the target media content and perform operations such as feature fusion from at least one fusion dimension, the fused features generated by the computer device can unify various types of data, multi-modal features and topological structures of the graph network, and thus, when the computer device recalls the content based on the fused features, the target media content can be measured and represented from multiple aspects due to the fused features, so that the efficiency and accuracy of the recall can be improved when the computer device recalls the media content.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the media content recall methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in the method for recalling any media content provided in the embodiment of the present application, the beneficial effects that can be achieved by the method for recalling any media content provided in the embodiment of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and execution of the computer instructions by the processor causes the computer device to perform the methods provided in the various alternative implementations of the recall aspect of media content described above.

The method, apparatus, device, storage medium, and product for recalling media content provided in the embodiments of the present application are described in detail above, and specific examples are applied herein to illustrate the principles and implementations of the present application, and the description of the above embodiments is only used to help understand the method and its core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for recalling media content, comprising:

2. The method for recalling media content according to claim 1, wherein performing feature fusion on the content features of the associated nodes in the associated node sequence in at least one fusion dimension to obtain fused features of the associated node sequence comprises:

and performing feature fusion on the intra-class fusion features to obtain fused features of the associated node sequence.

3. The method of recalling media content according to claim 2, characterized in that said associated node has content characteristics in at least one content dimension;

performing feature fusion on the content features of the associated nodes to obtain intra-node fusion features of the associated nodes, including:

performing feature extraction on the content features to obtain extracted features corresponding to each content dimension;

4. The method of claim 3, wherein the content dimension comprises a target content dimension;

performing feature extraction on the content features to obtain extracted features corresponding to the target content dimensions, including:

5. The method of recalling media content according to claim 2, characterized in that said node type comprises a target node type;

based on the node type of the associated node, performing feature fusion on the intra-node fusion features to obtain intra-class fusion features corresponding to the target node type, including:

determining a target associated node of the target node type;

extracting the characteristics of the intra-node fusion characteristics of the target associated nodes to obtain extracted characteristics corresponding to the target node type;

6. The method for recalling media content according to claim 2, wherein the associated node sequence starts with a target media content node, the target media content node is a node corresponding to the target media content;

performing feature fusion on the intra-class fusion features to obtain fused features of the associated node sequence, including:

acquiring fusion characteristics of the target media content nodes in the corresponding target nodes in the associated node sequence;

and performing feature fusion on the target node internal fusion features and the class internal fusion features based on the fusion weight to obtain fused features of the associated node sequence.

7. The method for recalling media content according to claim 1, wherein for the target media content, performing an association node lookup in the graph network to determine an association node sequence of the target media content comprises:

searching the associated node of the target node in the updated graph network;

8. The method of claim 7, wherein the target node comprises a target media content node, the target media content node being configured to characterize the target media content;

searching the associated node of the target node in the updated graph network, including:

determining a node access path with the target media content node as an initial node in the updated graph network;

9. The method for recalling media content according to claim 1, wherein determining a target similar node similar to the target media content in the graph network according to the fused feature comprises:

10. An apparatus for recalling media content, comprising:

11. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the method for recalling media content according to any one of claims 1 to 9.

12. A computer-readable storage medium storing instructions adapted to be loaded by a processor to perform the steps of the method for recalling media content according to any one of claims 1 to 9.

13. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps in the method for recalling media content according to any of claims 1 to 9.