CN111984803B

CN111984803B - Multimedia resource processing method and device, computer equipment and storage medium

Info

Publication number: CN111984803B
Application number: CN202010845865.3A
Authority: CN
Inventors: 张志伟; 李岩; 吴丽军
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2021-08-17
Anticipated expiration: 2040-08-20
Also published as: CN111984803A

Abstract

The disclosure relates to a multimedia resource processing method, a multimedia resource processing device, computer equipment and a storage medium, and belongs to the technical field of computers. The multimedia resource processing method provided by the embodiment of the disclosure considers that a user tends to feed back the same or similar multimedia resources, introduces an information source of behavior data, is not limited to existing sample data, breaks through a data bottleneck, converts the behavior data into graph data, maps account numbers and multimedia resources into nodes in a graph, and enables connection relations between the nodes to vividly and clearly represent feedback relations between a plurality of account numbers and a plurality of multimedia resources.

Description

Multimedia resource processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a multimedia resource processing method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, deep learning has made a breakthrough in the fields of natural language processing, text translation and other related content understanding.

In the related art, a multimedia resource processing method generally includes the steps of inputting multimedia resources into a multimedia resource processing model after the multimedia resources are obtained, classifying the multimedia resources due to the fact that the multimedia resource processing model carries out feature extraction on the content of the multimedia resources, and determining classification results of the multimedia resources.

However, these developments are heavily dependent on the scale of the training data, and it is difficult to train a multimedia resource processing model with high accuracy due to less training data, and if the training samples are added by manual labeling, a lot of manpower is required, and the efficiency is low, so that the data becomes the most important bottleneck in applying these techniques to the actual production environment.

Disclosure of Invention

The present disclosure provides a multimedia resource processing method, an apparatus, a computer device and a storage medium, and provides a novel multimedia resource processing method, which breaks through a data bottleneck and has high classification efficiency. The technical scheme of the disclosure is as follows:

according to an aspect of the embodiments of the present disclosure, there is provided a multimedia resource processing method, including:

acquiring target behavior data, wherein the target behavior data is used for expressing feedback relations between a plurality of accounts and a plurality of multimedia resources, and a first multimedia resource in the plurality of multimedia resources corresponds to a target classification result;

generating graph data according to the target behavior data, wherein an account node corresponding to any account in the graph data is connected with a multimedia resource node corresponding to a target multimedia resource, and the target multimedia resource and the any account have the feedback relationship;

and obtaining a classification result of a second multimedia resource node corresponding to a second multimedia resource based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relation between the multimedia resource node and the account node in the graph data, wherein the second multimedia resource is a multimedia resource except the first multimedia resource in the plurality of multimedia resources.

Optionally, the generating graph data according to the target behavior data includes:

mapping a plurality of accounts represented by the target behavior data into a plurality of account nodes in the graph data;

mapping a plurality of multimedia assets represented by the target behavior data to a plurality of multimedia asset nodes in the graph data;

and connecting the account node corresponding to the account with the feedback relationship with the multimedia resource node corresponding to the multimedia resource to obtain the edge in the graph data.

Optionally, the obtaining a classification result of a second multimedia resource node corresponding to a second multimedia resource based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and an account node in the graph data includes:

inputting the graph data into a graph neural network, and training the graph neural network based on a target classification result of the first multimedia resource node in the graph data;

and based on the trained graph neural network, performing feature extraction on the second multimedia resource node in the graph data according to target data corresponding to the connection relation, classifying based on the extracted features, and outputting a classification result of the second multimedia resource node.

Optionally, the training the graph neural network based on the target classification result of the first multimedia resource node in the graph data includes:

based on the graph neural network, performing feature extraction on the first multimedia resource node according to first data corresponding to the connection relation, and classifying based on the extracted first feature to obtain a prediction classification result of the first multimedia resource node;

obtaining prediction accuracy according to the prediction classification result and the target classification result of the first multimedia resource node;

and adjusting the network parameters of the graph neural network according to the prediction accuracy until the network parameters meet the target conditions.

Optionally, the performing, based on the trained graph neural network, feature extraction on the second multimedia resource node in the graph data according to target data corresponding to the connection relationship includes:

and performing feature extraction on data on at least one target path taking the second multimedia resource node as a starting point in the graph data based on the trained graph neural network, wherein the number of nodes on the target path is a first target threshold value, and the number of nodes on the target path is a second target threshold value.

Optionally, the obtaining target behavior data includes:

acquiring a behavior log, wherein the behavior log comprises behavior data of a plurality of account numbers, and the behavior data of each account number is used for indicating whether the account number has a feedback relationship with a plurality of multimedia resources;

and determining the behavior data corresponding to any account and any multimedia resource as first target behavior data in response to the fact that the account and the multimedia resource have a feedback relationship.

Optionally, the first target behavior data includes a plurality of account identifiers, a plurality of multimedia resource identifiers, and a feedback relationship between an account corresponding to any account identifier and a multimedia resource corresponding to any multimedia resource identifier;

the acquiring of the target behavior data further includes:

acquiring a plurality of multimedia resources corresponding to the multimedia resource identifications according to the multimedia resource identifications;

extracting features of the multimedia resources, classifying the multimedia resources based on the extracted features to obtain a classification result of each multimedia resource, wherein the classification result is used for indicating a predicted multimedia resource label corresponding to each multimedia resource and a probability value of the predicted multimedia resource label of each multimedia resource;

determining the multimedia resource with the probability value larger than the threshold value of the probability value in the classification result as a first multimedia resource, determining the classification result of the first multimedia resource as a target classification result corresponding to the first multimedia resource, and determining the target classification result corresponding to the first multimedia resource as second target behavior data.

Optionally, the probability value threshold is set by a relevant technician according to requirements, or the probability value threshold is determined by statistics of probability values in the classification results of the plurality of multimedia resources, or the probability value threshold is determined by statistics of probability values in the classification results of other plurality of multimedia resources.

Optionally, the behavior log is a behavior log in a target time period.

According to a second aspect of the embodiments of the present disclosure, there is provided a multimedia resource processing apparatus, including:

the system comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is configured to execute target behavior data acquisition, the target behavior data is used for representing feedback relations between a plurality of accounts and a plurality of multimedia resources, and a target classification result corresponds to a first multimedia resource in the plurality of multimedia resources;

the generating unit is configured to execute generating graph data according to the target behavior data, an account node corresponding to any account in the graph data is connected with a multimedia resource node corresponding to a target multimedia resource, and the target multimedia resource and the any account have the feedback relationship;

the processing unit is configured to execute a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and an account node in the graph data, and obtain a classification result of a second multimedia resource node corresponding to a second multimedia resource, where the second multimedia resource is a multimedia resource other than the first multimedia resource in the plurality of multimedia resources.

Optionally, the generating unit is configured to perform:

Optionally, the processing unit comprises a training subunit and a classification subunit;

the training subunit is configured to input the graph data into a graph neural network, and train the graph neural network based on a target classification result of the first multimedia resource node in the graph data;

the classification subunit is configured to perform feature extraction on the second multimedia resource node in the graph data according to target data corresponding to the connection relation based on the trained graph neural network, perform classification based on the extracted features, and output a classification result of the second multimedia resource node.

Optionally, the training subunit is configured to perform:

Optionally, the classification subunit is configured to perform feature extraction on data on at least one target path in the graph data, where the at least one target path takes the second multimedia resource node as a starting point, based on the trained graph neural network, where the number of nodes on the target path is a first target threshold, and the number of upper edges of the target path is a second target threshold.

Optionally, the obtaining unit is configured to perform:

the obtaining unit is further configured to perform:

Optionally, the behavior log is a behavior log in a target time period.

According to a third aspect of embodiments of the present disclosure, there is provided a computer device comprising:

one or more processors;

one or more memories for storing the one or more processor-executable program codes;

wherein the one or more processors are configured to execute the program code to implement any of the multimedia asset processing methods described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having program code embodied therein, which when executed by one or more processors of a computer device, enables the computer device to perform any one of the above-described multimedia asset processing methods.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising one or more program codes which, when executed by one or more processors of a computer device, enable the computer device to perform any of the multimedia asset processing methods described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the embodiment of the disclosure provides a novel multimedia resource processing mode, considering that a user tends to feed back the same or similar multimedia resources, the information source of behavior data is introduced, the data bottleneck is broken through without being limited to the existing sample data, the behavior data can represent the feedback relationship between each account and each multimedia resource, part of the multimedia resources in the behavior data are multimedia resources with known classification results, the behavior data are converted into graph data by arranging the behavior data, the accounts and the multimedia resources are mapped into nodes in the graph, the connection relationship between the nodes can vividly and clearly represent the feedback relationship between a plurality of accounts and a plurality of multimedia resources, and further through the connection relationship between the nodes, the overall understanding of the user behavior is provided, and the multimedia resources according to the known classification results can be obtained, the method has the advantages that the multimedia resources with unknown classification results are accurately classified, the data bottleneck is broken through, training samples are not required to be added through manual labeling, all the multimedia resources can be classified through the existing data, and the classification efficiency is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating the structure of a graph-convolutional neural network, according to an exemplary embodiment;

FIG. 2 is a diagram illustrating an environment for implementing a method for processing multimedia assets, according to an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of multimedia asset processing according to an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a method of multimedia asset processing according to an exemplary embodiment;

FIG. 5 is a block diagram illustrating a multimedia asset processing device according to an exemplary embodiment;

FIG. 6 is a block diagram illustrating the structure of a terminal according to one exemplary embodiment;

fig. 7 is a schematic diagram illustrating a configuration of a server according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," "third," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The user information to which the present disclosure relates may be information authorized by the user or sufficiently authorized by each party.

The following terms related to the present disclosure are explained as follows.

(1) Graph data

Graph data refers to data stored in the form of a graph, and is also referred to as graph model, graph representation, or graph structure data. The graph data includes at least one node, each node having a corresponding characteristic, and at least one edge for representing a connection relationship between different nodes. Alternatively, the graph data is defined as G ═ V, E. Wherein G represents graph data, V represents a set of nodes in the graph data, and E represents a set of edges in the graph data. Optionally, the edges of the graph data have weights, and the weights of the edges represent the attributes of the connection relationships.

(2) Graph neural network

Graph Neural Network (GNN) is a generic name of a model applied to a Graph by a Neural Network, and includes Graph Convolutional Neural Network (GCN), Graph attention Network, and the like. The graph neural network is used for predicting the category of the graph data according to the structural features of the graph data. In particular, the graph neural network may include one or more feature extraction layers. The feature extraction layer is, for example, Graph Convolution Layers (GCL). The feature extraction layer is used for extracting structural features of the graph data. If the two graphs are isomorphic, the graph data of the two graphs will have similar structural features after passing through the feature extraction layer. If the two graphs are heterogeneous, the graph data of the two graphs will have different structural features after passing through the feature extraction layer. Thus, the graph neural network is able to map graph structures with homogeneous properties into the same representation domain and output the same classes. For example, fig. 1 is a schematic structural diagram illustrating a graph convolutional neural network according to an exemplary embodiment, and as shown in fig. 1, in a specific example, data input to the graph convolutional neural network (GCN) is graph data, where the graph data includes node data and edge data in a graph, the node data is X, and the edge data is an adjacency matrix a. And processing the data by a Relu function after each layer in the graph convolution neural network, and obtaining Outputs (Outputs) at the last layer. Optionally, in this embodiment of the present disclosure, the node data X is included in an adjacency matrix a, and the horizontal and vertical coordinates of the adjacency matrix a are the node data X.

(3) Graph convolution network

Graph convolution networks are a type of graph neural network that employs graph convolution. The graph convolution network includes at least one graph convolution layer. The graph convolution layer functions similarly to a feature extractor, where the object of feature extraction is graph data and the extracted features are structural features contained in the graph data. Specifically, the graph convolution layer includes a plurality of convolution operators, the convolution operators are also called convolution kernels, the convolution kernels can be essentially a weight matrix, weight values in the weight matrix are obtained through a model training stage, and each weight matrix formed by the trained weight values can be used for extracting features from input graph data, so that the graph convolution network can perform correct prediction in an application stage.

Optionally, the graph data is input to the graph convolution network in the form of a matrix of data, and nodes, edges and weights in the graph data are represented by values in the matrix. For example, the characteristics of the nodes in the graph data are represented by a matrix X of dimension N × D, where N represents the number of nodes in the graph data, i.e., the number of nodes included in the set V described in (1) above. D represents the dimension of the feature vector of each node. The feature value in the feature vector of the node is, for example, a value of an attribute of the node. For example, if a node has three attributes, the feature vector of the node includes three attribute values corresponding to the three attributes, and D takes 3. The feature of the edge in the graph data is represented by an adjacency matrix a of N × N dimensions, for example, if there is an edge between two nodes, the value of the adjacency matrix a corresponding to the two nodes is 1, and if there is no edge between the two nodes, the value of the adjacency matrix a corresponding to the two nodes is 0, and the connection relationship between any two nodes in the N nodes can be specified by the adjacency matrix a. N is a positive integer.

(4) Graph convolution processing

The function of the graph convolution layer for realizing feature extraction is realized through graph convolution processing. The graph convolution process is an operation of performing nonlinear transformation on input data. For the first graph convolution layer of the graph convolution network, the input data of the graph convolution processing is graph data; for the second to last graph convolution layer, the input data for the graph convolution process is the output of the previous graph convolution layer.

Fig. 2 is a schematic diagram of an implementation environment of a multimedia resource processing method according to an exemplary embodiment, and as shown in fig. 2, the implementation environment includes at least one terminal 101 and a multimedia resource processing platform 110. At least one terminal 101 is connected to the multimedia resource processing platform 110 through a wireless network or a wired network.

The multimedia resource processing platform 110 is, for example, at least one of a terminal, one or more servers, a cloud computing platform, and a virtualization center.

The terminal 101 is, for example, at least one of a smartphone, a game console, a desktop computer, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The terminal is installed and operated with a multimedia resource application. The application program may be a client application or a browser application.

The multimedia resource processing platform 110 is used for providing multimedia resource services for the terminal 101. In particular, the terminal 101 can be used to obtain a multimedia asset from the multimedia asset processing platform 110 for presentation. The terminal 101 is also capable of publishing multimedia assets to the multimedia asset processing platform 110. The multimedia resource processing platform 110 can obtain the behavior data of the account from at least one terminal 101, and process the multimedia resource based on the behavior data. For example, the multimedia resource processing platform 110 can obtain click behavior data of a user on a multimedia resource, and determine a corresponding tag for the multimedia resource without the multimedia resource tag according to the click behavior data and the multimedia resource with the existing multimedia resource tag.

Generally, the user's preference is relatively clear, the user will tend to show the same or similar multimedia resources, perform a feedback operation on the same or similar multimedia resources, and perform a feedback operation on two multimedia resources, the contents of the two multimedia resources may be the same or similar, and thus, the classification results may be the same. In the embodiment of the present disclosure, in consideration of the above characteristics, behavior data is introduced as a data basis for multimedia resource classification. And forming a star-shaped network by taking the user as a center, wherein when one multimedia resource in the behavior data has a multimedia resource label, other similar multimedia resources can be given the same multimedia resource label.

The following describes the flow of the multimedia resource processing method through the embodiments shown in fig. 3 and fig. 4.

Fig. 3 is a flowchart illustrating a multimedia resource processing method according to an exemplary embodiment, where the multimedia resource processing method is used in a computer device, such as a terminal or a server, as shown in fig. 3, and includes the following steps.

In step S31, target behavior data is obtained, where the target behavior data is used to represent feedback relationships between multiple accounts and multiple multimedia resources, and a first multimedia resource in the multiple multimedia resources corresponds to a target classification result.

In step S32, graph data is generated according to the target behavior data, where an account node corresponding to any account in the graph data is connected to a multimedia resource node corresponding to a target multimedia resource, and the target multimedia resource has the feedback relationship with the any account.

In step S33, a classification result of a second multimedia resource node corresponding to a second multimedia resource is obtained based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and the account node in the graph data, where the second multimedia resource is a multimedia resource other than the first multimedia resource in the plurality of multimedia resources.

Optionally, generating graph data according to the target behavior data includes:

mapping a plurality of multimedia resources represented by the target behavior data into a plurality of multimedia resource nodes in the graph data;

Optionally, the obtaining a classification result of a second multimedia resource node corresponding to a second multimedia resource based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and the account node in the graph data includes:

obtaining the prediction accuracy according to the prediction classification result and the target classification result of the first multimedia resource node;

and adjusting the network parameters of the neural network of the graph according to the prediction accuracy until the network parameters meet the target conditions.

Optionally, the performing, based on the trained graph neural network, feature extraction on the second multimedia resource node in the graph data according to the target data corresponding to the connection relationship includes:

and based on the trained graph neural network, performing feature extraction on data on at least one target path which takes the second multimedia resource node as a starting point in the graph data, wherein the number of the nodes on the target path is a first target threshold value, and the number of the nodes on the target path is a second target threshold value.

Optionally, the obtaining target behavior data includes:

the acquiring of the target behavior data further includes:

extracting the characteristics of the multimedia resources, classifying the multimedia resources based on the extracted characteristics to obtain a classification result of each multimedia resource, wherein the classification result is used for indicating a predicted multimedia resource label corresponding to each multimedia resource and a probability value of the predicted multimedia resource label of each multimedia resource;

and determining the multimedia resource with the probability value larger than the threshold value of the probability value in the classification result as a first multimedia resource, determining the classification result of the first multimedia resource as a target classification result corresponding to the first multimedia resource, and determining the target classification result corresponding to the first multimedia resource as second target behavior data.

Optionally, the behavior log is a behavior log in a target time period.

Fig. 4 is a flowchart illustrating a multimedia asset processing method according to an exemplary embodiment, where the multimedia asset processing method is used in a computer device, as shown in fig. 4, and includes the following steps.

In step S41, the computer device obtains a behavior log, where the behavior log includes behavior data of a plurality of accounts, and the behavior data of each account is used to indicate whether the account has a feedback relationship with a plurality of multimedia resources.

For the behavior log, the user can log in an account in a multimedia resource application of the terminal, and one or more multimedia resources are displayed in the multimedia resource application. The terminal can generate a behavior log of the account according to behavior data such as the condition of displaying the multimedia resource on the terminal and any operation of the multimedia resource by the user, wherein the behavior log of the account comprises the behavior data of the account. The terminal can also send the behavior logs of the account to the server, and the server merges the received behavior logs of the plurality of accounts into a total behavior log which comprises the behavior data of the plurality of accounts.

Optionally, the process of generating the behavior log can also be executed by the server, that is, the terminal can synchronize behavior data, such as a condition that the multimedia resource is displayed on the terminal and any operation performed on the multimedia resource by the user, to the server, and the server generates the behavior log according to the behavior data of the plurality of accounts sent by the plurality of terminals.

For the operation of the user on the multimedia resource, the user can perform feedback operation on the multimedia resource, if the user performs feedback operation on a certain multimedia resource, it is determined that a feedback relationship exists between the account of the user and the multimedia resource, and the terminal can record the feedback relationship in an action log, or the terminal sends the feedback relationship to a server and the server records the feedback relationship. Optionally, the feedback operation is at least one of a comment operation, a reply operation, and a share operation.

In some embodiments, the account can be uniquely identified by an account identifier in the behavior log, and the multimedia resource identifier can be uniquely identified by a multimedia resource identifier, and accordingly, the behavior log includes account identifiers of a plurality of accounts, the multimedia resource identifiers of the multimedia resources displayed by the plurality of accounts, and whether each account performs a feedback operation on each multimedia resource, that is, whether each account and each multimedia resource have a feedback relationship.

For the multimedia resource, the multimedia resource is any one of a character, an image, a video and a short video, and the embodiment of the disclosure does not limit the type of the multimedia resource.

For example, in one particular example, the multimedia asset is a video that includes a plurality of image frames that are recorded as photo at the time of recording. The behavior log may record behavior data in the form of < user, photo, click >. Wherein, user is the account identification, photo is the multimedia resource identification, and click is the feedback relation. Optionally, the click is 1, which is used to indicate that a feedback relationship exists between the account corresponding to the account identifier and the multimedia resource corresponding to the multimedia resource identifier, that is, a feedback relationship exists between the account identifier and the multimedia resource identifier, that is, the account corresponding to the account identifier performs a feedback operation on the multimedia resource corresponding to the multimedia resource identifier. The click is 0, which is used to indicate that the account corresponding to the account id does not have a feedback relationship with the multimedia resource corresponding to the multimedia resource id, that is, the account id does not have a feedback relationship with the multimedia resource id, that is, the account corresponding to the account id does not perform a feedback operation on the multimedia resource corresponding to the multimedia resource id.

For the computer device, the computer device is a terminal, or the computer device is a server, which is not limited in this disclosure.

In some embodiments, if the computer device is a server, a behavior log can be generated by the server, which can retrieve the behavior log from a local store. The behavior log can also be stored in other servers or databases from which the server can retrieve the behavior log.

In other embodiments, if the computer device is a terminal, the terminal can obtain the behavior log from the server and continue to perform the subsequent multimedia resource classification step.

In one possible implementation, the behavior log is a full behavior log, or the behavior log is a partial behavior log. In some embodiments, the behavior log is a behavior log over a target time period. In this step S41, the computer device acquires a behavior log in the target time period.

Optionally, the target time period is set by the relevant technician on demand. Optionally, the target time period is determined by the server, for example, a period is set in the server, and the server can obtain a behavior log generated in a last period according to the period, where the last period is the target time period. After the current period is finished, the server enters the next period, and the behavior log generated in the current period can be obtained again. And after the server acquires the behavior log of each period, executing subsequent steps and determining a classification result for the related multimedia resources in the period.

By setting the target time period and processing the multimedia resources at intervals, the data volume and the calculation volume of the multimedia resource processing can be reduced, the calculation complexity is reduced, and the processing efficiency is improved on the basis of ensuring the timeliness of the multimedia resource processing.

In step S42, in response to that any account has a feedback relationship with any multimedia resource, the computer device determines behavior data corresponding to the account and the multimedia resource as first target behavior data, where the first target behavior data includes a plurality of account ids, a plurality of multimedia resource ids, and a feedback relationship between an account corresponding to any account id and a multimedia resource corresponding to any multimedia resource id.

The first target behavior data is partial target behavior data. After the computer equipment acquires the behavior log, the computer equipment can analyze the behavior data in the behavior log and determine the behavior data with reference significance for multimedia resource classification as target behavior data. The target behavior data is a data basis for subsequent multimedia resource classification, and based on the target behavior data, a classification result of the multimedia resource can be determined according to the feedback condition of the user to the multimedia resource.

It can be understood that the user's preference is generally clear, the user will tend to show the same or similar multimedia resources, perform a feedback operation on the same or similar multimedia resources, and perform a feedback operation on two multimedia resources, the contents of the two multimedia resources may be the same or similar, and thus, the classification results may be the same.

Through the step S41 and the step S42, data in the behavior log are screened, behavior data which have reference significance for multimedia resource classification are screened, redundant data in the behavior log are removed, subsequent calculation amount is reduced, and classification accuracy cannot be influenced by the redundant data.

The step S42 is a process of acquiring, by the computer device, the account and the behavior data corresponding to the multimedia resource, which have a feedback relationship, from the behavior log, where the behavior log indicates whether there is a feedback relationship between the account and the multimedia resource, and what has a reference meaning for classifying the multimedia resource is the account and the multimedia resource that have a feedback relationship. If a record in the behavior log shows a multimedia resource for an account and the account has a feedback relationship with the multimedia resource, that is, the user of the account performs a feedback operation on the multimedia resource, the record can be used as the first target behavior data. If a record in the behavior log shows a multimedia resource for an account but has no feedback relationship with the multimedia resource, that is, the user of the account does not perform a feedback operation on the multimedia resource, the record may not be used as the first target behavior data.

The feedback operation performed by the user on the multimedia resource includes various operations, for example, optionally, a approval operation, a comment operation, a reply operation, a sharing operation, and the like, and the feedback relationship also includes various operations. When multimedia resources are classified, a certain specific feedback relation can be selected according to requirements to determine target behavior data.

In some embodiments, in response to any account having a target feedback relationship with any multimedia resource, the computer device determines behavior data corresponding to the account and the multimedia resource as the first target behavior data. For example, the target feedback relationship is used for indicating that the user of the account performs an operation like approval on the multimedia resource. Of course, other feedback relationships can be selected, and the embodiment of the present disclosure does not limit this.

In other embodiments, the computer device may also use both the account and the behavior data corresponding to the multimedia resource with the feedback relationship as the first target behavior data without distinguishing the feedback relationship, and the embodiment of the present disclosure does not limit what implementation manner is specifically adopted.

The account and the multimedia resource in the first target behavior data are uniquely identified by an account identifier and a multimedia resource identifier respectively, and correspondingly, the first target behavior data comprise a plurality of account identifiers, a plurality of multimedia resource identifiers and feedback relations between the accounts corresponding to any account identifier and the multimedia resource corresponding to any multimedia resource identifier.

In step S43, the computer device obtains a plurality of multimedia resources corresponding to the plurality of multimedia resource identifiers according to the plurality of multimedia resource identifiers.

After determining which multimedia resources are involved in the first target behavior data, the computer device can further analyze which multimedia resources in the multimedia resources can carry accurate target classification results, and the multimedia resources are also the first multimedia resources. And further analyzing which multimedia resources need to determine the classification result according to the multimedia resource processing method provided by the present disclosure, where the multimedia resources are also the second multimedia resources, that is, the computer device can also determine second target behavior data, and the target behavior data includes the first target behavior data and the second target behavior data. Furthermore, the computer device can determine a classification result of a second multimedia resource with an unknown classification result according to the feedback relationship between the account and the multimedia resource according to the first multimedia resource with a known classification result.

Optionally, when the multimedia resource is divided, the multimedia resource can be obtained according to the multimedia resource identifier, the multimedia resource is classified, and the classification result with higher prediction accuracy is determined as the target classification result of the part of multimedia resource.

In some embodiments, the multimedia asset is stored in the computer device, or in another computer device or database. Accordingly, in step S43, the computer device extracts the multimedia resource corresponding to the multimedia resource identifier from the local storage, or the computer device obtains the multimedia resource corresponding to the multimedia resource identifier from another computer device or a database.

In step S44, the computer device extracts features of the multimedia resources and classifies the multimedia resources based on the extracted features to obtain a classification result of each multimedia resource, where the classification result is used to indicate a predicted multimedia resource label corresponding to each multimedia resource and a probability value that the label of each multimedia resource is the predicted multimedia resource label.

After the computer device obtains the plurality of multimedia resources, the plurality of multimedia resources can be classified. Optionally, each multimedia resource may correspond to a plurality of candidate multimedia resource tags, the classification result of each multimedia resource includes a probability value that the multimedia resource tag is the candidate multimedia resource tag, and the computer device can determine the candidate multimedia resource tag with the probability value greater than the target threshold as the predicted multimedia resource tag. The number of the predicted multimedia resource tags is one or more, which is not limited in the embodiments of the present disclosure.

In some embodiments, the classification process is implemented by a multimedia resource classification model that can be trained based on a large number of sample multimedia resources carrying target multimedia resource labels.

In step S45, the computer device determines the multimedia resource with the probability value greater than the threshold value as the first multimedia resource, determines the target classification result corresponding to the first multimedia resource according to the classification result of the first multimedia resource, and determines the target classification result corresponding to the first multimedia resource as the second target behavior data.

In the embodiment of the present disclosure, a probability value threshold is set, where the probability value threshold is used to measure the accuracy of prediction, where a probability value of predicting a multimedia resource tag in a classification result of one multimedia resource is greater than the probability value threshold, which indicates that the classification result obtained by classifying the multimedia resource is more accurate, and a probability value of predicting a multimedia resource tag in a classification result of another multimedia resource is less than or equal to the probability value threshold, which indicates that the accuracy of the classification result obtained by classifying the another multimedia resource is not very accurate. Based on the above, the multimedia resource with more accurate classification is determined as the first multimedia resource, and the classification result of the first multimedia resource is used as the target classification result of the first multimedia resource.

And regarding the multimedia resources with inaccurate classification as the second multimedia resources, namely the multimedia resources needing to be classified continuously. The target classification result of the first multimedia resource is used as a data basis for the classification of the second multimedia resource. That is, in the step S45, the computer device further determines a multimedia resource other than the first multimedia resource in the plurality of multimedia resources as a second multimedia resource, where the second multimedia resource is a multimedia resource for which a classification result is to be determined.

The probability value threshold has a plurality of determinations, and in a first possible embodiment, the probability value threshold is set by the person skilled in the art as required, for example, the probability value threshold is 90%.

In a second possible approach, the probability value threshold is determined by statistics of probability values in the classification of the plurality of multimedia resources.

In a third possible approach, the probability value threshold is determined by statistics of probability values in the classification of other multimedia assets.

In the second and third possible manners, the probability value threshold may be determined according to a statistical result of the probability values in the classification results of the multiple multimedia resources, where the multiple multimedia resources may be multiple multimedia resources acquired by the computer device this time, or multiple multimedia resources acquired by the computer device in a previous period or in another period. The probability value threshold is determined through statistical results, characteristics of the data are better met, the rule of general user behaviors can be more accurately embodied, and the determined sample data is more accurate.

Specifically, after the classification results of the multiple multimedia resources are determined, the computer device can also obtain the accuracy of each multimedia resource classification, obtain the corresponding relationship between the probability value and the accuracy of the multimedia resource classification, and determine the probability value corresponding to the target accuracy as the probability value threshold.

In a specific possible embodiment, the computer device uses the probability value in the classification result of each multimedia resource as an abscissa, uses the accuracy for classifying each multimedia resource as an ordinate, fits to obtain a relationship curve of the probability value and the accuracy, and uses the abscissa of a point on the relationship curve, of which the ordinate is the target accuracy, as the probability value threshold.

For example, in one specific example, the multimedia resource classification model is denoted as model_clfWill act as a log<user,photo,click>Each photo (e.g., photo)_iI for representing multimedia resources photo), using a model_clfPredicting to obtain the corresponding label and probability of the multimedia resource, namely<photo_i,lavel_i,prob_i>. According to the prior statistical analysis, a probability prob corresponding to the time with higher accuracy is obtained_thresIntroduction of prob_thresAs a probability value threshold. Selection of a protein satisfying prob_i>prob_thresForm a data set which can be referred to as a meta tag data set, i.e. a data set of the first multimedia asset and the corresponding tags, the meta tag data set data_setThe following were used:

wherein, U is union, I is the identification of multimedia resource photo and label, and I is the symbol conforming to prob_i>prob_thresThe number of photo in (c). Assuming that the number of photo multimedia resources in the behavior log is N, in general I<<N, I and N are positive integers.

It should be noted that, in the processes from step S43 to step S45, the first multimedia resource and the second multimedia resource in the plurality of multimedia resources and the target classification result corresponding to the first multimedia resource are determined, and in the above process, the data are determined according to the classification result of the multimedia resource classification model. The method can automatically determine the sample data and the data to be processed without manual marking, and can efficiently determine the sample data and the data to be processed without subjective factors, so that the classification result has objectivity and higher accuracy.

In another possible implementation manner, the first multimedia resource, the second multimedia resource, and the target classification result corresponding to the first multimedia resource can also be stored in the computer device or other computer devices, and the computer device may directly obtain these data from the computer device or other computer devices, and use them as the target behavior data. For example, if some multimedia assets of the plurality of multimedia assets have determined the classification result during the production, distribution or last processing of the multimedia assets, the multimedia assets can be determined as a first multimedia asset, and other multimedia assets without classification result can be determined as a second multimedia asset.

The steps S41 to S45 are processes of obtaining target behavior data, where the target behavior data is used to represent feedback relationships between a plurality of accounts and a plurality of multimedia resources, and a first multimedia resource in the plurality of multimedia resources corresponds to a target classification result. The target behavior data comprises the first target behavior data and the second target behavior data, the first multimedia resource corresponds to a target classification result in the target behavior data, the classification result of the second multimedia resource is unknown, and based on the target behavior data, the computer device can further guide how to classify the multimedia resources based on the multimedia resources with known classification results and classify the second multimedia resource with unknown classification results according to the feedback relationship between the account number and the multimedia resources.

In step S46, the computer device generates graph data according to the target behavior data, where an account node corresponding to any account in the graph data is connected to a multimedia resource node corresponding to a target multimedia resource, and the target multimedia resource has the feedback relationship with the any account.

The account and the multimedia resource related to the target behavior data are used as nodes in the graph data, and the edges are used for connecting the nodes of the account and the multimedia resource with a feedback relationship. In this disclosure, the graph data includes account nodes corresponding to the plurality of accounts, multimedia resource nodes corresponding to the plurality of multimedia resources, and edges connecting the account nodes having the feedback relationship with the multimedia resource nodes, and a first multimedia resource node corresponding to the first multimedia resource identifier corresponds to a target classification result.

Correspondingly, in step S46, the computer device maps the accounts represented by the target behavior data to account nodes in the graph data, maps the multimedia resources represented by the target behavior data to multimedia resource nodes in the graph data, and connects the account node corresponding to the account having the feedback relationship with the multimedia resource node corresponding to the multimedia resource to obtain an edge in the graph data.

In step S46, the connection relationship between the nodes in the graph data can visually and clearly represent the feedback relationship between the multiple accounts and the multiple multimedia resources by composing the graph through the target behavior data, so that the connection relationship between the nodes provides an overall understanding of the user behavior.

In some embodiments, the account and the multimedia resource in the target behavior data are uniquely identified by an account identifier and a multimedia resource identifier, respectively, that is, the graph data includes account nodes corresponding to the multiple account identifiers, multimedia resource nodes corresponding to the multiple multimedia resource identifiers, and edges connecting the account nodes and the multimedia resource nodes having the feedback relationship, where a first multimedia resource node corresponding to a first multimedia resource identifier corresponds to a target classification result.

Optionally, the computer device is also capable of setting the weight of the edge to 1. In the embodiment of the present disclosure, the edge is used to indicate that there is a feedback relationship between two connected nodes, and the weight of the edge is set to 1, so that the calculation amount can be reduced and the classification efficiency can be improved on the premise of ensuring the accuracy of the classification result. For example, taking a multimedia resource as a video, for an account i of a user,and a video j, a feedback behavior (e.g. click behavior, i.e. the user clicks on the video) can be constructed if it occurs<user_i|photo_i>The edge of (2). Wherein the video with the label (the first multimedia asset with the object classification result) is recorded as photo_j ^label. By analogy, a graph-based adjacency matrix may be formed for all users' accounts and videos. The adjacency matrix is used for representing all edges between the account node and the multimedia resource node in the graph data.

The form of the graph data includes a variety, and in some embodiments, the graph data includes a set of nodes and a set of edges in the graph data, e.g., the graph data is defined as G ═ V, E. Wherein G represents graph data, V represents a set of nodes in the graph data, and E represents a set of edges in the graph data.

Alternatively, the set of edges can be represented by an adjacency matrix, for example, assuming that the total number of account nodes and multimedia resource nodes is N, each bit element in the adjacency matrix is used to indicate whether a node is connected or not. The adjacency matrix a is an N × N-dimensional matrix, the abscissa and ordinate of the matrix are N nodes, for example, the element Aij corresponding to the abscissa i and the ordinate j is used to indicate whether the node i and the node j are connected, and if so, Aij is 1; if not, Aij is 0.

Optionally, the set of edges can be represented in an array manner, for example, in the set of edges, the data representation manner of each edge is as follows: the edge connecting the account number node i and the multimedia resource node j is<user_i|photo_j>。

In some embodiments, the graph data includes a set corresponding to each node in the graph data, for example, for a certain node, the data of the edge related to the node can be formed into an edge set corresponding to the node, and the edge sets of a plurality of nodes are taken as the graph data. Optionally, the edge related to the node at least includes an edge connected to the node, and optionally, the edge related to the node also includes an edge connected to another node connected to the node. Of course, the edge related to the node can also include other edges, for example, an edge in a certain direction of the node, and the setting of the edge corresponding to the node can be set by a related technician as required, and it can be understood that the greater the number of edges corresponding to the node, the higher the accuracy of multimedia resource classification based on such graph data, and the greater the calculation amount.

In step S47, the computer device inputs the graph data into a graph neural network, and trains the graph neural network based on the target classification result of the first multimedia resource node in the graph data.

The related data of the first multimedia resource node in the graph data is training sample data, and the network parameters of the graph neural network can be trained by the training sample data, so that the graph neural network learns that the characteristics of the multimedia resource node are accurately expressed and then accurately classified, and an accurate classification result is obtained. The graph neural network captures the dependency relationship in the graph by means of information transfer among the nodes in the graph, has strong characterization capability, adopts the graph neural network to analyze graph data to classify multimedia resources, can extract accurate feature representation, and further improves the accuracy of classification results.

In particular, the graph neural network is capable of graph convolution operations to enable feature extraction and classification. In step S47, after the computer device acquires the graph data, the graph data may be input into a graph neural network, and the graph neural network performs graph convolution on the graph data, so as to implement the steps of feature extraction and classification, and determine the prediction classification result of each multimedia resource node. The first multimedia resource node of the first multimedia resource corresponds to the target classification result, so that the network parameters of the graph neural network are adjusted based on the prediction classification result and the target classification result of the first multimedia resource node, and the accuracy of the graph neural network for processing the graph data is improved.

In one possible implementation, the graph neural network is a graph convolution network that includes at least one graph convolution layer, each graph convolution layer capable of graph convolution on input data and outputting the result to the next graph convolution layer. In summary, the graph convolution process can be implemented by the following formula:

H^L+1＝f(H^L，A)

h denotes the result of the graph convolution processing, L denotes the number of layers of the graph convolution layer, H^LShows the output of the previous graph convolution layer, H^L+1As a result of the output of the current map convolution layer, a represents an adjacency matrix representing edge data in the map data. H in the first graph convolution layer of graph convolution network^LNode data X representing graph data may be replaced. The implementation of the function f in different types of graph convolution networks may differ, e.g., the function f is first-to-H^LMultiplying the weight parameter matrix of the current graph convolution layer by the adjacent matrix A and then calculating the product by activating the function, for example, the function f is firstly compared with the function H^LMultiplying the weight parameter matrix of the current graph convolution layer by the Laplace matrix corresponding to the adjacent matrix A, and then calculating the product through an activation function. The activation function is a relu function or other activation function.

For the graph neural network, the graph neural network includes various kinds, for example, a spectrum-based graph neural network, a space-based graph neural network, and accordingly, graph convolution operations are also different. For example, based on a spectral graph neural network, the mathematical representation of the graph data is a regularized graph laplacian matrix L:

wherein A is the adjacent matrix of the graph, which is the edge data in the graph data, D is the diagonal matrix and D is the edge data_ii＝∑_j(A_i，j)。

Graph data is decomposed into L-UAU^TThe graph convolution process of the graph neural network for the node data X in the graph data is as follows: and point multiplication of a first result obtained by carrying out Fourier transform on the X and a second result obtained by carrying out Fourier transform on a filter function of the filter is obtained, and then Fourier inverse transform is carried out on the point multiplication. Wherein, X can be a feature vector formed by each node in the graph, X_iRepresenting the ith node.

As another example, a space-based graph neural network, each node's neighborhood is 8 nodes around it, through a window of a target size (e.g., 3 x 3). The positions of these eight nodes represent the order of the neighbors of a node. The features are convolved by applying a filter to the window by weighted averaging the features of the center node and its neighbors on each channel. Through the process, the feature representation of the central node and the feature representations of the adjacent nodes can be aggregated based on the graph convolution of the space to obtain a new feature representation of the central node.

In the step S47, the computer device performs feature extraction on the first multimedia resource node according to the first data corresponding to the connection relationship based on the graph neural network, performs classification based on the extracted first feature to obtain a predicted classification result of the first multimedia resource node, obtains prediction accuracy according to the predicted classification result and a target classification result of the first multimedia resource node, and adjusts the network parameter of the graph neural network according to the prediction accuracy until a target condition is met.

Through the strong characterization capability of the graph neural network, data related to the classified first multimedia resource nodes can be accurately extracted from various data to be processed, accurate feature representation is obtained, the classification result is accurate, the graph neural network is trained based on the graph neural network, the trained network parameters are accurate, the feature extraction and classification capability of the graph neural network can be further improved, and accurate classification results can be obtained when the graph neural network is subsequently used for classifying second multimedia resource nodes.

The first multimedia resource node takes the first data corresponding to the connection relation as data on at least one target path with the first multimedia resource node as a starting point, the number of nodes on the target path is a first target threshold value, and the number of nodes on the target path is a second target threshold value.

The first target threshold and the second target threshold are set by a relevant technician according to a requirement, for example, if the first target threshold is 2 and the second target threshold is 1, the data on at least one target path with the first multimedia resource node as a starting point includes data of an edge connected to the first multimedia resource node. For another example, if the first target threshold is 3 and the second target threshold is 2, the data on the at least one target path starting from the first multimedia resource node includes data of an edge connected to the first multimedia resource node and data of an edge connected to the first account node. The first account node is an account node connected with the first multimedia resource node.

For the network parameters, the network parameters at least comprise a weight matrix of the graph neural network, the feature expression obtained by feature extraction of the same graph data by the graph neural network can be changed by changing the network parameters, and the adjusted network parameters can enable the feature expression extracted by the graph neural network to be more accurate, so that the classification result is more accurate.

In one possible implementation, the setting of the first data to the graph neural network is a setting of a size of a convolution kernel of the graph neural network.

For the process of extracting features by graph convolution, in one particular example, the computer device can be enabled with the formula h_v＝f(X_v，X_co[v]，h_ne[v]，X_ne[v]) Graph data is subjected to graph convolution. Wherein h is_vIs a feature vector representation of node v, X_vBeing characteristic of node v, X_co[v]Is a characteristic of the adjacent edge of node v, h_ne[v]Is a feature vector representation of nodes adjacent to node v. It should be noted that the feature of the node or the edge adjacent to the node v in the above-mentioned argument "adjacent" does not limit the direct connection with the node v, but can also indicate the node or the edge indirectly connected with the node v through another node or edge, and the specific argument includes which amount is set by the relevant technical person according to the requirement, that is, the above-mentioned first data is set by the relevant technical person according to the requirement. The same applies to the target data in step S48, which will not be described in detail herein.

After extracting the features, the computer device can classify the extracted features, for example, by the following formula:

O_v＝g(h_v，x_v)

wherein h is_vIs a feature vector representation of node v, X_vIs a characteristic of node v, o_vIs the output of the node v, i.e. the classification result of the node v.

Optionally, the target condition is set by a person skilled in the relevant art according to a requirement, for example, the target condition is convergence of prediction accuracy, or the target condition is that the number of iterations reaches a target number, which is not limited by the embodiment of the disclosure.

In some embodiments, the prediction accuracy is expressed by a loss value, and the obtaining process of the prediction accuracy can be implemented by using a target loss function, and specifically, the computer device obtains a value of the target loss function according to the prediction classification result and the target classification result of the first multimedia resource node, where the value is the loss value.

The target loss function is any loss function, in a specific possible embodiment, the target loss function may be a cross-entropy loss function, and the obtaining process of the prediction accuracy may be implemented by the following formula:

where loss is the prediction accuracy and entrypy () is the cross entropy loss function. label_vAnd the target classification result is used for indicating a target label of the multimedia resource. o_vTo predict the classification result. v being the identity of the node, photo_j ^labelSigma is the accumulated symbol for the first multimedia resource node.

The above is an example of the prediction accuracy being expressed by a loss value, and the prediction accuracy can also be expressed by other manners, such as a reward value, and the like, which is not limited by the embodiment of the present disclosure.

It should be noted that the computer device also performs the similar feature extraction and classification steps on the second multimedia resource node according to the neural network of the graph, except that the second multimedia resource node does not have a corresponding target classification result, and therefore does not participate in the acquisition process of the prediction accuracy.

In step S48, the computer device performs feature extraction on the target data corresponding to the connection relationship of the second multimedia resource node in the graph data based on the trained graph neural network, performs classification based on the extracted features, and outputs a classification result of the second multimedia resource node.

After the graph neural network is trained, if the classification accuracy of the graph neural network is relatively good, the second multimedia resource node with unknown classification result in the graph data can be classified, and the classification process of the graph neural network on the second multimedia resource node is the same as the classification process on the first multimedia resource node in the step S47, which is not described in detail herein.

The graph neural network has strong characterization capability, after training is carried out on the basis of data related to first multimedia resource nodes, the feature extraction and classification capability is greatly improved, when the graph neural network is used for classifying second multimedia resource nodes, accurate classification results can be obtained, the novel multimedia resource processing mode does not need manual marking to increase training samples, classification of all multimedia resources can be completed through existing data, and the classification efficiency is high.

In a possible implementation manner, based on the trained graph neural network, feature extraction is performed on data on at least one target path in the graph data, where the second multimedia resource node is a starting point, the number of nodes on the target path is a first target threshold, and the number of upper edges of the target path is a second target threshold. Similarly to the first data, the embodiment of the present disclosure does not limit what the target data specifically includes.

The above steps S47 and S48 are based on the target classification result of the first multimedia resource node corresponding to the first multimedia resource in the graph data and the connection relationship between the multimedia resource node and the account node in the graph data, and obtain the classification result of the second multimedia resource node corresponding to the second multimedia resource. In other embodiments, when the graph neural network is trained based on data of the first multimedia resource node, and the network parameter is determined in the last iteration process, the graph neural network processes the first multimedia resource node and the second multimedia resource node to obtain a predicted classification result, which is an accurate classification result, and the graph neural network outputs the predicted classification result of the second multimedia resource node as a classification result of the second multimedia resource node. That is, when the training is completed, the classification results of all multimedia resource nodes are determined. In the above process, only the process of processing the graph data is implemented by the graph neural network, and the process can also be implemented by directly invoking a target algorithm by a computer device, which is not limited in the embodiment of the present disclosure.

The multimedia resources can be efficiently classified through the embodiment of the disclosure, so that the classified multimedia resources are increased, and the classified multimedia resources can also be used in other application scenarios, for example, for model training and the like. The data of model training is increased, the accuracy of model classification can be improved, the number of known classified multimedia resources is increased for all multimedia resources, and the recall rate can be improved in general. For example, the multimedia resource classification is to add multimedia resource labels to multimedia resources, and the multimedia resource labels can be efficiently and quickly added to a large number of multimedia resources in the manner, so that the coverage of the labels is increased, and the accuracy and the recall rate of multimedia resource label determination can be improved at the same time.

Fig. 5 is a block diagram illustrating a multimedia asset processing device according to an example embodiment. Referring to fig. 5, the apparatus includes:

an obtaining unit 501, configured to perform obtaining target behavior data, where the target behavior data is used to represent feedback relationships between multiple accounts and multiple multimedia resources, and a first multimedia resource in the multiple multimedia resources corresponds to a target classification result;

a generating unit 502 configured to execute generating graph data according to the target behavior data, where an account node corresponding to any account in the graph data is connected with a multimedia resource node corresponding to a target multimedia resource, and the target multimedia resource has the feedback relationship with the any account;

the processing unit 503 is configured to execute a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and the account node in the graph data, and obtain a classification result of a second multimedia resource node corresponding to a second multimedia resource, where the second multimedia resource is a multimedia resource other than the first multimedia resource in the plurality of multimedia resources.

Optionally, the generating unit 502 is configured to perform:

Optionally, the processing unit 503 includes a training subunit and a classification subunit;

the training subunit is configured to perform inputting the graph data into a graph neural network, and train the graph neural network based on a target classification result of the first multimedia resource node in the graph data;

Optionally, the training subunit is configured to perform:

Optionally, the obtaining unit 501 is configured to perform:

the obtaining unit 501 is further configured to perform:

Optionally, the behavior log is a behavior log in a target time period.

The device provided by the embodiment of the disclosure adopts a novel multimedia resource processing mode, considers that a user tends to feed back the same or similar multimedia resources, breaks through the data bottleneck by introducing an information source of behavior data, not limited to the existing sample data, can represent the feedback relationship between each account and each multimedia resource, wherein part of the multimedia resources in the behavior data are multimedia resources with known classification results, and converts the behavior data into graph data by sorting the behavior data, the accounts and the multimedia resources are mapped into nodes in the graph, the connection relationship between the nodes can vividly and clearly embody the feedback relationship between a plurality of accounts and a plurality of multimedia resources, and further the connection relationship between the nodes has the integral understanding of user behaviors, and can be based on the multimedia resources with known classification results, the method has the advantages that the multimedia resources with unknown classification results are accurately classified, the data bottleneck is broken through, training samples are not required to be added through manual labeling, all the multimedia resources can be classified through the existing data, and the classification efficiency is high.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Alternatively, the computer device is a terminal shown in fig. 6 described below. Alternatively, the computer device is a server shown in fig. 7 described below.

Fig. 6 is a block diagram illustrating a structure of a terminal according to an exemplary embodiment. Optionally, the device types of the terminal 600 include: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: a processor 601 and a memory 602.

Optionally, processor 601 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. Alternatively, the processor 601 is implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), PLA (Programmable Logic Array). In some embodiments, processor 601 includes a main processor and a coprocessor, the main processor is a processor for Processing data in the wake state, also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 is integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 601 further includes an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

In some embodiments, memory 602 includes one or more computer-readable storage media, which are optionally non-transitory. Optionally, memory 602 also includes high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 602 is used to store at least one program code for execution by the processor 601 to implement the multimedia asset processing methods provided by the various embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 can be connected by bus or signal lines. Each peripheral can be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera assembly 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 are implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Optionally, the radio frequency circuit 604 communicates with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 further includes NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). Optionally, the UI includes graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal can be input to the processor 601 as a control signal for processing. Optionally, the display 605 is also used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 is one, providing the front panel of the terminal 600; in other embodiments, the display 605 is at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 is a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more optionally, the display 605 is arranged in a non-rectangular irregular figure, i.e. a shaped screen. Alternatively, the Display 605 is made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 also includes a flash. Optionally, the flash is a monochrome temperature flash, or a bi-color temperature flash. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and is used for light compensation under different color temperatures.

In some embodiments, audio circuitry 607 includes a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones are respectively disposed at different positions of the terminal 600. Optionally, the microphone is an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. Alternatively, the speaker is a conventional membrane speaker, or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to human, but also the electric signal can be converted into a sound wave inaudible to human for use in distance measurement or the like. In some embodiments, audio circuitry 607 also includes a headphone jack.

The positioning component 608 is used for positioning the current geographic Location of the terminal 600 to implement navigation or LBS (Location Based Service). Alternatively, the Positioning component 608 is a Positioning component based on a GPS (Global Positioning System) in the united states, a beidou System in china, a graves System in russia, or a galileo System in the european union.

Power supply 609 is used to provide power to the various components in terminal 600. Optionally, the power supply 609 is an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 609 includes a rechargeable battery, the rechargeable battery supports wired charging or wireless charging. The rechargeable battery is also used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

In some embodiments, the acceleration sensor 611 detects acceleration magnitudes on three coordinate axes of a coordinate system established with the terminal 600. For example, the acceleration sensor 611 is used to detect components of the gravitational acceleration in three coordinate axes. Optionally, the processor 601 controls the touch display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 is also used for acquisition of motion data of a game or a user.

In some embodiments, the gyro sensor 612 detects a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 cooperate to acquire a 3D motion of the terminal 600 by the user. The processor 601 implements the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Optionally, pressure sensors 613 are disposed on the side bezel of terminal 600 and/or on the lower layer of touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs right-left hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Optionally, the fingerprint sensor 614 is provided on the front, back or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 can be integrated with the physical button or vendor Logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 controls the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, processor 601 also dynamically adjusts the shooting parameters of camera assembly 606 based on the ambient light intensity collected by optical sensor 615.

A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually becomes larger, the processor 601 controls the touch display 605 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 does not constitute a limitation of terminal 600, and can include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

Fig. 7 is a schematic structural diagram illustrating a server 700 according to an exemplary embodiment, where the server 700 may generate a relatively large difference due to different configurations or performances, and the server 700 includes one or more processors (CPUs) 701 and one or more memories 702, where at least one program code is stored in the memory 702, and the at least one program code is loaded and executed by the processors 701 to implement the multimedia resource Processing method provided in the foregoing embodiments. Optionally, the server 700 further has components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 700 further includes other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a storage medium comprising program code, such as a memory comprising program code, executable by one or more processors of a computer device to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes one or more program codes executable by a processor of a computer device to perform the multimedia resource processing method provided by the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for processing multimedia resources, comprising:

obtaining a classification result of a second multimedia resource node corresponding to a second multimedia resource based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and an account node in the graph data, wherein the second multimedia resource is a multimedia resource except the first multimedia resource in the plurality of multimedia resources;

the acquiring of the target behavior data comprises:

acquiring a behavior log, wherein the behavior log comprises behavior data of a plurality of account numbers, and the behavior data of each account number is used for indicating whether the account number has a feedback relationship with a plurality of multimedia resources; in response to the fact that any account number has a feedback relation with any multimedia resource, determining behavior data corresponding to the account number and the multimedia resource as first target behavior data;

the first target behavior data comprise a plurality of account identifications, a plurality of multimedia resource identifications and feedback relations between the accounts corresponding to any account identification and the multimedia resources corresponding to any multimedia resource identification; acquiring a plurality of multimedia resources corresponding to the multimedia resource identifications according to the multimedia resource identifications; extracting features of the multimedia resources, classifying the multimedia resources based on the extracted features to obtain a classification result of each multimedia resource, wherein the classification result is used for indicating a predicted multimedia resource label corresponding to each multimedia resource and a probability value of the predicted multimedia resource label of each multimedia resource; and determining the multimedia resource of which the probability value is greater than a probability value threshold value in the classification result as a first multimedia resource, determining the classification result of the first multimedia resource as a target classification result corresponding to the first multimedia resource, and determining the target classification result corresponding to the first multimedia resource as second target behavior data.

2. The method of claim 1, wherein the generating graph data from the target behavior data comprises:

3. The method according to claim 1, wherein the obtaining a classification result of a second multimedia resource node corresponding to a second multimedia resource based on a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and an account node in the graph data comprises:

4. The method according to claim 3, wherein the training the graph neural network based on the target classification result of the first multimedia resource node in the graph data comprises:

5. The method according to claim 3, wherein the performing feature extraction on the target data corresponding to the connection relationship by the second multimedia resource node in the graph data based on the trained graph neural network comprises:

6. The method as claimed in claim 1, wherein the probability value threshold is set by a technician according to requirements, or the probability value threshold is determined by statistics of probability values in the classification results of the multimedia resources, or the probability value threshold is determined by statistics of probability values in the classification results of other multimedia resources.

7. The method of claim 1, wherein the behavior log is a behavior log in a target time period.

8. A multimedia resource processing apparatus, comprising:

a processing unit, configured to execute a target classification result of a first multimedia resource node corresponding to the first multimedia resource in the graph data and a connection relationship between the multimedia resource node and an account node in the graph data, to obtain a classification result of a second multimedia resource node corresponding to a second multimedia resource, where the second multimedia resource is a multimedia resource other than the first multimedia resource in the plurality of multimedia resources;

the obtaining unit is configured to perform obtaining of a behavior log, where the behavior log includes behavior data of a plurality of accounts, and the behavior data of each account is used to indicate whether the account has a feedback relationship with a plurality of multimedia resources; in response to the fact that any account number has a feedback relation with any multimedia resource, determining behavior data corresponding to the account number and the multimedia resource as first target behavior data; the first target behavior data comprise a plurality of account identifications, a plurality of multimedia resource identifications and feedback relations between the accounts corresponding to any account identification and the multimedia resources corresponding to any multimedia resource identification; acquiring a plurality of multimedia resources corresponding to the multimedia resource identifications according to the multimedia resource identifications; extracting features of the multimedia resources, classifying the multimedia resources based on the extracted features to obtain a classification result of each multimedia resource, wherein the classification result is used for indicating a predicted multimedia resource label corresponding to each multimedia resource and a probability value of the predicted multimedia resource label of each multimedia resource; and determining the multimedia resource of which the probability value is greater than a probability value threshold value in the classification result as a first multimedia resource, determining the classification result of the first multimedia resource as a target classification result corresponding to the first multimedia resource, and determining the target classification result corresponding to the first multimedia resource as second target behavior data.

9. The apparatus according to claim 8, wherein the generating unit is configured to perform:

10. The apparatus of claim 8, wherein the processing unit comprises a training subunit and a classification subunit;

11. The multimedia resource processing apparatus as claimed in claim 10, wherein the training subunit is configured to perform:

12. The apparatus according to claim 10, wherein the classification subunit is configured to perform feature extraction on data on at least one target path starting from the second multimedia resource node in the graph data based on the trained graph neural network, where the number of nodes on the target path is a first target threshold, and the number of edges on the target path is a second target threshold.

13. The multimedia asset processing device as claimed in claim 8, wherein the probability value threshold is set by a technician according to requirements, or the probability value threshold is determined by statistics of probability values in the classification results of the plurality of multimedia assets, or the probability value threshold is determined by statistics of probability values in the classification results of other plurality of multimedia assets.

14. The apparatus according to claim 8, wherein the behavior log is a behavior log in a target time period.

15. A computer device, comprising:

one or more processors;

wherein the one or more processors are configured to execute the program code to implement the multimedia asset processing method of any of claims 1 to 7.

16. A storage medium, wherein program code in the storage medium, when executed by one or more processors of a computer device, enables the computer device to perform the multimedia asset processing method of any of claims 1 to 7.

17. A computer program product, characterized in that it comprises one or more program codes which, when executed by one or more processors of a computer device, enable the computer device to carry out the multimedia resource processing method according to any one of claims 1 to 7.