CN114417030A

CN114417030A - Resource processing method, device, equipment and computer readable storage medium

Info

Publication number: CN114417030A
Application number: CN202210093667.5A
Authority: CN
Inventors: 黄剑辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-04-29

Abstract

The application discloses a resource processing method, a resource processing device, resource processing equipment and a computer-readable storage medium, belongs to the technical field of internet, and can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like. The method comprises the following steps: acquiring multimedia resources, wherein the multimedia resources comprise at least two sub-resources; determining a feature vector corresponding to each sub-resource, wherein the feature vector is used for representing the sub-resources; extracting the characteristics corresponding to the partial region from the characteristic vectors corresponding to the sub-resources respectively to obtain the activation vectors corresponding to the sub-resources respectively; and determining a target characteristic vector based on the characteristic vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, wherein the target characteristic vector is used for representing the multimedia resource and the target characteristic vector is used for recommending the resource. The target characteristic vector of the multimedia resource determined by the method has high accuracy and high matching degree with the multimedia resource.

Description

Resource processing method, device, equipment and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a resource processing method, a resource processing device, resource processing equipment and a computer readable storage medium.

Background

With the rapid development of internet technology, the number of multimedia resources such as audio resources, video resources, text resources and the like is increasing. In order to facilitate the management of the multimedia resource, a feature vector of the multimedia resource is generally determined, and a tag of the multimedia resource is determined based on the feature vector of the multimedia resource. In this view, the premise for determining the label of a multimedia asset is to determine a feature vector of the multimedia asset. Therefore, a resource processing method is needed to determine the feature vector of the multimedia resource.

In the related art, when a multimedia resource includes at least two sub-resources, the process of determining the feature vector of the multimedia resource is as follows: determining a feature vector corresponding to each sub-resource; and splicing the feature vectors corresponding to the sub-resources to obtain the feature vectors of the multimedia resources.

However, the accuracy of the feature vector determined in the resource processing method is low, and the matching degree with the multimedia resource is not high, so that the accuracy of the label of the multimedia resource determined based on the feature vector of the multimedia resource is low.

Disclosure of Invention

The embodiment of the application provides a resource processing method, a resource processing device and a computer-readable storage medium, which can be used for solving the problems that the accuracy of a feature vector of a multimedia resource determined in the related art is low and the matching degree of the feature vector with the multimedia resource is not high. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a resource processing method, where the method includes:

acquiring a multimedia resource, wherein the multimedia resource comprises at least two sub-resources;

determining a feature vector corresponding to each sub-resource, wherein the feature vector is used for representing the sub-resource;

extracting the features corresponding to partial areas from the feature vectors corresponding to the sub-resources respectively to obtain the activation vectors corresponding to the sub-resources respectively;

and determining a target feature vector based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, wherein the target feature vector is used for representing the multimedia resource and the target feature vector is used for recommending the resource.

In another aspect, an embodiment of the present application provides a resource processing apparatus, where the apparatus includes:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring multimedia resources which comprise at least two sub-resources;

the determining module is used for determining a feature vector corresponding to each sub-resource, and the feature vector is used for representing the sub-resources;

the extraction module is used for extracting the characteristics corresponding to the partial region from the characteristic vectors corresponding to the sub-resources respectively to obtain the activation vectors corresponding to the sub-resources respectively;

the determining module is further configured to determine a target feature vector based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, where the target feature vector is used to represent the multimedia resource and the target feature vector is used to recommend the resource.

In a possible implementation manner, the determining module is configured to determine an intermediate feature vector based on an activation vector corresponding to a first sub-resource and a feature vector corresponding to a second sub-resource, where the intermediate feature vector is a feature vector obtained after fusing the second sub-resource on the basis of the first sub-resource, the number of the intermediate feature vector is the same as the number of the sub-resources, the first sub-resource is any one of the sub-resources included in the multimedia resource, and the second sub-resource is a sub-resource other than the first sub-resource among the sub-resources included in the multimedia resource; determining the target feature vector based on the intermediate feature vector.

In a possible implementation manner, the determining module is configured to multiply the activation vector corresponding to the first sub-resource and the numerical value of the feature vector corresponding to the second sub-resource with the same vector dimension to obtain a reference feature vector; and adding the numerical values of the vector dimensions in the reference feature vector to obtain the intermediate feature vector.

In a possible implementation manner, the extraction module is configured to input a feature vector corresponding to a first sub-resource into a target activation model corresponding to the first sub-resource, where the target activation model is obtained by training an initial activation model through a sample sub-resource and an activation vector of the sample sub-resource, where the sample sub-resource is of the same resource type as the first sub-resource, and the first sub-resource is any one of sub-resources included in the multimedia resource; and obtaining an activation vector corresponding to the first sub-resource based on an output result of the target activation model corresponding to the first sub-resource.

In a possible implementation manner, the obtaining module is further configured to obtain tags of the multimedia resource, where the number of the tags of the multimedia resource is at least two;

the determining module is further configured to determine a feature vector corresponding to the tag of the multimedia resource; determining the matching degree of the label of the multimedia resource and the multimedia resource based on the feature vector corresponding to the label of the multimedia resource and the target feature vector; and taking the label of which the matching degree with the multimedia resource meets the matching requirement in the labels of the multimedia resources as the resource label of the multimedia resources.

In one possible implementation, the apparatus further includes:

and the storage module is used for correspondingly storing the multimedia resources and the resource labels of the multimedia resources.

In one possible implementation, the apparatus further includes:

the device comprises a receiving module, a recommending module and a recommending module, wherein the receiving module is used for receiving a content recommending request which carries a recommending label;

the determining module is further configured to determine a multimedia resource to be recommended based on the recommendation tag, the multimedia resource, and the resource tag of the multimedia resource;

and the recommending module is used for recommending the multimedia resource to be recommended.

In a possible implementation manner, the determining module is configured to input a first sub-resource into a target vector determination model corresponding to the first sub-resource, where the target vector determination model is obtained by training an initial vector determination model through a sample sub-resource and a feature vector corresponding to the sample sub-resource, where the sample sub-resource is of the same resource type as the first sub-resource, and the first sub-resource is any one of sub-resources included in the multimedia resource; and determining an output result of the model based on the target vector to obtain a feature vector corresponding to the first sub-resource.

In a possible implementation manner, the first sub-resource is a video resource, and the determining module is configured to intercept a video image in the video resource; and inputting the video image into a target vector determination model corresponding to the video resource.

In one possible implementation, the apparatus further includes:

the training module is used for acquiring the sample sub-resources, the activation vectors of the sample sub-resources and the initial activation model; inputting the sample sub-resources into the initial activation model to obtain first feature vectors of the sample sub-resources; determining a first loss value between an activation vector of the sample sub-resources and the first feature vector; in response to the first loss value not being greater than a first loss threshold, treating the initial activation model as the target activation model.

In a possible implementation manner, the training module is further configured to adjust the initial activation model in response to that the first loss value is greater than the first loss threshold, so as to obtain an adjusted activation model; inputting the sample sub-resources into the adjusted activation model to obtain a second feature vector of the sample sub-resources; determining a second loss value between the activation vector and the second feature vector for the sample sub-resource; in response to the second loss value not being greater than the first loss threshold, treating the adjusted activation model as the target activation model.

In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so that the electronic device implements any one of the resource processing methods described above.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to make a computer implement any of the above-mentioned resource processing methods.

In another aspect, a computer program or a computer program product is provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement any one of the above resource processing methods.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the target feature vector of the multimedia resource determined by the technical scheme provided by the embodiment of the application is the feature vector obtained after the features of each sub-resource included in the multimedia resource are fused, the accuracy of the determined target feature vector of the multimedia resource is higher, the matching degree with the multimedia resource is higher, and the multimedia resource can be better expressed. Due to the fact that the accuracy of the determined target characteristic vector of the multimedia resource is higher, when the resource label of the multimedia resource is determined based on the target characteristic vector of the multimedia resource, the accuracy of the determined resource label of the multimedia resource can be improved, and the multimedia resource can be recommended better.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a resource processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a resource processing method provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a process for determining a tag corresponding to a video resource according to an embodiment of the present application;

fig. 4 is a schematic diagram of a process for determining a tag corresponding to a text resource according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a resource processing procedure provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a resource processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a resource processing method according to an embodiment of the present application, and as shown in fig. 1, the implementation environment includes: a terminal device 101 and a server 102.

The resource processing method provided by the embodiment of the application can be independently realized by the terminal device 101, can also be independently realized by the server 102, and can also be realized by the interaction between the terminal device and the server 102.

The terminal device 101 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, a household appliance, a vehicle-mounted terminal, an aircraft, and the like. The terminal device 101 may be generally referred to as one of a plurality of terminal devices, and the present embodiment is only illustrated by the terminal device 101. Those skilled in the art will appreciate that the number of terminal devices 101 may be greater or fewer. For example, the number of the terminal device 101 may be only one, or the number of the terminal device 101 may be several tens or several hundreds, or more, and the number of the terminal devices and the device types are not limited in the embodiment of the present application.

The server 102 is a server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 102 and the terminal apparatus 101 are communicatively connected via a wired network or a wireless network. The server 102 has a data receiving function, a data processing function, and a data transmitting function. Of course, the server 102 may also have other functions, which are not limited in this embodiment.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, traffic guidance, driving assistance and the like.

Based on the foregoing implementation environment, the present embodiment provides a resource processing method, which may be executed by the terminal device 101 in fig. 1, taking a flowchart of the resource processing method provided in the present embodiment shown in fig. 2 as an example. As shown in fig. 2, the method comprises the steps of:

in step 201, a multimedia resource is obtained, where the multimedia resource includes at least two sub-resources.

In the exemplary embodiment of the present application, the multimedia resource may be any resource that needs to be classified, before the multimedia resource is classified, a feature vector of the multimedia resource needs to be determined, and then a resource tag of the multimedia resource is determined based on the feature vector of the multimedia resource, that is, the purpose of classifying the multimedia resource is achieved. The multimedia resources to be classified may be food resources, life resources, or other resources, which is not limited in the embodiments of the present application. The multimedia resource includes at least two sub-resources, and the sub-resources included in the multimedia resource may be at least two of a text resource, a video resource, and an audio resource, which is not limited in the embodiment of the present application.

In a possible implementation manner, the multimedia resource may be a multimedia resource uploaded by a user, may also be a multimedia resource acquired from the internet, and may also be a multimedia resource stored in a storage space of the terminal device.

In step 202, a feature vector corresponding to each of the sub-resources is determined, and the feature vector is used to characterize the sub-resources.

In a possible implementation manner, the process of determining the feature vectors corresponding to the respective sub-resources includes: inputting the first sub-resource into a target vector determination model corresponding to the first sub-resource, wherein the target vector determination model is obtained by training an initial vector determination model through a sample sub-resource and a feature vector corresponding to the sample sub-resource, the sample sub-resource is the same as the first sub-resource in resource type, and the first sub-resource is any one of the resources included in the multimedia resource. And determining an output result of the model based on the target vector to obtain a feature vector corresponding to the first sub-resource. The process of determining the feature vector corresponding to the other sub-resources included in the multimedia resource is similar to the process of determining the feature vector corresponding to the first sub-resource, and is not described herein again.

When the resource type of the first sub-resource is a video resource, the resource type of the sample sub-resource is also a video resource; when the resource type of the first sub-resource is a text resource, the resource type of the sample sub-resource is also a text resource; when the resource type of the first sub-resource is an audio resource, the resource type of the sample sub-resource is also an audio resource.

Optionally, when the first sub-resource is a text resource, the text resource is directly input into the target vector determination model corresponding to the text resource. And when the first sub-resource is responded as the audio resource, directly inputting the audio resource into the target vector determination model corresponding to the audio resource. And when the first sub-resource is a video resource, intercepting a video image in the video resource, and inputting the video image into a target vector determination model corresponding to the video resource.

Before the first sub-resource is input into the target vector determination model corresponding to the first sub-resource, the target vector determination model corresponding to the first sub-resource needs to be obtained through training. The process of training to obtain the target vector determination model corresponding to the first sub-resource comprises the following steps: obtaining sample sub-resources, sample vectors corresponding to the sample sub-resources and an initial vector determination model; and inputting the sample sub-resources into the initial vector determination model to obtain a third feature vector corresponding to the sample sub-resources. Determining a third loss value between the third feature vector and the sample vector corresponding to the sample sub-resource; and in response to the third loss value not being larger than the second loss threshold value, taking the initial vector determination model as a target vector determination model corresponding to the first sub-resource. Wherein the resource type of the sample sub-resource is the same as the resource type of the first sub-resource.

In response to the third loss value being greater than the second loss threshold value, adjusting the initial vector determination model to obtain an adjusted vector determination model; inputting the sample sub-resources into the adjusted vector determination model to obtain a fourth feature vector corresponding to the sample sub-resources; determining a fourth loss value between the fourth feature vector and the sample vector corresponding to the sample sub-resource; and in response to the fourth loss value not being larger than the second loss threshold value, taking the adjusted vector determination model as a target vector determination model corresponding to the first sub-resource.

And in response to the fourth loss value still being larger than the second loss threshold, adjusting the initial vector determination model again until the loss value between the feature vector of the sample sub-resource obtained based on the vector determination model after being adjusted again and the sample vector of the sample sub-resource is not larger than the second loss threshold, and taking the vector determination model after being adjusted again as the target vector determination model corresponding to the first sub-resource.

The second loss threshold is set based on experience, or adjusted according to an application scenario, which is not limited in the embodiment of the present application. The initial vector determination model may be any type of model, and is not limited in this application. Illustratively, the initial vector determination model is a residual neural network (RESNET) model.

The process of determining a third loss value between the third feature vector and the sample vector corresponding to the sample sub-resource comprises: and calling a loss function to process the third feature vector and the sample vector corresponding to the sample sub-resource to obtain a third loss value. The loss function may be any one of an absolute loss function, a log (logarithmic) loss function, a square loss function, an exponential loss function, and a Hinge loss function (a loss function), which is not limited in this embodiment of the present application.

Optionally, taking the first sub-resource as a video resource as an example, the process of determining the feature vector corresponding to the video resource includes: intercepting a video image from a video resource, inputting the video image into a target vector determination model corresponding to the video resource, and training an initial vector determination model by a sample video and a sample vector corresponding to the sample video to obtain the target vector determination model corresponding to the video resource. And determining the output result of the model based on the target vector to obtain the characteristic vector corresponding to the video resource. The feature vector corresponding to the video resource is determined based on the video image in the video resource, so that the feature vector corresponding to the video resource can represent the video resource.

Optionally, taking the first sub-resource as a text resource as an example, the process of determining the feature vector corresponding to the text resource includes: inputting the text resources into a target vector determination model corresponding to the text resources, and training the initial vector determination model through the sample texts and sample vectors corresponding to the sample texts to obtain the target vector determination model corresponding to the text resources. And determining an output result of the model based on the target vector corresponding to the text resource to obtain a feature vector corresponding to the text resource. Because the feature vector corresponding to the text resource is determined based on the text resource, the feature vector corresponding to the text resource can represent the text resource.

It should be noted that, when the first sub-resource is a resource of another type, the process of determining the feature vector corresponding to the first sub-resource is similar to the process of determining the feature vector corresponding to the video resource and the process of determining the feature vector corresponding to the text resource, and details are not repeated here.

In step 203, the features corresponding to the partial regions are extracted from the feature vectors corresponding to the sub-resources, so as to obtain the activation vectors corresponding to the sub-resources.

Optionally, the process of extracting features corresponding to the partial region from the feature vectors corresponding to the respective sub-resources to obtain the activation vectors corresponding to the respective sub-resources includes: inputting the feature vector corresponding to the first sub-resource into a target activation model corresponding to the first sub-resource, wherein the target activation model is obtained by training an initial activation model through a sample sub-resource and an activation vector of the sample sub-resource, the sample sub-resource is the same as the first sub-resource in resource type, and the first sub-resource is any one of the sub-resources included in the multimedia resource. And obtaining an activation vector corresponding to the first sub-resource based on an output result of the target activation model corresponding to the first sub-resource.

The determination process of the target activation model corresponding to other sub-resources included in the multimedia resource is similar to the determination process of the target activation model corresponding to the first sub-resource, and is not described herein again.

Before inputting the feature vector corresponding to the first sub-resource into the target activation model corresponding to the first sub-resource, the target activation model corresponding to the first sub-resource needs to be obtained through training. The process of training to obtain the target activation model corresponding to the first sub-resource comprises the following steps: acquiring sample sub-resources, activation vectors of the sample sub-resources and an initial activation model; inputting the sample sub-resources into an initial activation model to obtain a first feature vector of the sample sub-resources; determining a first loss value between an activation vector and a first feature vector of a sample sub-resource; in response to the first loss value not being greater than the first loss threshold, the initial activation model is taken as the target activation model.

Responding to the first loss value being larger than a first loss threshold value, adjusting the initial activation model to obtain an adjusted activation model; and inputting the sample sub-resources into the adjusted activation model to obtain a second feature vector of the sample sub-resources. Determining a second loss value between the activation vector and a second feature vector of the sample sub-resource; and in response to the second loss value not being greater than the first loss threshold, taking the adjusted activation model as the target activation model.

And in response to the second loss value still being larger than the first loss threshold value, adjusting the initial activation model again until the loss value between the vector of the sample sub-resource obtained based on the activation model after being adjusted again and the activation vector of the sample sub-resource is not larger than the first loss threshold value, and taking the activation model after being adjusted again as the target activation model.

The first loss threshold is set based on experience, or adjusted according to an application scenario, which is not limited in the embodiment of the present application. The initial activation model may be any type of model, and is not limited in this application. Illustratively, the initial activation model may be a RESNET model, the initial activation model may be a Bert (Bidirectional Encoder responses from networks for pre-trained language characterization) model, or the initial activation model may also be a LSTM (Long short-term memory network) model.

Optionally, taking the first sub-resource as a video resource as an example, extracting features corresponding to the partial region from the feature vector corresponding to the first sub-resource, and obtaining an activation vector corresponding to the first sub-resource includes: and inputting the characteristic vector corresponding to the video resource into a target activation model corresponding to the video resource, wherein the target activation model corresponding to the video resource is obtained by training an initial activation model through the sample video and the activation vector of the sample video. And obtaining an activation vector corresponding to the video resource based on an output result of the target activation model corresponding to the video resource. Because the activation vector corresponding to the video resource is the feature vector obtained by focusing on the partial region of the video resource, the activation vector corresponding to the video resource has higher pertinence, so that the activation vector corresponding to the video resource can better express the video resource.

Optionally, extracting features corresponding to the partial region from the feature vector corresponding to the video resource based on the following formula (1), to obtain an activation vector Z corresponding to the video resource₁。

Z₁＝σ(W₁ ^TX₁+B₁) (1)

In the above formula (1), σ is an activation function, W₁Is a parameter mapping matrix, T is a transposed matrix, X₁As feature vectors corresponding to video resources, B₁Is a bias vector.

Optionally, taking the first sub-resource as a text resource as an example, extracting features corresponding to the partial region from the feature vector corresponding to the text resource to obtain an activation vector corresponding to the text resource, where the process includes: inputting the feature vector corresponding to the text resource into the target activation model corresponding to the text resource, training the initial activation model by the target activation model corresponding to the text resource through the sample text and the activation vector of the sample text to obtain the activation vector corresponding to the text resource based on the output result of the target activation model corresponding to the text resource. Because the activation vector corresponding to the text resource is the feature vector obtained by focusing on part of characters of the text resource, the activation vector corresponding to the text resource has higher pertinence, so that the activation vector corresponding to the text resource can better express the text resource.

Optionally, extracting features corresponding to the partial region from the feature vector corresponding to the text resource based on the following formula (2), to obtain an activation vector Z corresponding to the text resource₂。

Z₂＝σ(W₂ ^TX₂+B₂) (2)

In the above formula (2), σ is an activation function, W₂Is a parameter mapping matrix, T is a transposed matrix, X₂Feature vectors corresponding to text resources, B₂Is a bias vector.

It should be noted that, when the first sub-resource is a sub-resource of another type, the process of determining the activation vector corresponding to the first sub-resource is similar to the process of determining the activation vector corresponding to the video resource and the process of determining the activation vector corresponding to the text resource, and details are not repeated here.

In step 204, a target feature vector is determined based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, the target feature vector is used for representing the multimedia resource, and the target feature vector is used for recommending the resource.

In a possible implementation manner, the process of determining the target feature vector based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource includes: determining an intermediate feature vector based on an activation vector corresponding to the first sub-resource and a feature vector corresponding to the second sub-resource, wherein the intermediate feature vector is obtained after the second sub-resource is fused on the basis of the first sub-resource, the number of the intermediate feature vector is the same as that of the sub-resources, the first sub-resource is any one of the sub-resources included in the multimedia resource, and the second sub-resource is a sub-resource except the first sub-resource in the sub-resources included in the multimedia resource. Based on the intermediate feature vector, a target feature vector is determined.

Illustratively, the number of sub-resources included in the multimedia resource is 2, and the number of the determined intermediate feature vectors is 2. If the number of the sub-resources included in the multimedia resource is 3, the number of the determined intermediate feature vectors is 3.

The process of determining the intermediate feature vector based on the activation vector corresponding to the first sub-resource and the feature vector corresponding to the second sub-resource includes: and multiplying the activation vector corresponding to the first sub-resource and the numerical value of the same vector dimension in the feature vector corresponding to the second sub-resource to obtain a reference feature vector. And adding the numerical values of all vector dimensions in the reference feature vector to obtain an intermediate feature vector.

Optionally, taking the number of sub-resources included in the multimedia resource as two as an example, based on the activation vector corresponding to the first sub-resource and the feature vector corresponding to the second sub-resource, determining an intermediate feature vector f according to the following formula (3)₁。

f₁＝Z₁*X₂ (3)

In the above formula (3), Z₁For the activation vector, X, corresponding to the first sub-resource₂And calculating the feature vector corresponding to the second sub-resource by dot multiplication.

In a possible implementation manner, after determining the intermediate feature vector, the process of determining the target feature vector based on the intermediate feature vector includes: and splicing the intermediate characteristic vectors to obtain a target characteristic vector.

Optionally, after the target feature vector of the multimedia resource is determined, resource recommendation may be performed based on the target feature vector, that is, a resource tag of the multimedia resource is determined based on the target feature vector. And then resource recommendation is carried out based on the resource label of the multimedia resource. The process of determining the resource label of the multimedia resource comprises the following steps: and acquiring at least two labels of the multimedia resource, and determining the feature vector corresponding to the label of the multimedia resource. And determining the resource label corresponding to the multimedia resource based on the target characteristic vector of the multimedia resource and the characteristic vector corresponding to the label of the multimedia resource.

The label corresponding to the multimedia resource is a label corresponding to a sub-resource included in the multimedia resource, and the process of obtaining the label corresponding to the sub-resource includes: and inputting the first sub-resource into a label determination model, and obtaining a label corresponding to the first sub-resource based on an output result of the label model. The first sub-resource is any one of the sub-resources included in the multimedia resource. Alternatively, the label determination model may be any type of model, which is not limited in this embodiment. Illustratively, the tag determination model may be a RESNET model, and the tag determination model may also be an LSTM model.

The process of determining the feature vector corresponding to the label corresponding to the child resource includes: and inputting the label corresponding to the sub-resource into the vector determination model, and obtaining the feature vector corresponding to the label corresponding to the sub-resource based on the output result of the vector determination model. Alternatively, the vector determination model may be any type of model, which is not limited in this application. Illustratively, the vector determination model may be an LSTM model.

When the sub-resource is the video resource, the process of acquiring the label corresponding to the video resource comprises the following steps: and inputting the video resource into the label determination model, and obtaining a label corresponding to the video resource based on an output result of the label model. Alternatively, the label determination model may be any type of model, which is not limited in this embodiment. Illustratively, the tag determination model is a RESNET model. The process of determining the feature vector corresponding to the label comprises the following steps: and inputting the label into the vector determination model, and obtaining a characteristic vector corresponding to the label based on an output result of the vector determination model.

Taking the first sub-resource as the video resource as an example, as shown in fig. 3, which is a schematic diagram of a determination process of a tag corresponding to the video resource provided in the embodiment of the present application, in fig. 3, the video resource is input into an RESNET model, and the tags corresponding to the video resource are obtained as "beauty" and "pig trotter".

When the sub-resource is the text resource, the process of obtaining the label corresponding to the text resource comprises the following steps: and inputting the text resource into the label determination model, and obtaining a label corresponding to the text resource based on an output result of the label model. Alternatively, the label determination model may be any type of model, which is not limited in this embodiment. Illustratively, the label determination model is the LSTM model. The process of determining the feature vector corresponding to the label comprises the following steps: and inputting the label into the vector determination model, and obtaining a characteristic vector corresponding to the label based on an output result of the vector determination model.

Fig. 4 is a schematic diagram illustrating a process for determining a tag corresponding to a text resource according to an embodiment of the present application, in fig. 4, a text resource "saying that meat is eaten can be searched hot, and i try me. Inputting the LSTM model to obtain the label corresponding to the text resource as 'food'.

The process of determining a resource label for a multimedia resource comprises: and determining the matching degree of the label of the multimedia resource and the multimedia resource based on the feature vector corresponding to the label of the multimedia resource and the target feature vector. And taking the label of which the matching degree with the multimedia resource meets the matching requirement in the labels of the multimedia resources as the resource label of the multimedia resources.

The label with the matching degree meeting the matching requirement may be the label with the highest matching degree, or may be the label with the matching degree greater than the matching threshold, which is not limited in the embodiment of the present application. Optionally, the matching threshold is set based on experience, and may also be adjusted according to an application scenario, which is not limited in the embodiment of the present application. Illustratively, the matching threshold is 85.

Optionally, the tag corresponding to the multimedia resource and the multimedia are determined according to the following formula (4) based on the feature vector corresponding to the tag corresponding to the multimedia resource and the target feature vectorMatching degree S of body resources₁：

In the above formula (4), Y₁Is the value of the first vector dimension, Z, in the target feature vector₁The value of the first vector dimension in the feature vector corresponding to the label corresponding to the multimedia resource is obtained; y is₂Is the value of the second vector dimension, Z, in the target feature vector₂The value of the dimension of a second vector in the feature vector corresponding to the label corresponding to the multimedia resource; y is_nIs the value of the nth vector dimension, Z, in the target feature vector_nAnd the value of the nth vector dimension in the feature vector corresponding to the label corresponding to the multimedia resource.

It should be noted that other ways may also be selected to determine the matching degree between the label corresponding to the multimedia resource and the multimedia resource, which is not limited in this embodiment of the application.

Illustratively, the tags corresponding to multimedia assets include "beauty", "pig hoof", "gourmet". The label with the matching degree meeting the requirement is the label with the highest matching degree. The matching degree of the label 'beauty' and the multimedia resource is 70, the matching degree of the label 'pig trotter' and the multimedia resource is 90, and the matching degree of the label 'food' and the multimedia resource is 80. The matching degree of the label 'pig trotter' and the multimedia resource is the highest, so that the label 'pig trotter' is used as the resource label of the multimedia resource.

In a possible implementation manner, after the resource tag of the multimedia resource is determined, the multimedia resource and the resource tag of the multimedia resource are correspondingly stored. Optionally, the multimedia resource and the resource tag of the multimedia resource may be stored in a storage space of the terminal device, or the multimedia resource and the resource tag of the multimedia resource may be sent to the server, and the server stores the multimedia resource and the resource tag of the multimedia resource in the storage space of the server. The embodiment of the application does not limit the storage positions of the multimedia resources and the resource tags of the multimedia resources.

Alternatively, when storing the multimedia asset and the asset tag of the multimedia asset, the storage may be in a KEY-VALUE storage form. That is, the multimedia resource is KEY, and the resource label of the multimedia resource is VALUE, or the multimedia resource is VALUE and the resource label of the multimedia resource is KEY. This is not limited in the embodiments of the present application.

In a possible implementation manner, after the multimedia resource and the resource tag of the multimedia resource are correspondingly stored, a content recommendation request can be received, where the content recommendation request carries the recommendation tag. And determining the multimedia resource to be recommended based on the recommendation label, the multimedia resource and the resource label of the multimedia resource, and recommending the multimedia resource to be recommended.

The process of determining the multimedia resource to be recommended based on the recommendation label, the multimedia resource and the resource label of the multimedia resource comprises the following steps: and taking the multimedia resource with the resource label consistent with the recommended label as the multimedia resource to be recommended. Or determining the similarity between the resource label and the recommended label, and taking the multimedia resource corresponding to the resource label with the similarity meeting the similarity threshold as the multimedia resource to be recommended. Optionally, the similarity threshold is set based on experience, or adjusted according to an implementation environment, which is not limited in this application. Illustratively, the similarity threshold is 90.

Illustratively, the recommended label carried in the content recommendation request is "pig trotter", the resource label corresponding to the first multimedia resource is "pig trotter", the resource label corresponding to the second multimedia resource is "pig foot", and the resource label corresponding to the third multimedia resource is "pig meat". And the resource label corresponding to the multimedia resource I is consistent with the recommendation label carried by the content recommendation request, so that the multimedia resource I is used as the multimedia resource to be recommended.

For another example, the recommended label carried in the content recommendation request is "pig trotter", the resource label corresponding to the first multimedia resource is "pig trotter", the resource label corresponding to the second multimedia resource is "pig foot", and the resource label corresponding to the third multimedia resource is "pig meat". The similarity between the "pig trotters" and the "pig trotters" is 100, the similarity between the "pig trotters" and the "pig trotters" is 95, and the similarity between the "pork" and the "pig trotters" is 70. The similarity threshold is 90, and since the similarity between the "pig trotters" and the "pig trotters" exceeds 90, the multimedia resource I and the multimedia resource II are used as the multimedia resources to be recommended.

The target feature vector of the multimedia resource determined by the method is the feature vector obtained after the features of each sub-resource included in the multimedia resource are fused, the accuracy of the determined target feature vector of the multimedia resource is high, the matching degree with the multimedia resource is high, and the multimedia resource can be better expressed. Due to the fact that the accuracy of the determined target characteristic vector of the multimedia resource is higher, when the resource label of the multimedia resource is determined based on the target characteristic vector of the multimedia resource, the accuracy of the determined resource label of the multimedia resource can be improved, and the multimedia resource can be recommended better.

Fig. 5 is a schematic diagram illustrating a resource processing procedure according to an embodiment of the present application. In fig. 5, the multimedia resource includes two sub-resources, i.e., a video resource and a text resource. And inputting the video resources into a target vector determination model (RESNET model) corresponding to the video resources to obtain the feature vectors corresponding to the video resources. The text resource' all say that eat meat can be searched hot, i also try out. And inputting a target vector determination model (LSTM model) corresponding to the text resource to obtain a feature vector corresponding to the text resource. And inputting the feature vector corresponding to the text resource into an activation model (module 1) corresponding to the text resource to obtain an activation vector corresponding to the text resource. And inputting the feature vector corresponding to the video resource into an activation model (module 2) corresponding to the video resource to obtain an activation vector corresponding to the video resource. And determining a first intermediate feature vector based on the activation vector corresponding to the video resource and the feature vector corresponding to the text resource, wherein the first intermediate feature vector is obtained by fusing the video resource on the basis of the text resource. And determining a second intermediate feature vector based on the activation vector corresponding to the text resource and the feature vector corresponding to the video resource, wherein the second intermediate feature vector is obtained by fusing the text resource on the basis of the video resource. And splicing the first intermediate characteristic vector and the second intermediate characteristic vector to obtain a target characteristic vector corresponding to the multimedia resource. The first intermediate characteristic vector and the second intermediate characteristic vector are generated by the vectors of the video resource and the text resource together, so that the characteristics of each sub-resource are fused, and the purpose of searching pictures and texts is achieved.

Optionally, the video resource tags and the text resource tags may also be obtained, and the video resource tags and the text resource tags are used as the multimedia resource tags, where the number of the multimedia resource tags is at least two. And determining a feature vector corresponding to the label of the multimedia resource. And determining the matching degree between the label of the multimedia resource and the multimedia resource based on the feature vector corresponding to the label of the multimedia resource and the target feature vector corresponding to the multimedia resource. And taking the label of which the matching degree with the multimedia resource meets the matching requirement in the labels of the multimedia resources as the resource label of the multimedia resources. And correspondingly storing the multimedia resources and the resource labels of the multimedia resources.

Optionally, a resource recommendation request may also be received, where the resource recommendation request carries a recommendation tag. And determining the multimedia resource to be recommended based on the recommendation label, the resource label of the multimedia resource and the multimedia resource, and then recommending the multimedia resource to be recommended.

Fig. 6 is a schematic structural diagram of a resource processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:

an obtaining module 601, configured to obtain a multimedia resource, where the multimedia resource includes at least two sub-resources;

a determining module 602, configured to determine a feature vector corresponding to each sub-resource, where the feature vector is used to characterize the sub-resource;

an extracting module 603, configured to extract features corresponding to the partial region from the feature vectors corresponding to each sub-resource, so as to obtain an activation vector corresponding to each sub-resource;

the determining module 602 is further configured to determine a target feature vector based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, where the target feature vector is used to represent the multimedia resource and the target feature vector is used to recommend the resource.

In a possible implementation manner, the determining module 602 is configured to determine an intermediate feature vector based on an activation vector corresponding to a first sub-resource and a feature vector corresponding to a second sub-resource, where the intermediate feature vector is a feature vector obtained after a second sub-resource is fused on the basis of the first sub-resource, the number of the intermediate feature vector is the same as the number of the sub-resources, the first sub-resource is any one of the sub-resources included in the multimedia resource, and the second sub-resource is a sub-resource other than the first sub-resource among the sub-resources included in the multimedia resource; based on the intermediate feature vector, a target feature vector is determined.

In a possible implementation manner, the determining module 602 is configured to multiply the activation vector corresponding to the first sub-resource and the numerical value of the same vector dimension in the feature vector corresponding to the second sub-resource to obtain a reference feature vector; and adding the numerical values of all vector dimensions in the reference feature vector to obtain an intermediate feature vector.

In a possible implementation manner, the extracting module 603 is configured to input a feature vector corresponding to a first sub-resource into a target activation model corresponding to the first sub-resource, where the target activation model is obtained by training an initial activation model through a sample sub-resource and an activation vector of the sample sub-resource, the sample sub-resource is the same as a resource type of the first sub-resource, and the first sub-resource is any one of sub-resources included in a multimedia resource; and obtaining an activation vector corresponding to the first sub-resource based on an output result of the target activation model corresponding to the first sub-resource.

In a possible implementation manner, the obtaining module 601 is further configured to obtain tags of multimedia resources, where the number of the tags of the multimedia resources is at least two;

the determining module 602 is further configured to determine a feature vector corresponding to a tag of the multimedia resource; determining the matching degree of the label of the multimedia resource and the multimedia resource based on the feature vector corresponding to the label of the multimedia resource and the target feature vector; and taking the label of which the matching degree with the multimedia resource meets the matching requirement in the labels of the multimedia resources as the resource label of the multimedia resources.

In one possible implementation, the apparatus further includes:

the receiving module is used for receiving a content recommendation request, and the content recommendation request carries a recommendation tag;

the determining module 602 is further configured to determine a multimedia resource to be recommended based on the recommendation tag, the multimedia resource, and a resource tag of the multimedia resource;

and the recommending module is used for recommending the multimedia resources to be recommended.

In a possible implementation manner, the determining module 602 is configured to input the first sub-resource into a target vector determination model corresponding to the first sub-resource, where the target vector determination model is obtained by training an initial vector determination model through a sample sub-resource and a feature vector corresponding to the sample sub-resource, the sample sub-resource is of the same resource type as the first sub-resource, and the first sub-resource is any one of sub-resources included in the multimedia resource; and determining an output result of the model based on the target vector to obtain a feature vector corresponding to the first sub-resource.

In a possible implementation manner, the first sub-resource is a video resource, and the determining module 602 is configured to intercept a video image in the video resource; and inputting the video image into a target vector determination model corresponding to the video resource.

In one possible implementation, the apparatus further includes:

the training module is used for acquiring sample sub-resources, and an activation vector and an initial activation model of the sample sub-resources; inputting the sample sub-resources into an initial activation model to obtain a first feature vector of the sample sub-resources; determining a first loss value between an activation vector and a first feature vector of a sample sub-resource; in response to the first loss value not being greater than the first loss threshold, the initial activation model is taken as the target activation model.

In a possible implementation manner, the training module is further configured to adjust the initial activation model in response to that the first loss value is greater than a first loss threshold value, so as to obtain an adjusted activation model; inputting the sample sub-resources into the adjusted activation model to obtain a second feature vector of the sample sub-resources; determining a second loss value between the activation vector and a second feature vector of the sample sub-resource; and in response to the second loss value not being greater than the first loss threshold, taking the adjusted activation model as the target activation model.

The target feature vector of the multimedia resource determined by the device is the feature vector obtained after the features of each sub-resource included in the multimedia resource are fused, the accuracy of the determined target feature vector of the multimedia resource is high, the matching degree with the multimedia resource is high, and the multimedia resource can be better expressed. Due to the fact that the accuracy of the determined target characteristic vector of the multimedia resource is higher, when the resource label of the multimedia resource is determined based on the target characteristic vector of the multimedia resource, the accuracy of the determined resource label of the multimedia resource can be improved, and the multimedia resource can be recommended better.

It should be understood that, when the above-mentioned apparatus is provided to implement its functions, it is only illustrated by the division of the above-mentioned functional modules, and in practical applications, the above-mentioned functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 7 shows a block diagram of a terminal device 700 according to an exemplary embodiment of the present application. The terminal device 700 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal device 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal device 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the resource handling methods provided by method embodiments herein.

In some embodiments, the terminal device 700 may further include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, display screen 705, camera assembly 706, audio circuitry 707, and power supply 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminal devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, and is disposed on the front panel of the terminal device 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal device 700 or in a foldable design; in other embodiments, the display 705 may be a flexible display, disposed on a curved surface or on a folded surface of the terminal device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. In general, a front camera is provided on the front panel of the terminal apparatus 700, and a rear camera is provided on the rear panel of the terminal apparatus 700. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the terminal device 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The power supply 709 is used to supply power to various components in the terminal device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 709 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal device 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal device 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal device 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the user with respect to the terminal device 700. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of terminal device 700 and/or underneath display screen 705. When the pressure sensor 713 is arranged on the side frame of the terminal device 700, the holding signal of the user to the terminal device 700 can be detected, and the processor 701 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also called a distance sensor, is typically provided on the front panel of the terminal device 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal device 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal device 700 gradually decreases, the processor 701 controls the display screen 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal device 700 gradually becomes larger, the processor 701 controls the display 705 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of terminal device 700 and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where at least one program code is stored in the one or more memories 802, and is loaded and executed by the one or more processors 801 to implement the resource Processing method according to the foregoing method embodiments. Of course, the server 800 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 800 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to cause a computer to implement any of the above-described resource processing methods.

Alternatively, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program or a computer program product having at least one computer instruction stored therein, the at least one computer instruction being loaded and executed by a processor to cause a computer to implement any of the resource processing methods described above.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the principles of the present application should be included in the protection scope of the present application.

Claims

1. A method for processing resources, the method comprising:

2. The method according to claim 1, wherein the determining a target feature vector based on the feature vector corresponding to each of the sub-resources and the activation vector corresponding to each of the sub-resources comprises:

determining an intermediate feature vector based on an activation vector corresponding to a first sub-resource and a feature vector corresponding to a second sub-resource, wherein the intermediate feature vector is obtained by fusing the second sub-resource on the basis of the first sub-resource, the number of the intermediate feature vector is the same as the number of the sub-resources, the first sub-resource is any one of the sub-resources included in the multimedia resource, and the second sub-resource is a sub-resource except the first sub-resource among the sub-resources included in the multimedia resource;

determining the target feature vector based on the intermediate feature vector.

3. The method of claim 2, wherein determining an intermediate feature vector based on the activation vector corresponding to the first sub-resource and the feature vector corresponding to the second sub-resource comprises:

multiplying the activation vector corresponding to the first sub-resource and the numerical value of the same vector dimension in the feature vector corresponding to the second sub-resource to obtain a reference feature vector;

and adding the numerical values of the vector dimensions in the reference feature vector to obtain the intermediate feature vector.

4. The method according to claim 1, wherein the extracting features corresponding to partial regions from the feature vectors corresponding to the sub-resources to obtain the activation vectors corresponding to the sub-resources respectively comprises:

inputting a feature vector corresponding to a first sub-resource into a target activation model corresponding to the first sub-resource, wherein the target activation model is obtained by training an initial activation model through a sample sub-resource and an activation vector of the sample sub-resource, the sample sub-resource is the same as the first sub-resource in resource type, and the first sub-resource is any sub-resource included in the multimedia resource;

and obtaining an activation vector corresponding to the first sub-resource based on an output result of the target activation model corresponding to the first sub-resource.

5. The method according to any of claims 1 to 4, wherein after the acquiring the multimedia resource, the method further comprises:

acquiring the tags of the multimedia resources, wherein the number of the tags of the multimedia resources is at least two;

determining a feature vector corresponding to the label of the multimedia resource;

after determining the target feature vector based on the feature vector corresponding to each sub-resource and the activation vector corresponding to each sub-resource, the method further includes:

determining the matching degree of the label of the multimedia resource and the multimedia resource based on the feature vector corresponding to the label of the multimedia resource and the target feature vector;

and taking the label of which the matching degree with the multimedia resource meets the matching requirement in the labels of the multimedia resources as the resource label of the multimedia resources.

6. The method according to claim 5, wherein after the label with the matching degree with the multimedia resource satisfying the matching requirement is taken as the resource label of the multimedia resource, the method further comprises:

and correspondingly storing the multimedia resource and the resource label of the multimedia resource.

7. The method of claim 6, wherein after storing the multimedia asset and the asset tag of the multimedia asset in correspondence, the method further comprises:

receiving a content recommendation request, wherein the content recommendation request carries a recommendation tag;

determining a multimedia resource to be recommended based on the recommendation label, the multimedia resource and a resource label of the multimedia resource;

and recommending the multimedia resource to be recommended.

8. The method according to any one of claims 1 to 4, wherein the determining the feature vector corresponding to each of the sub-resources comprises:

inputting a first sub-resource into a target vector determination model corresponding to the first sub-resource, wherein the target vector determination model is obtained by training an initial vector determination model through a sample sub-resource and a feature vector corresponding to the sample sub-resource, the sample sub-resource and the first sub-resource are of the same resource type, and the first sub-resource is any sub-resource included in the multimedia resource;

and determining an output result of the model based on the target vector to obtain a feature vector corresponding to the first sub-resource.

9. The method of claim 8, wherein the first sub-resource is a video resource, and wherein the inputting the first sub-resource into the target vector determination model corresponding to the first sub-resource comprises:

intercepting video images in the video resources;

and inputting the video image into a target vector determination model corresponding to the video resource.

10. The method according to claim 4, wherein before inputting the feature vector corresponding to the first sub-resource into the target activation model corresponding to the first sub-resource, the method further comprises:

acquiring the sample sub-resources, the activation vectors of the sample sub-resources and the initial activation model;

inputting the sample sub-resources into the initial activation model to obtain first feature vectors of the sample sub-resources;

determining a first loss value between an activation vector of the sample sub-resources and the first feature vector;

in response to the first loss value not being greater than a first loss threshold, treating the initial activation model as the target activation model.

11. The method of claim 10, further comprising:

responding to the first loss value being larger than the first loss threshold value, adjusting the initial activation model to obtain an adjusted activation model;

inputting the sample sub-resources into the adjusted activation model to obtain a second feature vector of the sample sub-resources;

determining a second loss value between the activation vector and the second feature vector for the sample sub-resource;

in response to the second loss value not being greater than the first loss threshold, treating the adjusted activation model as the target activation model.

12. An apparatus for resource handling, the apparatus comprising:

13. An electronic device, comprising a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to cause the electronic device to implement the resource processing method according to any one of claims 1 to 11.

14. A computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to cause a computer to implement the resource processing method according to any one of claims 1 to 11.

15. A computer program product having stored therein at least one computer instruction which is loaded and executed by a processor to cause a computer to implement a resource handling method as claimed in any one of claims 1 to 11.