CN114528496B

CN114528496B - Multimedia data processing method, device, equipment and storage medium

Info

Publication number: CN114528496B
Application number: CN202210426196.5A
Authority: CN
Inventors: 李涛; 刘松; 刘峰; 龚千健; 许笑; 倪翔
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2022-07-08
Anticipated expiration: 2042-04-22
Also published as: CN114528496A

Abstract

The embodiment of the application discloses a multimedia data processing method, a device, equipment and a storage medium, which are applied to the fields of artificial intelligence, block chains, traffic and the like, wherein the method comprises the following steps: respectively acquiring initial media characteristic vectors of a target object, a first associated object and a second associated object; performing first-order association processing on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain a first associated media characteristic vector; performing second-order association processing on the initial media characteristic vector of the target object, the initial media characteristic vector of the second associated object and the initial media characteristic vector of the first associated object to obtain a second associated media characteristic vector; and generating a target media characteristic vector of the target object according to the first associated media characteristic vector and the second associated media characteristic vector, and pushing multimedia data for the target object according to the target media characteristic vector. The method and the device can improve the pushing accuracy of the multimedia data.

Description

Multimedia data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, block chaining, cloud technology, and the like, and in particular, to a multimedia data processing method, apparatus, device, and storage medium.

Background

With the development of big data, various multimedia platforms (such as audio and video platforms, game platforms and the like) successively introduce a personalized push algorithm based on the big data to push multimedia data for users, so that the convenience of the users for acquiring the multimedia data is improved. In the process of pushing multimedia data, user information is generally generated according to characteristics of the user, such as gender, age, and academic calendar, and multimedia data that the user is interested in is pushed to the user according to the user information. In practice, it is found that such user information can reflect the user's interest characteristics about the multimedia data only on one side, resulting in a low push accuracy of the multimedia data.

Disclosure of Invention

The embodiment of the application provides a multimedia data processing method, a multimedia data processing device, multimedia data processing equipment and a multimedia data processing storage medium, and the pushing accuracy of multimedia data is improved.

An embodiment of the present application provides a multimedia data processing method, including:

respectively acquiring initial media characteristic vectors of a target object, a first associated object and a second associated object relative to multimedia data; the target object and the first association object have a first-order association relationship therebetween, and the target object and the second association object have a second-order association relationship therebetween;

performing first-order association processing on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain a first associated media characteristic vector of the target object;

performing second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object;

generating a target media feature vector of the target object according to the first associated media feature vector and the second associated media feature vector, and pushing multimedia data for the target object according to the target media feature vector.

An embodiment of the present application provides a multimedia data processing apparatus, including:

the acquisition module is used for respectively acquiring initial media characteristic vectors of the target object, the first associated object and the second associated object relative to the multimedia data; the target object and the first association object have a first-order association relationship therebetween, and the target object and the second association object have a second-order association relationship therebetween;

the processing module is used for performing first-order association processing on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain a first associated media characteristic vector of the target object; performing second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object;

and the generating module is used for generating a target media characteristic vector of the target object according to the first associated media characteristic vector and the second associated media characteristic vector, and pushing multimedia data for the target object according to the target media characteristic vector.

An aspect of the embodiments of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

An aspect of the embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method.

An aspect of an embodiment of the present application provides a computer program product, which includes a computer program that, when being executed by a processor, implements the steps of the method.

In the application, the computer device may obtain initial media feature vectors of the target object, the first associated object, and the second associated object, respectively, where the initial media feature vectors are used to reflect basic attribute information and a significant media tag of the object. Further, the initial media feature vector of the target object and the initial media feature vector of the first associated object may be subjected to first order association processing to obtain a first associated media feature vector, and the initial media feature vector of the target object, the initial media feature vector of the first associated object and the initial media feature vector of the second associated object may be subjected to second order association processing to obtain a second associated media feature vector. The potential media tags of the target object can be mined through the first-order association processing and the second-order association processing, that is, the first association media feature vector and the second association media feature vector can reflect not only basic attribute information and a significant media tag of the target object, but also the potential media tags of the target object. The target media feature vector of the target object is generated according to the first associated media feature vector and the second associated media feature vector, and the target media feature vector can reflect rich media tags of the target object, so that the pushing accuracy of multimedia data can be improved and the accurate pushing of the multimedia data can be realized by pushing the multimedia data for a target user according to the target media feature vector.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a multimedia data processing system according to the present application;

FIG. 2 is a flow chart illustrating a multimedia data processing method provided herein;

FIG. 3 is a schematic diagram of a scene for obtaining a target media feature vector of a target object according to the present application;

FIG. 4 is a flow chart illustrating a multimedia data processing method provided herein;

FIG. 5 is a schematic diagram of a scenario of a training process for a candidate media recognition model provided in the present application;

FIG. 6 is a schematic diagram of a scenario for obtaining predicted media feature vectors of target sample objects based on a graph neural network model provided in the present application;

FIG. 7 is a block diagram of a multimedia data processing apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application relates to artificial intelligence, for example, the application mainly relates to a machine learning technology in the artificial intelligence, and the machine learning technology is utilized to perform first-order association processing on an initial media feature vector of a target object and an initial media feature vector of a first associated object to obtain a first associated media feature vector of the target object, perform second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the first associated object and the initial media feature vector of a second associated object to obtain a second associated media feature vector of the target object, and determine the target media feature vector of the target object according to the first associated media feature vector and the second associated media feature vector. That is to say, more potential media tags of the target object can be mined through the machine learning technology, and the accuracy of obtaining the media feature vector of the target object is improved. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

For example, a computer device may store basic attribute information and historical multimedia behavior data of an object user in a blockchain network, when the computer device needs to acquire the data, a node device in the blockchain network verifies the validity of the computer device, and when the node device determines that the computer device is valid, the computer device may read the data from the blockchain network, so that the security of the user data may be improved.

In order to facilitate a clearer understanding of the present application, a media data processing system implementing the media data processing method of the present application is first introduced, as shown in fig. 1, the media data processing system includes a server 10 and a terminal cluster, and the terminal cluster may include one or more terminals, and the number of the terminals is not limited herein. As shown in fig. 1, the terminal cluster may specifically include a terminal 1, a terminal 2, …, and a terminal n; it is understood that terminal 1, terminal 2, terminal 3, …, and terminal n may be all connected to server 10 via a network, so that each terminal may interact data with server 10 via the network connection.

The terminal is equipped with a multimedia platform for providing multimedia data to the user, which may include but is not limited to: the terminal can push multimedia data to the user according to the target media characteristic vector of the user. Here, the associated user may refer to a user having an association relationship with the user, the association relationship may include a friend relationship, a relativity relationship, a co-worker relationship, and the like, the friend relationship may refer to a relationship between users who pay attention to and collect the same type of multimedia data together, or the friend relationship may refer to a relationship between users who belong to one communication group. The association relationship may include a first order association relationship and a second order association relationship, the first order association relationship and the second order association relationship may be determined according to affinity between users, the affinity may be determined according to information such as interaction frequency, attention duration, association category, and the like between users, the interaction may be to perform operations such as playing, praise, collection, and the like on multimedia data published by other users, or the interaction may be to perform information exchange on a social platform. The association relationship category includes direct association relationship and indirect association relationship, for example, for a user a, a user having a direct association relationship with the user a refers to a user in the address book of the user a, such as a user B, and a user having an indirect association relationship with the user a refers to a user not in the address book of the user a and a user in the address book of the user B. For example, for the user a, the user having the first-order association relationship with the user a may refer to the user whose affinity value with the user a is greater than the affinity threshold, and the user having the second-order association relationship with the user a may refer to the user whose affinity value with the user a is less than or equal to the affinity threshold.

It is understood that the multimedia data refers to different specific contents in different multimedia platforms, for example, in a game application download platform, the multimedia data may refer to a game application, such as a stand-alone game, a network game, a hand game or a mini game, etc.; in short video platforms, multimedia data may refer to a segment of video. In the audio and video playing platform, the multimedia data can be movie and television works, audio data and the like; in the shopping platform, the multimedia data may refer to a product or service sold in the shopping platform, and in the content distribution platform, the multimedia data may refer to a literature, a news item, a travel story, and the like.

It is understood that the target media feature vector may be generated according to the initial media feature vector of the user itself and the initial media feature vector of the associated user, and the initial media feature vector includes a basic feature value for reflecting the basic attribute information of the user and media feature values for reflecting the L media tags of the user. The basic attribute information may include inherent basic attribute elements of the user, such as gender, age, academic calendar, education degree, address, and the like, the media feature value corresponding to the media tag includes a target value and a non-target value, and the media feature value corresponding to the media tag being the target value may mean that the user has the media tag, that is, the user is interested in the media data corresponding to the media tag; the non-target value of the media feature value corresponding to the media tag may mean that the user does not have the media tag, that is, the user does not have interest in the media data corresponding to the media tag. The media characteristic value corresponding to the media tag is determined according to historical multimedia behavior data of the user, wherein the historical multimedia behavior data comprises behavior data of the user for operations of downloading, playing, paying attention to, collecting, commenting and the like of the multimedia data. It can be seen that the initial media feature vector of the user can be used to reflect which media tags the user has, and since the media feature values corresponding to the media tags in the initial media feature vector are directly obtained according to the historical multimedia behavior data of the user, the initial media feature vector of the user can also be said to reflect the significant media tags of the user, that is, the significant media tags are directly obtained according to the historical multimedia behavior data of the user.

Similarly, the target media feature vector of the user also includes basic feature values for reflecting basic attribute information of the user, and media feature values for reflecting L media tags of the user. The basic characteristic value of the target media characteristic vector used for reflecting the basic attribute information of the user is the basic characteristic value of the initial media characteristic vector of the user used for reflecting the basic attribute information of the user; the media characteristic value corresponding to the media label in the target media characteristic vector is obtained by association processing according to the initial media characteristic vector of the user and the initial media characteristic vector of the associated user; that is, the media feature value corresponding to the media tag in the target media feature vector is obtained by adjusting the media feature value in the initial media feature vector of the user according to the media feature value in the initial media feature vector of the associated user. Therefore, the target media feature vector of the user can also be said to reflect the significant media tag and the potential media tag of the user, and the significant media tag is inherited from the initial media feature vector of the user. The potential media tags can be mined in any one or more of the following three ways:

1) and determining a joint association relation between the basic attribute information and the media tag of the user according to the initial media feature vector of the associated user, and mining the potential media feature tag of the user according to the joint association relation and the initial media feature vector of the user. For example, determining boys under the age of 15, most of which have media tags corresponding to game Y1, based on the initial media feature vector of user A's associated users, determines that boys under the age of 15 are associated with media tags corresponding to game Y1 in the basic attribute information. If the age of the user A is 14 years and the gender of the user A is male according to the initial media feature vector of the user A, the media tag corresponding to the game Y1 can be determined as a potential media tag of the user A; the media tag corresponding to game Y1 herein may refer to the name or category of game Y1, and so forth.

2) And determining a media association relation between the media tags of the users according to the initial media feature vector of the associated user, and mining potential media feature tags of the users according to the media association relation and the initial media feature vector of the users. For example, if the associated user with the media tag corresponding to game Y2 is determined from the initial media feature vector of the associated user of user a, and most of the associated users also have the media tag corresponding to game Y3, then game Y2 is determined to be associated with game Y3, and assuming that user a has the media tag corresponding to game Y2 determined from the initial media feature vector of user a, the media tag corresponding to game Y3 is determined to be a potential media tag of user a based on the association between game Y2 and game Y3.

3) Counting the number of users with certain media tags according to the initial media feature vectors of the associated users, determining the initial media feature vectors of the users according to the number of the users, and mining the potential media feature tags of the users. For example, the number of users having a media tag corresponding to game Y4 is 10 and the number of users having an associated user of user a is 12 according to the initial media feature vector of the associated user of user a, that is, most of the associated users have a media tag corresponding to game Y4, and therefore, the media tag corresponding to game Y4 can be determined as the potential media tag of user a.

It is understood that the server 10 may refer to a device for providing a backend service for a multimedia platform, for example, the server 10 may be configured to perform auditing, arrangement, and the like on multimedia data published in the multimedia platform by a user, and the server 10 may be further configured to store a relationship chain of the user, historical multimedia behavior data, and basic attribute information, where the relationship chain is used to record the user having an association relationship with the user.

The server may be an independent physical server, a server cluster or a distributed system formed by at least two physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal may specifically refer to, but is not limited to, a vehicle-mounted terminal, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a sound box with a screen, a smart watch, and the like. Each terminal and each server may be directly or indirectly connected through a wired or wireless communication manner, and the number of the terminals and the number of the servers may be one or at least two, which is not limited herein.

It should be noted that, the object in the present application may refer to a user, and when the above embodiment of the present application is applied to a specific product or technology, the user needs to obtain permission or consent, and the collection, use and processing of user information such as basic attribute information, historical multimedia behavior data, initial media feature vector, target media feature vector, tagged media feature vector, etc. of the user need to comply with relevant laws and regulations and standards of relevant countries and regions. That is to say, when the computer device acquires the authorization information of the user for the information, the computer device can only acquire the user information such as the basic attribute information, the historical multimedia behavior data, the initial media feature vector, the target media feature vector, the tagged media feature vector, and the like of the user.

For example, the computer device may display a permission prompt interface in a multimedia interface of the multimedia platform, where the permission prompt interface is used to prompt the user to currently collect an initial media feature vector of the user, and after obtaining that the user sends a confirmation operation to the permission prompt interface, start to execute the step of obtaining the initial media feature vector of the user, otherwise, end.

Further, please refer to fig. 2, which is a flowchart illustrating a multimedia data processing method according to an embodiment of the present application. As shown in fig. 2, the method may be performed by the terminal in fig. 1, the server in fig. 1, or both the terminal and the server in fig. 1, and devices for performing the method in this application may be collectively referred to as computer devices. The multimedia data processing method comprises the following steps S101-S104:

s101, computer equipment respectively obtains initial media feature vectors of a target object, a first associated object and a second associated object relative to multimedia data; the target object and the first association object have a first order association relationship therebetween, and the target object and the second association object have a second order association relationship therebetween.

In the application, the computer device can acquire the basic attribute information and the historical multimedia behavior data of the target object, and generate the initial media feature vector of the target object relative to the multimedia data according to the basic attribute information and the historical multimedia behavior data of the target object. The initial media feature vector of the target object comprises a basic feature value for reflecting basic attribute information of the target object and a media feature value of a media tag of the target object, the tag value of the media tag of the target object is determined according to historical multimedia data of the target object, and the media tag of which the media feature value is the target feature value is called a salient media tag of the target object, that is, the initial media feature vector of the target object is used for reflecting the salient media tag of the target object.

Similarly, the computer device may obtain the basic attribute information and the historical multimedia behavior data of the first associated object, and generate an initial media feature vector of the first associated object with respect to the multimedia data according to the basic attribute information and the historical multimedia behavior data of the first associated object. The initial media feature vector of the first associated object comprises a basic feature value for reflecting basic attribute information of the first associated object and a media feature value of a media tag of the first associated object, the tag value of the media tag of the first associated object is determined according to historical multimedia data of the first associated object, the media tag of which the media feature value is a target feature value is called a significant media tag of the first associated object, that is, the initial media feature vector of the first associated object is used for reflecting the significant media tag of the first associated object. The computer device can acquire the basic attribute information and the historical multimedia behavior data of the second associated object, and generate an initial media feature vector of the second associated object relative to the multimedia data according to the basic attribute information and the historical multimedia behavior data of the second associated object. The initial media feature vector of the second associated object comprises a basic feature value for reflecting basic attribute information of the second associated object and a media feature value of a media tag of the second associated object, the tag value of the media tag of the second associated object is determined according to historical multimedia data of the second associated object, the media tag of which the media feature value is a target feature value is called a significant media tag of the second associated object, that is, the initial media feature vector of the second associated object is used for reflecting the significant media tag of the second associated object.

Optionally, step S101 includes: acquiring a first associated object network corresponding to a target object, wherein the first associated object network comprises a first node for reflecting an initial media characteristic vector of the target object, a second node for reflecting an initial media characteristic vector of a candidate associated object, and an edge formed by connecting the first node and the second node; the target object is associated with the candidate associated object; according to a node path taking the first node as a starting point, sampling processing is sequentially carried out on second nodes in the first associated object network, and a first associated object having a first-order association relationship with the target object and a second associated object having a second-order association relationship with the target object are obtained; and respectively acquiring initial media feature vectors of the target object, the first associated object and the second associated object relative to the multimedia data from the first associated object network.

The computer device may use all or part of the candidate objects in the first association object network having a first order association relationship with the target object as the first association objects, and use all or part of the candidate objects in the first association object network having a second order association relationship with the target object as the second association objects; and respectively acquiring initial media feature vectors of the target object, the first associated object and the second associated object from the first associated object network.

For example, as shown in fig. 3, the first associated object network includes a first node being node a, a second node being node B, node C, node D, node E, node F, node G, and node H, where node a is used to reflect an initial media feature vector of the target object, node B, node C, and node D are respectively used to reflect an initial media feature vector of a first candidate associated media having a first-order association relationship with the target object, and node E, node F, node G, and node H are respectively used to reflect an initial media feature vector of a second candidate associated object having a second-order association relationship with the target object. The computer device may sequentially traverse the second node in the first association object network according to a node path that takes the node a as a starting point, to obtain a first candidate association object having a first-order association relationship with the target object and a second candidate association object having a second-order association relationship with the target object. Further, a part of candidate correlation objects are sampled from the first candidate correlation objects as first correlation objects having a first-order correlation with the target object, as shown in fig. 3, the first correlation objects include first candidate correlation objects corresponding to nodes B and C, respectively, where the first candidate correlation object corresponding to node D does not satisfy the sampling condition. And sampling part of the candidate associated objects from the second candidate associated objects to be used as second associated objects having a second-order association relationship with the target object, where, as shown in fig. 3, the second associated objects include second candidate associated objects corresponding to the nodes E, F, G, and H, that is, the second candidate associated objects corresponding to the nodes E, F, G, and H all satisfy the sampling condition. Determining an initial media characteristic vector of a candidate associated object corresponding to a first associated object in a first associated object network as an initial media characteristic vector of the first associated object, determining an initial media characteristic vector of a candidate associated object corresponding to a second associated object in the first associated object network as an initial media characteristic vector of the second associated object, and acquiring the initial media characteristic vector of a target object from the first associated object network. By sampling the candidate associated objects having association relation with the target object, the problem of resource waste caused by excessive object number of the candidate associated objects can be avoided, and the problem of low accuracy of the target media characteristic vector of the target object caused by unbalanced distribution of the candidate associated objects can be avoided.

Optionally, the initial media feature vector includes media feature values corresponding to L media tags, and the above sequentially performs sampling processing on a second node in the first associated object network according to a node path using the first node as a starting point to obtain a first associated object having a first-order association with the target object and a second associated object having a second-order association with the target object, including: according to a node path taking the first node as a starting point, sampling second nodes in the first associated object network in sequence to obtain N first candidate associated objects having a first-order association relationship with the target object and M second candidate associated objects having a second-order association relationship with the target object; acquiring the first object number of the first candidate associated objects of which the media characteristic values corresponding to L media tags in the N first candidate associated objects are non-target numerical values; acquiring the second object quantity of second candidate associated objects of which the media characteristic values corresponding to L media tags are non-target numerical values in the M second candidate associated objects; if the number of the first objects is smaller than a first number threshold, determining the N first candidate associated objects as first associated objects having a first-order association relationship with the target object; and if the number of the second objects is smaller than a second number threshold, determining the M second candidate associated objects as first associated objects having a first-order association relationship with the target object. M, N may or may not be the same, for example, N may be greater than or equal to M, or N may be less than or equal to M.

The initial media feature vector includes media feature values corresponding to L media tags, where a media tag is used to reflect a type of multimedia data or a media tag of multimedia data, the media feature value may be a target value or a non-target value, the media tag with the target value is used to reflect that the target object has the media tag, and the media tag with the non-target value is used to reflect that the target object does not have the media tag. For example, the target value may be 1, the non-target value may be 0, the media tag is a media tag of sports information, if the media feature value of the target object related to the sports information is 1, it indicates that the target object is interested in the sports information, i.e. the target object has the media tag of the sports information; if the media feature value of the target object related to the sports information is 0, it indicates that the target object is not interested in the sports information, i.e. the target object does not have the media tag of the sports information. The media characteristic values corresponding to the L media tags in the initial media characteristic vector of the object are all non-target numerical values, and the object can be called an object without the media tags or a label-free object; the media feature values corresponding to the L media tags in the initial media feature vector of the object are not completely non-target values, and the object may be referred to as an object with a media tag, or referred to as an object with a tag.

In order to avoid imbalance of various media tags in the first associated object and the second associated object of the target object, the computer device may sequentially sample the second node in the first associated object network according to a node path starting from the first node to obtain N first candidate associated objects having a first-order association relationship with the target object and M second candidate associated objects having a second-order association relationship with the target object, that is, obtain a limited number of first associated objects and second associated objects from the first associated object network, so as to avoid resource waste caused by an excessive number of objects in the first associated object or the second associated object. Further, a first object number of the first candidate associated objects whose media feature values corresponding to L media tags are all non-target numerical values may be obtained, and a second object number of the second candidate associated objects whose media feature values corresponding to L media tags are all non-target numerical values may be obtained from the M second candidate associated objects; that is, the first number of objects is used to reflect the number of objects corresponding to the unlabeled object in the N first candidate associated objects, and the second number of objects is used to reflect the number of objects corresponding to the unlabeled object in the M second candidate associated objects.

Further, if the number of the first objects is smaller than the first number threshold, it indicates that there are fewer unlabeled objects and there are more labeled objects in the N first candidate associated objects, that is, there are most labeled objects in the N first candidate associated objects, then the N first candidate associated objects are determined as the first associated objects having a first-order association relationship with the target object. If the number of the second objects is smaller than the second number threshold, it indicates that there are fewer unlabeled objects and there are more labeled objects in the M second candidate associated objects, that is, there are more labeled objects in the M second candidate associated objects, and then the M second candidate associated objects are determined as the first associated objects having a first-order association relationship with the target object. The first quantity threshold may be determined based on the number of objects in the first associated object, and the second quantity threshold may be determined based on the number of objects in the second associated object. By sampling the first candidate associated object and the second candidate associated object, the situation that the number of unlabeled objects is large, so that potential media labels of the target object cannot be accurately mined can be avoided, and the accuracy of obtaining the target media feature vector of the target object is improved.

S102, the computer equipment carries out first-order association processing on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain the first associated media characteristic vector of the target object.

In the application, the computer device may perform first-order association processing on the initial media feature vector of the target object and the initial media feature vector of the first associated object to obtain a first associated media feature vector of the target object; the first-order association processing refers to mining potential media characteristics of the target object according to the initial media characteristic vector of the target object and the initial media characteristic vector of the first association object; that is, the first associated media feature vector is used to reflect the salient media features, the potential media features, and the basic attribute information of the target object.

Optionally, step S102 may include: calling a first-order vector recognition layer of a target media recognition model, carrying out averaging processing on initial media feature vectors of at least two first associated objects to obtain first average media feature vectors, determining first media associated information about the at least two first associated objects according to the first average media feature vectors, and determining the first associated media feature vectors of the target objects according to the first media associated information and the initial media feature vectors of the target objects.

The computer equipment can call a first-order vector recognition layer of a target media recognition model, and carry out weighted averaging processing on initial media feature vectors of at least two first associated objects according to the intimacy between the target object and each first associated object respectively to obtain first average media feature vectors. Further, first media association information about at least two first association objects is determined according to the first average media feature vector; the first media association information includes a first association relationship, a first media association relationship and one or more items of the number of fourth objects of each media tag, the first association relationship is used for reflecting the association relationship between the media tag of the first association object and the basic attribute information, the first media association relationship is used for reflecting the association relationship between the media tags, and the number of the fourth objects of each media tag is used for reflecting the number of objects corresponding to the first association object with the media tag. Thus, the computer device can determine the first associated media feature vector in any one or more of three ways:

1) the first media association information comprises a first association relation, a first potential media tag of the target object is mined according to the first association relation and the initial media feature vector of the target object, and a media feature value corresponding to the first potential media tag in the initial media feature vector of the target object is adjusted to be a target numerical value, so that a first associated media feature vector is obtained. For example, the first joint association relationship reflects that the basic attribute information of the male with age [18, 27] and gender is associated with the media tag corresponding to the sports information, that is, most of the male subjects with age [18, 27] have the media tag corresponding to the sports information. If the age of the target object belongs to [18, 27] and the gender is male according to the initial media feature vector of the target object, determining the media tag corresponding to the sports information as a first potential media tag of the target object, and adjusting the media feature value corresponding to the first potential media tag in the initial media feature vector of the target object to be a target feature value to obtain a first associated media feature vector of the target object.

2) The first media association information comprises a first media association relation, a second potential media tag of the target object is mined according to the first media association relation and the initial media feature vector of the target object, and a media feature value corresponding to the second potential media tag in the initial media feature vector of the target object is adjusted to be a target numerical value, so that a first associated media feature vector is obtained. For example, the first media association reflects that the media tag corresponding to the sports information is associated with the media tag corresponding to the game information, i.e., most of the first associated objects having the media tag corresponding to the sports information also have the media tag corresponding to the game information. If the target object is determined to have the media tag corresponding to the sports information according to the initial media feature vector of the target object, the media tag corresponding to the game information is determined to be a second potential media tag of the target object, and the media feature value corresponding to the second potential media tag in the initial media feature vector of the target object is adjusted to be a target numerical value, so that a first associated media feature vector is obtained.

3) The first media association information comprises the fourth object quantity of each media tag, a third potential media tag of the target object is mined according to the fourth object quantity and the initial media feature vector of the target object, and the media feature value corresponding to the third potential media tag in the initial media feature vector of the target object is adjusted to be a target numerical value, so that a first associated media feature vector is obtained. For example, the total number of objects in the first associated object is 12, the number of objects in the first associated object having media tags corresponding to the sports information is 10, the media tag corresponding to the sports information is determined as a third potential media tag of the target object, and the media feature value corresponding to the third potential media tag in the initial media feature vector of the target object is adjusted to a target value, so as to obtain a first associated media feature vector. Potential media tags of target users can be mined through first-order association processing, and accuracy of obtaining target media feature vectors of target objects is improved.

S103, the computer equipment carries out second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object.

In this application, the computer device may perform second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object, and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object, where the second-order association processing refers to mining potential media features of the target object according to the sum of the initial media feature vectors of the target object, the initial media feature vector of the first associated object, and the initial media feature vector of the second associated object; that is, the second associated media feature vector is used to reflect the salient media features, the potential media features, and the basic attribute information of the target object. Potential media tags of target users can be mined through second-order association processing, and accuracy of obtaining target media feature vectors of target objects is improved.

Optionally, the computer device may obtain the second associated media feature vector of the target object in any one of the following two manners:

the first method is as follows: the computer equipment calls a second-order vector recognition layer of a target media recognition model, averages the initial media characteristic vector of the target object and the initial media characteristic vector of the second associated object to obtain a second average media characteristic vector, and determines second media associated information between the target object and the second associated object according to the second average media characteristic vector; and determining a second associated media characteristic vector of the target object according to the second media associated information and the initial media characteristic vector of the first associated object.

In the first mode: the computer equipment can call a second-order vector recognition layer of the target media recognition model, and carry out weighted averaging processing on the initial media feature vector of the target object and the initial media feature vector of the second associated object according to the intimacy between the target object and the second associated object to obtain a second average media feature vector. Further, second media association information about the target object and the second associated object is determined according to the second average media feature vector, where the second media association information includes one or more of a second association relationship, a second media association relationship, and a fifth object number of each media tag, the second association relationship is used to reflect association relationships between the second associated object, the media tag of the target object, and the basic attribute information, the second media association relationship is used to reflect association relationships between the media tags, and the fifth object number corresponding to each media tag is used to reflect the second associated object with the media tag and the corresponding object number in the target object. The computer equipment can mine the potential media characteristics of the first associated object according to the second media associated information and the initial media characteristic vector of the first associated object, use the potential media tag of the first associated object as the potential media tag of the target object, update the initial media characteristic vector of the target object according to the potential media tag of the target object, and obtain the second associated media characteristic vector of the target object.

The second method comprises the following steps: the computer equipment calls a second-order vector recognition layer of a target media recognition model, averages the initial media feature vector of the first associated object and the initial media feature vector of the second associated object to obtain a fourth average media feature vector, determines third media associated information between the first associated object and the second associated object according to the fourth average media feature vector, and determines the second associated media feature vector of the target object according to the third media associated information and the initial media feature vector of the target object.

In the second mode: the computer equipment can call a second-order vector recognition layer of the target media recognition model, and carry out weighted averaging processing on the initial media feature vector of the first associated object and the initial media feature vector of the second associated object according to the intimacy between the target object and the first associated object and between the target object and the second associated object respectively to obtain a fourth average media feature vector. Further, third media association information between the first association object and the second association object is determined according to the fourth average media feature vector, wherein the third media association information comprises a third association relationship, a third media association relationship and one or more items of a sixth object number corresponding to each media tag; the third association relationship is used for reflecting the association relationship between the media tags of the second association object and the first association object and the basic attribute information, the third association relationship is used for reflecting the association relationship between the media tags, and the sixth object number of each media tag is used for reflecting the corresponding object number of the second association object and the first association object with the media tag. The computer device may mine a potential media tag of the target object according to the third media association information and the initial media feature vector of the target object, and update the initial media feature vector of the target object according to the potential media tag of the target object to obtain a second associated media feature vector of the target object.

The implementation process of the potential media tag of the mining target object by the second-order association processing can refer to the implementation process of the potential media tag of the mining target object by the first-order association processing, and repeated parts are not described again.

S104, the computer equipment generates a target media characteristic vector of the target object according to the first associated media characteristic vector and the second associated media characteristic vector, and pushes multimedia data for the target object according to the target media characteristic vector.

In the present application, the first associated media feature vector is used to reflect basic attribute information of the target object, a salient media feature and a first potential media feature of the target object, and the second associated media feature vector is used to reflect basic attribute information of the target object, a salient media feature and a second potential media feature of the target object, or the second associated media feature vector is used to reflect basic attribute information of the target object, a salient media feature, a second potential media feature and a third potential media feature of the target object. Further, the computer device may perform processing such as splicing on the first associated media feature vector and the second associated media feature vector to obtain a target media feature vector of the target object, where the target media feature vector may reflect rich media features of the target object, and therefore, the computer device may push multimedia data for a target user according to the target media feature vector, thereby improving accuracy of pushing the multimedia data.

Optionally, step S104 may include: calling a target vector recognition layer of a target media recognition model, and averaging second associated media characteristic vectors corresponding to at least two second associated objects respectively to obtain a third average media characteristic vector; and splicing the first associated media feature vector and the third average media feature vector to obtain a target media feature vector of the target object.

The computer device can call a target vector recognition layer of a target media recognition model, average second associated media characteristic vectors corresponding to at least two second associated objects respectively to obtain a third average media characteristic vector, and splice the first associated media characteristic vector and the third average media characteristic vector to obtain a spliced media characteristic vector. Here, the splicing process may refer to adding the feature value in the first associated media feature vector to the third average media feature vector to obtain a spliced media feature vector, for example, the first associated media feature vector and the third average media feature vector are both vectors of 4x3, and then the spliced media feature vector is a vector of 4x 6.

Further, the spliced media feature vectors are subjected to normalization processing to obtain target media feature vectors of the target objects, the target media feature vectors of the target objects can reflect rich media features of the target objects, and accuracy of obtaining the media feature vectors of the target objects is improved.

Further, please refer to fig. 4, which is a flowchart illustrating a multimedia data processing method according to an embodiment of the present application. As shown in fig. 4, the method may be performed by the terminal in fig. 1, the server in fig. 1, or both the terminal and the server in fig. 1, and devices for performing the method in this application may be collectively referred to as computer devices. The multimedia data processing method may include the following steps S201 to S208:

s201, respectively acquiring marked media feature vectors of a target sample object, a first associated sample object and a second associated sample object relative to multimedia data by computer equipment; the target sample object has a first order association relationship with the first associated sample object and the target sample object has a second order association relationship with the second associated sample object.

In the application, the computer device may respectively obtain tagged media feature vectors of the target sample object, the first associated sample object, and the second associated sample object with respect to the multimedia data, where the tagged media feature vectors of the target sample object, the first associated sample object, and the second associated sample object with respect to the multimedia data may be obtained by multiple manual tagging and verification, and the tagged media feature vectors are used to reflect real media tags of the objects.

Optionally, step S201 includes: acquiring a second associated object network, wherein the second associated object network comprises nodes used for reflecting labeled media characteristic vectors of candidate sample objects and edges formed by connecting nodes corresponding to the associated candidate sample objects, and the labeled media characteristic vectors comprise media characteristic values corresponding to L media labels; according to the node path in the second associated object network, sequentially sampling the nodes in the second associated object network to obtain K target candidate sample objects corresponding to the L media tags respectively; a target candidate sample object corresponding to a media tag is a candidate sample object of which the media characteristic value corresponding to the media tag is a target numerical value in the second associated object network; determining a target sample object, a first associated sample object and a second associated sample object according to K target candidate sample objects respectively corresponding to the L media tags and the second associated object network; and respectively acquiring the labeled media feature vectors of the target sample object, the first associated sample object and the second associated sample object relative to the multimedia data from the second associated object network.

For example, as shown in fig. 5, the computer device may obtain a second associated object network, and obtain the target sample object, the first associated sample object, and the second associated sample object from the second associated object network by a balanced sampling manner; the balanced sampling mode is that according to the node path in the second associated object network, the nodes in the second associated object network are sampled in sequence to obtain K target candidate sample objects corresponding to the L media tags respectively; that is, K target candidate sample objects having the media tag are sampled for each media tag, so that the number of sample objects corresponding to each media tag is balanced. Part of the target candidate sample objects can be selected from the K target candidate sample objects respectively corresponding to the L media tags as target sample objects and part as verification sample objects, and then a first associated sample object having a first order association relationship with the target sample object and a second associated sample object having a second order association relationship with the target sample object are determined from the second associated object network. Further, the target sample object, the first associated sample object, and the second associated sample object are respectively obtained from the second associated object network with respect to the labeled media feature vector of the multimedia data. By carrying out balanced sampling processing on the first associated object network, the number of objects corresponding to each media tag is balanced, and the generalization capability of the target media identification model is improved, namely the inductive learning capability of the target media identification model is improved.

For example, the annotated media feature vector corresponding to any node u in the second related object network includes a basic feature value x for reflecting basic attribute information⁰ _uAnd the media feature values corresponding to the L media tags, for example, the media feature value of the ith media tag may be represented as:

i is less than or equal to L, which are not mutually exclusive. Some sample objects in the second associated object network have one or more media tags, and some users do not have media tags, and in practice, it is found that the number of objects having some media tags is too large, and if the labeled media feature vectors of the sample objects are directly adopted to train the candidate media identification model, the target media identification model can be good at identifying the media tags, and the identification capability of other media tags is poor, so that the generalization capability of the target media identification model is poor. Thus, the computer device may use a balanced sampling approach to identify each media tag y separately from the set V of candidate sample objects in the second associated object networkⁱRandomly sampling k target candidate sample objects containing the media label, and finally obtaining a union set to obtain a node set V formed by the target candidate sample objects^′The balanced sampling process can be expressed by the following formula (1):

after obtaining the node set V^′Then, a training sample object set and a verification sample object set can be divided according to the proportion, candidate sample objects in the training sample object set are used for training the candidate media identification model, and the candidate sample objects in the training sample object set can be the target sample objects; candidate sample object authentication in an authentication sample object setThe candidate sample objects in the verification sample object set may refer to the verification sample objects below.

Optionally, the determining a target sample object, a first associated sample object, and a second associated sample object according to the K target candidate sample objects and the second associated object network respectively corresponding to the L media tags includes: determining a first associated candidate sample object having a first-order association relationship with a target candidate sample object Pi from the second associated object network; the target candidate sample objects Pi belong to K target candidate sample objects respectively corresponding to the L media labels, and i is a positive integer; determining a second associated candidate sample object having a second order association relationship with the target candidate sample object Pi from the second associated object network; taking the target candidate sample object Pi as a target sample object, and sampling Q first associated candidate sample objects from the first associated candidate sample objects as the first associated sample objects; sampling D second associated candidate sample objects from the second associated candidate sample objects as the second associated sample objects.

The computer apparatus may determine a first associated candidate sample object having a first order association with a target candidate sample object Pi from the second associated object network and determine a second associated candidate sample object having a second order association with the target candidate sample object Pi from the second associated object network; q first associated candidate sample objects may be sampled from the first associated candidate sample objects as first associated sample objects; d second associated candidate sample objects are sampled from the second associated candidate sample objects to serve as second associated sample objects; q may be greater than or equal to D, and Q may also be less than D, Q, D may be a function of the performance (e.g., memory size, processing efficiency) of the computer device and may refer to the target candidate sample object Pi as the target sample object. Through sampling processing, the number of objects corresponding to the first associated candidate sample object and the second associated candidate sample object of the target candidate sample object Pi is not excessive, and the problem of insufficient memory in the training process of the candidate media identification model due to the occurrence of super sample objects is avoided.

The number of objects with media tags in the first associated sample object is larger than that without media tags, and the number of objects with media tags in the second associated sample object is larger than that without media tags; the sample object has the media tag, that is, the media feature value part of the labeled media feature vector of the sample object is a target numerical value, or the sample object has the media tag, that is, the media feature values of the labeled media feature vector of the sample object are both target numerical values, and the sample object does not have the media tag, that is, the media feature values of the labeled media feature vector of the sample object are all non-target numerical values, so that the situation that the sample object is completely dominated by the non-labeled associated sample object in the training process of the candidate media identification model is avoided, and the capability of mining the potential media tag of the target sample object by means of the labeled media feature vectors of other associated objects is lost.

That is, in the training and verification process, in order to accelerate the training efficiency of the candidate medium identification model and perform the inductive learning manner, all the first associated candidate sample object and the second associated candidate sample object of the target sample object are not input into the candidate medium identification model, but the first associated candidate sample object and the second associated candidate sample object are also sampled in advance. In the sampling process, the maximum number of the first associated sample objects and the maximum number of the second associated sample objects of each target sample object are set to be Q, D respectively, so that the problem of insufficient memory in the training process caused by super nodes (namely excessive associated sample objects of the target sample objects) is solved. Meanwhile, in the first associated sample object and the second associated sample object of each sample object, the number of the objects of the associated sample object with the label cannot be less than that of the associated sample object without the label, so that the condition that the candidate media identification model is completely dominated by the associated object without the label in the training process is avoided, and the capability of mining the potential media label of the potential target sample object by means of the labeled media feature vector of the associated sample object is lost.

S202, the computer device calls the candidate media identification model, and mask processing is respectively carried out on the tagged media feature vectors of the target sample object, the first associated sample object and the second associated sample object, so that the tagged media feature vectors after mask processing respectively corresponding to the target sample object, the first associated sample object and the second associated sample object are obtained.

In this application, the computer device may perform mask processing on the tagged media feature vector of the target sample object, the tagged media feature vector of the second associated sample object, and the tagged media feature vector of the first associated sample object, respectively, to obtain a mask-processed tagged media feature vector corresponding to the target sample object, a mask-processed tagged media feature vector corresponding to the second associated sample object, and a mask-processed tagged media feature vector corresponding to the first associated sample object. The masking process may refer to adjusting the media feature values in the tagged media feature vector, and the masking process may be referred to as an encoding process in a noise reduction self-encoding process, where the noise reduction self-encoding process includes an encoding process and a decoding process. The coding process of the noise reduction self-coding processing refers to: noise is introduced by an encoder in the candidate media identification model into the annotated media feature vector of the target sample object and the annotated media feature vector of the second associated sample object, respectively, the annotated media feature vector of the first associated sample object. The coding process of the noise reduction self-coding processing refers to: the decoder in the candidate media identification model reconstructs the real media feature vector (and the labeled media feature vector) corresponding to each object. The marked media feature vectors are used for mask processing, so that the generalization capability of the predicted media feature vectors of the target sample object is improved, namely the generalization capability of the target media identification model is improved.

Optionally, step S202 may include: the computer equipment calls a candidate media identification model, adjusts the media characteristic value in the marked media characteristic vector of the target sample object, and obtains the marked media characteristic vector after mask processing corresponding to the target sample object; adjusting a media characteristic value in the labeled media characteristic vector of the first associated sample object to obtain a mask processed labeled media characteristic vector corresponding to the first associated sample object; and adjusting the media characteristic value in the marked media characteristic vector of the second associated sample object to obtain the marked media characteristic vector after mask processing corresponding to the second associated sample object.

The computer equipment can call an encoder of the candidate media identification model, randomly adjust a target numerical value corresponding to a media tag in a labeled media characteristic vector of a target sample object to a non-target numerical value according to a target probability, obtain a labeled media characteristic vector after mask processing corresponding to the target sample object, namely randomly abandoning a real media tag of the target sample object according to the target probability, which is equivalent to randomly introducing noise into the labeled media characteristic vector of the target sample object according to the target probability. Similarly, the computer device may invoke an encoder of the candidate media identification model, randomly adjust a target value corresponding to a media tag in a tagged media feature vector of the first associated sample object to a non-target value according to the target probability, obtain a tagged media feature vector after mask processing corresponding to the first associated sample object, that is, randomly discard a real media tag of the first associated sample object according to the first association probability, which is equivalent to randomly introduce noise into the tagged media feature vector of the first associated sample object according to the first association probability. The computer device may invoke an encoder of the candidate media identification model, randomly adjust a target value corresponding to a media tag in a tagged media feature vector of the second associated sample object to a non-target value according to a target probability, obtain a tagged media feature vector after mask processing corresponding to the second associated sample object, that is, randomly discard a real media tag of the second associated sample object according to a first association probability, which is equivalent to randomly introducing noise into the tagged media feature vector of the second associated sample object according to the first association probability. Noise is introduced into the marked media feature vector, so that the generalization capability of the predicted media feature vector of the target sample object is improved, namely the generalization capability of the target media identification model is improved.

For example, the candidate medium recognition model may be a deep neural network model, a graph neural network model, a logistic regression network model, a linear regression network model, or the like, as shown in fig. 6, which is exemplified by the candidate medium recognition model as the graph neural network model. As shown in fig. 6, the input of the neural network model is a second correlation object network, which includes nodes corresponding to the target sample object, and nodes corresponding to the first correlation sample object having a first order correlation with the target sample object, and nodes corresponding to the second correlation sample object having a second order correlation with the target sample object. As shown in fig. 6, the first associated sample object having a first order association relationship with the target sample object includes an object 1, an object 2, and an object 3, and the second associated sample object having a second order association relationship with the target sample object includes an object 4, an object 5, an object 6, an object 7, an object 8, and an object 9. The objects 4 and 5 have first-order association with the object 1, the objects 6 and 7 have first-order association with the object 2, and the objects 8 and 9 have first-order association with the object 3. Nodes in the second related object network are used for reflecting the labeled media feature vector of the sample object, the labeled media feature vector includes a basic feature value for reflecting basic attribute information, and a media feature value for reflecting a real media tag, in fig. 6, the basic attribute information of the sample object is identified by using a triangle, and the real media tag of the sample object is identified by using a non-filled rectangle. The encoder of the graph neural network model firstly applies the real label Y of any input node u_uPerforming dynamic masking, namely, when batch training data of the graph neural network model is input in each training, performing real label Y on each node u again_uIs discarded according to a certain random probability alpha

So that part of the labels i therein have

Finally, the input of the neural network model of the graph is

(i.e., the processed tagged media feature vector).

S203, the computer device performs correlation prediction on the marked media feature vector after mask processing corresponding to the target sample object, the marked media feature vector after mask processing corresponding to the second associated sample object, and the marked media feature vector after mask processing corresponding to the first associated sample object, so as to obtain a predicted media feature vector of the target sample object.

In the application, the computer device can perform first-order association prediction on the marked media feature vector after mask processing corresponding to the target sample object and the marked media feature vector after mask processing corresponding to the first associated sample object to obtain a first association prediction media feature vector; and performing second-order correlation prediction on the marked media feature vector after mask processing corresponding to the target sample object, the marked media feature vector after mask processing corresponding to the first correlation sample object and the marked media feature vector after mask processing corresponding to the second correlation sample object to obtain a second correlation prediction media feature vector, and determining the predicted media feature vector of the target sample object according to the first correlation prediction media feature vector and the second correlation prediction media feature vector.

It is understood that the first order correlation prediction here has a similar meaning to the first order correlation processing, that is, the first order correlation prediction refers to mining the potential media characteristics of the target sample object according to the masked labeled media characteristic vector corresponding to the target sample object and the masked labeled media characteristic vector corresponding to the first correlation sample object. Similarly, the second-order correlation prediction has a similar meaning to the second-order correlation processing, that is, the second-order correlation prediction is to mine the potential media characteristics of the target sample object according to the masked labeled media characteristic vector corresponding to the target sample object, the masked labeled media characteristic vector corresponding to the first correlation sample object, and the masked labeled media characteristic vector corresponding to the second correlation sample object.

For example, defining the sampled neighbor node set (sample object set with first-order association relationship) of each node u as n (u), depth =2 of sampling depth (i.e. number of iterations), and the forward propagation flow of each batch of training data is shown in table 1 below:

TABLE 1

In Table 1 above, h^d _N(u)Representing an average media feature vector h obtained by averaging the marked media feature vector after mask processing corresponding to the sample object in the neighbor node set corresponding to the node u^d _uRepresenting associated media feature vectors, W, corresponding to nodes u^dThe weight value may be empirically derived. Taking fig. 6 as an example, the above table 1 is illustrated, where the set of neighbor nodes corresponding to the target sample object is N₀= { object 1, object 2, object 3}, where the set of neighbor nodes corresponding to object 1 is N₁= { object 4, object 5, object 6}, and the set of neighbor nodes corresponding to object 2 is N₂= { object 6, object 7, target object }, and the set of neighbor nodes corresponding to object 3 is N₃= object 8, object 9, target object. The masked tagged media feature vectors corresponding to the target sample object, object 1, object 2, object 3, object 4, object 5, object 6, object 7, object 8, and object 9 are h⁰ _a，h⁰ _b，h⁰ _c，h⁰ _d，h⁰ _e，h⁰ _f，h⁰ _g，h⁰ _k，h⁰ _m，h⁰ _n. As shown in fig. 6, these processed labeled media feature vectors are used to reflect basic attribute information and masked media tags of sample objects in the second related object network, and the sample is identified by a non-filled rectangle in fig. 6A masked media tag of an object. For the target sample object, when d =1, the computer device may perform step (3) in table 1 for the target sample object, i.e., adopt the following formula (2) according to N₀Calculating average media characteristic vector h corresponding to target sample object¹ _N(a)And executing the step (4) in the table 1, namely adopting the following formula 3, calculating a first associated media feature vector h corresponding to the target sample object according to the average media feature vector corresponding to the target sample object and the processed labeled media feature vector¹ _a：

Similarly, for object 1, the computer device may perform step (3) in table 1, i.e. according to N, using the following formula (4)₁Calculating the average media characteristic vector h corresponding to the object 1¹ _N(b)And performing step (4) in table 1, i.e. calculating a first associated media feature vector h corresponding to the object 1 according to the average media feature vector corresponding to the object 1 and the processed labeled media feature vector by using the following formula (5)¹ _b：

Also, for object 2, the computer device is according to N₂And the marked media feature vector corresponding to the object 2 after mask processing is obtained by executing the step (3) and the step (4) in the table 1, and the first related media feature vector h corresponding to the object 2¹ _c. For object 3, the computer device is according to N₃And the marked media feature vector corresponding to the object 3 after mask processing is obtained by executing the step (3) and the step (4) in the table 1, and the first media feature corresponding to the object 3Eigenvector h¹ _d. Further, the computer device may determine the first associated media feature vectors corresponding to the object 1, the object 2, and the object 3 as the second associated media feature vector of the target sample object. At this time, d is increased by 1, that is, d =2, and step (3) in table 1 is performed, that is, averaging is performed on each second associated media feature vector to obtain an average media feature vector; and executing the step (4) in the table 1, and according to the average media feature vector and the first associated media feature vector h corresponding to the target sample object¹ _aCalculating to obtain a third associated media feature vector corresponding to the target sample object; and (5) executing the step (6) in the table 1, and performing normalization processing on the third associated media feature vector to obtain the output of the graph neural network, namely the predicted media feature vector of the target sample object, wherein the predicted media feature vector of the target sample object is identified by the filled rounded rectangles sampled in the graph 6.

S204, the computer equipment adjusts the candidate media identification model according to the marked media characteristic vector and the predicted media characteristic vector of the target sample object to obtain the target media identification model.

In the application, the computer device may obtain a vector distance between the predicted media feature vector and the labeled media feature vector of the target sample object, and determine a media feature prediction error of the candidate media identification model according to the vector distance, where the media feature prediction error is used to reflect the prediction accuracy of the media feature vector of the candidate media identification model. That is, the greater the vector distance is, the greater the difference between the predicted media feature vector and the labeled media feature vector of the target sample object is, that is, the greater the media feature prediction error is, that is, the lower the prediction accuracy of the media feature vector of the candidate media identification model is; conversely, the smaller the vector distance is, the smaller the difference between the predicted media feature vector and the labeled media feature vector of the target sample object is, that is, the smaller the media feature prediction error is, that is, the higher the prediction accuracy of the media feature vector of the candidate media identification model is. Therefore, the computer device can adjust the candidate medium identification model according to the medium characteristic prediction error to obtain the target medium identification model, and the accuracy of the target medium identification model can be improved.

Optionally, the computer device may adjust the candidate media identification model according to the labeled media characteristic vector and the predicted media characteristic vector of the target sample object in any one of the following two ways to obtain the target media identification model:

the first method is as follows: and the computer equipment adjusts the candidate medium identification model according to the marked medium characteristic vector and the predicted medium characteristic vector of the target sample object to obtain an adjusted candidate medium identification model, counts the adjustment times of the candidate medium identification model, and determines the adjusted candidate medium identification model as the target medium identification model if the adjustment times is greater than a time threshold value.

In the first mode, the computer device may obtain a vector distance between the labeled media characteristic vector of the target sample object and the predicted media characteristic vector, determine a media characteristic prediction error of the candidate media identification model according to the vector distance, and adjust the candidate media identification model according to the media characteristic prediction error to obtain an adjusted candidate media identification model. Here, adjusting the candidate media identification model according to the media feature prediction error may refer to adjusting a model parameter of the candidate identification model, for example, determining an adjustment range according to the media feature prediction error, and adjusting the model parameter of the candidate identification model according to the adjustment range, where the adjustment range may be understood as an adjustment step length, and the media feature prediction error and the adjustment range have a positive correlation, that is, the larger the media feature prediction error is, the larger the adjustment range is; conversely, the smaller the media characteristic prediction error is, the smaller the adjustment amplitude is. Further, counting the adjustment times of the candidate medium identification model, and if the adjustment times is greater than a time threshold value, determining the adjusted candidate medium identification model as a target medium identification model. If the adjustment times is less than or equal to the time threshold, determining a verification sample object set according to the K target candidate sample objects respectively corresponding to the L media tags and the second associated object network in the second execution mode; by limiting the adjustment times of the candidate medium identification model, the resource waste caused by excessive adjustment times of the candidate medium identification model is avoided.

The second method comprises the following steps: adjusting the candidate media identification model according to the labeled media characteristic vector and the predicted media characteristic vector of the target sample object to obtain an adjusted candidate media identification model, and determining a verification sample object set according to K target candidate sample objects respectively corresponding to the L media tags and the second associated object network; determining a convergence state of the adjusted candidate media identification model based on the set of verification sample objects; and determining the target medium identification model according to the convergence state and the adjusted candidate medium identification model.

In the second mode, the computer device may obtain a vector distance between the labeled media characteristic vector of the target sample object and the predicted media characteristic vector, determine a media characteristic prediction error of the candidate media identification model according to the vector distance, and adjust the candidate media identification model to obtain an adjusted candidate media identification model. Then, a target candidate sample object is randomly selected from the K target candidate sample objects respectively corresponding to the L media tags as a verification sample object, the verification sample object is different from the target sample object, a third correlation sample object having a first order correlation with the verification sample object and a fourth correlation sample object having a second order correlation with the verification sample object are determined from the second correlation object network, and the verification sample object, the third correlation sample object and the fourth correlation sample object are determined as a set of verification sample objects. Then, a convergence status of the adjusted candidate media identification model may be determined based on the set of verification sample objects, where the convergence status is used to reflect whether the media feature prediction capability of the adjusted candidate media identification model is optimal. Accordingly, the computer device may determine the target media identification model based on the convergence status and the adjusted candidate media identification model. The convergence state of the adjusted candidate medium identification model is determined based on the verification sample object set according to the verification object set, so that the medium feature identification accuracy of the adjusted candidate medium identification model is verified according to the verification sample object set, the medium feature identification accuracy of the target medium identification model is improved, and the learning induction capability of the target medium identification model is improved.

Optionally, the verification sample object set includes a verification sample object, a third associated sample object, and a fourth associated sample object; the verification sample object is different from the target sample object, the verification sample object and the third associated sample object have a first order association relationship therebetween, and the verification sample object and the fourth associated sample object have a second order association relationship therebetween; the determining the convergence state of the adjusted candidate media identification model based on the verification sample object set includes: and calling the adjusted candidate media identification model, and performing mask processing on the labeled media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object respectively to obtain the labeled media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object which are processed correspondingly respectively. And calling the adjusted candidate media identification model, and performing associated prediction on the marked media characteristic vector after mask processing corresponding to the verification sample object, the marked media characteristic vector after mask processing corresponding to the third associated sample object, and the marked media characteristic vector after mask processing corresponding to the fourth associated sample object to obtain a predicted media characteristic vector of the verification sample object. Determining a predicted identification error of the adjusted candidate media identification model according to the predicted media characteristic vector of the verification sample object and the marked media characteristic vector of the verification sample object; and determining the convergence state of the adjusted candidate medium identification model according to the prediction identification error.

The computer equipment can call the encoder of the adjusted candidate media identification model, call the adjusted candidate media identification model, and perform mask processing on the tagged media feature vector of the verification sample object to obtain the tagged media feature vector corresponding to the verification sample object after mask processing; performing mask processing on the tagged media feature vector of the third related sample object to obtain a mask-processed tagged media feature vector corresponding to the third related sample object; and performing mask processing on the tagged media feature vector of the fourth associated sample object to obtain a mask-processed tagged media feature vector corresponding to the fourth associated sample object. And further, calling a decoder of the adjusted candidate media identification model, and performing correlation prediction on the marked media feature vector after mask processing corresponding to the verification sample object, the marked media feature vector after mask processing corresponding to the third correlation sample object, and the marked media feature vector after mask processing corresponding to the fourth correlation sample object to obtain a predicted media feature vector of the verification sample object. Determining a predicted identification error of the adjusted candidate media identification model according to the predicted media characteristic vector of the verification sample object and the marked media characteristic vector of the verification sample object; if the predicted identification error is smaller than an error threshold, determining that the adjusted candidate medium identification model is in a convergence state; and if the predicted identification error is larger than or equal to an error threshold value, determining that the adjusted candidate medium identification model is not in a convergence state. The media feature recognition accuracy of the adjusted candidate media recognition model is verified according to the verification sample object set, so that the media feature recognition accuracy of the target media recognition model is improved, and the learning induction capability of the target media recognition model is improved.

Optionally, the predicted media feature vector of the verification sample object and the labeled media feature vector of the verification sample object both include media feature values corresponding to L media tags; the determining the predictive identification error of the adjusted candidate media identification model according to the predictive media feature vector of the verification sample object and the tagged media feature vector of the verification sample object includes: inputting the target probability and a media characteristic value corresponding to the jth media tag in the labeled media characteristic vector of the verification sample object into a cross entropy loss function to obtain a candidate identification error corresponding to the jth media tag; the target probability is the probability that the media characteristic value corresponding to the jth media tag in the predicted media characteristic vector of the verification sample object is a target numerical value; j is a positive integer less than or equal to L; balancing the candidate identification error corresponding to the jth media tag to obtain a balanced candidate identification error corresponding to the jth media tag; and accumulating the processed candidate identification errors corresponding to the L media tags respectively to obtain the predicted identification error of the adjusted candidate media identification model. The accuracy of the target media identification model is improved by calculating the corresponding predicted identification error of each media tag, and the accuracy of the target media identification model is low due to the fact that the media tags are not distributed in an balanced mode by carrying out balance processing on the corresponding predicted identification error of each media tag.

Optionally, the balancing the candidate identification error corresponding to the jth media tag to obtain a balanced candidate identification error corresponding to the jth media tag includes: acquiring a third object quantity of sample objects of which the media characteristic values corresponding to the jth media tag in the verification sample set are the target numerical values; and generating balance parameters according to the number of the third objects, and carrying out balance processing on the candidate identification error corresponding to the jth media tag by adopting the balance parameters to obtain a balanced candidate identification error corresponding to the jth media tag. By determining the balance parameters according to the number of the objects with the media characteristic values as the target characteristic values, the problem that the number of the objects with the media tags is too large, the distribution of the objects with the media tags is unbalanced with that of the objects without the media tags, and the accuracy of a target media identification model is low is avoided.

For example, as shown in FIG. 3, the predicted media feature vector for node u (i.e., the target sample object) is z_u=h² _u. The above steps complete the encoder part in the noise reduction self-encoder, the whole graph neural network model is used as the encoder, andin the decoder part, a simple classifier is adopted as a decoder structure of the noise reduction self-encoder in the scheme and is used for final loss function calculation. The output of the decoder can be expressed by the following equation (6):

in the formula (6), the first and second groups,

each dimension of (a) represents a possible target probability value, W, of the corresponding original tag^gThe learning weight of the graph neural network model is obtained through sample data training; in general, a cross-entropy loss function can be used as the final network loss, which can be expressed by the following formula (7):

in the formula (7), Loss (p)_u，y_u) Representing predicted recognition errors of candidate medium recognition models, pⁱ _uProbability that the media feature value of the ith media tag in the predicted media feature vector representing the target sample object is the target value, yⁱ _uAnd representing the media characteristic value of the ith media label in the labeled media characteristic vector of the target sample object. In an actual scenario, although the media tag balancing operation is already performed at the sample stage, the finally obtained media tag may still be unbalanced, and in order to alleviate the unbalanced training problem of each media tag caused by such unbalanced operation, the scheme introduces a balance-like loss function as shown in the following formula (8):

in the case of the formula (8),

to balance the parameters, n_iFor training sample object set yⁱNumber of samples of =1

For the hyperparameter, the weight lost by the corresponding calculation of the current label in each sample is controlled, the closer b is to 1, the closer the weight is to the reciprocal of the number of samples, and the closer b is to 0, which is equivalent to not using the weight. After obtaining the balance loss value (predicted identification error) of the candidate medium identification model, the candidate medium identification model may be adjusted according to the predicted identification error, so as to obtain an adjusted medium identification model. Similarly, the computer device may calculate the predicted recognition error of the adjusted media recognition model according to the verification object set by the above method, and repeated parts are not described again.

Optionally, the determining the target media identification model according to the convergence status and the adjusted candidate media identification model includes: if the adjusted candidate media identification model is not in a convergence state, adjusting the adjusted candidate media identification model according to the tagged media characteristic vector of the target sample object and an updated predicted media characteristic vector to obtain the target media identification model, wherein the updated predicted media characteristic vector is obtained by calling the adjusted candidate media identification model to perform association prediction on the masked tagged media characteristic vector corresponding to the target sample object, the masked tagged media characteristic vector corresponding to the second associated sample object and the masked tagged media characteristic vector corresponding to the first associated sample object; and if the adjusted candidate medium identification model is in a convergence state, determining the adjusted candidate medium identification model as the target medium identification model.

If the convergence state of the adjusted candidate medium identification model is not in the convergence state, it indicates that the prediction identification error of the adjusted candidate medium identification model is not the lowest, that is, it indicates that the prediction identification accuracy of the adjusted candidate medium identification model is not the best, and therefore, the computer device may continue to adjust the adjusted candidate medium identification model according to the tagged medium characteristic vector of the target sample object and the updated predicted medium characteristic vector until the adjusted candidate medium identification model is in the convergence state, so as to obtain the target medium identification model. If the convergence state of the adjusted candidate medium identification model is in the convergence state, it indicates that the prediction identification error of the adjusted candidate medium identification model is the lowest, that is, it indicates that the prediction identification accuracy of the adjusted candidate medium identification model is the best, and therefore, the computer device may determine the adjusted candidate medium identification model as the target medium identification model.

S205, the computer equipment respectively obtains initial media feature vectors of the target object, the first associated object and the second associated object relative to the multimedia data; the target object and the first association object have a first order association relationship therebetween, and the target object and the second association object have a second order association relationship therebetween.

S206, the computer equipment calls a target media identification model, and first-order association processing is carried out on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain the first associated media characteristic vector of the target object.

S207, the computer device calls a target media identification model, and second-order association processing is carried out on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object.

S208, the computer equipment generates a target media feature vector of the target object according to the first associated media feature vector and the second associated media feature vector, and pushes multimedia data for the target object according to the target media feature vector.

Please refer to fig. 7, which is a schematic structural diagram of a multimedia data processing apparatus according to an embodiment of the present application. The multimedia data processing apparatus may be a computer program (including program code) running in a network device, for example, the multimedia data processing apparatus is an application software; the apparatus may be configured to perform corresponding steps in the methods provided in the embodiments of the present application. As shown in fig. 7, the multimedia data processing apparatus may include: an obtaining module 711, a processing module 712, a generating module 713, a predicting module 714 and an adjusting module 715.

The acquisition module is used for respectively acquiring initial media characteristic vectors of the target object, the first associated object and the second associated object relative to the multimedia data; the target object and the first association object have a first-order association relationship, and the target object and the second association object have a second-order association relationship;

Optionally, the obtaining module obtains initial media feature vectors of the target object, the first associated object, and the second associated object with respect to the multimedia data, respectively, and includes:

acquiring a first associated object network corresponding to a target object, wherein the first associated object network comprises a first node for reflecting an initial media characteristic vector of the target object, a second node for reflecting an initial media characteristic vector of a candidate associated object, and an edge formed by connecting the first node and the second node; the target object is associated with the candidate associated object;

according to a node path taking the first node as a starting point, sampling processing is sequentially carried out on second nodes in the first associated object network, and a first associated object having a first-order association relation with the target object and a second associated object having a second-order association relation with the target object are obtained;

and respectively acquiring initial media feature vectors of the target object, the first associated object and the second associated object relative to multimedia data from the first associated object network.

Optionally, the initial media feature vector includes media feature values corresponding to L media tags; the acquiring module sequentially samples second nodes in the first associated object network according to a node path taking the first node as a starting point to obtain a first associated object having a first-order association with the target object and a second associated object having a second-order association with the target object, and the acquiring module includes:

according to a node path taking the first node as a starting point, sampling second nodes in the first associated object network in sequence to obtain N first candidate associated objects having a first-order association relationship with the target object and M second candidate associated objects having a second-order association relationship with the target object;

acquiring the first object number of the first candidate associated objects of which the media characteristic values corresponding to the L media tags of the N first candidate associated objects are non-target numerical values; acquiring the second object quantity of the second candidate associated objects of which the media characteristic values corresponding to the L media tags are non-target numerical values in the M second candidate associated objects;

if the number of the first objects is smaller than a first number threshold, determining the N first candidate associated objects as first associated objects having a first-order association relationship with the target object;

and if the number of the second objects is smaller than a second number threshold, determining the M second candidate associated objects as first associated objects having a first-order association relationship with the target object.

Optionally, the number of the first associated objects is at least two, and the processing module performs first-order association processing on the initial media feature vector of the target object and the initial media feature vector of the first associated object to obtain the first associated media feature vector of the target object, including:

calling a first-order vector recognition layer of a target media recognition model, and averaging initial media characteristic vectors of at least two first associated objects to obtain a first average media characteristic vector;

determining first media association information about at least two of the first association objects according to the first average media feature vector;

and determining a first associated media feature vector of the target object according to the first media associated information and the initial media feature vector of the target object.

Optionally, the performing, by the processing module, second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object, and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object, includes:

calling a second-order vector recognition layer of a target media recognition model, and averaging the initial media feature vector of the target object and the initial media feature vector of the second associated object to obtain a second average media feature vector;

determining second media association information between the target object and the second associated object according to the second average media feature vector;

and determining a second associated media feature vector of the target object according to the second media associated information and the initial media feature vector of the first associated object.

Optionally, the number of the second associated objects is at least two, and the generating, by the processing module, a target media feature vector of the target object according to the first associated media feature vector and the second associated media feature vector includes:

calling a target vector recognition layer of a target media recognition model, and averaging second associated media characteristic vectors corresponding to at least two second associated objects respectively to obtain a third average media characteristic vector;

and splicing the first associated media feature vector and the third average media feature vector to obtain a target media feature vector of the target object.

Optionally, the obtaining module is further configured to obtain labeled media feature vectors of the target sample object, the first associated sample object, and the second associated sample object with respect to the multimedia data, respectively; the target sample object and the first associated sample object have a first order association relationship therebetween, and the target sample object and the second associated sample object have a second order association relationship therebetween;

the processing module is used for calling a candidate media identification model, and performing mask processing on the marked media characteristic vectors corresponding to the target sample object, the first associated sample object and the second associated sample object respectively to obtain masked marked media characteristic vectors corresponding to the target sample object, the first associated sample object and the second associated sample object respectively;

the prediction module is used for performing associated prediction on the marked media feature vector after mask processing corresponding to the target sample object, the marked media feature vector after mask processing corresponding to the second associated sample object and the marked media feature vector after mask processing corresponding to the first associated sample object to obtain a predicted media feature vector of the target sample object;

and the adjusting module is used for adjusting the candidate media identification model according to the labeled media characteristic vector and the predicted media characteristic vector of the target sample object to obtain the target media identification model.

Optionally, the invoking, by the processing module, a candidate media identification model, and performing mask processing on tagged media feature vectors respectively corresponding to the target sample object, the first associated sample object, and the second associated sample object to obtain masked tagged media feature vectors respectively corresponding to the target sample object, the first associated sample object, and the second associated sample object, and the method includes:

calling a candidate media identification model, and adjusting a media characteristic value in the labeled media characteristic vector of the target sample object to obtain a mask processed labeled media characteristic vector corresponding to the target sample object;

adjusting a media characteristic value in the labeled media characteristic vector of the first associated sample object to obtain a mask processed labeled media characteristic vector corresponding to the first associated sample object;

and adjusting the media characteristic value in the marked media characteristic vector of the second associated sample object to obtain the marked media characteristic vector after mask processing corresponding to the second associated sample object.

Optionally, the obtaining module obtains the labeled media feature vectors of the target sample object, the first associated sample object, and the second associated sample object with respect to the multimedia data, respectively, and includes:

acquiring a second associated object network; the second associated object network comprises nodes used for reflecting labeled media characteristic vectors of the candidate sample objects and edges formed by connecting the nodes corresponding to the associated candidate sample objects, wherein the labeled media characteristic vectors comprise media characteristic values corresponding to L media labels;

according to the node path in the second associated object network, sequentially sampling the nodes in the second associated object network to obtain K target candidate sample objects corresponding to the L media tags respectively; a target candidate sample object corresponding to a media tag is a candidate sample object in the second associated object network, and a media characteristic value corresponding to the media tag is a target numerical value;

determining a target sample object, a first associated sample object and a second associated sample object according to K target candidate sample objects respectively corresponding to the L media tags and the second associated object network;

and respectively acquiring the marked media feature vector of the target sample object relative to the multimedia data, the marked media feature vector of the first associated sample object relative to the multimedia data and the marked media feature vector of the second associated sample object relative to the multimedia data from the second associated object network.

Optionally, the determining, by the obtaining module, a target sample object, a first associated sample object, and a second associated sample object according to the K target candidate sample objects and the second associated object network respectively corresponding to the L media tags includes:

determining a first associated candidate sample object having a first-order association relationship with a target candidate sample object Pi from the second associated object network; the target candidate sample objects Pi belong to K target candidate sample objects respectively corresponding to the L media tags, and i is a positive integer;

determining a second associated candidate sample object having a second order association relationship with the target candidate sample object Pi from the second associated object network;

taking the target candidate sample object Pi as a target sample object, and sampling Q first associated candidate sample objects from the first associated candidate sample objects as the first associated sample objects;

sampling D second associated candidate sample objects from the second associated candidate sample objects as the second associated sample objects.

Optionally, the adjusting module adjusts the candidate media identification model according to the labeled media characteristic vector of the target sample object and the predicted media characteristic vector to obtain the target media identification model, including:

adjusting the candidate media identification model according to the marked media characteristic vector and the predicted media characteristic vector of the target sample object to obtain an adjusted candidate media identification model;

determining a verification sample object set according to K target candidate sample objects respectively corresponding to the L media tags and the second associated object network;

determining a convergence state of the adjusted candidate media identification model based on the set of verification sample objects;

and determining the target medium identification model according to the convergence state and the adjusted candidate medium identification model.

Optionally, the verification sample object set includes a verification sample object, a third associated sample object, and a fourth associated sample object; the verification sample object is different from the target sample object, the verification sample object has a first order association relationship with the third association sample object, and the verification sample object has a second order association relationship with the fourth association sample object; the adjustment module determines a convergence state of the adjusted candidate media identification model based on the set of verification sample objects, including:

calling the adjusted candidate media identification model, and performing mask processing on the labeled media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object respectively to obtain labeled media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object which are processed correspondingly respectively;

performing correlation prediction on the marked media feature vector after mask processing corresponding to the verification sample object, the marked media feature vector after mask processing corresponding to the third correlation sample object, and the marked media feature vector after mask processing corresponding to the fourth correlation sample object to obtain a predicted media feature vector of the verification sample object;

determining a predicted identification error of the adjusted candidate media identification model according to the predicted media characteristic vector of the verification sample object and the marked media characteristic vector of the verification sample object;

and determining the convergence state of the adjusted candidate medium identification model according to the prediction identification error.

Optionally, the predicted media feature vector of the verification sample object and the labeled media feature vector of the verification sample object both include media feature values corresponding to L media tags; the adjusting module determines the predicted identification error of the adjusted candidate media identification model according to the predicted media feature vector of the verification sample object and the marked media feature vector of the verification sample object, and the method comprises the following steps:

inputting the target probability and a media characteristic value corresponding to the jth media tag in the labeled media characteristic vector of the verification sample object into a cross entropy loss function to obtain a candidate identification error corresponding to the jth media tag; the target probability is the probability that the media characteristic value corresponding to the jth media tag in the predicted media characteristic vector of the verification sample object is a target numerical value; j is a positive integer less than or equal to L;

balancing the candidate identification error corresponding to the jth media tag to obtain a balanced candidate identification error corresponding to the jth media tag;

and accumulating the processed candidate identification errors corresponding to the L media tags respectively to obtain the predicted identification error of the adjusted candidate media identification model.

Optionally, the balancing module performs balancing processing on the candidate identification error corresponding to the jth media tag to obtain a balanced candidate identification error corresponding to the jth media tag, and the balancing processing includes:

acquiring a third object quantity of sample objects of which the media characteristic values corresponding to the jth media tag in the verification sample set are the target numerical values;

and generating a balance parameter according to the number of the third objects, and carrying out balance processing on the candidate identification error corresponding to the jth media tag by adopting the balance parameter to obtain a candidate identification error after the balance processing corresponding to the jth media tag.

Optionally, the determining, by the adjusting module, the target media identification model according to the convergence state and the adjusted candidate media identification model includes:

if the adjusted candidate media identification model is not in a convergence state, continuously adjusting the adjusted candidate media identification model according to the tagged media characteristic vector of the target sample object and an updated predicted media characteristic vector to obtain the target media identification model, wherein the updated predicted media characteristic vector is obtained by calling the adjusted candidate media identification model to perform association prediction on the masked tagged media characteristic vector corresponding to the target sample object, the masked tagged media characteristic vector corresponding to the second associated sample object and the masked tagged media characteristic vector corresponding to the first associated sample object;

and if the adjusted candidate medium identification model is in a convergence state, determining the adjusted candidate medium identification model as the target medium identification model.

According to an embodiment of the present application, the steps involved in the multimedia data processing method shown in fig. 2 may be performed by various modules in the multimedia data processing apparatus shown in fig. 7. For example, step S101 shown in fig. 2 may be performed by the acquisition module 711 in fig. 7, and step S102 and step S103 shown in fig. 2 may be performed by the processing module 712 in fig. 7; step S104 shown in fig. 2 may be performed by the generation module 713 in fig. 7.

According to an embodiment of the present application, each module in the multimedia data processing apparatus shown in fig. 7 may be respectively or completely combined into one or several units to form the apparatus, or one (some) of the units may be further split into at least two sub-units with smaller functions, which may implement the same operation without affecting implementation of technical effects of the embodiment of the present application. The modules are divided based on logic functions, and in practical applications, the functions of one module may also be implemented by at least two units, or the functions of at least two modules may also be implemented by one unit. In other embodiments of the present application, the multimedia data processing apparatus may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of at least two units.

According to an embodiment of the present application, the multimedia data processing apparatus as shown in fig. 7 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 3 on a general-purpose computer device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like as processing components and storage components, and implementing the multimedia data processing method of the embodiment of the present application. The computer program may be recorded on a computer-readable recording medium, for example, and loaded into and executed by the computing apparatus via the computer-readable recording medium.

Please refer to fig. 8, which is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one storage device remote from the processor 1001. As shown in fig. 8, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 8, the network interface 1004 may provide a network communication function; and the user interface 1003 is mainly used for an interface for providing input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

respectively acquiring initial media characteristic vectors of a target object, a first associated object and a second associated object relative to multimedia data; the target object and the first association object have a first-order association relationship, and the target object and the second association object have a second-order association relationship;

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the multimedia data processing method in the embodiment corresponding to fig. 2 or fig. 4, and may also perform the description of the multimedia data processing apparatus in the embodiment corresponding to fig. 7, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned multimedia data processing apparatus is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the descriptions of the multimedia data processing method in the embodiments corresponding to fig. 2 and fig. 4 can be executed, so that the descriptions of the multimedia data processing method will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

As an example, the program instructions described above may be executed on one computer device, or on at least two computer devices distributed over at least two sites and interconnected by a communication network, or the at least two computer devices distributed over at least two sites and interconnected by a communication network may constitute a blockchain network.

The computer readable storage medium may be the multimedia data processing apparatus provided in any of the foregoing embodiments or a central storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both a central storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms "first," "second," and the like in the description and claims of embodiments of the present application and in the drawings are used for distinguishing between different media and not for describing a particular sequential order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.

An embodiment of the present application further provides a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the description of the multimedia data processing method in the embodiment corresponding to fig. 4 and fig. 2 is implemented, and therefore, details will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product referred to in the present application, reference is made to the description of the method embodiments of the present application.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable network connection device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable network connection device, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable network connection device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable network connection device to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method for processing multimedia data, comprising:

respectively acquiring initial media characteristic vectors of a target object, a first associated object and a second associated object relative to multimedia data; the target object and the first association object have a first-order association relationship therebetween, and the target object and the second association object have a second-order association relationship therebetween; the first-order incidence relation is used for reflecting that the first incidence object and the target object have incidence relation; the second-order incidence relation reflects that the second incidence object and the target object do not have incidence relation, and the second incidence object and the first incidence object have incidence relation;

performing second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object; the second-order association processing is to update the initial media characteristic vector of the target object according to second media association information and the initial media characteristic vector of the first association object; the second media association information is determined according to the initial media feature vector of the second association object and the initial media feature vector of the target object, and the second media association information is used for reflecting at least one or more of an association relationship between a media tag of the second association object and basic attribute information, an association relationship between media tags of the second association object, and the number of objects having the same media tag in the target object and the second association object;

2. The method of claim 1, wherein the obtaining initial media feature vectors of the target object, the first associated object, and the second associated object with respect to the multimedia data, respectively, comprises:

3. The method of claim 2, wherein the initial media feature vector comprises media feature values corresponding to L media tags; the sampling processing is sequentially performed on the second nodes in the first associated object network according to the node path with the first node as the starting point to obtain a first associated object having a first order association relationship with the target object and a second associated object having a second order association relationship with the target object, and the method includes:

acquiring the first object number of the first candidate associated objects of which the media characteristic values corresponding to the L media tags of the N first candidate associated objects are non-target numerical values; acquiring the second object quantity of second candidate associated objects of which the media characteristic values corresponding to L media tags are non-target numerical values in the M second candidate associated objects;

4. The method of claim 1, wherein the number of the first associated objects is at least two, and the first associating processing of the initial media feature vector of the target object with the initial media feature vector of the first associated object to obtain the first associated media feature vector of the target object comprises:

5. The method of claim 1, wherein the performing the second-order correlation on the initial media feature vector of the target object, the initial media feature vector of the second correlation object, and the initial media feature vector of the first correlation object to obtain the second correlation media feature vector of the target object comprises:

6. The method of claim 1, wherein the number of the second associated objects is at least two, and the generating the target media feature vector of the target object according to the first associated media feature vector and the second associated media feature vector comprises:

7. The method of any one of claims 4-6, further comprising:

respectively obtaining marked media feature vectors of the target sample object, the first associated sample object and the second associated sample object relative to the multimedia data; the target sample object has a first order association relationship with the first associated sample object, and the target sample object has a second order association relationship with the second associated sample object;

calling a candidate media identification model, and performing mask processing on tagged media feature vectors corresponding to the target sample object, the first associated sample object and the second associated sample object respectively to obtain masked tagged media feature vectors corresponding to the target sample object, the first associated sample object and the second associated sample object respectively;

performing correlation prediction on the marked media feature vector after mask processing corresponding to the target sample object, the marked media feature vector after mask processing corresponding to the second associated sample object, and the marked media feature vector after mask processing corresponding to the first associated sample object to obtain a predicted media feature vector of the target sample object;

and adjusting the candidate media identification model according to the marked media characteristic vector and the predicted media characteristic vector of the target sample object to obtain the target media identification model.

8. The method of claim 7, wherein the invoking the candidate media identification model to mask tagged media feature vectors corresponding to the target sample object, the first associated sample object, and the second associated sample object, respectively, to obtain masked tagged media feature vectors corresponding to the target sample object, the first associated sample object, and the second associated sample object, respectively, comprises:

9. The method of claim 7, wherein the obtaining the annotated media feature vector of the target sample object, the first associated sample object, and the second associated sample object with respect to the multimedia data comprises:

according to the node path in the second associated object network, sequentially sampling the nodes in the second associated object network to obtain K target candidate sample objects corresponding to the L media tags respectively; a target candidate sample object corresponding to one media tag is a candidate sample object of which the media characteristic value corresponding to the one media tag is a target numerical value in the second associated object network;

and respectively acquiring the marked media feature vectors of the target sample object, the first associated sample object and the second associated sample object relative to the multimedia data from the second associated object network.

10. The method of claim 9, wherein the determining a target sample object, a first associated sample object, and a second associated sample object according to the K target candidate sample objects and the second associated object network respectively corresponding to the L media tags comprises:

11. The method of claim 7, wherein the adjusting the candidate media identification model according to the annotated media feature vector and the predicted media feature vector of the target sample object to obtain the target media identification model comprises:

adjusting the candidate media identification model according to the labeled media characteristic vector and the predicted media characteristic vector of the target sample object to obtain an adjusted candidate media identification model;

12. The method of claim 11, wherein the set of validation sample objects comprises a validation sample object, a third associated sample object, a fourth associated sample object; the verification sample object is different from the target sample object, the verification sample object and the third associated sample object have a first order association relationship therebetween, and the verification sample object and the fourth associated sample object have a second order association relationship therebetween;

determining a convergence state of the adjusted candidate media identification model based on the set of verification sample objects, comprising:

calling the adjusted candidate media identification model, and performing mask processing on the tagged media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object respectively to obtain tagged media feature vectors of the verification sample object, the third correlation sample object and the fourth correlation sample object which correspond to the mask processing respectively;

determining a predictive identification error of the adjusted candidate media identification model according to the predictive media characteristic vector of the verification sample object and the marked media characteristic vector of the verification sample object;

13. The method of claim 12, wherein the predicted media feature vector of the verification sample object and the tagged media feature vector of the verification sample object each include media feature values corresponding to L media tags;

the determining a predicted identification error of the adjusted candidate media identification model according to the predicted media feature vector of the verification sample object and the labeled media feature vector of the verification sample object includes:

14. The method as claimed in claim 13, wherein the balancing the candidate identification error corresponding to the jth media tag to obtain a balanced candidate identification error corresponding to the jth media tag comprises:

and generating balance parameters according to the number of the third objects, and carrying out balance processing on the candidate identification error corresponding to the jth media tag by adopting the balance parameters to obtain a balanced candidate identification error corresponding to the jth media tag.

15. The method of claim 11, wherein determining the target media recognition model based on the convergence status and the adjusted candidate media recognition models comprises:

16. A multimedia data processing apparatus, comprising:

the acquisition module is used for respectively acquiring initial media characteristic vectors of the target object, the first associated object and the second associated object relative to the multimedia data; the target object and the first association object have a first-order association relationship therebetween, and the target object and the second association object have a second-order association relationship therebetween; the first-order incidence relation is used for reflecting that the first incidence object and the target object have incidence relation; the second-order incidence relation reflects that the second incidence object and the target object do not have incidence relation, and the second incidence object and the first incidence object have incidence relation;

the processing module is used for performing first-order association processing on the initial media characteristic vector of the target object and the initial media characteristic vector of the first associated object to obtain a first associated media characteristic vector of the target object; performing second-order association processing on the initial media feature vector of the target object, the initial media feature vector of the second associated object and the initial media feature vector of the first associated object to obtain a second associated media feature vector of the target object; the second-order association processing is to update the initial media characteristic vector of the target object according to second media association information and the initial media characteristic vector of the first association object; the second media association information is determined according to the initial media feature vector of the second association object and the initial media feature vector of the target object, and the second media association information is used for reflecting at least one or more of an association relationship between a media tag of the second association object and basic attribute information, an association relationship between media tags of the second association object, and the number of objects having the same media tag in the target object and the second association object;

17. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 15.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 15.

19. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 15 when executed by a processor.