CN115018009A

CN115018009A - Object description method, and network model training method and device

Info

Publication number: CN115018009A
Application number: CN202210807314.7A
Authority: CN
Inventors: 尉桢楷; 李雅楠; 何伯磊; 和为; 徐伟; 谢楚曦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-09-06
Anticipated expiration: 2042-07-07
Also published as: CN115018009B

Abstract

The present disclosure provides an object description method, a network model training method, an object description device, a network model training device, an object description device, a network model training medium and a network model training product, and relates to the technical field of artificial intelligence, in particular to the fields of graph neural networks, intelligent office and intelligent search technologies, etc. The specific implementation scheme comprises the following steps: determining a first descriptive feature of the target object and a second descriptive feature of each candidate object of the at least one candidate object; determining an aggregate description feature for the target object according to the first description feature and the second description feature; and obtaining an object description result of the target object based on the aggregation description characteristics, wherein the target object and each candidate object have an association relation.

Description

Object description method, and network model training method and device

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the fields of graph neural networks, intelligent office, intelligent search technology and the like, and specifically relates to an object description method, and a network model training method and device.

Background

The object description has wide application in scenes such as object matching, object classification, information recommendation and the like. However, in some scenarios, the object description process has the phenomena of poor accuracy and low description efficiency.

Disclosure of Invention

The disclosure provides an object description method, a network model training method, an object description device, a network model training medium and a network model training product.

According to an aspect of the present disclosure, there is provided an object description method including: determining a first descriptive feature of a target object and a second descriptive feature of each of at least one candidate object; determining an aggregate descriptive feature for the target object from the first descriptive feature and the second descriptive feature; and obtaining an object description result of the target object based on the aggregation description characteristics, wherein the target object and each candidate object have an association relationship.

According to another aspect of the present disclosure, there is provided a training method of a network model, including: determining at least one positive sample node and at least one negative sample node corresponding to a target sample node according to an object relationship network comprising the target sample node; determining sample description features of sample nodes to be processed, the sample nodes to be processed including the target sample node, the at least one positive sample node, and the at least one negative sample node; taking the sample description characteristics of the sample nodes to be processed and the sample description characteristics of at least one candidate sample node having an incidence relation with the sample nodes to be processed as input data of a target network model to be trained to obtain the aggregation sample description characteristics of the sample nodes to be processed; and adjusting model parameters of the target network model to be trained according to the aggregation sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node respectively to obtain the trained target network model, wherein the target sample node and each positive sample node have an incidence relation.

According to another aspect of the present disclosure, there is provided an object description apparatus including: a first processing module for determining a first descriptive feature of a target object and a second descriptive feature of each of at least one candidate object; a second processing module for determining an aggregate descriptive feature for the target object based on the first descriptive feature and the second descriptive feature; and the third processing module is used for obtaining an object description result of the target object based on the aggregation description characteristics, wherein the target object and each candidate object have an association relation.

According to another aspect of the present disclosure, there is provided a training apparatus of a network model, including: the sixth processing module is used for determining at least one positive sample node and at least one negative sample node corresponding to a target sample node according to an object relationship network comprising the target sample node; a seventh processing module, configured to determine sample description features of sample nodes to be processed, where the sample nodes to be processed include the target sample node, the at least one positive sample node, and the at least one negative sample node; the eighth processing module is configured to use the sample description features of the to-be-processed sample node and the sample description features of at least one candidate sample node having an association relationship with the to-be-processed sample node as input data of a target network model to be trained to obtain an aggregate sample description feature of the to-be-processed sample node; and the ninth processing module is used for adjusting model parameters of the target network model to be trained according to the aggregation sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node respectively to obtain the trained target network model, wherein the target sample node and each positive sample node have an incidence relation.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object description method or the network model training method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the object description method or the training method of the network model described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the object description method or the training method of a network model as described above.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates a system architecture of an object description method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of an object description method according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart of an object description method according to a further embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of training a network model according to an embodiment of the present disclosure;

FIG. 5 schematically shows a schematic diagram of a training process of a network model according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of an object description apparatus according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus for a network model according to an embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of an electronic device for object description according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The embodiment of the disclosure provides an object description method. The method of the embodiment comprises the following steps: the method comprises the steps of determining a first description feature of a target object and a second description feature of each candidate object in at least one candidate object, determining an aggregation description feature aiming at the target object according to the first description feature and the second description feature, obtaining an object description result of the target object based on the aggregation description feature, and enabling the target object and each candidate object to have an incidence relation.

Fig. 1 schematically illustrates a system architecture of an object description method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

The system architecture 100 according to this embodiment may include a requesting terminal 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between requesting terminals 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud computing, network services, middleware services, and the like.

The requesting terminal 101 interacts with the server 103 through the network 102 to receive or transmit data or the like. The request terminal 101 is for example configured to initiate an object description request to the server 103, and the request terminal 101 is further configured to, for example, send object description data of the target object to the server 103, where the object description data may include object representation data of the target object and association relationship data between the target object and other objects, and the association relationship data may include attribute association data and interaction association data, for example.

The server 103 may be a server that provides various services, and may be, for example, a background processing server (merely an example) that performs object description processing according to an object description request transmitted by the requesting terminal 101.

For example, the server 103 determines, in response to an object description request acquired from the request terminal 101, a first description feature of the target object and a second description feature of each candidate object of the at least one candidate object, determines an aggregate description feature for the target object according to the first description feature and the second description feature, and obtains an object description result of the target object based on the aggregate description feature, where the target object and each candidate object have an association relationship.

It should be noted that the object description method provided by the embodiment of the present disclosure may be executed by the server 103. Accordingly, the object description apparatus provided by the embodiment of the present disclosure may be disposed in the server 103. The object description method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 103 and is capable of communicating with the requesting terminal 101 and/or the server 103. Accordingly, the object description apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the requesting terminal 101 and/or the server 103.

It should be understood that the number of requesting terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of requesting terminals, networks, and servers, as desired for an implementation.

An object description method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 3 in conjunction with the system architecture of fig. 1. The object description method of the embodiment of the present disclosure may be performed by, for example, the server 103 shown in fig. 1.

Fig. 2 schematically shows a flow chart of an object description method according to an embodiment of the present disclosure.

As shown in fig. 2, the object description method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S230.

In operation S210, a first descriptive feature of the target object and a second descriptive feature of each of the at least one candidate object are determined.

In operation S220, an aggregate description feature for the target object is determined according to the first description feature and the second description feature.

In operation S230, an object description result of the target object is obtained based on the aggregation description feature, and the target object and each candidate object have an association relationship therebetween.

The following illustrates exemplary flows of the operations of the object description method of the present embodiment.

For example, a first descriptive feature of the target object may be determined and a second descriptive feature of at least one candidate object having an association with the target object may be determined from the object description data. For example, the object description data for the target object may include object representation data of the target object and association relationship data between the target object and other objects.

The object description data may be obtained in various public, legally compliant ways, such as from a public data set, or by a data collection site after obtaining user authorization associated with the object description data. The object description data is not scene data for a specific user and does not reflect personal information of a specific user. The scope of application of the object description data is limited to the extent that the user has the right to know and is authorized to use.

One example approach determines at least one neighboring node corresponding to a target node from an object relationship network including the target node. And taking the object characterized by each adjacent node in the at least one adjacent node as a candidate object having an association relation with the target object.

According to the object relationship network based on the target node, at least one candidate object having an association relationship with the target object is determined, so that the screening efficiency and the rationality of the candidate object can be effectively ensured, and the description precision of the target object description can be improved.

The object relationship network includes a plurality of nodes characterizing the object and edges characterizing associations between the plurality of nodes. The target node is configured to characterize the target object, and the adjacent node corresponding to the target node may be a node connected to the target node through an edge, that is, a neighbor node of the target node. The object relationship network may be, for example, a heterogeneous graph network, and any node in the heterogeneous graph network may be determined by a connection relationship between the node and other nodes.

At least one first initial descriptive feature of the target object may be determined based on the object description data, the first descriptive feature of the target object being determined from the at least one first initial descriptive feature. And determining at least one second initial description feature of each candidate object in the at least one candidate object, and determining a second description feature of each candidate object according to the at least one second initial description feature.

Illustratively, based on the object description data, at least one first initial description feature of the target object is determined. And aggregating at least one first initial description feature according to the attention weight corresponding to the feature type of each first initial description feature to obtain a first aggregation result. And obtaining a first description characteristic of the target object based on the first aggregation result. The at least one first initial descriptive feature includes an object portrait feature of the target object and an association relationship feature between the target object and other objects.

At least one second initial descriptive feature of the at least one candidate object is determined based on the object description data. And aggregating at least one second initial description feature according to the attention weight corresponding to the feature type of each second initial description feature to obtain a second aggregation result. And obtaining a second description feature of the at least one candidate object based on the second aggregation result. The at least one second initial descriptive feature includes an object representation feature of the at least one candidate object and an association relationship feature between the at least one candidate object and the other object.

The first description feature and the second description feature are fused with abundant object description features, so that credible data support is provided for the description of the target object, and the description precision and the description efficiency of the description of the target object are improved.

The objects to be processed may be divided into individual objects and group objects. The object representation characteristics of the object to be processed may include at least one of attribute representation characteristics and behavior representation characteristics. The association relation characteristics between the object to be processed and other objects can comprise at least one of attribute association characteristics and interaction association characteristics.

The object to be processed is taken as an individual object for example, and the attribute portrait characteristics of the individual object may include, for example, a name of the individual object, a department to which the individual object belongs, and the like. The behavior profile characteristics of the individual object may include, for example, historical search behaviors of the individual object, task data submitted by the individual object, and the like. The object to be processed is taken as a group object for example, and the attribute portrait characteristics of the group object may include a group object tag, a group object name, and the like.

The individual objects can have interactive correlation characteristics, and the interactive correlation characteristics can comprise communication behavior information, co-occurrence behavior information and other contents. There may be attribute association features between individual objects and group objects, which may indicate, for example, that an individual object is a member of a group object.

And determining an aggregation description feature for the target object according to the first description feature of the target object and the second description feature of the at least one candidate object. Based on the aggregated description features, an object description result for the target object is determined. Illustratively, the aggregation description features may be subjected to convolution processing, resulting in a target vector representation for the target object as the object description result.

According to the embodiment of the disclosure, the first description feature of the target object and the second description feature of at least one candidate object are determined, the aggregation description feature for the target object is determined according to the first description feature and each second description feature, the object description result of the target object is obtained based on the aggregation description feature, the description precision of the description of the target object can be effectively improved, the object description efficiency can be effectively improved, the object description cost can be effectively reduced, and the credible data support can be provided for downstream applications such as object search, relation chain determination, association analysis, cluster classification and object visualization.

Fig. 3 schematically shows a flow chart of an object description method according to another embodiment of the present disclosure.

As shown in fig. 3, the object description method 300 of the embodiment of the present disclosure may include, for example, operation S210, operation S310, and operation S230.

In operation S310, the first description feature and the second description feature of each candidate object are respectively subjected to linear transformation processing to obtain at least one intermediate description feature, and the at least one intermediate description feature is aggregated to obtain an aggregated description feature for the target object.

An example flow of each operation of the object description method of the present embodiment is illustrated below.

Illustratively, the first descriptive feature and the second descriptive feature of each candidate object are respectively subjected to linear transformation processing to obtain at least one intermediate descriptive feature. The linear transformation process may enable mapping of corresponding descriptive features into a uniform vector space that matches the object type. The object types may include, for example, individual object types and group object types. For example, the linear transformation process may be performed based on a linear rectification function, and the linear rectification function may be implemented by using a ramp function or a leakage-carrying rectification function, which is not limited in this embodiment.

By way of example, the first described feature may be subjected to a first linear transformation process to obtain a first intermediate described feature corresponding to the target object. And respectively carrying out second linear transformation processing and third linear transformation processing on the second description characteristic to obtain a second intermediate description characteristic and a third intermediate description characteristic corresponding to the candidate object.

A similarity between the first intermediate descriptive feature and each of the second intermediate descriptive features may be calculated, resulting in an attention evaluation value corresponding to each candidate object. And carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description characteristic to obtain a weighted summation result. And obtaining the aggregation description characteristics aiming at the target object based on the weighted summation result.

By carrying out linear transformation processing on the first description features and the second description features of the candidate objects, the multi-dimensional description features can be mapped to a unified vector space, the aggregation speed of description feature aggregation can be effectively increased, the requirement of hardware equipment for learning the special composition is favorably reduced, and the description efficiency of target object description is favorably improved.

Illustratively, the weighted sum result may be used as an aggregate descriptive feature for the target object. The weighted sum result may also be used as an initial aggregate description feature for the target object. And performing weighted product solving on the initial aggregation description characteristics based on a preset residual connection matrix to obtain a weighted product solving result. And summing the first intermediate description feature and the weighted product finding result to obtain an aggregation description feature for the target object. And performing convolution processing on the aggregation description features to obtain target vector representation aiming at the target object, wherein the target vector representation is used as an object description result.

The initial aggregation description feature may be characterized by a description feature vector, for example, and the preset residual connection matrix may be a weighted matrix of residual connections, for example. By introducing residual concatenation, it is possible to highlight certain types of initial aggregation description features as a complement. And obtaining a product of the residual connecting matrix and the initial aggregation description characteristics, and inputting the product into a linear rectification function to obtain a weighted product-solving result, wherein the weighted product-solving result can be represented by the weighted description characteristic vector.

For example, it can be represented by the formula y _self ＝Relu(W _self ·h _k ) Calculating a weighted product result, W _self Weight matrix, h, representing residual concatenation _k Representing the initial aggregation description characteristics of the target object, Relu representing the linear rectification function, and the weighted product result y _self Representing aggregate descriptive features for a target object, weighted product y _self For example, can be characterized by a weighted descriptive feature vector.

According to an example mode, the matching degree between the target object and each other object can be determined according to the object description result of the target object and the object description result of at least one other object. And taking other objects with the corresponding matching degrees meeting the preset matching conditions as object matching results aiming at the target object. According to the object description result of at least one object, the object matching result for the target object is determined, and the matching efficiency and the matching precision of object matching can be effectively improved.

According to the embodiment of the disclosure, the first description feature of the target object and the second description feature of each candidate object are subjected to linear transformation processing to obtain at least one intermediate description feature, and the at least one intermediate description feature is aggregated to obtain an aggregated description feature for the target object. By fully utilizing the object description characteristics of the target object and each candidate object and fully utilizing the incidence relation between the target object and each candidate object, the accuracy of the description of the target object can be effectively ensured, and the description efficiency and the generalization capability of the description of the target object can be improved.

FIG. 4 schematically shows a flow chart of a method of training a network model according to an embodiment of the present disclosure.

As shown in FIG. 4, the training method 400 may include operations S410-S440, for example.

In operation S410, at least one positive sample node and at least one negative sample node corresponding to a target sample node are determined according to an object relationship network including the target sample node.

In operation S420, a sample description feature of a to-be-processed sample node is determined, the to-be-processed sample node including a target sample node, at least one positive sample node, and at least one negative sample node.

In operation S430, the sample description feature of the sample node to be processed and the sample description feature of at least one candidate sample node having an association relationship with the sample node to be processed are used as input data of the target network model to be trained, so as to obtain an aggregate sample description feature of the sample node to be processed.

In operation S440, model parameters of the target network model to be trained are adjusted according to the aggregated sample description features corresponding to the target sample node, the at least one positive sample node, and the at least one negative sample node, respectively, to obtain a trained target network model.

An example flow of each operation of the model training method of the present embodiment is illustrated below.

Illustratively, the target network model to be trained may be, for example, a graph neural network model. The graph neural network model may map graph data (which may be understood as a high-dimensional dense matrix) into low-dimensional dense vectors. By learning the structural features of graph data and the adjacency relation between nodes, the nodes are mapped into equal-dimension vectors, and data support can be provided for applications such as downstream cluster classification, association analysis and visualization tasks.

And determining at least one positive sample node and at least one negative sample node corresponding to the target sample node according to the object relationship network comprising the target sample node. The object relationship network may be, for example, embedded graph data that includes target sample nodes. The target sample node is any node in the at least one sample node, and the target sample node and each positive sample node have an association relationship therebetween, where the association relationship may include at least one of an attribute association relationship and an interaction association relationship.

In an example manner, a random walk manner may be adopted, and the target sample node is taken as a starting node, so as to determine the association degree between each sample node and the target sample node in the object relationship network. And taking the sample node with the relevance higher than a preset threshold value as a positive sample node corresponding to the target sample node. The positive sample node may be, for example, a sample node connected to the target sample node in the object relationship network, that is, a neighbor sample node of the target sample node.

At least one initial sample description feature may be determined for the pending sample nodes, including the target sample node, the at least one positive sample node, and the at least one negative sample node. And according to the attention weight corresponding to the feature type of each initial sample description feature, aggregating at least one initial sample description feature to obtain the sample description feature of the sample node to be processed.

The at least one initial sample description feature may include a node representation feature of the sample node to be processed and an association relationship feature between the sample node to be processed and other sample nodes, and the association relationship feature may include at least one of an attribute association feature and an interaction association feature.

The sample description characteristics of the sample nodes to be processed and the sample description characteristics of at least one candidate sample node having an association relationship with the sample nodes to be processed can be used as input data of a target network model to be trained to obtain the aggregate sample description characteristics of the sample nodes to be processed. In an example manner, a target network model to be trained may be utilized to perform linear transformation on the sample description features of the sample nodes to be processed and the sample description features of each candidate sample node, so as to obtain at least one intermediate sample description feature. And aggregating the at least one intermediate sample description feature to obtain an aggregated sample description feature.

For example, the sample description feature of the sample node to be processed may be subjected to a first linear transformation process, so as to obtain a first intermediate sample description feature corresponding to the sample node to be processed. And performing second linear transformation processing and third linear transformation processing on the sample description characteristics of each candidate sample node to respectively obtain second intermediate sample description characteristics and third intermediate sample description characteristics corresponding to each candidate sample node.

The similarity between the first intermediate sample description feature and each second intermediate sample description feature may be calculated to obtain the attention estimation value corresponding to each candidate sample node. And carrying out weighted summation on the attention evaluation value corresponding to each candidate sample node and the description characteristics of the third intermediate sample to obtain a weighted summation result. And obtaining the aggregation sample description characteristics aiming at the target sample node based on the weighted summation result.

The target loss function may be determined from aggregated sample description features corresponding to the target sample node, the at least one positive sample node, and the at least one negative sample node, respectively. And adjusting the model parameters of the target network model to be trained according to the target loss function to obtain the trained target network model.

In one example, a first similarity between the aggregate sample description feature of the target sample node and the aggregate sample description features of the positive sample nodes may be determined, and a first similarity weight parameter may be obtained based on the first similarity. And determining second similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each negative sample node, and obtaining a second similarity weight parameter based on the second similarity. And determining a target loss function according to the first similarity weight parameter and the second similarity weight parameter.

The objective loss function may be positively correlated with the first similarity weight parameter and negatively correlated with the second similarity weight parameter. And adjusting model parameters of the target network model to be trained by taking the similarity between the aggregation sample description feature of the target sample node and the aggregation sample description feature of the positive sample node as a training target and the similarity between the aggregation sample description feature of the target sample node and the aggregation sample description feature of the negative sample node as a training target, so as to obtain the trained target network model.

For example, the objective loss function L can be calculated by equation (1),

z _i aggregate sample description features, z, representing target sample nodes _m Aggregate sample description features, z, representing positive sample nodes _n Aggregate exemplar description features, p, representing negative exemplar nodes _i Representing a set of positive sample nodes, n _i Represents a negative sample node set, sim (z) _i ，z _m ) Representing the similarity, sim (z), between the aggregate sample description feature of the target sample node and the aggregate sample description feature of the positive sample node _i ，z _n ) A similarity between the aggregated sample description feature representing the target sample node and the aggregated sample description feature of the negative sample node, which may indicate a distance between corresponding sample nodes.

In another example manner, the aggregated sample description features of the target sample nodes may be used as input data of the target network model to be trained, so as to obtain sample description results for the target sample nodes. For example, convolution processing may be performed on the aggregated sample description features of the target sample nodes through convolution layers of a target network model to be trained, so as to obtain a target vector representation for the target sample nodes, as a sample description result.

A first penalty function may be determined based on the first similarity weight parameter and the second similarity weight parameter. And determining a second loss function according to the sample description result and a preset description result label associated with the target sample node. And carrying out weighted summation on the first loss function and the second loss function to obtain a target loss function.

And determining a target loss function for guiding model training according to the aggregation sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node respectively. The convergence rate of the target network model can be effectively improved, the accuracy of network model training is guaranteed, and the generalization capability of the trained target network model is guaranteed.

One example approach is to derive an object description model based on a trained target network model. The trained target network model may be used as the object description model, or the trained target network model may be adaptively adjusted, and the adjusted target network model may be used as the object description model. The adaptive adjustment performed on the trained target network model may include, for example, an adjustment for the contents of model parameter weight, model structure, and the like, which is not limited in this embodiment.

The first descriptive feature of the target object and the second descriptive feature of the at least one candidate object may be used as input data of the object description model to obtain an aggregate descriptive feature for the target object. And performing convolution processing on the aggregation description characteristics of the target object by using the object description model, and outputting an object description result aiming at the target object. By utilizing the trained target network model, the object description result for the target object is determined, so that the object description efficiency can be effectively improved, and the object description cost can be effectively reduced.

According to the embodiment of the disclosure, model parameters of the target network model to be trained are adjusted according to the aggregation sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node, so as to obtain the trained target network model. The convergence rate of the network model training can be effectively improved, the accuracy and the generalization of the trained target network model can be effectively ensured, the object description precision can be improved, the hardware equipment requirement of the object description can be reduced, and credible data support can be provided for downstream applications such as object search, relation chain determination, association analysis, cluster classification and object visualization.

Fig. 5 schematically shows a schematic diagram of a training process of a network model according to an embodiment of the present disclosure.

As shown in fig. 5, for the target sample node 51A, at least one positive sample node (e.g., including the

positive sample nodes

51B, 51C) and at least one negative sample node (e.g., including the

negative sample nodes

51D, 51E) corresponding to the target sample node 51A are determined according to the object relationship network including the target sample node 51A.

Determining sample description characteristics of sample nodes to be processed, wherein the sample nodes to be processed comprise a target sample node, at least one positive sample node and at least one negative sample node. For example, a sample description feature 52A of the target sample node 51A is determined, sample description features 52B, 52C of the

positive sample nodes

51B, 51C, respectively, are determined, and sample description features 52D, 52E of the

negative sample nodes

51D, 51E, respectively, are determined.

The sample description characteristics of the sample nodes to be processed and the sample description characteristics of each candidate sample node can be respectively subjected to linear transformation processing by using a target network model to be trained to obtain at least one intermediate sample description characteristic. And aggregating the at least one intermediate sample description feature to obtain an aggregated sample description feature of the sample node to be processed. The candidate sample node is a sample node having an association relationship with the sample node to be processed, and the association relationship may include at least one of an attribute association relationship and an interaction association relationship.

Taking the sample node to be processed as the target sample node 51A as an example for explanation, the first linear transformation processing is performed on the sample description feature 52A to obtain a first intermediate sample description feature Q of the sample node to be processed. The second linear conversion processing is performed on the sample description features 52B, 52C, 52D, 52E respectively corresponding to the candidate sample nodes, to obtain second intermediate sample description features K1, K2, K3, K4. And carrying out third linear transformation processing on the

sample description characteristics

52B, 52C, 52D and 52E respectively corresponding to the candidate sample nodes to obtain third intermediate sample description characteristics V1, V2, V3 and V4.

According to the weight matrixes W1, W2, W3 and W4 corresponding to the candidate sample nodes respectively, the similarity between the first intermediate sample description characteristic Q and the second intermediate sample description characteristics K1, K2, K3 and K4 is calculated, and the attention assessment values R1, R2, R3 and R4 corresponding to the candidate sample nodes respectively are obtained.

The attention evaluations R1, R2, R3, R4 may be normalized by a Softmax function. And according to the weight matrixes W1 ', W2', W3 'and W4' respectively corresponding to the candidate sample nodes, carrying out weighted summation on the normalized attention assessment values R1, R2, R3 and R4 respectively corresponding to the candidate sample nodes and the third intermediate sample description features V1, V2, V3 and V4 to obtain a weighted summation result A-Linear.

The residual error unit ReLu may be utilized to perform residual error processing on the weighted summation result a-Linear, and perform weighted summation on the residual error processing result and the sample description feature 52A, so as to obtain an aggregate sample description feature of the target sample node 51A.

The convolution layer of the target network model to be trained can be utilized to carry out convolution processing on the aggregation sample description characteristics of the target sample node 51A to obtain a sample description result H of the target sample node 51A ^L 。

The target loss function may be determined according to the aggregated sample description characteristics of the target sample node, the at least one positive sample node, the at least one negative sample node, and according to the sample description result of the target sample node. And adjusting the model parameters of the target network model to be trained based on the target loss function to obtain the trained target network model. The convergence rate of the network model training can be effectively improved, and the accuracy and the generalization of the trained target network model can be effectively ensured.

Fig. 6 schematically shows a block diagram of an object description apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the object description apparatus 600 of the embodiment of the present disclosure includes, for example, a first processing module 610, a second processing module 620, and a third processing module 630.

A first processing module 610, configured to determine a first descriptive feature of the target object and a second descriptive feature of each candidate object of the at least one candidate object; a second processing module 620, configured to determine an aggregate description feature for the target object according to the first description feature and the second description feature; and a third processing module 630, configured to obtain an object description result of the target object based on the aggregation description feature, where the target object and each candidate object have an association relationship.

According to an embodiment of the present disclosure, the apparatus further includes a fourth processing module configured to: determining at least one adjacent node corresponding to the target node according to an object relationship network comprising the target node; and taking the object represented by each adjacent node as a candidate object, wherein the target node is used for representing the target object, and the object relationship network comprises a plurality of nodes for representing the object and edges for representing the incidence relationship among the nodes.

According to an embodiment of the present disclosure, the second processing module includes: the first processing submodule is used for respectively carrying out linear transformation processing on the first description characteristic and the second description characteristic of each candidate object to obtain at least one intermediate description characteristic; and the second processing submodule is used for aggregating the at least one intermediate description feature to obtain an aggregated description feature.

According to an embodiment of the present disclosure, the first processing submodule includes: the first processing unit is used for performing first linear transformation processing on the first description characteristic to obtain a first intermediate description characteristic, and performing second linear transformation processing and third linear transformation processing on the second description characteristic respectively to obtain a second intermediate description characteristic and a third intermediate description characteristic corresponding to the candidate object; and the second processing submodule comprises: the second processing unit is used for calculating the similarity between the first intermediate description characteristic and each second intermediate description characteristic to obtain an attention evaluation value corresponding to each candidate object; and the third processing unit is used for carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description feature, and obtaining an aggregation description feature based on the weighted summation result.

According to an embodiment of the present disclosure, the third processing unit includes: the first processing subunit is used for carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description feature to obtain an initial aggregation description feature; the second processing subunit is used for performing weighted product calculation on the initial aggregation description characteristics based on the residual connection matrix to obtain a weighted product calculation result; and the third processing subunit is used for obtaining the aggregation description feature according to the first intermediate description feature and the weighted quadrature result.

According to an embodiment of the present disclosure, a first processing module includes: a third processing submodule for determining at least one first initial descriptive feature of the target object; according to the attention weight corresponding to the feature type of each first initial description feature, aggregating at least one first initial description feature to obtain a first aggregation result; obtaining first description characteristics based on the first aggregation result, wherein at least one first initial description characteristic comprises an object image characteristic of a target object and an incidence relation characteristic between the target object and other objects; a fourth processing submodule, configured to determine at least one second initial descriptive feature of the at least one candidate object; according to the attention weight corresponding to the feature type of each second initial description feature, aggregating at least one second initial description feature to obtain a second aggregation result; and obtaining second description characteristics based on the second aggregation result, wherein the at least one second initial description characteristic comprises object portrait characteristics of at least one candidate object and association relation characteristics between the at least one candidate object and other objects.

According to an embodiment of the present disclosure, the third processing module includes: and the fifth processing submodule is used for performing convolution processing on the aggregation description characteristics to obtain target vector representation aiming at the target object, and the target vector representation is used as an object description result.

According to an embodiment of the present disclosure, the apparatus further includes a fifth processing module configured to: determining the matching degree between the target object and each other object according to the object description result of the target object and the object description result of at least one other object; and taking other objects with the corresponding matching degrees meeting the preset matching conditions as object matching results aiming at the target object.

Fig. 7 schematically shows a block diagram of a training apparatus of a network model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 for a network model according to the embodiment of the present disclosure includes, for example, a sixth processing module 710, a seventh processing module 720, an eighth processing module 730, and a ninth processing module 740.

A sixth processing module 710, configured to determine at least one positive sample node and at least one negative sample node corresponding to a target sample node according to an object relationship network including the target sample node; a seventh processing module 720, configured to determine sample description characteristics of sample nodes to be processed, where the sample nodes to be processed include a target sample node, at least one positive sample node, and at least one negative sample node; the eighth processing module 730 is configured to use the sample description feature of the sample node to be processed and the sample description feature of at least one candidate sample node having an association relationship with the sample node to be processed as input data of the target network model to be trained, to obtain an aggregate sample description feature of the sample node to be processed; and a ninth processing module 740, configured to adjust model parameters of the target network model to be trained according to the aggregated sample description features corresponding to the target sample node, the at least one positive sample node, and the at least one negative sample node, respectively, to obtain a trained target network model, where there is an association relationship between the target sample node and each positive sample node.

According to the embodiment of the disclosure, model parameters of the target network model to be trained are adjusted according to the aggregation sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node, so as to obtain the trained target network model. The convergence rate of the network model training can be effectively improved, the accuracy and the generalization of the trained target network model can be effectively ensured, the object description precision can be improved, the hardware equipment requirement of the object description can be reduced, and credible data support can be provided for downstream applications such as object search, relation chain determination, association analysis, cluster classification, object visualization, object intimacy analysis and the like.

According to an embodiment of the present disclosure, the eighth processing module includes: the sixth processing submodule is used for respectively carrying out linear transformation processing on the sample description characteristics of the sample nodes to be processed and the sample description characteristics of each candidate sample node by using the target network model to be trained to obtain at least one intermediate sample description characteristic; and the seventh processing submodule is used for aggregating the description characteristics of the at least one intermediate sample to obtain the aggregated sample description characteristics.

According to an embodiment of the present disclosure, the ninth processing module includes: the eighth processing submodule is used for determining first similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each positive sample node, and obtaining a first similarity weight parameter based on the first similarity; the ninth processing submodule is used for determining second similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each negative sample node, and obtaining a second similarity weight parameter based on the second similarity; the tenth processing submodule is used for determining a target loss function according to the first similarity weight parameter and the second similarity weight parameter; and an eleventh processing submodule, configured to perform network model training based on the target loss function, to obtain a trained target network model.

According to an embodiment of the disclosure, the eighth processing module is further configured to: taking the aggregation sample description characteristics of the target sample nodes as input data of a target network model to be trained to obtain a sample description result aiming at the target sample nodes; the tenth processing submodule includes: the fourth processing unit is used for determining a first loss function according to the first similarity weight parameter and the second similarity weight parameter; the fifth processing unit is used for determining a second loss function according to the sample description result and the description result label of the target sample node; and a sixth processing unit for obtaining a target loss function based on the first loss function and the second loss function.

It should be noted that in the technical solutions of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the related information are all in accordance with the regulations of the related laws and regulations, and do not violate the customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. The electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running deep learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the object description method. For example, in some embodiments, the object description methods may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the object description method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the object description method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable model training apparatus, such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with an object, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to an object; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which objects can provide input to the computer. Other kinds of devices may also be used to provide for interaction with an object; for example, feedback provided to the subject can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the object may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., an object computer having a graphical object interface or a web browser through which objects can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An object description method comprising:

determining a first descriptive feature of a target object and a second descriptive feature of each of at least one candidate object;

determining an aggregate descriptive feature for the target object from the first descriptive feature and the second descriptive feature; and

obtaining an object description result of the target object based on the aggregation description feature,

and the target object and each candidate object have an association relationship.

2. The method of claim 1, further comprising:

determining at least one adjacent node corresponding to a target node according to an object relationship network comprising the target node; and

taking the object characterized by each adjacent node as the candidate object,

wherein the target node is used for characterizing the target object, and the object relationship network comprises a plurality of nodes for characterizing the object and edges for characterizing the incidence relationship among the plurality of nodes.

3. The method of claim 1, wherein the determining an aggregate descriptive feature for the target object from the first descriptive feature and the second descriptive feature comprises:

respectively carrying out linear transformation processing on the first description characteristics and the second description characteristics of each candidate object to obtain at least one intermediate description characteristic; and

and aggregating the at least one intermediate description feature to obtain the aggregated description feature.

4. The method of claim 3, wherein,

the performing linear transformation processing on the first description feature and the second description feature of each candidate object respectively to obtain at least one intermediate description feature includes:

performing first linear transformation processing on the first description features to obtain first intermediate description features, and performing second linear transformation processing and third linear transformation processing on the second description features respectively to obtain second intermediate description features and third intermediate description features corresponding to the candidate objects; and

the aggregating the at least one intermediate description feature to obtain the aggregated description feature includes:

calculating the similarity between the first intermediate description feature and each second intermediate description feature to obtain an attention evaluation value corresponding to each candidate object;

and carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description characteristic, and obtaining the aggregation description characteristic based on the weighted summation result.

5. The method according to claim 4, wherein the weighted summation of the attention assessment value corresponding to each candidate object and the corresponding third intermediate description feature to obtain the aggregated description feature based on the weighted summation result comprises:

carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description feature to obtain an initial aggregation description feature;

based on a residual connection matrix, performing weighted product operation on the initial aggregation description characteristics to obtain a weighted product operation result; and

and obtaining the aggregation description feature according to the first intermediate description feature and the weighted product result.

6. The method of claim 1, wherein the determining a first descriptive characteristic of the target object and a second descriptive characteristic of each of the at least one candidate object comprises:

determining at least one first initial descriptive feature of the target object; according to the attention weight corresponding to the feature type of each first initial description feature, aggregating the at least one first initial description feature to obtain a first aggregation result; and obtaining the first description feature based on the first aggregation result, wherein the at least one first initial description feature comprises an object portrait feature of the target object and an association relationship feature between the target object and other objects;

determining at least one second initial descriptive feature of the at least one candidate object; according to the attention weight corresponding to the feature type of each second initial description feature, aggregating the at least one second initial description feature to obtain a second aggregation result; and obtaining the second description feature based on the second aggregation result, wherein the at least one second initial description feature comprises an object image feature of the at least one candidate object and an association relationship feature between the at least one candidate object and other objects.

7. The method according to any one of claims 1 to 6, wherein the obtaining of the object description result of the target object based on the aggregated description feature comprises:

and performing convolution processing on the aggregation description features to obtain a target vector representation for the target object, wherein the target vector representation is used as the object description result.

8. The method of claim 1, further comprising:

determining the matching degree between the target object and each other object according to the object description result of the target object and the object description result of at least one other object; and

and taking other objects with the corresponding matching degrees meeting preset matching conditions as object matching results aiming at the target object.

9. A training method of a network model comprises the following steps:

determining at least one positive sample node and at least one negative sample node corresponding to a target sample node according to an object relationship network comprising the target sample node;

determining sample description features of sample nodes to be processed, wherein the sample nodes to be processed include the target sample node, the at least one positive sample node, and the at least one negative sample node;

taking the sample description characteristics of the sample node to be processed and the sample description characteristics of at least one candidate sample node having an incidence relation with the sample node to be processed as input data of a target network model to be trained to obtain the aggregation sample description characteristics of the sample node to be processed; and

adjusting model parameters of the target network model to be trained according to the aggregated sample description characteristics corresponding to the target sample node, the at least one positive sample node and the at least one negative sample node respectively to obtain a trained target network model,

wherein, the target sample node and each positive sample node have an incidence relation.

10. The method according to claim 9, wherein the obtaining of the aggregated sample description feature of the sample node to be processed by using the sample description feature of the sample node to be processed and the sample description feature of at least one candidate sample node having an association relationship with the sample node to be processed as input data of a target network model to be trained comprises:

respectively carrying out linear transformation processing on the sample description characteristics of the sample nodes to be processed and the sample description characteristics of each candidate sample node by using a target network model to be trained to obtain at least one intermediate sample description characteristic; and

and aggregating the at least one intermediate sample description feature to obtain an aggregated sample description feature.

11. The method of claim 9, wherein the adjusting model parameters of the target network model to be trained according to the aggregated sample description features corresponding to the target sample node, the at least one positive sample node, and the at least one negative sample node, respectively, to obtain a trained target network model comprises:

determining a first similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each positive sample node, and obtaining a first similarity weight parameter based on the first similarity;

determining a second similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each negative sample node, and obtaining a second similarity weight parameter based on the second similarity;

determining a target loss function according to the first similarity weight parameter and the second similarity weight parameter;

and adjusting the model parameters of the target network model to be trained based on the target loss function to obtain the trained target network model.

12. The method of claim 11, further comprising:

taking the aggregation sample description characteristics of the target sample nodes as input data of the target network model to be trained to obtain sample description results aiming at the target sample nodes;

determining a target loss function according to the first similarity weight parameter and the second similarity weight parameter includes:

determining a first loss function according to the first similarity weight parameter and the second similarity weight parameter;

determining a second loss function according to the sample description result and a preset sample description label associated with the target sample node; and

and obtaining the target loss function based on the first loss function and the second loss function.

13. An object description apparatus comprising:

the first processing module is used for determining a first description characteristic of a target object and a second description characteristic of each candidate object in at least one candidate object;

a second processing module for determining an aggregate descriptive feature for the target object based on the first descriptive feature and the second descriptive feature; and

a third processing module, configured to obtain an object description result of the target object based on the aggregation description feature,

14. The apparatus of claim 13, further comprising a fourth processing module to:

taking the object characterized by each adjacent node as the candidate object,

15. The apparatus of claim 13, wherein the second processing module comprises:

the first processing submodule is used for respectively carrying out linear transformation processing on the first description characteristic and the second description characteristic of each candidate object to obtain at least one intermediate description characteristic; and

and the second processing submodule is used for aggregating the at least one intermediate description feature to obtain the aggregated description feature.

16. The apparatus of claim 15, wherein the first processing sub-module comprises:

the first processing unit is used for performing first linear transformation processing on the first description characteristic to obtain a first intermediate description characteristic, and performing second linear transformation processing and third linear transformation processing on the second description characteristic respectively to obtain a second intermediate description characteristic and a third intermediate description characteristic corresponding to the candidate object; and

the second processing sub-module comprises:

a second processing unit, configured to calculate a similarity between the first intermediate descriptive feature and each of the second intermediate descriptive features, and obtain an attention evaluation value corresponding to each of the candidate objects;

and the third processing unit is used for carrying out weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description feature, and obtaining the aggregation description feature based on the weighted summation result.

17. The apparatus of claim 16, wherein the third processing unit comprises:

the first processing subunit is configured to perform weighted summation on the attention evaluation value corresponding to each candidate object and the corresponding third intermediate description feature to obtain an initial aggregation description feature;

the second processing subunit is used for performing weighted product calculation on the initial aggregation description characteristics based on the residual connection matrix to obtain a weighted product calculation result; and

and the third processing subunit is configured to obtain the aggregation description feature according to the first intermediate description feature and the weighted product result.

18. The apparatus of claim 13, wherein the first processing module comprises:

a third processing submodule for determining at least one first initial descriptive feature of the target object; according to the attention weight corresponding to the feature type of each first initial description feature, aggregating the at least one first initial description feature to obtain a first aggregation result; and obtaining the first description feature based on the first aggregation result, wherein the at least one first initial description feature comprises an object portrait feature of the target object and an association relationship feature between the target object and other objects;

a fourth processing sub-module for determining at least one second initial descriptive feature of the at least one candidate object; according to the attention weight corresponding to the feature type of each second initial description feature, aggregating the at least one second initial description feature to obtain a second aggregation result; and obtaining the second description feature based on the second aggregation result, wherein the at least one second initial description feature comprises an object image feature of the at least one candidate object and an association relationship feature between the at least one candidate object and other objects.

19. The apparatus of any of claims 13 to 18, wherein the third processing module comprises:

and the fifth processing submodule is used for performing convolution processing on the aggregation description characteristics to obtain a target vector representation aiming at the target object, and the target vector representation is used as the object description result.

20. The apparatus of claim 13, further comprising a fifth processing module to:

21. An apparatus for training a network model, comprising:

the sixth processing module is used for determining at least one positive sample node and at least one negative sample node corresponding to a target sample node according to an object relationship network comprising the target sample node;

a seventh processing module, configured to determine a sample description feature of a to-be-processed sample node, where the to-be-processed sample node includes the target sample node, the at least one positive sample node, and the at least one negative sample node;

the eighth processing module is configured to use the sample description features of the sample nodes to be processed and the sample description features of at least one candidate sample node having an association relationship with the sample nodes to be processed as input data of a target network model to be trained, so as to obtain aggregation sample description features of the sample nodes to be processed; and

a ninth processing module, configured to adjust model parameters of the target network model to be trained according to the aggregated sample description features corresponding to the target sample node, the at least one positive sample node, and the at least one negative sample node, respectively, to obtain a trained target network model,

and the target sample node and each positive sample node have an incidence relation.

22. The apparatus of claim 21, wherein the eighth processing module comprises:

the sixth processing submodule is used for respectively carrying out linear transformation processing on the sample description characteristics of the sample nodes to be processed and the sample description characteristics of each candidate sample node by using a target network model to be trained to obtain at least one intermediate sample description characteristic; and

and the seventh processing submodule is used for aggregating the at least one intermediate sample description feature to obtain the aggregated sample description feature.

23. The apparatus of claim 21, wherein the ninth processing module comprises:

the eighth processing submodule is used for determining first similarity between the aggregation sample description characteristics of the target sample node and the aggregation sample description characteristics of each positive sample node, and obtaining a first similarity weight parameter based on the first similarity;

a ninth processing submodule, configured to determine a second similarity between the aggregation sample description feature of the target sample node and the aggregation sample description feature of each negative sample node, and obtain a second similarity weight parameter based on the second similarity;

a tenth processing submodule, configured to determine a target loss function according to the first similarity weight parameter and the second similarity weight parameter; and

and the eleventh processing submodule is used for adjusting the model parameters of the target network model to be trained on the basis of the target loss function to obtain the trained target network model.

24. The apparatus of claim 23, wherein the eighth processing module is further configured to:

the tenth processing sub-module includes:

a fourth processing unit, configured to determine a first loss function according to the first similarity weight parameter and the second similarity weight parameter;

the fifth processing unit is used for determining a second loss function according to the sample description result and the description result label of the target sample node; and

a sixth processing unit, configured to obtain the target loss function based on the first loss function and the second loss function.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object description method of any one of claims 1 to 8 or to perform the training method of the network model of any one of claims 9 to 12.

26. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the object description method of any one of claims 1 to 8 or the training method of the network model of any one of claims 9 to 12.

27. A computer program product comprising a computer program which, when executed by a processor, implements an object description method as claimed in any one of claims 1 to 8, or implements a training method for a network model as claimed in any one of claims 9 to 12.