CN112036418A - Method and device for extracting user features - Google Patents

Method and device for extracting user features Download PDF

Info

Publication number
CN112036418A
CN112036418A CN202010919065.1A CN202010919065A CN112036418A CN 112036418 A CN112036418 A CN 112036418A CN 202010919065 A CN202010919065 A CN 202010919065A CN 112036418 A CN112036418 A CN 112036418A
Authority
CN
China
Prior art keywords
network
sub
user
networks
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010919065.1A
Other languages
Chinese (zh)
Inventor
李杰桠
李德华
郑邦祺
彭南博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN202010919065.1A priority Critical patent/CN112036418A/en
Publication of CN112036418A publication Critical patent/CN112036418A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for extracting user features. One embodiment of the method comprises: acquiring a user relationship network, wherein nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users; extracting at least two sub-networks from the subscriber relationship network; and processing the at least two sub-networks by utilizing a pre-trained graph convolution network to obtain user characteristics corresponding to the users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting the characteristics of the users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network. This embodiment helps to reduce the computational complexity of the graph convolution network.

Description

Method and device for extracting user features
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for extracting user features.
Background
User relationship networks are a common way to represent the connections between groups of users. The user relationship network generally takes users as main bodies, wherein each node in the user relationship network can correspond to one user, and an edge between two nodes can represent the contact between the users corresponding to the two connected nodes respectively. The analysis and research on the user relationship network, including the problem of how to mine the data therein to know the users therein, is a current research hotspot.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for extracting user features.
In a first aspect, an embodiment of the present disclosure provides a method for extracting a user feature, where the method includes: acquiring a user relationship network, wherein nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users; extracting at least two sub-networks from the subscriber relationship network; and processing the at least two sub-networks by utilizing a pre-trained graph convolution network to obtain user characteristics corresponding to the users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting the characteristics of the users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network.
In some embodiments, a subnetwork of the at least two subnetworks belongs to a homogeneous network.
In some embodiments, extracting at least two sub-networks from the user relationship network comprises: at least two sub-networks are extracted from the user relationship network according to the association types between the users in the user relationship network, wherein the association types between the users in the sub-networks are the same for the sub-networks in the at least two sub-networks.
In some embodiments, the at least two sub-networks comprise all edges of the user relationship network.
In some embodiments, the convolutional layer is further configured to aggregate the feature extraction results of the at least two sub-networks according to the weights respectively corresponding to the sub-networks.
In some embodiments, the graph convolution network further includes a residual network.
In some embodiments, the above method further comprises: and determining the attributes of the users in the user relationship network according to the user characteristics corresponding to the users in the user relationship network.
In a second aspect, an embodiment of the present disclosure provides an apparatus for extracting a user feature, the apparatus including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a user relationship network, nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users; an extracting unit configured to extract at least two sub-networks from a user relationship network; and the processing unit is configured to process at least two sub-networks by using a pre-trained graph volume network to obtain user characteristics corresponding to users in the user relationship network, wherein the graph volume network comprises a convolution layer, and the convolution layer is used for respectively extracting the characteristics of the users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results respectively corresponding to each sub-network.
In some embodiments, a subnetwork of the at least two subnetworks belongs to a homogeneous network.
In some embodiments, the above extraction unit is further configured to: at least two sub-networks are extracted from the user relationship network according to the association types between the users in the user relationship network, wherein the association types between the users in the sub-networks are the same for the sub-networks in the at least two sub-networks.
In some embodiments, the at least two sub-networks comprise all edges of the user relationship network.
In some embodiments, the convolutional layer is further configured to aggregate the feature extraction results of the at least two sub-networks according to the weights respectively corresponding to the sub-networks.
In some embodiments, the graph convolution network further includes a residual network.
In some embodiments, the above apparatus further comprises: the determining unit is configured to determine the attribute of the user in the user relationship network according to the user characteristic corresponding to the user in the user relationship network.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
According to the method and the device for extracting the user features, at least two sub-networks are extracted from the user relationship network, then the pre-trained graph convolution network is used for respectively extracting the features of the users in each sub-network, and the feature extraction results respectively corresponding to each sub-network are aggregated to obtain the user features corresponding to the users in the user relationship network.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for extracting user features according to the present disclosure;
FIG. 3 is a flow diagram of yet another embodiment of a method for extracting user features according to the present disclosure;
FIG. 4 is a schematic diagram of one application scenario of a method for extracting user features according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating an embodiment of an apparatus for extracting user features according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an exemplary architecture 100 to which embodiments of the disclosed method for extracting user features or apparatus for extracting user features may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various client applications may be installed on the terminal devices 101, 102, 103. For example, browser-like applications, search-like applications, shopping-like applications, social platform software, instant messaging tools, information flow-like applications, financial-like applications, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services. The server may process a user relationship network constructed in advance based on users corresponding to the terminal devices 101, 102, and 103, respectively, to extract user characteristics corresponding to the users corresponding to the terminal devices 101, 102, and 103, respectively. Further, the server 105 may also determine an attribute of the user based on the extracted user feature, and return information to the terminal device corresponding to the user according to the attribute of the user.
It should be noted that the user relationship network may be stored in the terminal devices 101, 102, and 103, and at this time, the server 105 may acquire the user relationship network from the terminal devices 101, 102, and 103. In addition, the user relationship network may be directly stored locally in the server 105, and the server 105 may directly extract the user relationship network stored locally and process the user relationship network, in which case, the terminal apparatuses 101, 102, and 103 and the network 104 may not be present.
It should be noted that the method for extracting the user features provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for extracting the user features is generally disposed in the server 105.
It is further noted that the terminal devices 101, 102, 103 may also have processing capabilities for the user relationship network. At this time, the method for extracting the user feature may be executed by the terminal apparatuses 101, 102, 103, and accordingly, the means for extracting the user feature may be provided in the terminal apparatuses 101, 102, 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for extracting user features in accordance with the present disclosure is shown. The method for extracting the user features comprises the following steps:
step 201, acquiring a user relationship network.
In this embodiment, the user relationship network may be composed of nodes and edges between the nodes. Nodes in the user relationship network may be used to characterize users. The edges in the user relationship network may be used to represent the association relationships between users respectively corresponding to the connected nodes.
The user relationship network can be flexibly constructed in advance according to different application scenes and actual application requirements. For example, a user relationship network may be pre-constructed for a user group of an application or applications.
The incidence relation between users can also be flexibly set according to different application scenes and actual application requirements. For example, the association between users may include, but is not limited to: friend relationships, address book relationships, transaction relationships, possession of the same item, and so on.
As an example, the user relationship network is constructed in advance according to a certain e-commerce platform. At this time, the users in the user relationship network may include all users using the e-commerce platform. The association between users may include, but is not limited to: the same or similar items have been purchased, the same terminal device has been used, the same shipping address has been used, items have been purchased for the same user, and so on.
Wherein, the users in the user relation network can be represented by using the user data collected in advance. The user data may be various data of the user, and may be specifically set according to different application scenarios. For example, the user data may be historical behavior data, attribute data, and the like of the user.
In this embodiment, the executing agent (e.g., server 105 shown in fig. 1) of the method for extracting user features may retrieve a pre-constructed user relationship network from a local or other storage device.
It should be noted that the user relationship network may be pre-constructed by the execution subject, or may be pre-constructed by another terminal device.
At step 202, at least two sub-networks are extracted from the user relationship network.
In this embodiment, the number of edges in each sub-network of the user relationship network is less than the number of edges in the user relationship network. Various methods can be flexibly adopted to extract at least two sub-networks from the user relationship network according to the actual application requirements. For example, the edges in the user relationship network may be arbitrarily divided into at least two groups, and then for each group of edges, each edge in the group and the node corresponding to each edge form a sub-network corresponding to the re-group, thereby obtaining at least two sub-networks.
For another example, according to the attributes of the users in the user relationship network, the users having the same attribute and the edges between the users can be extracted as a sub-network, so as to obtain at least two sub-networks. Wherein, the attribute of the user can be set by a technician according to the actual application requirement. By way of example, the attributes of the user include, but are not limited to, age, gender, account rating, and the like.
In some optional implementations of this embodiment, each of the extracted at least two subnetworks may be a homogeneous network. A homogeneous network may refer to a network in which nodes or edges have the same attributes of a certain aspect.
By designing each sub-network to belong to a homogeneous network, the need for processing heterogeneous networks can be avoided. Generally, as the scale of the user relationship network increases, the user relationship network is usually heterogeneous, i.e. users have very complex association relationship. When dealing with heterogeneous networks, the heterogeneous networks need to be converted into homogeneous networks. Therefore, in the prior art, when the user relationship network is processed, the entire user relationship network needs to be converted into a homogeneous network, and the complexity and the calculation amount of the conversion process are increased along with the increase of the scale of the user relationship network. Based on this, each isomorphic sub-network is extracted from the user relation network, and the conversion process can be directly saved, so that the calculation complexity is reduced, and the calculation efficiency is improved.
In some optional implementations of this embodiment, the at least two sub-networks may be extracted from the user relationship network according to a type of association between users in the user relationship network. And, for a sub-network of the at least two sub-networks, the type of association between users in that sub-network is the same.
Wherein, the association type between users may refer to the type of association relationship between users. The type of association between users in each sub-network is the same, i.e. each sub-network is a homogeneous network. In particular, edges of the same association type and nodes associated with the edges may be extracted as one sub-network, resulting in at least two sub-networks.
It should be noted that, if the association types between the users are N, the number of extracted sub-networks may not be less than N. If the number of extracted subnetworks is equal to N, each extracted subnetwork may correspond to an association type. If the number of extracted sub-networks is greater than N, two or more sub-networks in each extracted sub-network may correspond to the same association type.
It should be noted that there may be more edges connected to each node. Therefore, for any two sub-networks in the extracted sub-networks, the two sub-networks may include the same node or may not have repeated nodes.
As an example, the association relationship between users includes a friend relationship, the same or similar items purchased, and the items purchased for the users corresponding to the same mobile phone number. At this time, the user relationship network may be divided into three sub-networks, and the three sub-networks correspond to three association relationships, namely, a friend relationship, the purchase of the same or similar articles, and the purchase of the articles for the user corresponding to the same mobile phone number.
According to the association types corresponding to the edges in the user relationship network, the sub-networks corresponding to each association type are extracted from the user relationship network, so that not only can each sub-network belong to a homogeneous network to reduce the calculation complexity, but also the influence of different association types among users on the users can be more accurately realized.
Optionally, the extracted at least two sub-networks may comprise all edges in the user relationship network. Therefore, the situation that some incidence relations among users are lost due to the process of extracting the sub-networks, and therefore the accuracy of feature extraction of subsequent users is influenced can be avoided.
Step 203, processing at least two sub-networks by using a pre-trained graph convolution network to obtain user characteristics corresponding to users in the user relationship network, wherein the graph convolution network comprises convolution layers for respectively extracting characteristics of users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network.
In this embodiment, a Graph Convolutional Network (GCN) is a kind of neural network that has become popular in recent years, and compared with a conventional neural network (e.g., a Convolutional neural network CNN, etc.), the Graph Convolutional Network (GCN) can process various irregular data such as a social network, a communication network, a protein molecular structure, etc., and mine features and rules of the data.
The algorithm adopted by the graph convolution neural network is realized based on the algebraic graph theory. In general, it can be seen as a model of a spectral convolution operation using a chebyshev first order polynomial approximation of the graph laplacian matrix. From the perspective of spectrogram convolution, the graph convolution neural network can be viewed as a special form of graph laplacian smoothing. The convolution operation of the graph convolution neural network can be regarded as that the feature information of each node is sent to the neighbor nodes of the node after being transformed, and then the feature information of the neighbor nodes is fused.
In this embodiment, the graph convolution network may be configured to process the extracted at least two sub-networks to extract the user characteristics of the user in the user relationship network. In particular, the graph convolution network may perform a convolution operation on the extracted at least two sub-networks by using convolution layers included therein to extract user features of the user in the user relationship network.
It should be noted that the graph convolutional neural network may include at least one convolutional layer. The number of convolutional layers can be set by the skilled person according to the actual application requirements.
The feature extraction of each sub-network in the at least two sub-networks can be realized by convolution operation of the convolutional layer, and the feature extraction results of each sub-network are aggregated. Specifically, for each sub-network, the convolution layer is used to perform convolution operation on the sub-network to extract the user features respectively corresponding to the users in the sub-network as the feature extraction result of the sub-network.
After obtaining the feature extraction results corresponding to the respective sub-networks, the obtained feature extraction results may be aggregated to aggregate the user features of the users belonging to different sub-networks at the same time, and the aggregation result may be used as the user feature of the user extracted by the convolutional layer.
Since different sub-networks may comprise the same node. Therefore, the feature extraction results corresponding to different sub-networks may include the extracted user features for the same user. Therefore, the user characteristics of each user can be determined more accurately by the aggregation operation of the feature extraction results of the respective sub-networks.
Wherein, the algorithm of aggregation can be flexibly set by technical personnel according to the actual application requirement. For example, for each user, an average value of the user features of the user included in the feature extraction results respectively corresponding to the sub-networks may be calculated as an aggregation result, that is, the user features of the user extracted by the current convolutional layer.
At this time, the expression of the graph convolution network may be as follows (1):
Figure BDA0002666041690000091
wherein A is(d)The formula (2) is as follows:
Figure BDA0002666041690000092
wherein, the sigma is a nonlinear activation function of the graph convolution network. T is a positive integer that may refer to the number of convolutional layers of the graph convolutional network. t denotes the t-th convolutional layer. H(t)May refer to the output of the (t-1) th convolutional layer as well as the input of the t-th convolutional layer. Correspondingly, H(t-1)May refer to the input of the (t-1) th convolutional layer. D is a positive integer and may refer to the number of extracted subnetworks. d denotes the d-th sub-network. VdThe parameter of the nth sub-network can be referred to, that is, one of the parameters to be learned in the training process of the graph convolution network.
A(d)May refer to the feature matrix corresponding to the nth sub-network. S(d)May refer to the sum of the adjacency matrix and the identity matrix of the d-th sub-network. K(d)May refer to the degree matrix of the d-th sub-network. The adjacency matrix may represent the relations between nodes in the user-relationship network. The degree matrix is a diagonal matrix, and the elements on the diagonal are the degrees of each node in the user relationship network.
The graph convolution network can be obtained by training a technician in advance by using a machine learning method.
In some optional implementations of this embodiment, the graph convolution network may further include a residual network. The expression of the residual error network may be specifically designed by a technician according to an actual application scenario.
Alternatively, when the graph convolution network includes a residual network, expression (1) of the above graph convolution network may be adjusted to the following expression (3):
Figure BDA0002666041690000101
as can be seen, the expression of the residual network may be XW. Where X may refer to an adjacency matrix of the user relationship network. W may be a parameter of the residual error network, i.e. one of the parameters to be learned by the training process of the graph convolution network.
By setting the residual error network, the long-distance incidence relation can be connected, the number of layers of the graph convolution network can be increased, the situations of degradation, gradient disappearance and the like of the graph convolution network during training can be avoided, and the accuracy and precision of the processing result of the graph convolution network can be ensured.
In some optional implementation manners of this embodiment, after the user characteristics corresponding to the user in the user relationship network are obtained, the attribute of the user may be determined according to the user characteristics corresponding to each user. The attributes of the user may be attributes of various aspects of the user. Specifically, the attributes of the user may be set according to actual application requirements.
After the user characteristics corresponding to each user in the user relationship network are obtained, the attributes of the users can be determined according to the user characteristics by using various existing methods. For example, the user may be classified according to user characteristics using a neural network for classification to determine a category to which the user belongs. For another example, the neural network for prediction may be used to predict the attributes of the user based on the user characteristics.
Optionally, the graph convolution network may further include an attribute determination model for determining an attribute of the user according to the user characteristic. At this time, the convolution layer of the graph convolution network can be directly used for extracting the user characteristics corresponding to the user in the user relationship network, and the attribute of the user is output by the output layer, so that the determination speed of the attribute of the user can be improved.
In the prior art, the user relationship network is generally processed by directly utilizing a graph convolution network. Because the user relationship network generally includes a large number of nodes and edges, the training process of the graph convolution network or the processing process of the user relationship network by using the trained graph convolution network consumes very large computer resources, has long processing time, and simultaneously loses some information in the graph. The method provided by the embodiment of the disclosure extracts a plurality of sub-networks from the user relationship, so that the processing process of the user relationship network is converted into the processing process of each sub-network, thereby greatly reducing the calculation amount and the calculation complexity, enabling the training process and the practical application process of the graph convolution network not to consume excessive computer resources such as memory and the like, improving the processing efficiency, and simultaneously enabling the graph convolution network to be applied to scenes with higher real-time requirements. In addition, the method also makes full use of the prior information in the user relationship network, and can improve the accuracy of the calculation result.
With further reference to fig. 3, a flow 300 of yet another embodiment of a method for extracting user features is shown. The flow 300 of the method for extracting user features comprises the following steps:
step 301, obtaining a user relationship network.
Step 302, at least two sub-networks are extracted from the user relationship network.
Step 303, processing the at least two sub-networks by using a pre-trained graph convolution network to obtain user characteristics corresponding to users in the user relationship network, where the graph convolution network includes convolution layers for respectively performing characteristic extraction on users in each of the at least two sub-networks, and aggregating the characteristic extraction results of each sub-network according to weights respectively corresponding to each of the at least two sub-networks.
In this embodiment, after the convolution layer performs convolution operation on each sub-network to obtain the feature extraction result corresponding to each sub-network, the feature extraction results of each sub-network may be aggregated according to the weight corresponding to each sub-network, so as to assign different weights to different sub-networks.
At this time, the expression of the graph convolution network may be as follows (4):
Figure BDA0002666041690000111
wherein alpha isdThe weight of the kth sub-network, which is also one of the parameters to be learned by the graph-convolution network during the training process, may be represented. Specifically, the weights of each sub-network may be normalized. For example, the weight of each sub-network may be obtained by calculating the ratio of the weight of the sub-network to the sum of the weights of all sub-networks.
Optionally, the graph convolution network may also include a residual network while giving different weights to different sub-networks. In this case, the expression of the graph convolution network may be as follows (5):
Figure BDA0002666041690000112
therefore, the residual error network is utilized to increase the layer number of the graph convolution network so as to increase the accuracy and precision of the processing result of the graph convolution network, and simultaneously avoid the situations of degradation, gradient disappearance and the like of the graph convolution network during training,
the content that is not specifically described in the above steps 401, 402, and 403 may refer to the related description in steps 201, 202, and 203 in the corresponding embodiment of fig. 2, and is not repeated herein.
With continued reference to fig. 4, fig. 4 is an exemplary application scenario 400 of the method for extracting user features according to the present embodiment. In the application scenario of fig. 4, the execution subject may store in advance a user association 4011 and user data 4012 for representing an association between users from the database 401. For each user, the user data 4012 of the user includes consumption data (e.g., consumption times, consumption amount, consumption field, etc.), credit data (e.g., loan times, historical credit value, etc.), and browsing data (e.g., browsing information type, browsing information duration, etc.), wherein the credit value can be used to characterize the credit degree of the user. The user association 4011 may include a transfer relationship, use of the same electronic device, have the same shipping address, and the like.
Then, the user relationship network 402 can be constructed with the user association relationship 4011 and the user data 4012. Then, according to the association relationship represented by each edge in the user relationship network 402, the sub-network 4021, the sub-network 4022, and the sub-network 4023 are extracted from the user relationship network 402. Wherein the edges in sub-network 4021 all represent transfer relationships, the edges in sub-network 4022 all represent usage of the same electronic device, and the edges in sub-network 4023 all represent having the same shipping address.
Then, the adjacency matrix and degree matrix of the sub-network 4021, the sub-network 4022, and the sub-network 4023 may be extracted, the feature matrix corresponding to each sub-network may be determined according to the adjacency matrix and degree matrix corresponding to each sub-network, and the feature matrices corresponding to the sub-networks 4021, 4022, and 4023 may be input to the graph convolution network 403 trained in advance. The graph convolutional network firstly uses the convolutional layer to process the feature matrices corresponding to the sub-network 4021, the sub-network 4022 and the sub-network 4023, respectively, so as to extract the user features corresponding to each user in the user relationship network 402, and then uses the output layer to determine the user credit value 404 corresponding to each user according to the user features corresponding to each user.
In this application scenario, the credit value of the user is comprehensively evaluated by using the graph and volume network from information of multiple dimensions, such as the consumption behavior, the historical loan behavior, and the browsing behavior of the user. The graph convolution network can aggregate complex incidence relations among users in the user relation network, and comprehensively considers information of various aspects, so that the credit value of the users can be accurately evaluated even under the condition that the data of the users are weak.
In the prior art, when a graph convolution network is used for processing a user relationship network, the strength of the association relationship between users in the user relationship network is generally considered to be consistent, so that detailed information of edges in the user relationship network is not used. As can be seen from fig. 3, the flow 300 of the method for extracting user features in this embodiment highlights the step of giving different weights to different sub-networks compared to the embodiment corresponding to fig. 2. Therefore, the scheme described in this embodiment can fully list the association relationship information between users in the user relationship network, and further improve the accuracy of the processing result of the graph convolution network.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for extracting user features, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for extracting user features provided by the present embodiment includes an acquisition unit 501, an extraction unit 502, and a processing unit 503. The obtaining unit 501 is configured to obtain a user relationship network, where nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing association relationships between users; the extracting unit 502 is configured to extract at least two sub-networks from the user relationship network; the processing unit 503 is configured to process at least two sub-networks by using a pre-trained graph convolution network to obtain user features corresponding to users in the user relationship network, where the graph convolution network includes convolution layers for performing feature extraction on users in each of the at least two sub-networks respectively and aggregating feature extraction results corresponding to each sub-network respectively.
In the present embodiment, in the apparatus 500 for extracting a user feature: the specific processing of the obtaining unit 501, the extracting unit 502 and the processing unit 503 and the technical effects thereof can refer to the related descriptions of step 201, step 202 and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some alternative implementations of this embodiment, a sub-network of the at least two sub-networks belongs to a homogeneous network.
In some optional implementations of the present embodiment, the extracting unit 502 is further configured to: at least two sub-networks are extracted from the user relationship network according to the association types between the users in the user relationship network, wherein the association types between the users in the sub-networks are the same for the sub-networks in the at least two sub-networks.
In some alternative implementations of this embodiment, the at least two sub-networks comprise all edges of the user relationship network.
In some optional implementations of the embodiment, the convolutional layer is further configured to aggregate the feature extraction results of the at least two sub-networks according to the weights respectively corresponding to the sub-networks.
In some optional implementations of this embodiment, the graph convolution network further includes a residual network.
In some optional implementations of this embodiment, the apparatus for extracting a user feature further includes: the determining unit (not shown in the figures) is configured to determine the attributes of the users in the user relationship network according to the user characteristics corresponding to the users in the user relationship network.
The apparatus provided in the foregoing embodiment of the present disclosure acquires a user relationship network through an acquisition unit, where a node in the user relationship network is used to represent users, and an edge in the user relationship network is used to represent an association relationship between users; the extracting unit extracts at least two sub-networks from the user relationship network; the processing unit processes the at least two sub-networks by using a pre-trained graph convolution network to obtain user characteristics corresponding to users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting characteristics of the users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network. Therefore, compared with the method for processing the whole user relationship network by using the graph convolution network, the method greatly reduces the calculation amount and the calculation complexity, so that the training process and the practical application process of the graph convolution network do not need to consume excessive computer resources such as memory and the like, and the processing efficiency is improved.
Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., server 105 of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the server; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a user relationship network, wherein nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users; extracting at least two sub-networks from the subscriber relationship network; and processing the at least two sub-networks by utilizing a pre-trained graph convolution network to obtain user characteristics corresponding to the users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting the characteristics of the users in each sub-network of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an extraction unit, and a processing unit. Where the names of these elements do not in some cases constitute a limitation of the element itself, for example, the obtaining element may also be described as an "element for obtaining a user relationship network".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A method for extracting user features, comprising:
acquiring a user relationship network, wherein nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users;
extracting at least two sub-networks from the user relationship network;
and processing the at least two sub-networks by using a pre-trained graph convolution network to obtain user characteristics corresponding to users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting the characteristics of the users in each of the at least two sub-networks and aggregating the characteristic extraction results corresponding to each sub-network.
2. The method of claim 1, wherein a sub-network of the at least two sub-networks belongs to a homogeneous network.
3. The method of claim 2, wherein said extracting at least two sub-networks from the user relationship network comprises:
and extracting at least two sub-networks from the user relationship network according to the association types among the users in the user relationship network, wherein the association types among the users in the sub-networks are the same for the sub-networks in the at least two sub-networks.
4. The method of claim 1, wherein the at least two sub-networks comprise all edges of the user relationship network.
5. The method of claim 1, wherein the convolutional layer is further configured to aggregate the feature extraction results of each of the at least two sub-networks according to the weight corresponding to each of the sub-networks.
6. The method of claim 1, wherein the graph convolution network further comprises a residual network.
7. The method according to one of claims 1-6, wherein the method further comprises:
and determining the attributes of the users in the user relationship network according to the user characteristics corresponding to the users in the user relationship network.
8. An apparatus for extracting a user feature, wherein the apparatus comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a user relationship network, nodes in the user relationship network are used for representing users, and edges in the user relationship network are used for representing incidence relations among the users;
an extracting unit configured to extract at least two sub-networks from the user relationship network;
and the processing unit is configured to process the at least two sub-networks by using a pre-trained graph convolution network to obtain user features corresponding to users in the user relationship network, wherein the graph convolution network comprises convolution layers, and the convolution layers are used for respectively extracting features of the users in each sub-network of the at least two sub-networks and aggregating feature extraction results corresponding to each sub-network.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010919065.1A 2020-09-04 2020-09-04 Method and device for extracting user features Pending CN112036418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010919065.1A CN112036418A (en) 2020-09-04 2020-09-04 Method and device for extracting user features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010919065.1A CN112036418A (en) 2020-09-04 2020-09-04 Method and device for extracting user features

Publications (1)

Publication Number Publication Date
CN112036418A true CN112036418A (en) 2020-12-04

Family

ID=73592064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010919065.1A Pending CN112036418A (en) 2020-09-04 2020-09-04 Method and device for extracting user features

Country Status (1)

Country Link
CN (1) CN112036418A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734693A (en) * 2020-12-18 2021-04-30 平安科技(深圳)有限公司 Pipeline weld defect detection method and related device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150236910A1 (en) * 2014-02-18 2015-08-20 Telefonaktiebolaget L M Ericsson (Publ) User categorization in communications networks
CN108322473A (en) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 User behavior analysis method and apparatus
CN110009093A (en) * 2018-12-07 2019-07-12 阿里巴巴集团控股有限公司 For analyzing the nerve network system and method for relational network figure
CN110148053A (en) * 2019-04-25 2019-08-20 北京淇瑀信息科技有限公司 User's credit line assessment method, apparatus, electronic equipment and readable medium
CN110473083A (en) * 2019-07-08 2019-11-19 阿里巴巴集团控股有限公司 Tree-shaped adventure account recognition methods, device, server and storage medium
CN111046237A (en) * 2018-10-10 2020-04-21 北京京东金融科技控股有限公司 User behavior data processing method and device, electronic equipment and readable medium
CN111274491A (en) * 2020-01-15 2020-06-12 杭州电子科技大学 Social robot identification method based on graph attention network
CN111309983A (en) * 2020-03-10 2020-06-19 支付宝(杭州)信息技术有限公司 Method and device for processing service based on heterogeneous graph
CN111400560A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for predicting based on heterogeneous graph neural network model
CN111581450A (en) * 2020-06-24 2020-08-25 支付宝(杭州)信息技术有限公司 Method and device for determining service attribute of user

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150236910A1 (en) * 2014-02-18 2015-08-20 Telefonaktiebolaget L M Ericsson (Publ) User categorization in communications networks
CN108322473A (en) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 User behavior analysis method and apparatus
CN111046237A (en) * 2018-10-10 2020-04-21 北京京东金融科技控股有限公司 User behavior data processing method and device, electronic equipment and readable medium
CN110009093A (en) * 2018-12-07 2019-07-12 阿里巴巴集团控股有限公司 For analyzing the nerve network system and method for relational network figure
CN110148053A (en) * 2019-04-25 2019-08-20 北京淇瑀信息科技有限公司 User's credit line assessment method, apparatus, electronic equipment and readable medium
CN110473083A (en) * 2019-07-08 2019-11-19 阿里巴巴集团控股有限公司 Tree-shaped adventure account recognition methods, device, server and storage medium
CN111274491A (en) * 2020-01-15 2020-06-12 杭州电子科技大学 Social robot identification method based on graph attention network
CN111309983A (en) * 2020-03-10 2020-06-19 支付宝(杭州)信息技术有限公司 Method and device for processing service based on heterogeneous graph
CN111400560A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for predicting based on heterogeneous graph neural network model
CN111581450A (en) * 2020-06-24 2020-08-25 支付宝(杭州)信息技术有限公司 Method and device for determining service attribute of user

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林增跃;朱涛;王鹏;宋国杰;: "基于网络子图表示的铁路运输安全事故分类方法", 综合运输, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734693A (en) * 2020-12-18 2021-04-30 平安科技(深圳)有限公司 Pipeline weld defect detection method and related device
CN112734693B (en) * 2020-12-18 2024-06-07 平安科技(深圳)有限公司 Pipeline weld defect detection method and related device

Similar Documents

Publication Publication Date Title
CN108280115B (en) Method and device for identifying user relationship
CN112085615B (en) Training method and device for graphic neural network
CN110009486B (en) Method, system, equipment and computer readable storage medium for fraud detection
CN107451854B (en) Method and device for determining user type and electronic equipment
US11748452B2 (en) Method for data processing by performing different non-linear combination processing
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
CN111369258A (en) Entity object type prediction method, device and equipment
CN113627536A (en) Model training method, video classification method, device, equipment and storage medium
CN115496970A (en) Training method of image task model, image recognition method and related device
CN113807926A (en) Recommendation information generation method and device, electronic equipment and computer readable medium
CN114139052B (en) Ranking model training method for intelligent recommendation, intelligent recommendation method and device
CN115423031A (en) Model training method and related device
CN110720099A (en) System and method for providing recommendation based on seed supervised learning
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN112036418A (en) Method and device for extracting user features
CN110347973B (en) Method and device for generating information
CN116956204A (en) Network structure determining method, data predicting method and device of multi-task model
CN110930226A (en) Financial product recommendation method and device, electronic equipment and storage medium
CN113362097B (en) User determination method and device
CN114610996A (en) Information pushing method and device
CN114780847A (en) Object information processing and information pushing method, device and system
CN109597851B (en) Feature extraction method and device based on incidence relation
CN109712011B (en) Community discovery method and device
CN112035581A (en) Model-based task processing method, device, equipment and medium
US20230334096A1 (en) Graph data processing method and apparatus, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co., Ltd