CN112131446B - Graph node classification method and device, electronic equipment and storage medium - Google Patents

Graph node classification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112131446B
CN112131446B CN202010838847.2A CN202010838847A CN112131446B CN 112131446 B CN112131446 B CN 112131446B CN 202010838847 A CN202010838847 A CN 202010838847A CN 112131446 B CN112131446 B CN 112131446B
Authority
CN
China
Prior art keywords
node
nodes
current node
data
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010838847.2A
Other languages
Chinese (zh)
Other versions
CN112131446A (en
Inventor
余晓填
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN202010838847.2A priority Critical patent/CN112131446B/en
Publication of CN112131446A publication Critical patent/CN112131446A/en
Application granted granted Critical
Publication of CN112131446B publication Critical patent/CN112131446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a graph node classification method which can be used for classifying graph nodes with tens of millions of levels and above, and comprises the following steps: acquiring relation data among all nodes; extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space; acquiring the relationship data of the current node and j nodes related to the current node; according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained; reading first relation sparse representation data of a current node and j nodes from a first storage space; performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and initializing a sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes; and classifying the current node and the j nodes according to the final label distribution matrix. The node classification efficiency of the large-scale graph structure is improved.

Description

Graph node classification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a graph node classification method, apparatus, electronic device, and storage medium.
Background
The graph structure is a data structure for determining the relationship between the nodes through the connection of the nodes, and can be used for relationship analysis between the objects, such as relationship analysis of personnel objects, relationship analysis between place objects and the like. The relation analysis between objects is mainly embodied in classification, in a graph structure, each node is classified, for classification problems, in general, sample labels are learned through modeling, so that the purpose of classifying the nodes in the graph structure is achieved, in the graph structure, the larger the graph structure scale is affected by the graph structure scale, the higher the requirements on storage resources and computing resources of node data are, the data of the node are high-dimensional data, more redundant information (such as zero values) are included, and the higher the requirements on the storage resources and computing resources of the node data are. For M nodes, a matrix of m×m is required to be constructed to represent the connection relationship between M nodes, and if the attribute dimension of the node is d, the storage complexity is O (d 2 ) The computational complexity is O (d 3 ) Thus, as the scale of graph structures increases, more storage resources and computing resources are required. Therefore, the existing graph structure classification method In node classification of a large-scale graph structure (graph structure of millions of nodes, even graph structure above millions of nodes), the node classification of the large-scale graph structure is not efficient due to the need for higher storage resources and computing resources.
Disclosure of Invention
The embodiment of the invention provides a graph node classification method which can improve the node classification efficiency of a large-scale graph structure.
In a first aspect, an embodiment of the present invention provides a graph node classification method, for classifying an image file, including:
acquiring the relation data among the nodes, wherein the relation data of the nodes are obtained by extracting data according to the image files of the nodes, and each node corresponds to one image file;
extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space;
acquiring relation data of a current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is greater than or equal to 1;
according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained;
Reading first relation sparse representation data of the current node and the j nodes from the first storage space, and respectively calculating to obtain the similarity between the current node and the j nodes;
performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes;
and classifying the current node and the j nodes according to the final label distribution matrix.
In a second aspect, an embodiment of the present invention provides a graph node classification apparatus for classifying an image file, including:
the first acquisition module is used for acquiring the relation data among the nodes, wherein the relation data of the nodes are obtained by data extraction according to the image files of the nodes, and each node corresponds to one image file;
the first processing module is used for extracting first relation sparse expression data of each node according to the relation data and storing the first relation sparse expression data into a first storage space;
the second acquisition module is used for acquiring the relationship data of the current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is more than or equal to 1;
The label acquisition module is used for acquiring an initialized sample label matrix corresponding to the current node according to the classified nodes;
the first calculation module is used for reading the first relation sparse representation data of the current node and the j nodes from the first storage space, and respectively calculating the similarity between the current node and the j nodes;
the iteration module is used for carrying out label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and carrying out iteration to obtain a final label distribution matrix of the current node and the j nodes;
and the classification module is used for classifying the current node and the j nodes according to the final label distribution matrix.
In a third aspect, an embodiment of the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps in the graph node classification method provided by the embodiment of the invention when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps in the graph node classification method provided by the embodiment of the present invention.
In the embodiment of the invention, the relation data among all nodes are acquired, wherein the relation data of all nodes are obtained by data extraction according to the image files of all nodes, and each node corresponds to one image file; extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space, wherein the first relation sparse representation data comprises first pointer data for indexing into the first relation sparse representation data; acquiring relation data of a current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is greater than or equal to 1; according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained; according to the first pointer data, first relation sparse representation data of the current node and the j nodes are read from the first storage space, and similarity between the current node and the j nodes is calculated respectively; performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes; and classifying the current node and the j nodes according to the final label distribution matrix. The first relation sparse representation data are extracted through the relation data among the nodes to be stored, matrix storage is not needed, redundant data can be removed, the storage complexity of the data is reduced, meanwhile, in the classified calculation process, similarity calculation is also carried out through the first relation sparse representation data, redundant data are not added, the calculation complexity is reduced, the I/O speed of the data in the classified calculation process is improved, the calculation complexity is reduced, and the calculation speed of the data in the classified calculation process is improved, so that the node classification efficiency of a large-scale graph structure is effectively improved, and the method can be used for classifying large-scale graph nodes in millions, tens of thousands and above.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a graph node classification method provided by an embodiment of the invention;
FIG. 2 is a flowchart of a similarity calculation method according to an embodiment of the present invention;
FIG. 3 is a flowchart of another similarity calculation method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a relationship between time overhead and graph nodes provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a node classification diagram structure according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a structure of various node diagrams provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a node classification device according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a first processing module according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a first computing module according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a second storage sub-module according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of a first computing sub-module according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of another iteration module according to an embodiment of the present invention;
FIG. 13 is a schematic structural diagram of another node classification device according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a graph node classification method according to an embodiment of the present invention, where the method is used for classifying image files, as shown in fig. 1, and includes the following steps:
101. And acquiring the relation data among the nodes.
In the embodiment of the invention, the relationship data of each node is obtained by extracting data according to the image files of each node, and each node corresponds to one image file. The archive object of the image archive may be one of a personnel object, a place object and an article object.
In a possible embodiment, the object may be a person object, and the image file data records image data corresponding to the person object, where the image data may be an image of the person object acquired by the image acquisition device in a certain area at a certain time, and may be a face image or a whole body image of the person object. The face image may be a face image extracted after face detection according to a whole body image of a person, for example, the image collecting device collects a large image, where the large image includes H person objects, the face of the H person objects may be detected by the face detection engine, and the face images of the H person objects may be extracted respectively, and the face image may be understood as a small image extracted from the inside of the large image. After the face image is obtained, the face image and the corresponding person are put into gear, all face images of one person object are put into an image file, the image file of each person object is obtained, and the image file is stored in a file repository, wherein the file repository comprises the image data of each object. Further, the image data may be a structured or semi-structured face image, where the structured or semi-structured face image at least includes the face image information, the face image time information, and the face image location information. The time information may be determined by the time when the image acquisition device acquires the face image. The above location information may also be referred to as spatial information, and may be determined by a location where the image capturing device is located, further, may also be determined by a number of the image capturing device, where the number of the image capturing device has no repetition, and further, the spatial information may be determined according to the number of the image capturing device by binding the number to the set area.
In one possible embodiment, the object may be a location object, and the image file data records image data corresponding to the location object, where the image data may be an image of the location object acquired by the image acquisition device at a certain time, may be a person image appearing in a location area, may be a traffic image appearing in the location area, or the like, and in this embodiment, the image data is preferably a person image appearing in the location area. The person images appearing in the above-mentioned location range may be person images extracted after person detection is performed according to the collected images, for example, the image collecting device collects a large image, where the large image includes H person objects, the person images of the H person objects may be detected by the person detection engine, and person images of the H person objects may be respectively extracted, where the person images may be understood as small images extracted from the inside of the large image. After the personnel images are acquired, the personnel images and the corresponding places are documented, an image file is built for all the personnel images extracted in the range of one place, the image files of the objects in each place are obtained, and the image files are stored in a archive, wherein the archive comprises the image data of each object. Further, the image data may be a structured or semi-structured person image, where the structured or semi-structured person image at least includes face image information corresponding to the person, time information of the face image, and location information of the face image. The time information can be determined by the time when the image acquisition device acquires the image of the person. The above location information may also be referred to as spatial information, and may be determined by a location where the image capturing device is located, further, may also be determined by a number of the image capturing device, where the number of the image capturing device has no repetition, and further, the spatial information may be determined according to the number of the image capturing device by binding the number to the set area.
In a possible embodiment, the object may be an object, and the image file data records image data corresponding to the object, where the image data may be an image of the object acquired by the image acquisition device at a certain time in a certain area, may be an image of a person appearing in a location area, may be an image of an animal appearing in the location area, and the object may be a beehive, a tree, a cable, a dustbin, and the like. After the article image is obtained, the article image and the corresponding article are documented, all the images of one article establish an image file, the image file of each article object is obtained, and the image file is stored in a file repository, wherein the file repository comprises the image data of each object. Further, the image data may be a structured or semi-structured person image, where the structured or semi-structured person image at least includes face image information corresponding to the person, time information of the face image, and location information of the face image. The time information can be determined by the time when the image acquisition device acquires the image of the person. The above location information may also be referred to as spatial information, and may be determined by a location where the image capturing device is located, further, may also be determined by a number of the image capturing device, where the number of the image capturing device has no repetition, and further, the spatial information may be determined according to the number of the image capturing device by binding the number to the set area.
The relationship data may be relationships between objects, for example, relationships between people may be relationships between couples, parents, siblings, sisters, other relatives, colleagues, friends, strangers, and the like. The relationship between sites may be the same legal person, peer competition, peer cooperation, the same supply chain, customer overlap, etc. The relationship between the items may be the same function, the same shape, the same color, the distance of the position, etc. In a possible embodiment, the relationship data may be stored in the base after being extracted in advance, and when in use, the relationship data corresponding to each node is directly read, so as to improve the efficiency of classification of the graph nodes.
The above-mentioned objects such as persons, places, and articles may be objects such as persons, places, and articles in a wide range such as a street range and a city range.
The relationship data includes a first relationship value (which may also be referred to as a relationship weight), where the first relationship value is used to describe a relationship between each node, and is preset according to a relationship type between two nodes. Taking a person object as an example of a node, the mapping relationship between the persons and the first relationship value may be as shown in table 1:
TABLE 1
The relationship type is couple, mother-child and father-child, the first relationship value is 1.00, the relationship type is brother and sister, the first relationship value is 0.90, the relationship type is other relatives, the first relationship value is 0.80, the relationship type is friend, the first relationship value is 0.75, the relationship type is colleague, and the first relationship value is 0.70. The relationship type is stranger, then the first relationship value is [0,0.7 ].
It should be noted that table 1 is an example of a person object, and should not be construed as limiting the present invention, and it is understood that the relationship between location objects and the relationship between article objects may be represented by a relationship format similar to table 1.
The relationship type between the nodes may be recorded when the information of each node is collected, that is, the first relationship value is also recorded when the information of each node is collected.
102. And extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space.
In the embodiment of the invention, each node in the base further comprises a corresponding node ID.
Further, assuming that the number of nodes is M and V is a first relationship value of non-zero value, the above relationship data may be represented in the base as table 2:
x 1 x 2 x i x j x M
x 1 V 1,1 V 1,2 0 0 0
x 2 V 2,1 V 2,2 V 2,i 0 V 2,j
x i 0 V i,2 V i,i V i,j 0
x j 0 0 V j,i V j,j 0
x M 0 V M,2 0 0 V M,M
TABLE 2
In Table 2, it can be seen that Table 2 contains more zero-valued first relationship values that occupy memory space, and thus, table 2 storesComplexity is O (d 2 ) And when the data are read out to carry out classification calculation, the first relation values of the zero values are also read to participate in calculation, so that the calculation amount is increased. The node ID described above can be cleaved as x in Table 2 1 、x 2 、x i 、x j 、x M And the subscripts 1, 2, i, j and M can be understood as index identifiers of all the nodes, and the relationship data of the corresponding nodes can be indexed in the base according to the index identifiers.
In the embodiment of the present invention, the first relationship sparse representation data may be a node pair corresponding to a first relationship value of a non-zero value, where the node pair corresponding to the first relationship value of the non-zero value is obtained by extracting the first relationship value of the non-zero value, so as to discard the node pair corresponding to the first relationship value of the zero value, and represent the relationship between the nodes. The first relationship sparse representation data includes first pointer data and first content data, the first pointer data is used for indexing the first relationship sparse representation data, the first content data may be a first relationship value meeting an expression condition, and the first relationship sparse representation data may specifically be:
S={(i,j,V)}
In the above formula, i and j are pointer data for indicating a data position, i.e., the ith row and the jth column in table 2, V is content data, and if the expression condition of the first relational sparse representation data is a non-zero value, the first content data V represents a first relational value of the non-zero value. Further, S (x i ,x j )={(x i ,x j ,V i,j ) Used to represent node x i And node x j Is a first relation value V of i,j . Thus, the storage complexity is O ((Nz (mxm)), where Nz (mxm) represents the number of first relation values in table 2 representing non-zero values.
In the embodiment of the present invention, the first storage space may be a storage area in the base, which is dedicated to storing the first relational sparse representation data, or may be a newly built database, which is dedicated to storing the first relational sparse representation data. By extracting the first relation sparse representation data of each node for storage, the I/O read-write quantity of the first relation value of zero value can be reduced when the data of each node is read, and the I/O read-write speed is improved.
Optionally, if the expression condition of the first relationship sparse representation data is a preset relationship value, node pairs with the first relationship value greater than the preset relationship value may be extracted to obtain first relationship sparse representation data of each node; and storing the node ID into a first storage space by the node pair. The first relation value indicates the relation degree between two nodes, and for the node pair with the first relation value smaller than the preset relation value, discarding can be carried out, and only the node pair with the first relation value larger than the preset relation value is extracted to be converted into the first relation sparse representation data. For example, when the preset relationship value is 0.5, node pairs with the first relationship value greater than 0.5 are extracted, and node pairs with the first relationship value less than 0.5 are discarded. Therefore, the storage space occupied by the first relation sparse representation data is smaller, the calculated amount is smaller, and the classification of the nodes with larger first relation values is not influenced.
103. And acquiring the relationship data of the current node and j nodes related to the current node.
Wherein the j nodes include at least one classified node, and j is 1 or more.
The j nodes related to the current node can be understood as: all nodes with a first relation value of the current node being a non-zero value, or all nodes with a first relation value of the current node being greater than a preset relation value.
The relationship data of the j nodes related to the current node may be obtained from a base library or may be obtained from a first storage space. Preferably, in the embodiment of the present invention, relationship data of a current node and j nodes related to the current node are obtained from a first storage space.
Specifically, in equation 1, corresponding node data may be obtained by pointer data, assuming that the current node is x i Then get and x i All nodes associated, as in Table 2, are associated with x i All relevant nodes are x 2 、x i 、x j Etc.
The classified nodes can be understood as nodes with calibrated categories, and the classification of the nodes to be classified has guiding significance in the node classification. Further, the classified nodes are smaller than j.
104. And obtaining an initialized sample label matrix corresponding to the current node according to the classified nodes.
In this step, the above-mentioned classified nodes can be understood as selecting a small number of nodes from the current node and j nodes to perform manual classification, so as to obtain corresponding classified nodes. The classified nodes comprise sample label data, and an initialized sample label matrix of the current node is constructed according to the sample label data.
Specifically, in the embodiment of the present invention, the data of each node is set as (x 1 ,y 1 ),…,(x n ,y n ),…,(x n+1 ,y n+1 ),…,(x n+m ,y n+m )。X=(x 1 ,...,x n ,...,x n+1 ,...,x n+m ) For the statistical data of each node, the statistical data includes the relationship data of each node, and the sample label data is the label of the classified nodes by manual or automatic labeling algorithm, such as (x) 1 ,y 1 ),…,(x n ,y n ) Is data of a classified node, where x 1 Relationship data, x, representing the 1 st classified node n Relational data representing the nth classified node, y 1 Sample label data corresponding to 1 st classified node, y n Sample label data corresponding to the nth classified node is represented, and then a sample label matrix is Y n =(y 1 ,…,y n );(x n+1 ,y n+1 ),…,(x n+m ,y n+m ) Data which is an unlabeled node to be classified, wherein x is n+1 Relation data, x, representing the n+1th node to be classified n+m And representing the relation data of the n+m to-be-classified nodes, wherein n is an integer which is far smaller than m. Above Y m =(y n+1 ,…,y n+m ) Is of no markThe signed relationship data can be understood as the relationship data corresponding to the nodes to be classified. According to Y n And Y is equal to m Constructing an initialization sample tag matrix Y, wherein the initialization sample tag matrix Y= (Y) 1 ,…,y n ,…,y n+1 ,…,y n+m ) Where C is the class of node classification, in two classifications C is 2, and when the class of node classification is 5, C is 5. Optionally, the number of classified nodes is much smaller than the number of equally classified nodes, so that the initialization sample tag matrix Y is a sparse matrix. The principle of the embodiment of the invention is that X, Y n To predict Y m Specifically, by X, Y n Performing label propagation, and predicting to obtain a label distribution matrix Y t At Y t In (1), Y m Since the tag propagates, having been propagated with the assignment as the corresponding tag, Y can be derived m A corresponding tag.
105. And according to the first pointer data, reading the current node and the first relation sparse representation data of the j nodes from the first storage space, and respectively calculating to obtain the similarity between the current node and the j nodes.
In this step, the j nodes are j nodes related to the current node. The similarity may be Euclidean distance, gaussian kernel distance, cosine distance, jacquard distance, or Mahalanobis distance.
Optionally, referring to fig. 2, fig. 2 is a flowchart of a similarity calculation method provided in an embodiment of the present invention, as shown in fig. 2, including:
201. and extracting first relation sparse representation data of the current node and the target node from the first storage space according to the first pointer data.
In this step, the target node is any one node among the j nodes. The first relationship sparse representation data of the current node and the target node can be read in the first storage space according to the first pointer data in the first relationship sparse representation data, for example, let the current node be x 1 The target node is x 2 Then the current node is x i Is thin in the first relation of (2)Sparse representation data is S (x 1 )={(1,j,V 1,j ) The target node is x 2 Is S (x) 2 )={(2,k,V 2,k ) V because of the first relationship sparse representation data extracted from the first storage space 1,j V (V) 2,j Are both greater than a predetermined relationship value or are both non-zero values. Therefore, redundant data reading and writing can be avoided and the data reading and writing speed can be improved because the first relation sparse representation data of the current node and the target node is read only in the first storage space.
202. And converting the first relation sparse representation data of the current node and the target node into vectors to obtain a current node vector and a target node vector.
In this step, let the current node be x i Is S (x) 1 )={(1,j,V 1,j ) The target node is x 2 Is S (x) 2 )={(2,k,V 2,k ) Then the current node vector obtained by conversion is x 1 =(V 1,1 ,V 1,2 ,…,V 1,j ) The target node vector is x 2 =(V 2,1 ,V 2,2 ,…,V 2,k ) The first relation sparse representation data of the current node and the target node are converted into vectors, so that the spatial meaning of the current node and the target node is abstracted, and the similarity between the current node and the target node is conveniently calculated.
203. The current node vector is aligned with the target node vector.
In this step, since the current node vector is x 1 =(V 1,1 ,V 1,2 ,…,V 1,j ) The target node vector is x 2 =(V 2,1 ,V 2,2 ,…,V 2,k ) Where j and k may be the same or different, so that the dimensions of the current node vector and the target node vector may be different, and thus, the current node vector and the target node vector may be aligned to unify the spatial dimensions of the current node and the target node. Concrete embodimentsJ and k represent the maximum dimension values of the current node vector and the target node vector, respectively, and the magnitudes of j and k can be compared to align the current node vector and the target node vector to the dimension where j and k are larger. For the redundant dimensions of the current node vector and the target node vector, 0 is filled to represent, and if j= 7,k =9, it can be as shown in table 3:
x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9
x 1 V 1,1 V 1,2 0 0 V 1,5 0 V 1,7 0 0
x 2 V 2,1 V 2,2 0 V 2,4 0 V 2,6 0 0 V 2,9
TABLE 3 Table 3
In Table 3, the current node vector is x 1 =(V 1,1 ,V 1,2 ,V 1,5 ,V 1,7 ) Target node vector x 2 =(V 2,1 ,V 2,2 ,V 2,4 ,V 2,6 ,V 2,9 ) After alignment, the aligned current node vector is x 1 =(V 1,1 ,V 1,2 ,0,V 1,5 ,0,V 1,7 0), aligned target node vector x 2 =(V 2,1 ,V 2,2 ,V 2,4 ,0,V 2,6 ,0,V 2,9 )。
204. And obtaining second relation sparse representation data of the current node and the target node through the aligned current node vector and target node vector, and storing the second relation sparse representation data into a second storage space.
In the step, spatial distance calculation can be performed on the aligned current node vector and the target node vector, so as to obtain second relation sparse representation data of the current node and the target node. The above-mentioned spatial distance calculation may be to calculate the euclidean distance, gaussian kernel distance, cosine distance, jaccard distance, or mahalanobis distance between the current node and the target node. The above-mentioned spatial distance may also be referred to as a second relationship value.
The second storage space may be a storage area dedicated to storing the second relationship sparse representation data in the base, or may be a newly built database dedicated to storing the second relationship sparse representation data, or may be a memory area storing the second relationship sparse representation data as equally loaded thermal data, so that the memory area is directly accessed to obtain the second relationship sparse representation data during subsequent computation. The second relationship sparse representation data described above may be understood as intermediate data at the time of computation.
In the embodiment of the invention, the second relation value of the current node and the target node can be obtained through Gaussian kernel function calculation according to the aligned current node vector and target node vector. Specifically, the second relationship value between the current node and the target node may be calculated by:
in the above, w ij As a second relation value, z is as above i The z is the aligned current node vector j For the aligned target node vector, σ is as described above 2 The first relation value variance for the current node and j nodes.
The second relational sparse representation data includes second pointer data for indexing the second relational sparse representation data, and second content data, the second relational sparse representation data being expressed as w= { (i, j, W) ij ) -for expressing the current node x i With the target node x j Wherein i, j is second pointer data of second relation sparse representation data, w ij And second content data (second relationship values) that are second relationship sparse representation data. Further, the storage complexity is O (w ij )。
And after obtaining second relation sparse representation data of the current node and the target node according to the second relation value, storing the second relation sparse representation data into a second storage space, so as to facilitate the reading of the second relation value in the time of subsequent calculation.
In the embodiment of the invention, the second relation value between the nodes can be calculated in a parallel mode, and the second relation value between the nodes is converted into the second relation sparse representation data and stored in the second storage space.
205. And according to the second pointer data, second relation sparse representation data of the current node and the target node are read from the second storage space, and the similarity between the current node and the target node is calculated.
In this step, the second relationship sparse representation data includes a second relationship value, where the second relationship value is a spatial distance between the current node and the target node, and the similarity between the current node and the target node may be calculated based on the second relationship value.
In the embodiment of the invention, the second relation value can be used as the similarity between the current node and the target node, the second relation value can be weighted, the obtained weighted result can be used as the similarity between the current node and the target node, the second relation value can be normalized, and the obtained normalized result can be used as the similarity between the current node and the target node.
Referring to fig. 3, fig. 3 is a flowchart of another similarity calculation method according to an embodiment of the present invention, as shown in fig. 3, including:
301. the sum of the elements in the current node vector is calculated.
In this step, taking Table 3 as an example, let the current node vector be x 1 =(V 1,1 ,V 1,2 ,V 1,5 ,V 1,7 ) Then the sum of the elements in the current node vector is d 1 =V 1,1 +V 1,2 +V 1,5 +V 1,7 . Let the current node vector be x i It can also be expressed as:
in the above, d i I is the index of the current node, and j is the index of j nodes related to the current node.
302. The sum of the elements in the target node vector is calculated.
Similarly to step 301, consider, for example, table 3, the current node vector is set to x 2 =(V 2,1 ,V 2,2 ,V 2,4 ,V 2,6 ,V 2,9 ) Then the sum of the elements in the target node vector is d 2 =V 2,1 +V 2,2 +V 2,4 +V 2,6 +V 2,9 . Let the current node vector be x j It can also be expressed as:
in the above, d j J is the index of the target node, and k is the index of k nodes related to the target node.
In the above steps 301 and 302, the sum of the elements in the current node vector and the sum of the elements in the target node vector may be understood as the sum of the first relationship value of the current node and the first relationship value of the target node, and the first relationship value of the current node and the first relationship value of the target node may be obtained from the base, or the first relationship value of the current node and the first relationship value of the target node may be obtained from the first storage space. As is known from s= { (i, j, V) }, the first relationship value V of each node is included in the first relationship sparse representation data. The embodiment of the invention preferably obtains the first relation value of the current node and the first relation value of the target node from the first storage space, so as to calculate the sum of all elements in the current node vector and the sum of all elements in the target node vector. Alternatively, in step 201, the sum of the first relationship values of the current node and the first relationship values of the target node may be calculated by extracting the first relationship sparse representation data of the current node and the target node from the first storage space, and the sum of the first relationship values of the current node and the first relationship values of the target node may be stored in the fourth storage space, and when steps 301 and 302 are executed, the sum of the first relationship values of the current node and the first relationship values of the target node may be directly read from the fourth storage space.
303. And reading second relation sparse representation data of the current node and the target node from a second storage space according to the second pointer data.
In this step, the second relationship sparse representation data includes a second relationship value, and corresponding second pointer data may be constructed according to node IDs of the current node and the target node. When the second relation sparse representation data needs to be read, the corresponding second relation sparse representation data can be searched out in the second storage space through the second pointer data. Assume that the node IDs of the current node and the target node are x respectively 1 、x 2 The constructed second pointer data may be (1, 2), sparsely expressing data w= { (i, j, W) according to the second relationship ij ) And the second relation sparse representation data of the current node and the target node can be indexed to W (x) 1 ,x 2 )={(1,2,w 12 ) W, where 12 And the second relation value of the current node and the target node.
304. And calculating the similarity between the current node and the target node according to the sum of all elements in the current node vector, the sum of all elements in the target node vector and second relation sparse representation data of the current node and the target node.
In this step, let the sum of the elements in the current node vector be d i Target nodeThe sum of the elements in the vector is d j Correspondingly, the second relation value in the second relation sparse representation data is w ij The similarity p between the current node and the target node ij The calculation can be made by the following formula:
p ij =w ij /(d i d j )
in the embodiment of the invention, the similarity between the current node and the target node is calculated through the sum of all elements in the current node vector, the sum of all elements in the target node vector and the second relation sparse representation data of the current node and the target node, so that the space information fitting effect of the current node and the target node is better, and the convergence speed in the subsequent iteration process is faster.
206. And traversing to calculate the similarity between the current node and the j nodes.
In this step, the similarity between the current node and j nodes can be traversed by parallel computation threads.
106. And carrying out label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iteratively obtaining a final label distribution matrix of the current node and the j nodes.
In the embodiment of the invention, the graph structure can be constructed according to the similarity between every two nodes as the connecting line between the two nodes. The graph structure comprises nodes to be classified and classified nodes, and the labels of the classified nodes are transmitted to the nodes to be classified through label transmission of the nodes to be classified, so that classification of the nodes to be classified is completed.
In one possible embodiment, the classified nodes include positive sample label data and negative sample label data, and the corresponding sample sites include positive sample nodes and negative sample nodes, and the graph structure includes nodes to be classified, positive sample nodes and negative sample nodes.
Alternatively, the connection between two nodes represents the similarity between the two nodes, and in a possible embodiment, different similarities may correspond to different forms of connection, for example, different similarities correspond to connection with different color values, different similarities correspond to connection with different thicknesses, or different similarity values are marked on the corresponding connection.
In the object graph structure, the current node is connected with j object nodes, where j is greater than or equal to 1, for example, if the number of nodes related to the current node is j, one current node is connected with the j nodes.
In the embodiment of the invention, as the nodes to be classified are unlabeled and the classified nodes are labeled, the nodes to be classified can be subjected to label propagation according to the classified nodes and the similarity related to the classified nodes. And (3) regarding the nodes to be classified which are high in similarity with the classified nodes as nodes with the same class attribute through label propagation, and further, propagating the labels of the classified nodes to the nodes to be classified, so that the nodes to be classified have the same labels as the nodes to be classified. Thus, the labels of the nodes to be classified can be predicted, and the corresponding nodes are classified according to the labels.
And traversing all nodes of the graph structure, and carrying out label propagation on all nodes to be classified to obtain label matrixes corresponding to all nodes.
In the embodiment of the invention, the label propagation step is carried out on the object to be classified based on the similarity between the current node and j nodes and the initialized sample label matrix until convergence, so as to obtain a final label distribution matrix.
The above convergence refers to that the label distribution error obtained by two iterations is smaller than the convergence condition error.
Optionally, before step 106, similarity sparse representation data of the current node and the j nodes may be obtained according to the similarity between the current node and the j nodes, where the similarity sparse representation data includes third pointer data and third content data, and the third pointer data includes a first pointerThe data is used for indexing the similarity sparse representation data, and the third content data can be the similarity between corresponding nodes. Specifically, the similarity sparse representation data may be expressed as p= { (i, j, P) ij ) P is }, where ij =w ij /(d i d j ) The method comprises the steps of carrying out a first treatment on the surface of the Storing the similarity sparse representation data into a third storage space; according to third pointer data, reading similarity sparse representation data of the current node and the j nodes from the third storage space; and obtaining the similarity between the current node and the j nodes according to the similarity sparse representation data of the current node and the j nodes. Furthermore, the similarity between the nodes can be converted into similarity sparse representation data to be stored in the third storage space, so that the storage complexity is further reduced, and the I/O reading and writing speed of the similarity data between the nodes is improved.
In the embodiment of the invention, the first priori parameter beta 1 and the second priori parameter beta can be acquired before iteration 2 Wherein the first prior parameter β1 and the second prior parameter β2 are nonnegative numbers that sum to 1, i.e., β 2 =1-β 1
In the current iteration process, firstly, a label distribution matrix L obtained in the last iteration is obtained t . Calculating the similarity between the current node and j nodes and the label distribution matrix L obtained in the last iteration t And pass the a priori parameter beta 1 And adjusting the product matrix. By a priori parameter beta 2 And adjusting the initialization tag matrix, and adding the adjusted initialization tag matrix and the product matrix to obtain a tag distribution matrix of the current iteration.
Wherein the prior parameter refers to the confidence level of the user for initializing the sample tag matrix, which is needed to be performed by the user when the initialized sample tag data is constructed, and when the confidence level of the user for initializing the sample tag matrix is higher, beta can be calculated 2 The arrangement is larger, beta 1 The label matrix L obtained by the current iteration is set smaller at the moment t+1 Initialized tag matrix shadowLoud, when the user has low confidence level in initializing the tag matrix, beta can be calculated 2 The arrangement is smaller, beta 1 The label matrix L obtained by the current iteration is set larger at the moment t+1 Less affected by initializing the tag matrix.
Specifically, in the embodiment of the present invention, the first priori parameter is set to be β, the second priori parameter is set to be (1- β), and a specific iteration formula is as follows:
L t+1 =βPL t +(1-β)Y
where Y is an initialized sample tag matrix, L t Is the label distribution matrix obtained by the last iteration. And carrying out iterative learning through an iterative formula, and iterating the steps until convergence. It should be noted that, the initialized sample tag matrix and the tag distribution matrix obtained by each iteration have the same dimension.
More specifically, in the embodiment of the present invention, in tag propagation between a current node and j nodes, the j nodes are nodes related to an ith node, and a more specific iteration formula is as follows:
wherein Y is i Is the initialized sample label matrix of the current node and j nodes, L t,i Is the label distribution matrix obtained by the last iteration. L (L) t+1,i Representing the label distribution matrix of the ith node and j nodes.
As can be seen from the above iterative formula, the time overhead for solving is in the calculation of the tag distribution matrix. Obviously, the time overhead is related to the tag distribution matrix dimension. Due to p ij Is scalar, so the computational complexity is O (L t,i ) The computational complexity is reduced compared to matrix multiplication, and the time overhead of its solution is linear.
On a CPU stand-alone server with a hardware condition of x86 and 2.30GHz, large-scale graph node data (1 million nodes) are artificially generated, wherein each node comprises three non-zero first relation values (and is sparsely expressed according to the step 102), and the time cost for classifying and solving the large-scale graph structure (1 million nodes) is shown in fig. 4. As can be seen from fig. 4, the time overhead of the solution is substantially linearly related to the number of graph nodes.
After the iteration is completed, a final label distribution matrix of the ith node and the j nodes can be obtained, and all the nodes are traversed to obtain a final label distribution matrix of each node and the related nodes.
107. And classifying the current node and the j nodes according to the final label distribution matrix.
After iteration is completed, a final label distribution matrix of the current node and j nodes is obtained, and according to the final label distribution matrix, labels of corresponding nodes to be classified can be inquired, so that attribute classification of the nodes to be classified is determined. The node to be classified may be at least one of a current node and j nodes, and the specific classification is shown in fig. 5, where in fig. 5, the nodes with the same degree of color are the same class of nodes.
Alternatively, in step 106, the final label distribution matrix of each node and the nodes related to each node may be iteratively calculated by parallel calculation, so as to classify each node according to the final label distribution matrix, as shown in fig. 6, where fig. 6 shows the graph structure obtained by different nodes.
In the embodiment of the invention, the relation data among all nodes is acquired; extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space; acquiring relation data of a current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is greater than or equal to 1; reading first relation sparse representation data of the current node and the j nodes from the first storage space, and respectively calculating to obtain the similarity between the current node and the j nodes; according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained; performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes; and classifying the current node and the j nodes according to the final label distribution matrix. The first relation sparse representation data is extracted through the relation data among the nodes for storage, redundant data can be removed, the storage complexity of the data is reduced, meanwhile, in the classified calculation process, similarity calculation is carried out through the first relation sparse representation data, no redundant data is added, the calculation complexity is reduced, the I/O speed of the data in the classified calculation process is improved due to the reduction of the storage complexity of the data, the calculation complexity is reduced, and the calculation speed of the data in the classified calculation process is improved, so that the node classification efficiency of a large-scale graph structure is effectively improved, and the method can be used for classifying large-scale graph nodes in millions, tens of millions and above.
It should be noted that, the graph node classification method provided by the embodiment of the invention can be applied to devices such as a mobile phone, a monitor, a computer, a server and the like which can classify graph nodes.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a node classification device according to an embodiment of the present invention, where the device is used for classifying image files, and as shown in fig. 7, the device includes:
a first obtaining module 701, configured to obtain relationship data between each node, where the relationship data of each node is obtained by extracting data according to an image file of each node, and each node corresponds to one image file;
a first processing module 702, configured to extract first relationship sparse representation data of each node according to the relationship data, and store the first relationship sparse representation data in a first storage space, where the first relationship representation data includes first pointer data for indexing into the first relationship representation data;
a second obtaining module 703, configured to obtain relationship data of a current node and j nodes related to the current node, where the j nodes include at least one classified node, and j is greater than or equal to 1;
the label acquisition module 704 is configured to obtain an initialized sample label matrix corresponding to the current node according to the classified nodes;
A first calculation module 705, configured to read, according to the first pointer data, first relationship sparse representation data of the current node and the j nodes from the first storage space, and calculate, respectively, a similarity between the current node and the j nodes;
an iteration module 706, configured to perform label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterate to obtain a final label distribution matrix of the current node and the j nodes;
and the classification module 707 is configured to classify the current node and the j nodes according to the final label distribution matrix.
Optionally, the relationship data includes a preset first relationship value, and the node includes a node ID.
Optionally, as shown in fig. 8, the first processing module 702 includes:
a first extraction submodule 7021, configured to extract a node pair with a first relationship value greater than a preset relationship value, so as to obtain first relationship sparse representation data of each node;
a first storage submodule 7022 is configured to store the node ID with the node pair in a first storage space.
Optionally, as shown in fig. 9, the first calculating module 705 includes:
a second extraction submodule 7051, configured to extract, according to the first pointer data, first relationship sparse representation data of the current node and a target node from the first storage space, where the target node is any node among the j nodes;
a conversion submodule 7052, configured to convert the first relationship sparse representation data of the current node and the target node into a vector, so as to obtain a current node vector and a target node vector;
an object submodule 7053, configured to align the current node vector with a target node vector;
a second storage submodule 7054, configured to obtain second relationship sparse representation data of the current node and the target node through the aligned current node vector and target node vector, and store the second relationship sparse representation data into a second storage space;
and the first calculating submodule 7055 is used for reading second relation sparse expression data of the current node and the target node from the second storage space according to the second pointer data, and calculating the similarity of the current node and the target node.
Optionally, as shown in fig. 10, the second storage submodule 7054 includes:
a first calculating unit 70541, configured to calculate, according to the aligned current node vector and the target node vector, a second relationship value between the current node and the target node;
a processing unit 70542, configured to obtain second relationship sparse representation data of the current node and the target node according to the second relationship value, where the second relationship sparse representation data includes second pointer data for indexing into the second relationship representation data;
a storage unit 70543 configured to store the second relationship sparse representation data in a second storage space.
Optionally, as shown in fig. 11, the first computing submodule 7055 includes:
a second calculating unit 70551 for calculating a sum of elements in the current node vector;
a third calculation unit 70552 for calculating a sum of elements in the target node vector;
a reading unit 70553, configured to read, according to the second pointer data, second relationship sparse representation data of the current node and a target node from the second storage space;
and a fourth calculating unit 70554, configured to calculate, according to the sum of the elements in the current node vector, the sum of the elements in the target node vector, and the second relationship sparse representation data of the current node and the target node, obtain the similarity between the current node and the target node.
Optionally, as shown in fig. 12, the iteration module 706 includes:
a first obtaining submodule 7061, configured to obtain a first prior parameter and a second prior parameter, where the first prior parameter and the second prior parameter are nonnegative numbers with a sum of 1;
a second obtaining sub-module 7062, configured to obtain a label distribution matrix obtained in a previous iteration;
a first adjustment submodule 7063, configured to calculate a product matrix of a similarity between the current node and the j nodes and a label distribution matrix obtained at the previous iteration, and perform weighted adjustment on the product matrix through the first prior parameter;
a second adjustment submodule 7064, configured to perform weighted adjustment on the initialized sample tag matrix through the second prior parameter, and add the weighted adjusted initialized sample tag matrix to the product matrix to obtain a tag distribution matrix of the current iteration;
and an iteration submodule 7065, configured to iterate the steps until convergence, so as to obtain a final label distribution matrix of the current node and the j nodes.
Alternatively, as shown in fig. 13, the apparatus includes:
a second processing module 708, configured to obtain similarity sparse representation data of the current node and the j nodes according to similarities between the current node and the j nodes, where the similarity sparse representation data includes third pointer data for indexing into the similarity sparse representation data;
A second storage module 709 for storing the similarity sparse representation data into a third storage space;
a reading module 710, configured to read, according to third pointer data, similarity sparse representation data of the current node and the j nodes from the third storage space;
and the second calculation module 711 is configured to obtain the similarity between the current node and the j nodes according to the similarity sparse representation data of the current node and the j nodes.
It should be noted that the graph node classification device provided by the embodiment of the invention can be applied to devices such as a mobile phone, a monitor, a computer, a server and the like which can classify graph nodes.
The graph node classification device provided by the embodiment of the invention can realize each process realized by the graph node classification method in the method embodiment, and can achieve the same beneficial effects. In order to avoid repetition, a description thereof is omitted.
Referring to fig. 14, fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device is used for image file classification, as shown in fig. 14, and includes: memory 1402, processor 1401, and a computer program stored on the memory 1402 and executable on the processor 1401, wherein:
The processor 1401 is configured to call a computer program stored in the memory 1402, and execute the following steps:
acquiring the relation data among the nodes, wherein the relation data of the nodes are obtained by extracting data according to the image files of the nodes, and each node corresponds to one image file;
extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space, wherein the first relation sparse representation data comprises first pointer data for indexing into the first relation sparse representation data;
acquiring relation data of a current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is greater than or equal to 1;
according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained;
according to the first pointer data, first relation sparse representation data of the current node and the j nodes are read from the first storage space, and similarity between the current node and the j nodes is calculated respectively;
performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes;
And classifying the current node and the j nodes according to the final label distribution matrix.
Optionally, the relationship data includes a preset first relationship value, and the node includes a node ID.
Optionally, the extracting, by the processor 1401, first relationship sparse representation data of each node according to the relationship data, where the first relationship representation data includes first pointer data for indexing into the first relationship representation data, and storing the first relationship sparse representation data in a first storage space, where the first relationship representation data includes:
extracting node pairs with the first relation value larger than a preset relation value to obtain first relation sparse representation data of each node;
and storing the node ID into a first storage space by the node pair.
Optionally, the reading, by the processor 1401, the relationship sparse representation data of the current node and the j nodes from the first storage space according to the first pointer data, and calculating to obtain the similarity between the current node and the j nodes respectively includes:
extracting first relation sparse representation data of the current node and a target node from the first storage space according to the first pointer data, wherein the target node is any node in the j nodes;
The first relation sparse representation data of the current node and the target node are converted into vectors, and the current node vector and the target node vector are obtained;
aligning the current node vector with a target node vector;
obtaining second relation sparse representation data of the current node and the target node through the aligned current node vector and target node vector, and storing the second relation sparse representation data into a second storage space;
and according to the second pointer data, second relation sparse representation data of the current node and the target node are read from the second storage space, and the similarity between the current node and the target node is calculated.
Optionally, the obtaining, by the processor 1401, second relationship sparse representation data of the current node and the target node by using the aligned current node vector and the target node vector, and storing the second relationship sparse representation data in a second storage space includes:
according to the aligned current node vector and target node vector, calculating to obtain a second relation value of the current node and the target node;
obtaining second relation sparse representation data of the current node and the target node according to the second relation value, wherein the second relation sparse representation data comprises second pointer data used for indexing the second relation sparse representation data;
And storing the second relation sparse representation data into a second storage space.
Optionally, the reading, by the processor 1401, second relation sparse representation data of the current node and the target node from the second storage space according to the second pointer data, and calculating a similarity between the current node and the target node includes:
calculating the sum of all elements in the current node vector;
calculating the sum of all elements in the target node vector;
reading second relation sparse representation data of the current node and a target node from the second storage space according to the second pointer data;
and calculating the similarity between the current node and the target node according to the sum of all elements in the current node vector, the sum of all elements in the target node vector and the second relation sparse representation data of the current node and the target node.
Optionally, the performing, by the processor 1401, label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes, where the method includes:
Acquiring a first priori parameter and a second priori parameter, wherein the sum of the first priori parameter and the second priori parameter is a non-negative number of 1;
acquiring a label distribution matrix obtained in the last iteration;
calculating a product matrix of the similarity between the current node and the j nodes and a label distribution matrix obtained in the last iteration, and carrying out weighted adjustment on the product matrix through the first priori parameters;
the initialization sample tag matrix is subjected to weight adjustment through the second prior parameter, and the weighted initialization sample tag matrix is added with the product matrix to obtain a tag distribution matrix of the current iteration;
and iterating the steps until convergence, and obtaining the final label distribution matrix of the current node and the j nodes.
Optionally, after the reading the first relationship sparse representation data of the current node and the j nodes from the first storage space, and calculating the similarity between the current node and the j nodes respectively, the processor 1401 further executes the following steps:
obtaining similarity sparse representation data of the current node and the j nodes according to the similarity between the current node and the j nodes, wherein the similarity sparse representation data comprises third pointer data used for indexing to the similarity sparse representation data;
Storing the similarity sparse representation data into a third storage space;
before the performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes, the processor 1401 further performs the steps including:
according to third pointer data, similarity sparse representation data of the current node and the j nodes are read from the third storage space;
and obtaining the similarity between the current node and the j nodes according to the similarity sparse representation data of the current node and the j nodes.
The electronic device may be a mobile phone, a monitor, a computer, a server, or the like, which may be used for classifying nodes of a graph.
The electronic device provided by the embodiment of the invention can realize each process realized by the graph node classification method in the embodiment of the method, can achieve the same beneficial effects, and is not repeated here for avoiding repetition.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the graph node classification method provided by the embodiment of the invention, and can achieve the same technical effect, so that repetition is avoided, and no further description is provided herein.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (11)

1. A graph node classification method for classifying image files, comprising the steps of:
acquiring the relation data among the nodes, wherein the relation data of the nodes are obtained by extracting data according to the image files of the nodes, and each node corresponds to one image file;
extracting first relation sparse representation data of each node according to the relation data, and storing the first relation sparse representation data into a first storage space, wherein the first relation sparse representation data comprises first pointer data for indexing into the first relation sparse representation data;
Acquiring relation data of a current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is greater than or equal to 1;
according to the classified nodes, an initialized sample label matrix corresponding to the current node is obtained;
according to the first pointer data, first relation sparse representation data of the current node and the j nodes are read from the first storage space, and similarity between the current node and the j nodes is calculated respectively;
performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iterating to obtain a final label distribution matrix of the current node and the j nodes;
and classifying the current node and the j nodes according to the final label distribution matrix.
2. The method of claim 1, wherein the relationship data comprises a preset first relationship value and the node comprises a node ID.
3. The method of claim 2, wherein extracting the first relational sparse representation data for each node from the relational data and storing the first relational sparse representation data in the first storage space comprises:
Extracting node pairs with the first relation value larger than a preset relation value to obtain first relation sparse representation data of each node;
and storing the node ID into a first storage space by the node pair.
4. The method of claim 1, wherein the reading the relationship sparse representation data of the current node and the j nodes from the first storage space according to the first pointer data, and calculating the similarity between the current node and the j nodes respectively, includes:
extracting first relation sparse representation data of the current node and a target node from the first storage space according to the first pointer data, wherein the target node is any node in the j nodes;
the first relation sparse representation data of the current node and the target node are converted into vectors, and the current node vector and the target node vector are obtained;
aligning the current node vector with a target node vector;
obtaining second relation sparse representation data of the current node and the target node through the aligned current node vector and target node vector, and storing the second relation sparse representation data into a second storage space;
And according to the second pointer data, second relation sparse representation data of the current node and the target node are read from the second storage space, and the similarity between the current node and the target node is calculated.
5. The method of claim 4, wherein the obtaining second relationship sparse representation data for the current node and the target node from the aligned current node vector and target node vector, and storing the second relationship sparse representation data in a second storage space, comprises:
according to the aligned current node vector and target node vector, calculating to obtain a second relation value of the current node and the target node;
obtaining second relation sparse representation data of the current node and the target node according to the second relation value, wherein the second relation sparse representation data comprises second pointer data used for indexing the second relation sparse representation data;
and storing the second relation sparse representation data into a second storage space.
6. The method of claim 5, wherein the reading the second sparse representation data of the second relationship between the current node and the target node from the second storage space based on the second pointer data, and calculating the similarity between the current node and the target node, comprises:
Calculating the sum of all elements in the current node vector;
calculating the sum of all elements in the target node vector;
reading second relation sparse representation data of the current node and a target node from the second storage space according to the second pointer data;
and calculating the similarity between the current node and the target node according to the sum of all elements in the current node vector, the sum of all elements in the target node vector and the second relation sparse representation data of the current node and the target node.
7. The method according to any one of claims 1 to 6, wherein the performing label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and iteratively obtaining a final label distribution matrix of the current node and the j nodes includes:
acquiring a first priori parameter and a second priori parameter, wherein the sum of the first priori parameter and the second priori parameter is a non-negative number of 1;
acquiring a label distribution matrix obtained in the last iteration;
calculating a product matrix of the similarity between the current node and the j nodes and a label distribution matrix obtained in the last iteration, and carrying out weighted adjustment on the product matrix through the first priori parameters;
The initialization sample tag matrix is subjected to weight adjustment through the second prior parameter, and the weighted initialization sample tag matrix is added with the product matrix to obtain a tag distribution matrix of the current iteration;
and iterating the steps until convergence, and obtaining the final label distribution matrix of the current node and the j nodes.
8. The method according to any one of claims 1 to 6, wherein after the reading of the first relationship sparse representation data of the current node and the j nodes from the first storage space, respectively, the method further comprises:
obtaining similarity sparse representation data of the current node and the j nodes according to the similarity between the current node and the j nodes, wherein the similarity sparse representation data comprises third pointer data used for indexing to the similarity sparse representation data;
storing the similarity sparse representation data into a third storage space;
before the label propagation is performed on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and the final label distribution matrix of the current node and the j nodes is obtained through iteration, the method further comprises:
According to third pointer data, similarity sparse representation data of the current node and the j nodes are read from the third storage space;
and obtaining the similarity between the current node and the j nodes according to the similarity sparse representation data of the current node and the j nodes.
9. A graph node classification apparatus for classifying an image archive, the apparatus comprising:
the first acquisition module is used for acquiring the relation data among the nodes, wherein the relation data of the nodes are obtained by data extraction according to the image files of the nodes, and each node corresponds to one image file;
the first processing module is used for extracting first relation sparse expression data of each node according to the relation data and storing the first relation sparse expression data into a first storage space;
the second acquisition module is used for acquiring the relationship data of the current node and j nodes related to the current node, wherein the j nodes comprise at least one classified node, and j is more than or equal to 1;
the label acquisition module is used for acquiring an initialized sample label matrix corresponding to the current node according to the classified nodes;
The first calculation module is used for reading the first relation sparse representation data of the current node and the j nodes from the first storage space, and respectively calculating the similarity between the current node and the j nodes;
the iteration module is used for carrying out label propagation on the current node and the j nodes based on the similarity between the current node and the j nodes and the initialized sample label matrix, and carrying out iteration to obtain a final label distribution matrix of the current node and the j nodes;
and the classification module is used for classifying the current node and the j nodes according to the final label distribution matrix.
10. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the graph node classification method according to any one of claims 1 to 8 when the computer program is executed.
11. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps in the graph node classification method according to any of claims 1 to 8.
CN202010838847.2A 2020-08-19 2020-08-19 Graph node classification method and device, electronic equipment and storage medium Active CN112131446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010838847.2A CN112131446B (en) 2020-08-19 2020-08-19 Graph node classification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010838847.2A CN112131446B (en) 2020-08-19 2020-08-19 Graph node classification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112131446A CN112131446A (en) 2020-12-25
CN112131446B true CN112131446B (en) 2023-11-17

Family

ID=73850521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010838847.2A Active CN112131446B (en) 2020-08-19 2020-08-19 Graph node classification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112131446B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933442A (en) * 2015-06-16 2015-09-23 陕西师范大学 Method for propagating image label based on minimal cost path
CN106446806A (en) * 2016-09-08 2017-02-22 山东师范大学 Semi-supervised face identification method and system based on fuzzy membership degree sparse reconstruction

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8612364B2 (en) * 2009-10-29 2013-12-17 Xerox Corporation Method for categorizing linked documents by co-trained label expansion
WO2011156247A2 (en) * 2010-06-11 2011-12-15 Massachusetts Institute Of Technology Processor for large graph algorithm computations and matrix operations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933442A (en) * 2015-06-16 2015-09-23 陕西师范大学 Method for propagating image label based on minimal cost path
CN106446806A (en) * 2016-09-08 2017-02-22 山东师范大学 Semi-supervised face identification method and system based on fuzzy membership degree sparse reconstruction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
最小代价路径标签传播算法;汪西莉;蔺洪帅;;计算机学报(07);全文 *

Also Published As

Publication number Publication date
CN112131446A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
US11494589B2 (en) Systems and methods for unifying statistical models for different data modalities
US10242224B2 (en) Differentially private processing and database storage
JP7360497B2 (en) Cross-modal feature extraction method, extraction device, and program
CN110532417B (en) Image retrieval method and device based on depth hash and terminal equipment
US8954365B2 (en) Density estimation and/or manifold learning
US20210012153A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
CN113569554B (en) Entity pair matching method and device in database, electronic equipment and storage medium
CN109284414B (en) Cross-modal content retrieval method and system based on semantic preservation
CN111209974A (en) Tensor decomposition-based heterogeneous big data core feature extraction method and system
CN111582506A (en) Multi-label learning method based on global and local label relation
CN116777006A (en) Sample missing label enhancement-based multi-label learning method, device and equipment
CN108804544A (en) Internet video display multi-source data fusion method and device
Atto et al. On joint parameterizations of linear and nonlinear functionals in neural networks
CN111506832B (en) Heterogeneous object completion method based on block matrix completion
WO2022162427A1 (en) Annotation-efficient image anomaly detection
CN117349494A (en) Graph classification method, system, medium and equipment for space graph convolution neural network
CN112131446B (en) Graph node classification method and device, electronic equipment and storage medium
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
CN109299291A (en) A kind of Ask-Answer Community label recommendation method based on convolutional neural networks
CN115240782A (en) Drug attribute prediction method, device, electronic device and storage medium
CN111428741B (en) Network community discovery method and device, electronic equipment and readable storage medium
Li et al. Time series clustering based on relationship network and community detection
CN113537458B (en) Rational function neural network construction method, system and readable storage medium
CN114817668B (en) Automatic labeling and target association method for electromagnetic big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant