CN112925984A - GCN recommendation-based sample density aggregation method - Google Patents

GCN recommendation-based sample density aggregation method Download PDF

Info

Publication number
CN112925984A
CN112925984A CN202110358626.XA CN202110358626A CN112925984A CN 112925984 A CN112925984 A CN 112925984A CN 202110358626 A CN202110358626 A CN 202110358626A CN 112925984 A CN112925984 A CN 112925984A
Authority
CN
China
Prior art keywords
matrix
gcn
degree
recommendation
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110358626.XA
Other languages
Chinese (zh)
Inventor
董立岩
王浩
马心陶
刘元宁
朱晓冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110358626.XA priority Critical patent/CN112925984A/en
Publication of CN112925984A publication Critical patent/CN112925984A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a GCN recommendation-based sample density polymerization method, which comprises the following steps: step one, collecting an original data sample of a GCN model; step two, calculating a degree matrix and a high-order matrix of the adjacent matrix; step three, the high-order degree matrixes obtained in the step two are aggregated into a final-form degree matrix; and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting. The invention has the beneficial effects that: the amount of information carried by each object in the original sample is increased. Therefore, during GCN training, the neural network can obtain the density characteristic attribute of each object more completely and accurately, and the recommendation error rate of the recommendation system is reduced. The technical optimization method can bring better direct application experience to service users depending on GCN. The practical performance of the method has been tested, thus proving that the method is indeed effective.

Description

GCN recommendation-based sample density aggregation method
Technical Field
The invention relates to a sample density aggregation method, in particular to a GCN recommendation-based sample density aggregation method.
Background
At present, a GCN-based recommendation system model is applied in many fields, and the GCN-based recommendation system model is used for representing characteristics of some objects in a data mode. And then quantifying the characteristics of the objects by using a graph convolution neural network so as to edit the corresponding actual type of the objects. For example, in an e-commerce website, each user has a favorite commodity or brand, and the GCN can obtain the interest level of each user in a certain type of commodity by analyzing the relationship network of commodities and the contact between the users and the commodities. And then the recommending system recommends the commodity to the corresponding user.
However, the current GCN has a technical defect, so that the model using the GCN cannot sufficiently and accurately acquire the characteristic attribute of the target. This deficiency is due to the lack of preprocessing of the original samples of the target domain by the GCN, and in particular the lack of analysis of the adjacency density characteristics of each object in the relationship network.
Disclosure of Invention
The invention aims to solve the problem that a GCN model cannot sufficiently and accurately acquire the characteristic attribute of a target, and provides a GCN recommendation-based sample density aggregation method.
The invention provides a GCN recommendation-based sample density polymerization method, which comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D11D22D3Form (1) ofAggregating them into a final form of degree matrix D, and then using the matrix to calculate the Laplace matrix L of the graph by L ═ D-1/2AD-1/2Then, the obtained Laplace matrix is normalized and used for feature mapping and training, and before use, the coordination parameter alpha needs to be determined1And alpha2The value of the coordination parameter often has an available value interval, and a statistical method is used for drawing a model expression distribution curve to obtain the value interval of the coordination parameter, and the specific process is as follows:
firstly, the value range of the low-order coordination parameter needs to be determined, a GCN model is used for training and recommending, and then the accuracy of the model when the value of the low-order coordination parameter is a certain value is recorded and recorded. The values of the coordination parameters are increased or decreased equidistantly during the next round of statistics, the interval is generally small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process are in the form of quadratic functions, the coordination parameters corresponding to the maximum values are found, the values of the high-order coordination parameters are obtained by the same method, but the low-order coordination parameters are fixed at the moment, and the initial values of the coordination parameters are set according to the orders;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
The invention has the beneficial effects that:
the GCN recommendation-based sample density aggregation method optimizes original samples in the field related to GCN by using a local density fusion technical method, and aggregates the self characteristics of each object in a relation network with the local adjacent density characteristics of each object. This increases the amount of information carried by each object in the original sample. Therefore, during GCN training, the neural network can obtain the density characteristic attribute of each object more completely and accurately, and the recommendation error rate of the recommendation system is reduced. The technical optimization method can bring better direct application experience to service users depending on GCN. The practical performance of the method has been tested, thus proving that the method is indeed effective.
Drawings
FIG. 1 is a schematic diagram of an example of a second order matrix calculation with 4 point maps according to the present invention.
FIG. 2 is a schematic diagram of the GCN neural network according to the present invention.
FIG. 3 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
FIG. 4 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
FIG. 5 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
Detailed Description
Please refer to fig. 1 to 5:
the invention provides a GCN recommendation-based sample density polymerization method, which comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D11D22D3By aggregating them into a final form of degree matrix D, and then using this matrix to calculate the laplacian matrix L of the graph, by the method L ═ D-1/2AD-1/2The obtained Laplace matrix is then normalized and used for feature mappingShooting and training, requiring the determination of the co-ordination parameter a prior to use1And alpha2The value of the coordination parameter often has an available value interval, and a statistical method is used for drawing a model expression distribution curve to obtain the value interval of the coordination parameter, and the specific process is as follows:
firstly, the value range of the low-order coordination parameter needs to be determined, a GCN model is used for training and recommending, and then the accuracy of the model when the value of the low-order coordination parameter is a certain value is recorded and recorded. The values of the coordination parameters are increased or decreased equidistantly during the next round of statistics, the interval is generally small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process are in the form of quadratic functions, the coordination parameters corresponding to the maximum values are found, the values of the high-order coordination parameters are obtained by the same method, but the low-order coordination parameters are fixed at the moment, and the initial values of the coordination parameters are set according to the orders;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
The implementation principle of the invention is as follows:
for a relational graph (also called graph) with n nodes, G ═ G (V, E), assuming that its adjacency matrix and degree matrix are a and D, in GCN, the formula for calculating the laplacian matrix of the network is L ═ D-1/2AD-1/2. The degree matrix used by them is a first order matrix of nodes, called D1. Then, a high-order matrix is defined, and the definition of the high-order matrix represents the local density characteristic of a certain order: the first order matrix represents the first order density characteristics of the nodes, the second order matrix represents the second order density characteristics of the nodes, and so on. The calculation method of the high order matrix in the method can therefore be defined in the form:
D=D11D22D3+...+αx-1Dx
according to the formula, for each degree matrix which is larger than the first order, a coordination parameter alpha is required to be multiplied when the degree matrix participates in calculation, the coordination parameter alpha is used for adjusting the influence degree of the high-order adjacency density on the current node, a feasible interval is often arranged for the degree matrix, and the optimization of the recommendation precision is promoted in the interval.
The above equation is a formal definition of a degree matrix calculation, but the calculation algorithm for the high degree matrix is an unbalanced algorithm, and the imbalance can be interpreted as that the calculation of the high degree adjacency of the node later will be affected by the node which has calculated the high degree adjacency, as shown in fig. 1.
Taking the calculation of a three-degree matrix as an example, the input of the graph is a sparse adjacent matrix, and the calculation process is as follows: 1. and converting the sparse adjacent matrix into a common matrix, and solving the sum of elements in all row vectors of the matrix to obtain a first-order matrix. 2. And then traversing the adjacency matrix line by line to find the number of second-order adjacent nodes of the current node, thereby counting the second-order matrixes of all the nodes. 3. They are aggregated together using the formula mentioned before, resulting in a matrix D, which will be used to calculate the laplacian matrix.
Although a third degree matrix can be defined, in practical applications, the third degree matrix will not have more obvious effect, and will also increase the difficulty of counting the coordination parameters (because there will be two coordination parameters to be set).
The structure of the GCN neural network is shown in fig. 2.
The form formulated is: z ═ softmax (LTanh (LXW)0)W1);
In the formula W0And W1The weights of an input layer and a hidden layer in the neural network and the weights of the hidden layer and an output layer in the neural network are respectively, L is the Laplace matrix, and after the high-order density characteristic is added, the matrix contains more information capable of distinguishing each node. Features of Laplace matrixThe decomposition form is as follows:
Figure BDA0003004579030000061
in the convolution operation, the Fourier transformation between the weight of the neural network and the matrix F during training can be performed only by the participation of the matrix U, wherein the characteristic matrix U is obtained by characteristic decomposition of the Laplace matrix, so that a more accurate characteristic matrix U can be obtained according to the high-order Laplace matrix defined by the local density, and a better model can be obtained by using the matrix for training. The experimental results obtained through the optimization can fully show the effectiveness of the method, and the data set used in the experiment is shown in table 1. The model analyzes the link relation between nodes in the network. So that nodes in the network are analyzed for attributes and then recommended to users who like the research direction.
Figure BDA0003004579030000062
TABLE 1 data set used in the experiment
The experimental results obtained are shown in fig. 3, 4 and 5, and the curves with solid circles in the graphs represent the recommended accuracy of the GCN under the optimization of the method. While the open circular curve represents the performance of the GCN using the original sample.
In addition to the above method, further optimization is performed on the basis of the method, and the calculation method of the high-order matrix is an unbalanced calculation method, so that the high-order matrix is sorted in an ascending order or a descending order before being calculated, so as to adapt to the influence caused by the unbalanced state.
The matrix index conversion method is used in the sorting, and the implementation process of the matrix index conversion method is as follows: 1. and sequencing the first-order matrix, and recording all the node indexes and degrees after sequencing. 2. And selecting an ascending or descending method according to the sorted indexes to calculate the high-order degree matrix of each node one by one. The rest of the process is the same as before. We also verified the effect of this method by experiments. The verification results are shown in tables 2 to 7, and the results are expressed using accuracy.
The verification test is as follows:
the experimental results obtained by the ascending and descending operations of the degree matrix and the adjustment of the coordination parameter α are shown in the following table:
Figure BDA0003004579030000071
TABLE 2 Experimental results on the cora data set (ascending order)
Figure BDA0003004579030000072
TABLE 3 Experimental results on the citeseer data set (ascending order)
Figure BDA0003004579030000073
TABLE 4 Experimental results on pubmed data set (ascending order)
Figure BDA0003004579030000074
TABLE 5 results of experiments on cora data set (descending order)
Figure BDA0003004579030000075
Figure BDA0003004579030000081
TABLE 6 results of experiments on citeseer data set (descending order)
Figure BDA0003004579030000082
TABLE 7 Experimental results on pubmed data set (descending order)
Finally, the optimal experimental result obtained by the method is compared with the experimental results of other mainstream methods in the field, and as shown in the following tables, the data in the tables directly reflect whether a certain item in the graph can be recommended to the user more accurately in the recommendation system. The higher accuracy rate can enhance the user experience and bring higher economic benefits to the company in the aspect of big data services.
Figure BDA0003004579030000083
Table 8 compares the results obtained with other algorithms

Claims (1)

1. A GCN recommendation-based sample density aggregation method is characterized in that: the method comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D11D22D3By aggregating them into a final form of degree matrix D, and then using this matrix to calculate the laplacian matrix L of the graph, by the method L ═ D-1/2AD-1/2Then, the obtained Laplace matrix is normalized and used for feature mapping and training, and before use, the coordination parameter alpha needs to be determined1And alpha2The value of the coordination parameter always has an available value intervalAnd drawing a model expression distribution curve by using a statistical method to obtain the value intervals of the model expression distribution curve, wherein the specific process is as follows:
firstly, determining the value range of low-order coordination parameters, training by using a GCN model and recommending, then recording the accuracy of the model when the value of the low-order coordination parameters is a certain value, recording the accuracy, and increasing or decreasing the value of the coordination parameters at equal intervals when carrying out next round of statistics, wherein the interval is generally very small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process can be in the form of a quadratic function, and the coordination parameters corresponding to the maximum value are found;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
CN202110358626.XA 2021-04-02 2021-04-02 GCN recommendation-based sample density aggregation method Pending CN112925984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110358626.XA CN112925984A (en) 2021-04-02 2021-04-02 GCN recommendation-based sample density aggregation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110358626.XA CN112925984A (en) 2021-04-02 2021-04-02 GCN recommendation-based sample density aggregation method

Publications (1)

Publication Number Publication Date
CN112925984A true CN112925984A (en) 2021-06-08

Family

ID=76173874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110358626.XA Pending CN112925984A (en) 2021-04-02 2021-04-02 GCN recommendation-based sample density aggregation method

Country Status (1)

Country Link
CN (1) CN112925984A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631847A (en) * 2022-10-19 2023-01-20 哈尔滨工业大学 Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
CN116028727A (en) * 2023-03-30 2023-04-28 南京邮电大学 Video recommendation method based on image data processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO WANG等: ""A local density optimization method based on a graph convolutional network"", 《FRONTIERS OF INFORMATION TECHNOLOGY AND ELECTRONIC ENGINEERING》 *
HAO WANG等: ""A local density optimization method based on a graph convolutional network"", 《FRONTIERS OF INFORMATION TECHNOLOGY AND ELECTRONIC ENGINEERING》, vol. 21, no. 12, 23 December 2020 (2020-12-23), pages 1795 - 1803, XP037320180, DOI: 10.1631/FITEE.1900663 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631847A (en) * 2022-10-19 2023-01-20 哈尔滨工业大学 Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
CN115631847B (en) * 2022-10-19 2023-07-14 哈尔滨工业大学 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics
CN116028727A (en) * 2023-03-30 2023-04-28 南京邮电大学 Video recommendation method based on image data processing
CN116028727B (en) * 2023-03-30 2023-08-18 南京邮电大学 Video recommendation method based on image data processing

Similar Documents

Publication Publication Date Title
US20170300546A1 (en) Method and Apparatus for Data Processing in Data Modeling
CN112925984A (en) GCN recommendation-based sample density aggregation method
CN110991786A (en) 10kV static load model parameter identification method based on similar daily load curve
CN108629436B (en) Method and electronic equipment for estimating warehouse goods picking capacity
CN110765418B (en) Intelligent set evaluation method and system for basin water and sand research model
CN108830492B (en) Method for determining spot-check merchants based on big data
CN109636467A (en) A kind of comprehensive estimation method and system of the internet digital asset of brand
Wu et al. Comparing the aggregation methods in the analytic hierarchy process when uniform distribution
CN111126865B (en) Technology maturity judging method and system based on technology big data
Chittenden et al. Modelling the galaxy–halo connection with semi-recurrent neural networks
CN106570616A (en) Quantitative evaluation method for scientific and technological project evaluation
CN115952426B (en) Distributed noise data clustering method based on random sampling and user classification method
Maharani et al. The MFEP and MAUT methods in selecting the best employees
CN116341290A (en) Long storage equipment reliability sampling detection method
CN112766537B (en) Short-term electric load prediction method
Senthilkumar et al. Construction and selection of repetitive deferred variables sampling (RDVS) plan indexed by quality levels
CN114880490A (en) Knowledge graph completion method based on graph attention network
CN111062118B (en) Multilayer soft measurement modeling system and method based on neural network prediction layering
TW202312030A (en) Recipe construction system, recipe construction method, computer readable recording media with stored programs, and non-transitory computer program product
CN111652384B (en) Balancing method for data volume distribution and data processing method
CN114625781A (en) Commodity housing value-based batch evaluation method
Charongrattanasakul et al. Designing of optimal required sample sizes for double acceptance sampling plans under the zero-inflated defective data
Banditvilai et al. Forecasting Models for Thailand’s Electrical Appliances Export Values
CN110287272A (en) A kind of configurable real-time feature extraction method, apparatus and system
JP5978183B2 (en) Measurement value classification apparatus, method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210608