CN112925984A - GCN recommendation-based sample density aggregation method - Google Patents
GCN recommendation-based sample density aggregation method Download PDFInfo
- Publication number
- CN112925984A CN112925984A CN202110358626.XA CN202110358626A CN112925984A CN 112925984 A CN112925984 A CN 112925984A CN 202110358626 A CN202110358626 A CN 202110358626A CN 112925984 A CN112925984 A CN 112925984A
- Authority
- CN
- China
- Prior art keywords
- matrix
- gcn
- degree
- recommendation
- order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000002776 aggregation Effects 0.000 title claims description 10
- 238000004220 aggregation Methods 0.000 title claims description 10
- 239000011159 matrix material Substances 0.000 claims abstract description 74
- 230000008569 process Effects 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000012887 quadratic function Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 8
- 238000005457 optimization Methods 0.000 abstract description 6
- 238000006116 polymerization reaction Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000001174 ascending effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 3
- 241000689227 Cora <basidiomycete fungus> Species 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a GCN recommendation-based sample density polymerization method, which comprises the following steps: step one, collecting an original data sample of a GCN model; step two, calculating a degree matrix and a high-order matrix of the adjacent matrix; step three, the high-order degree matrixes obtained in the step two are aggregated into a final-form degree matrix; and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting. The invention has the beneficial effects that: the amount of information carried by each object in the original sample is increased. Therefore, during GCN training, the neural network can obtain the density characteristic attribute of each object more completely and accurately, and the recommendation error rate of the recommendation system is reduced. The technical optimization method can bring better direct application experience to service users depending on GCN. The practical performance of the method has been tested, thus proving that the method is indeed effective.
Description
Technical Field
The invention relates to a sample density aggregation method, in particular to a GCN recommendation-based sample density aggregation method.
Background
At present, a GCN-based recommendation system model is applied in many fields, and the GCN-based recommendation system model is used for representing characteristics of some objects in a data mode. And then quantifying the characteristics of the objects by using a graph convolution neural network so as to edit the corresponding actual type of the objects. For example, in an e-commerce website, each user has a favorite commodity or brand, and the GCN can obtain the interest level of each user in a certain type of commodity by analyzing the relationship network of commodities and the contact between the users and the commodities. And then the recommending system recommends the commodity to the corresponding user.
However, the current GCN has a technical defect, so that the model using the GCN cannot sufficiently and accurately acquire the characteristic attribute of the target. This deficiency is due to the lack of preprocessing of the original samples of the target domain by the GCN, and in particular the lack of analysis of the adjacency density characteristics of each object in the relationship network.
Disclosure of Invention
The invention aims to solve the problem that a GCN model cannot sufficiently and accurately acquire the characteristic attribute of a target, and provides a GCN recommendation-based sample density aggregation method.
The invention provides a GCN recommendation-based sample density polymerization method, which comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D1+α1D2+α2D3Form (1) ofAggregating them into a final form of degree matrix D, and then using the matrix to calculate the Laplace matrix L of the graph by L ═ D-1/2AD-1/2Then, the obtained Laplace matrix is normalized and used for feature mapping and training, and before use, the coordination parameter alpha needs to be determined1And alpha2The value of the coordination parameter often has an available value interval, and a statistical method is used for drawing a model expression distribution curve to obtain the value interval of the coordination parameter, and the specific process is as follows:
firstly, the value range of the low-order coordination parameter needs to be determined, a GCN model is used for training and recommending, and then the accuracy of the model when the value of the low-order coordination parameter is a certain value is recorded and recorded. The values of the coordination parameters are increased or decreased equidistantly during the next round of statistics, the interval is generally small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process are in the form of quadratic functions, the coordination parameters corresponding to the maximum values are found, the values of the high-order coordination parameters are obtained by the same method, but the low-order coordination parameters are fixed at the moment, and the initial values of the coordination parameters are set according to the orders;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
The invention has the beneficial effects that:
the GCN recommendation-based sample density aggregation method optimizes original samples in the field related to GCN by using a local density fusion technical method, and aggregates the self characteristics of each object in a relation network with the local adjacent density characteristics of each object. This increases the amount of information carried by each object in the original sample. Therefore, during GCN training, the neural network can obtain the density characteristic attribute of each object more completely and accurately, and the recommendation error rate of the recommendation system is reduced. The technical optimization method can bring better direct application experience to service users depending on GCN. The practical performance of the method has been tested, thus proving that the method is indeed effective.
Drawings
FIG. 1 is a schematic diagram of an example of a second order matrix calculation with 4 point maps according to the present invention.
FIG. 2 is a schematic diagram of the GCN neural network according to the present invention.
FIG. 3 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
FIG. 4 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
FIG. 5 is a diagram illustrating the comparison of the GCN model using the neighbor density aggregation method with the original GCN model in the recommendation accuracy.
Detailed Description
Please refer to fig. 1 to 5:
the invention provides a GCN recommendation-based sample density polymerization method, which comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D1+α1D2+α2D3By aggregating them into a final form of degree matrix D, and then using this matrix to calculate the laplacian matrix L of the graph, by the method L ═ D-1/2AD-1/2The obtained Laplace matrix is then normalized and used for feature mappingShooting and training, requiring the determination of the co-ordination parameter a prior to use1And alpha2The value of the coordination parameter often has an available value interval, and a statistical method is used for drawing a model expression distribution curve to obtain the value interval of the coordination parameter, and the specific process is as follows:
firstly, the value range of the low-order coordination parameter needs to be determined, a GCN model is used for training and recommending, and then the accuracy of the model when the value of the low-order coordination parameter is a certain value is recorded and recorded. The values of the coordination parameters are increased or decreased equidistantly during the next round of statistics, the interval is generally small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process are in the form of quadratic functions, the coordination parameters corresponding to the maximum values are found, the values of the high-order coordination parameters are obtained by the same method, but the low-order coordination parameters are fixed at the moment, and the initial values of the coordination parameters are set according to the orders;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
The implementation principle of the invention is as follows:
for a relational graph (also called graph) with n nodes, G ═ G (V, E), assuming that its adjacency matrix and degree matrix are a and D, in GCN, the formula for calculating the laplacian matrix of the network is L ═ D-1/2AD-1/2. The degree matrix used by them is a first order matrix of nodes, called D1. Then, a high-order matrix is defined, and the definition of the high-order matrix represents the local density characteristic of a certain order: the first order matrix represents the first order density characteristics of the nodes, the second order matrix represents the second order density characteristics of the nodes, and so on. The calculation method of the high order matrix in the method can therefore be defined in the form:
D=D1+α1D2+α2D3+...+αx-1Dx
according to the formula, for each degree matrix which is larger than the first order, a coordination parameter alpha is required to be multiplied when the degree matrix participates in calculation, the coordination parameter alpha is used for adjusting the influence degree of the high-order adjacency density on the current node, a feasible interval is often arranged for the degree matrix, and the optimization of the recommendation precision is promoted in the interval.
The above equation is a formal definition of a degree matrix calculation, but the calculation algorithm for the high degree matrix is an unbalanced algorithm, and the imbalance can be interpreted as that the calculation of the high degree adjacency of the node later will be affected by the node which has calculated the high degree adjacency, as shown in fig. 1.
Taking the calculation of a three-degree matrix as an example, the input of the graph is a sparse adjacent matrix, and the calculation process is as follows: 1. and converting the sparse adjacent matrix into a common matrix, and solving the sum of elements in all row vectors of the matrix to obtain a first-order matrix. 2. And then traversing the adjacency matrix line by line to find the number of second-order adjacent nodes of the current node, thereby counting the second-order matrixes of all the nodes. 3. They are aggregated together using the formula mentioned before, resulting in a matrix D, which will be used to calculate the laplacian matrix.
Although a third degree matrix can be defined, in practical applications, the third degree matrix will not have more obvious effect, and will also increase the difficulty of counting the coordination parameters (because there will be two coordination parameters to be set).
The structure of the GCN neural network is shown in fig. 2.
The form formulated is: z ═ softmax (LTanh (LXW)0)W1);
In the formula W0And W1The weights of an input layer and a hidden layer in the neural network and the weights of the hidden layer and an output layer in the neural network are respectively, L is the Laplace matrix, and after the high-order density characteristic is added, the matrix contains more information capable of distinguishing each node. Features of Laplace matrixThe decomposition form is as follows:
in the convolution operation, the Fourier transformation between the weight of the neural network and the matrix F during training can be performed only by the participation of the matrix U, wherein the characteristic matrix U is obtained by characteristic decomposition of the Laplace matrix, so that a more accurate characteristic matrix U can be obtained according to the high-order Laplace matrix defined by the local density, and a better model can be obtained by using the matrix for training. The experimental results obtained through the optimization can fully show the effectiveness of the method, and the data set used in the experiment is shown in table 1. The model analyzes the link relation between nodes in the network. So that nodes in the network are analyzed for attributes and then recommended to users who like the research direction.
TABLE 1 data set used in the experiment
The experimental results obtained are shown in fig. 3, 4 and 5, and the curves with solid circles in the graphs represent the recommended accuracy of the GCN under the optimization of the method. While the open circular curve represents the performance of the GCN using the original sample.
In addition to the above method, further optimization is performed on the basis of the method, and the calculation method of the high-order matrix is an unbalanced calculation method, so that the high-order matrix is sorted in an ascending order or a descending order before being calculated, so as to adapt to the influence caused by the unbalanced state.
The matrix index conversion method is used in the sorting, and the implementation process of the matrix index conversion method is as follows: 1. and sequencing the first-order matrix, and recording all the node indexes and degrees after sequencing. 2. And selecting an ascending or descending method according to the sorted indexes to calculate the high-order degree matrix of each node one by one. The rest of the process is the same as before. We also verified the effect of this method by experiments. The verification results are shown in tables 2 to 7, and the results are expressed using accuracy.
The verification test is as follows:
the experimental results obtained by the ascending and descending operations of the degree matrix and the adjustment of the coordination parameter α are shown in the following table:
TABLE 2 Experimental results on the cora data set (ascending order)
TABLE 3 Experimental results on the citeseer data set (ascending order)
TABLE 4 Experimental results on pubmed data set (ascending order)
TABLE 5 results of experiments on cora data set (descending order)
TABLE 6 results of experiments on citeseer data set (descending order)
TABLE 7 Experimental results on pubmed data set (descending order)
Finally, the optimal experimental result obtained by the method is compared with the experimental results of other mainstream methods in the field, and as shown in the following tables, the data in the tables directly reflect whether a certain item in the graph can be recommended to the user more accurately in the recommendation system. The higher accuracy rate can enhance the user experience and bring higher economic benefits to the company in the aspect of big data services.
Table 8 compares the results obtained with other algorithms
Claims (1)
1. A GCN recommendation-based sample density aggregation method is characterized in that: the method comprises the following steps:
collecting original data samples of a GCN model, sorting an original relation network into an object adjacent table and an object-attribute relation table, and taking the object adjacent table and the object-attribute relation table as the basis of the next calculation;
step two, calculating a degree matrix and a high-order degree matrix of the adjacent matrix, namely adding the process of calculating the degree of the adjacent object additionally when the degree of each row in the adjacent matrix is weighted, and completing the process through a loop statement in programming, wherein the process obtains the degree matrices of different orders;
step three, taking the high-order matrix obtained in the step two as D ═ D1+α1D2+α2D3By aggregating them into a final form of degree matrix D, and then using this matrix to calculate the laplacian matrix L of the graph, by the method L ═ D-1/2AD-1/2Then, the obtained Laplace matrix is normalized and used for feature mapping and training, and before use, the coordination parameter alpha needs to be determined1And alpha2The value of the coordination parameter always has an available value intervalAnd drawing a model expression distribution curve by using a statistical method to obtain the value intervals of the model expression distribution curve, wherein the specific process is as follows:
firstly, determining the value range of low-order coordination parameters, training by using a GCN model and recommending, then recording the accuracy of the model when the value of the low-order coordination parameters is a certain value, recording the accuracy, and increasing or decreasing the value of the coordination parameters at equal intervals when carrying out next round of statistics, wherein the interval is generally very small and is usually between 0.005 and 0.2, most of the expression curves of the model in the process can be in the form of a quadratic function, and the coordination parameters corresponding to the maximum value are found;
and step four, finishing the preprocessing process of the original sample data by using the fixed coordination parameter setting, then using the processed sample to carry out GCN network recommendation, and obtaining the recommendation content with higher quality, and after the set time, counting the values of the coordination parameters again due to the change of the relation network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110358626.XA CN112925984A (en) | 2021-04-02 | 2021-04-02 | GCN recommendation-based sample density aggregation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110358626.XA CN112925984A (en) | 2021-04-02 | 2021-04-02 | GCN recommendation-based sample density aggregation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112925984A true CN112925984A (en) | 2021-06-08 |
Family
ID=76173874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110358626.XA Pending CN112925984A (en) | 2021-04-02 | 2021-04-02 | GCN recommendation-based sample density aggregation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112925984A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115631847A (en) * | 2022-10-19 | 2023-01-20 | 哈尔滨工业大学 | Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment |
CN116028727A (en) * | 2023-03-30 | 2023-04-28 | 南京邮电大学 | Video recommendation method based on image data processing |
-
2021
- 2021-04-02 CN CN202110358626.XA patent/CN112925984A/en active Pending
Non-Patent Citations (2)
Title |
---|
HAO WANG等: ""A local density optimization method based on a graph convolutional network"", 《FRONTIERS OF INFORMATION TECHNOLOGY AND ELECTRONIC ENGINEERING》 * |
HAO WANG等: ""A local density optimization method based on a graph convolutional network"", 《FRONTIERS OF INFORMATION TECHNOLOGY AND ELECTRONIC ENGINEERING》, vol. 21, no. 12, 23 December 2020 (2020-12-23), pages 1795 - 1803, XP037320180, DOI: 10.1631/FITEE.1900663 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115631847A (en) * | 2022-10-19 | 2023-01-20 | 哈尔滨工业大学 | Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment |
CN115631847B (en) * | 2022-10-19 | 2023-07-14 | 哈尔滨工业大学 | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics |
CN116028727A (en) * | 2023-03-30 | 2023-04-28 | 南京邮电大学 | Video recommendation method based on image data processing |
CN116028727B (en) * | 2023-03-30 | 2023-08-18 | 南京邮电大学 | Video recommendation method based on image data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170300546A1 (en) | Method and Apparatus for Data Processing in Data Modeling | |
CN112925984A (en) | GCN recommendation-based sample density aggregation method | |
CN110991786A (en) | 10kV static load model parameter identification method based on similar daily load curve | |
CN108629436B (en) | Method and electronic equipment for estimating warehouse goods picking capacity | |
CN110765418B (en) | Intelligent set evaluation method and system for basin water and sand research model | |
CN108830492B (en) | Method for determining spot-check merchants based on big data | |
CN109636467A (en) | A kind of comprehensive estimation method and system of the internet digital asset of brand | |
Wu et al. | Comparing the aggregation methods in the analytic hierarchy process when uniform distribution | |
CN111126865B (en) | Technology maturity judging method and system based on technology big data | |
Chittenden et al. | Modelling the galaxy–halo connection with semi-recurrent neural networks | |
CN106570616A (en) | Quantitative evaluation method for scientific and technological project evaluation | |
CN115952426B (en) | Distributed noise data clustering method based on random sampling and user classification method | |
Maharani et al. | The MFEP and MAUT methods in selecting the best employees | |
CN116341290A (en) | Long storage equipment reliability sampling detection method | |
CN112766537B (en) | Short-term electric load prediction method | |
Senthilkumar et al. | Construction and selection of repetitive deferred variables sampling (RDVS) plan indexed by quality levels | |
CN114880490A (en) | Knowledge graph completion method based on graph attention network | |
CN111062118B (en) | Multilayer soft measurement modeling system and method based on neural network prediction layering | |
TW202312030A (en) | Recipe construction system, recipe construction method, computer readable recording media with stored programs, and non-transitory computer program product | |
CN111652384B (en) | Balancing method for data volume distribution and data processing method | |
CN114625781A (en) | Commodity housing value-based batch evaluation method | |
Charongrattanasakul et al. | Designing of optimal required sample sizes for double acceptance sampling plans under the zero-inflated defective data | |
Banditvilai et al. | Forecasting Models for Thailand’s Electrical Appliances Export Values | |
CN110287272A (en) | A kind of configurable real-time feature extraction method, apparatus and system | |
JP5978183B2 (en) | Measurement value classification apparatus, method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210608 |