CN110737837A - Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform - Google Patents
Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform Download PDFInfo
- Publication number
- CN110737837A CN110737837A CN201910981032.7A CN201910981032A CN110737837A CN 110737837 A CN110737837 A CN 110737837A CN 201910981032 A CN201910981032 A CN 201910981032A CN 110737837 A CN110737837 A CN 110737837A
- Authority
- CN
- China
- Prior art keywords
- researchers
- scientific
- research
- similarity
- social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000011160 research Methods 0.000 title claims abstract description 116
- 238000000034 method Methods 0.000 title claims abstract description 35
- 239000013598 vector Substances 0.000 claims description 42
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000013459 approach Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 description 9
- 230000003997 social interaction Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Educational Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a scientific research partner recommending method based on multidimensional characteristics under research gate platforms, which measures the association relationship between scientific researchers and other scientific researchers from three dimensions of text similarity of published papers sent by the scientific researchers under the research gate platforms, social association between the scientific researchers, influence of the scientific researchers and the like, constructs a scientific research partner recommending model by using a linear combination method to carry out Top-N recommendation, and provides personalized scientific research partner recommending service for the scientific researchers.
Description
Technical Field
The invention relates to a scientific research collaborator recommendation method based on multi-dimensional features under research gate platforms, and belongs to the technical field of software engineering recommendation systems, data mining and text mining.
Background
However, it is difficult tasks to find useful conversations among scientific researchers with the same or similar research interests, which take a lot of time in scientific research of the scientific workers, so that main problems in achieving scientific collaboration are to identify scientific collaborators with similar research interests.
Research gate, , the most popular research social networking platform at present, stands in 2008 and aims to promote scientific cooperation worldwide, effectively utilizes research gate as a catalyst for the contact between scientists, can greatly promote the communication and progress of research and research, and therefore, if the research social networking platform can help the researchers to find other researchers with the same or similar research interests, it is meaningful things.
Disclosure of Invention
The method measures the association relation between the scientific research workers and other scientific research workers from three dimensions of text similarity of published papers of the scientific research workers, social association degree between the scientific research workers and influence of the scientific research workers, and constructs a recommendation model of the scientific research collaborators by using a linear combination method, thereby providing personalized recommendation service for the scientific research collaborators for the scientific research workers.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
scientific research collaborator recommendation method based on multi-dimensional features under research gate platform, comprising the following steps:
(1) collecting published papers, social associations and self influence related data of scientific researchers under a research gate platform and carrying out preprocessing operation;
(2) calculating the text similarity of papers among scientific researchers by using a Doc2Vec text depth representation model;
(3) establishing a relationship matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by elements in the relationship matrix, and calculating the social association degree among the scientific researchers according to the proportion of common friends among the scientific researchers based on the relationship matrix;
(4) adding relevant data of the influence of the scientific researchers to average and calculate the influence of the scientific researchers;
(5) combining three dimensional characteristics of text similarity, social association degree and self influence of the paper to construct a linear combination recommendation model;
(6) and recommending by using a recommendation model, calculating a comprehensive similarity score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, and generating a recommended list of the scientific research collaborators for the given scientific research workers.
, the step (1) includes:
(11) collecting relevant data of published papers of scientific research workers under a research gate platform, wherein the relevant data comprises a paper title and a summary field;
(12) collecting relevant data of social associations among research workers under a research gate platform, wherein the relevant data comprises fields of concern and concern;
(13) collecting relevant data of self influence of a scientific research worker under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommendation numbers, reading amount and thesis number fields;
(14) collecting related data of self description information of scientific research workers under a research gate platform, wherein the related data comprises professional skills and subject fields;
(15) and cleaning, denoising, de-weighting and standardizing the collected data.
, the step (2) includes:
(21) reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and the like, and regarding punctuation marks as invalid words;
(22) considering the paper title and abstract as paragraphs, each paragraphs and each words in the paragraphs are represented in vector form using the distributed storage model (PV-DM) in the sentence vector Doc2Vec model;
(23) after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles so as to represent the similarity of papers between scientific researchers, and a calculation formula of the cosine similarity is as follows:
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
, the step (3) includes:
(31) according to the existing social network, a relation matrix a between n researchers is established, wherein a is n × n matrixes, and if a researcher u pays attention toA, a of researchersuv1, otherwise auv=0;
(32) Calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows:
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
And , adding the interest value, the reference number, the recommendation number, the reading amount and the paper number of the scientific research worker to average in the step (4) to obtain the influence of the scientific research worker.
, the step (5) includes:
(51) the text similarity, the social association degree and the influence are subjected to grouping respectively, and the grouping process is that the ratio of each feature value to the maximum value of the feature is subjected to grouping , so that the features are operated under the same order of magnitude, and the weight parameters are convenient to adjust;
(52) and (3) carrying out linear combination on the three dimensions subjected to the treatment of , and calculating the similarity among researchers, wherein the specific formula is as follows:
score(u1,u2)=α*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2BetweenSocial relevance of (1), popular (u)2) Presentation scientist u2α, β and gamma are weight parameters;
(53) by manually setting α, β and gamma, and then adjusting α, β and gamma based on the similarity of the professional skills and the subject fields of the researchers, the model is optimized, and candidate researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
The invention has the beneficial effects that the scientific research collaborator personalized recommendation method based on multi-dimensional features under a research gate platform measures the association relationship between the scientific research workers and other scientific research workers from three dimensions, such as text similarity of published papers of the scientific research workers, social association degree between the scientific research workers, influence of the scientific research workers and the like, and a scientific research collaborator recommendation model is constructed by utilizing a linear combination method, so that personalized scientific research collaborator recommendation service is provided for the scientific research workers, and the problem of difficulty in finding the scientific research collaborators is solved.
Drawings
FIG. 1 is a schematic diagram of a scientific research partner personalized recommendation method based on multi-dimensional features under a research Gate platform according to the present invention;
fig. 2 is a diagram of a Doc2vec network architecture.
Detailed Description
The scientific research collaborator personalized recommendation method based on multi-dimensional features mainly comprises the following six steps of collecting relevant data of scientific researchers under the research gate platform, social association and self influence of the scientific researchers, carrying out preprocessing operation, using a Doc2Vec text depth representation model to calculate text similarity between the scientific researchers, building a relation matrix between the scientific researchers according to the existing social network, marking the concerned information among the elements in the relation matrix, calculating social association between the scientific researchers according to the common proportion between the scientific researchers, using the relation matrix to calculate the social association degree between the scientific researchers, using a combination of the social association degree of the elements in the relation matrix and the social interaction degree of the elements to calculate average influence relevant data of the scientific research workers, and carrying out combined recommendation on the social interaction degree of the elements and the social interaction degree of the scientific research workers, wherein the three steps are that the scientific research collaborator personalized recommendation method based on the multi-dimensional features, and the scientific research collaborator personalized recommendation method is carried out by combining the three recommended research author recommendation models, and the three recommended scientific research collaborative recommendation methods are carried out by using a given dimensional feature combination test recommendation model.
The specific implementation process of each step is described in detail as follows:
and step , collecting published papers, social associations and self influence related data of scientific research workers under the research gate platform, and performing preprocessing operation.
The method specifically comprises the following steps:
step 11, collecting relevant data of published papers sent by scientific research workers under a research gate platform, wherein the relevant data comprises a paper Title and Abstract fields which are expressed by Title and Abstract;
step 12, collecting relevant data of social association among scientific research workers under the research gate platform, wherein the relevant data comprises fields of concern and are expressed by Following and Followers;
step 13, collecting relevant data of self influence of scientific research workers under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommended numbers, reading amounts and thesis number fields which are respectively expressed by interest value, CiteCount, RecomCount, ReadCount and ItemCount;
step 14, collecting relevant data of self description information of scientific research workers under the research gate platform, wherein the relevant data comprises professional skill and subject fields which are expressed by Skills and Topics, and the fields can be used for adjusting and optimizing parameters of a linear combination recommendation model;
and 15, cleaning, denoising, removing the weight and standardizing the collected data.
And step two, calculating the text similarity of the papers among the scientific researchers by using a Doc2Vec text depth representation model.
The step mainly utilizes the similarity of papers among scientific researchers as the similarity among the scientific researchers, and shows that the more similar the papers are, the more similar the research interests of the scientific researchers are.
The method specifically comprises the following steps:
step 21, reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and other operations, and regarding punctuation marks as invalid words;
step 22, regarding the paper titles and abstracts as paragraphs, representing each paragraphs and each word in the paragraphs in a vector form by using a distributed storage model (PV-DM) in a sentence vector Doc2Vec model;
the procedure is explained in conjunction with fig. 2, and here, a PV-DM model improved by using hierarchical software max is used to construct three-layer neural networks, i.e., an input layer, a projection layer (hidden layer) and an output layer, assuming that there are samples (context (w), where context (w) is composed of c words before and after a core word w as an input sample train _ X and the core word w as an output value train _ Y.
layer input layer, the input of the layer is randomly initialized K-dimensional segment vector V (context (para)) and 2c word vectors V (context (w) _1), V (context (w) _2), … … and V (context (w) _2c) in the segment context (w), the vector lengths are the same;
k takes 300, and the larger K, the more highly dimensional space paragraphs and words are mapped, and the more expressive the expression.
2c, selection: the length of the fixed context in the paragraph is generated by a sliding window method, and the larger the 2c is, the stronger the expression capability is, and the convergence speed is also reduced. A paragraph vector is shared in this context and can be considered as the subject of the paragraph.
Second layer projection layer: the layer accumulates 2c +1 vectors of the input layer and then calculates the average value to obtain a middle vector X _ w (K dimension) which is used as the input of the output layer hierarchical Softmax;
the third layer is an output layer which is Huffman trees, wherein leaf nodes are words in a corresponding vocabulary table, non-leaf nodes (coloring nodes) are equivalent to parameters W' from a hidden layer to the output layer in the original DNN model, the weight of the node is represented by P _ i, the weight is vectors, and a root node is the output X _ W of the projection layer.
The PV-DM predicts the words within this window given the paragraph and context vector. The specific training process is as follows:
1. initializing vectors with K dimensions for each paragraph and each word in the paragraph, and constructing a Huffman tree according to the word frequency;
2. training paragraphs in sequence, taking paragraphs as an example, inputting 2c word vectors in a context window of a segment vector and a central word W into a model, and accumulating and averaging by a projection layer to obtain a K-dimensional intermediate vector X _ w.X _ W which is input of a high iterative Soft max output layer and is a root node of a Huffman tree, wherein X _ W reaches a certain leaf node (namely a predicted current word W) along a certain path in the Huffman tree;
3. since W is known, the correct path from the root node to the leaf nodes can be determined based on the Huffman encoding of W, and the predictions that should be made on all classifiers (non-leaf nodes) on the path are also determined. For example, if the code of W is "01101", starting from the root node of the huffman tree, we want the probability that the intermediate vector is connected with the root node and divided into 0 through the iterative software ftmax calculation to be close to 1, the probability of inputting 1 at the second layer to be close to 1, and so on until the leaf node is reached;
4. proceeding until in 3, multiplying the probabilities obtained by calculation on paths to obtain the probability P of W in the current network, wherein the residual is (1-P), and then adjusting the parameters of non-leaf nodes in the path (the gradient is obtained by back propagation) by adopting a random gradient descent method to make the actual path close to the correct path, after n times of iterative convergence, obtaining the vector representation of each paragraph and each word in the paragraph, and after obtaining the vector representation of the paragraph, calculating the similarity between the papers by calculating cosine similarity.
The weights of non-leaf nodes in the Huffman tree are updated by using back propagation and random gradient descent every training, namely the weight parameters from a hidden layer to an output layer in the DNN model, and the segment vectors and the word vectors are also continuously updated. Through continuous training, the PV-DM model parameters obtained by the method are more and more accurate, and the segment vectors and the word vectors are more and more accurately expressed.
PV-DM, each training, slidingly intercepts a small number of words in a paragraph , and segment vectors are shared among several trainings with paragraphs, so there are several trainings with paragraphs, each training input contains segment vectors.
The PV-DM model improved by using the hierarchical Softmax has the advantages that the model adopts a Huffman tree, and words with larger weight (namely words with larger frequency) can obtain shorter codes at leaves with smaller depths. Such that more frequent words are discovered at a lesser cost.
Given as w1,w2,w3,…wTThe objective function is to maximize the average log-likelihood probability, as follows:
wherein p is the predicted central word wtProbability of success, T being the length of the training word sequence, c being the size of the background window, i.e. the core word wtIn connection with which context c words are present.
Step 23, after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles to represent the similarity of papers among researchers, and a calculation formula of the cosine similarity is as follows:
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
And step three, establishing a relation matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by the elements in the relation matrix, and calculating the social association degree among the scientific researchers according to the proportion of the common friends among the scientific researchers based on the relation matrix.
This step is primarily through the existing social network between the researchers, and then recommending new research collaborators to the researchers based on this social network, believing that two users with higher rates of common friends are more similar.
The method specifically comprises the following steps:
step 31, establishing a relation matrix A among n scientific research workers according to the existing social network, wherein A is n multiplied by n matrixes, and a is the relation matrix A if the scientific research workers u pay attention to the scientific research workers vuv1, otherwise auv=0;
For example: the friend list of the scientific researchers u in the matrix A is: u. ofa=(au1,au2,au3,…,aun)
Step 32, calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows;
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
And step four, adding relevant data of the influence of the scientific research workers to calculate the influence of the scientific research workers by averaging.
generally consider it more popular to recommend more influential research associates to a given researcher.
The method specifically comprises the following steps: and adding the interest value, the reference number, the recommended number, the reading amount and the paper number of the scientific researchers to average to obtain the influence of the scientific researchers.
The specific calculation formula is calculated as follows:
wherein: interestvalue is an interest value, CiteCount is a reference number, RecomCount is a recommended number, ReadCount is a reading amount, and ItemCount is a theoretical number.
And step five, combining the three dimensional characteristics of the text similarity, the social association degree and the self influence of the thesis to construct a linear combination recommendation model.
The method specifically comprises the following steps:
step 51, in order to control the influence of the three characteristics, namely text similarity, social association degree and influence on a final result, the three characteristics can be subjected to grouping respectively, and the grouping process is that the ratio of each characteristic value to the maximum value of the characteristic is subjected to grouping so that the characteristics are operated under the same order of magnitude and parameters are convenient to adjust;
step 52, carrying out linear combination on the three dimensions subjected to the treatment of grouping , providing calculation methods for calculating the similarity between scientific researchers, wherein the final score can be expressed as the following formula:
score(u1,u2)=a*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2Social relationship between, popup (u)2) Presentation scientist u2α, β and gamma are weight parameters;
step 53, α, β and gamma are manually set, α, β and gamma are adjusted through result feedback, so that the model is optimized, and candidate scientific researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
Specifically, the method for adjusting the parameters according to the result feedback is that the similarity between two scientific researchers is calculated by using two fields of professional Skills Skills and theme Topics of relevant data of description information of the scientific researchers in a test data set, and then the similarity between the two scientific researchers calculated by a linear combination recommendation model is closer to the real similarity by adjusting parameters α, β and gamma, namely the similarity calculated by the two fields of the professional Skills Skills and the theme Topics.
And sixthly, carrying out recommendation test by using the recommendation model, calculating the similarity comprehensive score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, generating a recommended list of the scientific research collaborators for the given scientific research workers, and returning the ranked Top-N scientific research collaborators to the given scientific research workers.
Claims (6)
1, scientific research collaborator recommendation method based on multi-dimensional features under research gate platform is characterized by comprising the following steps:
(1) collecting published papers, social associations and self influence related data of scientific researchers under a research gate platform and carrying out preprocessing operation;
(2) calculating the text similarity of papers among scientific researchers by using a Doc2Vec text depth representation model;
(3) establishing a relationship matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by elements in the relationship matrix, and calculating the social association degree among the scientific researchers according to the proportion of common friends among the scientific researchers based on the relationship matrix;
(4) adding relevant data of the influence of the scientific researchers to average and calculate the influence of the scientific researchers;
(5) combining three dimensional characteristics of text similarity, social association degree and self influence of the paper to construct a linear combination recommendation model;
(6) and recommending by using a recommendation model, calculating a comprehensive similarity score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, and generating a recommended list of the scientific research collaborators for the given scientific research workers.
2. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (1) comprises:
(11) collecting relevant data of published papers of scientific research workers under a research gate platform, wherein the relevant data comprises a paper title and a summary field;
(12) collecting relevant data of social associations among research workers under a research gate platform, wherein the relevant data comprises fields of concern and concern;
(13) collecting relevant data of self influence of a scientific research worker under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommendation numbers, reading amount and thesis number fields;
(14) collecting related data of self description information of scientific research workers under a research gate platform, wherein the related data comprises professional skills and subject fields;
(15) and cleaning, denoising, de-weighting and standardizing the collected data.
3. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (2) comprises:
(21) reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and the like, and regarding punctuation marks as invalid words;
(22) considering the paper title and abstract as paragraphs, each paragraphs and each words in the paragraphs are represented in vector form using the distributed storage model (PV-DM) in the sentence vector Doc2Vec model;
(23) after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles so as to represent the similarity of papers between scientific researchers, and a calculation formula of the cosine similarity is as follows:
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
4. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (3) comprises:
(31) according to the existing social network, a relation matrix a between n researchers is established, a being a n × n matrix, where a is if a researcher u pays attention to a researcher vuv1, otherwise auv=0;
(32) Calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows:
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
5. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein in step (4), the interest value, the number of citations, the recommendation number, the reading amount and the paper number of the researchers are added and averaged to obtain the influence of the researchers themselves.
6. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (5) comprises:
(51) the text similarity, the social association degree and the influence are subjected to grouping respectively, and the grouping process is that the ratio of each feature value to the maximum value of the feature is subjected to grouping , so that the features are operated under the same order of magnitude, and the weight parameters are convenient to adjust;
(52) and (3) carrying out linear combination on the three dimensions subjected to the treatment of , and calculating the similarity among researchers, wherein the specific formula is as follows:
score(u1,u2)=α*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2Social relationship between, popup (u)2) Presentation scientist u2α, β,Gamma is a weight parameter;
(53) by manually setting α, β and gamma, and then adjusting α, β and gamma based on the similarity of the professional skills and the subject fields of the researchers, the model is optimized, and candidate researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910981032.7A CN110737837B (en) | 2019-10-16 | 2019-10-16 | Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910981032.7A CN110737837B (en) | 2019-10-16 | 2019-10-16 | Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110737837A true CN110737837A (en) | 2020-01-31 |
CN110737837B CN110737837B (en) | 2022-03-08 |
Family
ID=69269085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910981032.7A Active CN110737837B (en) | 2019-10-16 | 2019-10-16 | Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110737837B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069306A (en) * | 2020-07-22 | 2020-12-11 | 中国科学院计算机网络信息中心 | Paper partner recommendation method based on author writing tree and graph neural network |
CN113064965A (en) * | 2021-03-23 | 2021-07-02 | 南京航空航天大学 | Intelligent recommendation method for similar cases of civil aviation unplanned events based on deep learning |
CN116910628A (en) * | 2023-09-12 | 2023-10-20 | 联通在线信息科技有限公司 | Creator expertise portrait assessment method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070266144A1 (en) * | 2006-05-09 | 2007-11-15 | Johan Bollen | Usage based indicators to assess the impact of scholarly works: architecture and method |
CN105260849A (en) * | 2015-10-21 | 2016-01-20 | 内蒙古科技大学 | Scientific researcher evaluation method across social networks |
CN105824922A (en) * | 2016-03-16 | 2016-08-03 | 重庆邮电大学 | Emotion classifying method fusing intrinsic feature and shallow feature |
CN107833142A (en) * | 2017-11-08 | 2018-03-23 | 广西师范大学 | Academic social networks scientific research cooperative person recommends method |
CN108427715A (en) * | 2018-01-30 | 2018-08-21 | 重庆邮电大学 | A kind of social networks friend recommendation method of fusion degree of belief |
CN109658277A (en) * | 2018-11-30 | 2019-04-19 | 华南师范大学 | A kind of science social networks friend recommendation method, system and storage medium |
CN109766431A (en) * | 2018-12-24 | 2019-05-17 | 同济大学 | A kind of social networks short text recommended method based on meaning of a word topic model |
-
2019
- 2019-10-16 CN CN201910981032.7A patent/CN110737837B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070266144A1 (en) * | 2006-05-09 | 2007-11-15 | Johan Bollen | Usage based indicators to assess the impact of scholarly works: architecture and method |
CN105260849A (en) * | 2015-10-21 | 2016-01-20 | 内蒙古科技大学 | Scientific researcher evaluation method across social networks |
CN105824922A (en) * | 2016-03-16 | 2016-08-03 | 重庆邮电大学 | Emotion classifying method fusing intrinsic feature and shallow feature |
CN107833142A (en) * | 2017-11-08 | 2018-03-23 | 广西师范大学 | Academic social networks scientific research cooperative person recommends method |
CN108427715A (en) * | 2018-01-30 | 2018-08-21 | 重庆邮电大学 | A kind of social networks friend recommendation method of fusion degree of belief |
CN109658277A (en) * | 2018-11-30 | 2019-04-19 | 华南师范大学 | A kind of science social networks friend recommendation method, system and storage medium |
CN109766431A (en) * | 2018-12-24 | 2019-05-17 | 同济大学 | A kind of social networks short text recommended method based on meaning of a word topic model |
Non-Patent Citations (7)
Title |
---|
C. YANG 等: "A Nearest Neighbor Based Personal Rank Algorithm for Collaborator Recommendation", 《2018 15TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM)》 * |
吴燎原 等: "一种改进List-wise的科技论文推荐方法研究", 《计算机应用研究》 * |
曾庆旺: "基于ResearchGate的科研合作者推荐研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
杨辰: "科研社交网络平台中的合作者推荐", 《中国博士学位论文全文数据库 经济与管理科学辑》 * |
熊回香 等: "基于学术能力及合作网络的学者研究", 《情报科学》 * |
袁成哲 等: "面向学术社交网络的多维度团队推荐模型", 《计算机科学与探索》 * |
赵杨等: "国内外学术社交网络研究现状述评与思考", 《情报资料工作》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069306A (en) * | 2020-07-22 | 2020-12-11 | 中国科学院计算机网络信息中心 | Paper partner recommendation method based on author writing tree and graph neural network |
CN112069306B (en) * | 2020-07-22 | 2022-09-09 | 中国科学院计算机网络信息中心 | Paper partner recommendation method based on author writing tree and graph neural network |
CN113064965A (en) * | 2021-03-23 | 2021-07-02 | 南京航空航天大学 | Intelligent recommendation method for similar cases of civil aviation unplanned events based on deep learning |
CN116910628A (en) * | 2023-09-12 | 2023-10-20 | 联通在线信息科技有限公司 | Creator expertise portrait assessment method and system |
CN116910628B (en) * | 2023-09-12 | 2024-02-06 | 联通在线信息科技有限公司 | Creator expertise portrait assessment method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110737837B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ji et al. | Learning private neural language modeling with attentive aggregation | |
CN109635291B (en) | Recommendation method for fusing scoring information and article content based on collaborative training | |
CN108021616B (en) | Community question-answer expert recommendation method based on recurrent neural network | |
CN107133224B (en) | Language generation method based on subject word | |
EP3180742B1 (en) | Generating and using a knowledge-enhanced model | |
CN110222163B (en) | Intelligent question-answering method and system integrating CNN and bidirectional LSTM | |
CN110543557B (en) | Construction method of medical intelligent question-answering system based on attention mechanism | |
CN110516085A (en) | The mutual search method of image text based on two-way attention | |
CN109635083B (en) | Document retrieval method for searching topic type query in TED (tele) lecture | |
CN112966091B (en) | Knowledge map recommendation system fusing entity information and heat | |
CN108090229A (en) | A kind of method and apparatus that rating matrix is determined based on convolutional neural networks | |
CN110737837A (en) | Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform | |
CN110321421B (en) | Expert recommendation method for website knowledge community system and computer storage medium | |
CN109063147A (en) | Online course forum content recommendation method and system based on text similarity | |
CN111079409A (en) | Emotion classification method by using context and aspect memory information | |
CN109145083B (en) | Candidate answer selecting method based on deep learning | |
CN117236410B (en) | Trusted electronic file large language model training and reasoning method and device | |
CN112818106A (en) | Evaluation method of generating type question and answer | |
Lai et al. | Transconv: Relationship embedding in social networks | |
CN112632253A (en) | Answer extraction method and device based on graph convolution network and related components | |
CN116561251A (en) | Natural language processing method | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN107239562A (en) | The analysis of public opinion method associated based on probability characteristics | |
CN116720519A (en) | Seedling medicine named entity identification method | |
CN116187852A (en) | Online course recommendation method based on community association and behavior feature learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |