CN110737837A - Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform - Google Patents

Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform Download PDF

Info

Publication number
CN110737837A
CN110737837A CN201910981032.7A CN201910981032A CN110737837A CN 110737837 A CN110737837 A CN 110737837A CN 201910981032 A CN201910981032 A CN 201910981032A CN 110737837 A CN110737837 A CN 110737837A
Authority
CN
China
Prior art keywords
researchers
scientific
research
similarity
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910981032.7A
Other languages
Chinese (zh)
Other versions
CN110737837B (en
Inventor
张鹏程
邵孟巧
金惠颖
张雅玲
于佳男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201910981032.7A priority Critical patent/CN110737837B/en
Publication of CN110737837A publication Critical patent/CN110737837A/en
Application granted granted Critical
Publication of CN110737837B publication Critical patent/CN110737837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a scientific research partner recommending method based on multidimensional characteristics under research gate platforms, which measures the association relationship between scientific researchers and other scientific researchers from three dimensions of text similarity of published papers sent by the scientific researchers under the research gate platforms, social association between the scientific researchers, influence of the scientific researchers and the like, constructs a scientific research partner recommending model by using a linear combination method to carry out Top-N recommendation, and provides personalized scientific research partner recommending service for the scientific researchers.

Description

Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
Technical Field
The invention relates to a scientific research collaborator recommendation method based on multi-dimensional features under research gate platforms, and belongs to the technical field of software engineering recommendation systems, data mining and text mining.
Background
However, it is difficult tasks to find useful conversations among scientific researchers with the same or similar research interests, which take a lot of time in scientific research of the scientific workers, so that main problems in achieving scientific collaboration are to identify scientific collaborators with similar research interests.
Research gate, , the most popular research social networking platform at present, stands in 2008 and aims to promote scientific cooperation worldwide, effectively utilizes research gate as a catalyst for the contact between scientists, can greatly promote the communication and progress of research and research, and therefore, if the research social networking platform can help the researchers to find other researchers with the same or similar research interests, it is meaningful things.
Disclosure of Invention
The method measures the association relation between the scientific research workers and other scientific research workers from three dimensions of text similarity of published papers of the scientific research workers, social association degree between the scientific research workers and influence of the scientific research workers, and constructs a recommendation model of the scientific research collaborators by using a linear combination method, thereby providing personalized recommendation service for the scientific research collaborators for the scientific research workers.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
scientific research collaborator recommendation method based on multi-dimensional features under research gate platform, comprising the following steps:
(1) collecting published papers, social associations and self influence related data of scientific researchers under a research gate platform and carrying out preprocessing operation;
(2) calculating the text similarity of papers among scientific researchers by using a Doc2Vec text depth representation model;
(3) establishing a relationship matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by elements in the relationship matrix, and calculating the social association degree among the scientific researchers according to the proportion of common friends among the scientific researchers based on the relationship matrix;
(4) adding relevant data of the influence of the scientific researchers to average and calculate the influence of the scientific researchers;
(5) combining three dimensional characteristics of text similarity, social association degree and self influence of the paper to construct a linear combination recommendation model;
(6) and recommending by using a recommendation model, calculating a comprehensive similarity score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, and generating a recommended list of the scientific research collaborators for the given scientific research workers.
, the step (1) includes:
(11) collecting relevant data of published papers of scientific research workers under a research gate platform, wherein the relevant data comprises a paper title and a summary field;
(12) collecting relevant data of social associations among research workers under a research gate platform, wherein the relevant data comprises fields of concern and concern;
(13) collecting relevant data of self influence of a scientific research worker under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommendation numbers, reading amount and thesis number fields;
(14) collecting related data of self description information of scientific research workers under a research gate platform, wherein the related data comprises professional skills and subject fields;
(15) and cleaning, denoising, de-weighting and standardizing the collected data.
, the step (2) includes:
(21) reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and the like, and regarding punctuation marks as invalid words;
(22) considering the paper title and abstract as paragraphs, each paragraphs and each words in the paragraphs are represented in vector form using the distributed storage model (PV-DM) in the sentence vector Doc2Vec model;
(23) after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles so as to represent the similarity of papers between scientific researchers, and a calculation formula of the cosine similarity is as follows:
Figure BDA0002235183580000031
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
, the step (3) includes:
(31) according to the existing social network, a relation matrix a between n researchers is established, wherein a is n × n matrixes, and if a researcher u pays attention toA, a of researchersuv1, otherwise auv=0;
(32) Calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows:
Figure BDA0002235183580000032
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
And , adding the interest value, the reference number, the recommendation number, the reading amount and the paper number of the scientific research worker to average in the step (4) to obtain the influence of the scientific research worker.
, the step (5) includes:
(51) the text similarity, the social association degree and the influence are subjected to grouping respectively, and the grouping process is that the ratio of each feature value to the maximum value of the feature is subjected to grouping , so that the features are operated under the same order of magnitude, and the weight parameters are convenient to adjust;
(52) and (3) carrying out linear combination on the three dimensions subjected to the treatment of , and calculating the similarity among researchers, wherein the specific formula is as follows:
score(u1,u2)=α*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2BetweenSocial relevance of (1), popular (u)2) Presentation scientist u2α, β and gamma are weight parameters;
(53) by manually setting α, β and gamma, and then adjusting α, β and gamma based on the similarity of the professional skills and the subject fields of the researchers, the model is optimized, and candidate researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
The invention has the beneficial effects that the scientific research collaborator personalized recommendation method based on multi-dimensional features under a research gate platform measures the association relationship between the scientific research workers and other scientific research workers from three dimensions, such as text similarity of published papers of the scientific research workers, social association degree between the scientific research workers, influence of the scientific research workers and the like, and a scientific research collaborator recommendation model is constructed by utilizing a linear combination method, so that personalized scientific research collaborator recommendation service is provided for the scientific research workers, and the problem of difficulty in finding the scientific research collaborators is solved.
Drawings
FIG. 1 is a schematic diagram of a scientific research partner personalized recommendation method based on multi-dimensional features under a research Gate platform according to the present invention;
fig. 2 is a diagram of a Doc2vec network architecture.
Detailed Description
The scientific research collaborator personalized recommendation method based on multi-dimensional features mainly comprises the following six steps of collecting relevant data of scientific researchers under the research gate platform, social association and self influence of the scientific researchers, carrying out preprocessing operation, using a Doc2Vec text depth representation model to calculate text similarity between the scientific researchers, building a relation matrix between the scientific researchers according to the existing social network, marking the concerned information among the elements in the relation matrix, calculating social association between the scientific researchers according to the common proportion between the scientific researchers, using the relation matrix to calculate the social association degree between the scientific researchers, using a combination of the social association degree of the elements in the relation matrix and the social interaction degree of the elements to calculate average influence relevant data of the scientific research workers, and carrying out combined recommendation on the social interaction degree of the elements and the social interaction degree of the scientific research workers, wherein the three steps are that the scientific research collaborator personalized recommendation method based on the multi-dimensional features, and the scientific research collaborator personalized recommendation method is carried out by combining the three recommended research author recommendation models, and the three recommended scientific research collaborative recommendation methods are carried out by using a given dimensional feature combination test recommendation model.
The specific implementation process of each step is described in detail as follows:
and step , collecting published papers, social associations and self influence related data of scientific research workers under the research gate platform, and performing preprocessing operation.
The method specifically comprises the following steps:
step 11, collecting relevant data of published papers sent by scientific research workers under a research gate platform, wherein the relevant data comprises a paper Title and Abstract fields which are expressed by Title and Abstract;
step 12, collecting relevant data of social association among scientific research workers under the research gate platform, wherein the relevant data comprises fields of concern and are expressed by Following and Followers;
step 13, collecting relevant data of self influence of scientific research workers under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommended numbers, reading amounts and thesis number fields which are respectively expressed by interest value, CiteCount, RecomCount, ReadCount and ItemCount;
step 14, collecting relevant data of self description information of scientific research workers under the research gate platform, wherein the relevant data comprises professional skill and subject fields which are expressed by Skills and Topics, and the fields can be used for adjusting and optimizing parameters of a linear combination recommendation model;
and 15, cleaning, denoising, removing the weight and standardizing the collected data.
And step two, calculating the text similarity of the papers among the scientific researchers by using a Doc2Vec text depth representation model.
The step mainly utilizes the similarity of papers among scientific researchers as the similarity among the scientific researchers, and shows that the more similar the papers are, the more similar the research interests of the scientific researchers are.
The method specifically comprises the following steps:
step 21, reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and other operations, and regarding punctuation marks as invalid words;
step 22, regarding the paper titles and abstracts as paragraphs, representing each paragraphs and each word in the paragraphs in a vector form by using a distributed storage model (PV-DM) in a sentence vector Doc2Vec model;
the procedure is explained in conjunction with fig. 2, and here, a PV-DM model improved by using hierarchical software max is used to construct three-layer neural networks, i.e., an input layer, a projection layer (hidden layer) and an output layer, assuming that there are samples (context (w), where context (w) is composed of c words before and after a core word w as an input sample train _ X and the core word w as an output value train _ Y.
layer input layer, the input of the layer is randomly initialized K-dimensional segment vector V (context (para)) and 2c word vectors V (context (w) _1), V (context (w) _2), … … and V (context (w) _2c) in the segment context (w), the vector lengths are the same;
k takes 300, and the larger K, the more highly dimensional space paragraphs and words are mapped, and the more expressive the expression.
2c, selection: the length of the fixed context in the paragraph is generated by a sliding window method, and the larger the 2c is, the stronger the expression capability is, and the convergence speed is also reduced. A paragraph vector is shared in this context and can be considered as the subject of the paragraph.
Second layer projection layer: the layer accumulates 2c +1 vectors of the input layer and then calculates the average value to obtain a middle vector X _ w (K dimension) which is used as the input of the output layer hierarchical Softmax;
the third layer is an output layer which is Huffman trees, wherein leaf nodes are words in a corresponding vocabulary table, non-leaf nodes (coloring nodes) are equivalent to parameters W' from a hidden layer to the output layer in the original DNN model, the weight of the node is represented by P _ i, the weight is vectors, and a root node is the output X _ W of the projection layer.
The PV-DM predicts the words within this window given the paragraph and context vector. The specific training process is as follows:
1. initializing vectors with K dimensions for each paragraph and each word in the paragraph, and constructing a Huffman tree according to the word frequency;
2. training paragraphs in sequence, taking paragraphs as an example, inputting 2c word vectors in a context window of a segment vector and a central word W into a model, and accumulating and averaging by a projection layer to obtain a K-dimensional intermediate vector X _ w.X _ W which is input of a high iterative Soft max output layer and is a root node of a Huffman tree, wherein X _ W reaches a certain leaf node (namely a predicted current word W) along a certain path in the Huffman tree;
3. since W is known, the correct path from the root node to the leaf nodes can be determined based on the Huffman encoding of W, and the predictions that should be made on all classifiers (non-leaf nodes) on the path are also determined. For example, if the code of W is "01101", starting from the root node of the huffman tree, we want the probability that the intermediate vector is connected with the root node and divided into 0 through the iterative software ftmax calculation to be close to 1, the probability of inputting 1 at the second layer to be close to 1, and so on until the leaf node is reached;
4. proceeding until in 3, multiplying the probabilities obtained by calculation on paths to obtain the probability P of W in the current network, wherein the residual is (1-P), and then adjusting the parameters of non-leaf nodes in the path (the gradient is obtained by back propagation) by adopting a random gradient descent method to make the actual path close to the correct path, after n times of iterative convergence, obtaining the vector representation of each paragraph and each word in the paragraph, and after obtaining the vector representation of the paragraph, calculating the similarity between the papers by calculating cosine similarity.
The weights of non-leaf nodes in the Huffman tree are updated by using back propagation and random gradient descent every training, namely the weight parameters from a hidden layer to an output layer in the DNN model, and the segment vectors and the word vectors are also continuously updated. Through continuous training, the PV-DM model parameters obtained by the method are more and more accurate, and the segment vectors and the word vectors are more and more accurately expressed.
PV-DM, each training, slidingly intercepts a small number of words in a paragraph , and segment vectors are shared among several trainings with paragraphs, so there are several trainings with paragraphs, each training input contains segment vectors.
The PV-DM model improved by using the hierarchical Softmax has the advantages that the model adopts a Huffman tree, and words with larger weight (namely words with larger frequency) can obtain shorter codes at leaves with smaller depths. Such that more frequent words are discovered at a lesser cost.
Given as w1,w2,w3,…wTThe objective function is to maximize the average log-likelihood probability, as follows:
Figure BDA0002235183580000071
wherein p is the predicted central word wtProbability of success, T being the length of the training word sequence, c being the size of the background window, i.e. the core word wtIn connection with which context c words are present.
Step 23, after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles to represent the similarity of papers among researchers, and a calculation formula of the cosine similarity is as follows:
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
And step three, establishing a relation matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by the elements in the relation matrix, and calculating the social association degree among the scientific researchers according to the proportion of the common friends among the scientific researchers based on the relation matrix.
This step is primarily through the existing social network between the researchers, and then recommending new research collaborators to the researchers based on this social network, believing that two users with higher rates of common friends are more similar.
The method specifically comprises the following steps:
step 31, establishing a relation matrix A among n scientific research workers according to the existing social network, wherein A is n multiplied by n matrixes, and a is the relation matrix A if the scientific research workers u pay attention to the scientific research workers vuv1, otherwise auv=0;
For example: the friend list of the scientific researchers u in the matrix A is: u. ofa=(au1,au2,au3,…,aun)
Step 32, calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows;
Figure BDA0002235183580000081
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
And step four, adding relevant data of the influence of the scientific research workers to calculate the influence of the scientific research workers by averaging.
generally consider it more popular to recommend more influential research associates to a given researcher.
The method specifically comprises the following steps: and adding the interest value, the reference number, the recommended number, the reading amount and the paper number of the scientific researchers to average to obtain the influence of the scientific researchers.
The specific calculation formula is calculated as follows:
Figure BDA0002235183580000082
wherein: interestvalue is an interest value, CiteCount is a reference number, RecomCount is a recommended number, ReadCount is a reading amount, and ItemCount is a theoretical number.
And step five, combining the three dimensional characteristics of the text similarity, the social association degree and the self influence of the thesis to construct a linear combination recommendation model.
The method specifically comprises the following steps:
step 51, in order to control the influence of the three characteristics, namely text similarity, social association degree and influence on a final result, the three characteristics can be subjected to grouping respectively, and the grouping process is that the ratio of each characteristic value to the maximum value of the characteristic is subjected to grouping so that the characteristics are operated under the same order of magnitude and parameters are convenient to adjust;
step 52, carrying out linear combination on the three dimensions subjected to the treatment of grouping , providing calculation methods for calculating the similarity between scientific researchers, wherein the final score can be expressed as the following formula:
score(u1,u2)=a*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2Social relationship between, popup (u)2) Presentation scientist u2α, β and gamma are weight parameters;
step 53, α, β and gamma are manually set, α, β and gamma are adjusted through result feedback, so that the model is optimized, and candidate scientific researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
Specifically, the method for adjusting the parameters according to the result feedback is that the similarity between two scientific researchers is calculated by using two fields of professional Skills Skills and theme Topics of relevant data of description information of the scientific researchers in a test data set, and then the similarity between the two scientific researchers calculated by a linear combination recommendation model is closer to the real similarity by adjusting parameters α, β and gamma, namely the similarity calculated by the two fields of the professional Skills Skills and the theme Topics.
And sixthly, carrying out recommendation test by using the recommendation model, calculating the similarity comprehensive score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, generating a recommended list of the scientific research collaborators for the given scientific research workers, and returning the ranked Top-N scientific research collaborators to the given scientific research workers.

Claims (6)

1, scientific research collaborator recommendation method based on multi-dimensional features under research gate platform is characterized by comprising the following steps:
(1) collecting published papers, social associations and self influence related data of scientific researchers under a research gate platform and carrying out preprocessing operation;
(2) calculating the text similarity of papers among scientific researchers by using a Doc2Vec text depth representation model;
(3) establishing a relationship matrix among the scientific researchers according to the existing social network, marking the concerned information among the scientific researchers by elements in the relationship matrix, and calculating the social association degree among the scientific researchers according to the proportion of common friends among the scientific researchers based on the relationship matrix;
(4) adding relevant data of the influence of the scientific researchers to average and calculate the influence of the scientific researchers;
(5) combining three dimensional characteristics of text similarity, social association degree and self influence of the paper to construct a linear combination recommendation model;
(6) and recommending by using a recommendation model, calculating a comprehensive similarity score of the candidate recommended scientific research collaborators and the given scientific research workers, ranking, and generating a recommended list of the scientific research collaborators for the given scientific research workers.
2. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (1) comprises:
(11) collecting relevant data of published papers of scientific research workers under a research gate platform, wherein the relevant data comprises a paper title and a summary field;
(12) collecting relevant data of social associations among research workers under a research gate platform, wherein the relevant data comprises fields of concern and concern;
(13) collecting relevant data of self influence of a scientific research worker under a research gate platform, wherein the relevant data comprises interest values, reference numbers, recommendation numbers, reading amount and thesis number fields;
(14) collecting related data of self description information of scientific research workers under a research gate platform, wherein the related data comprises professional skills and subject fields;
(15) and cleaning, denoising, de-weighting and standardizing the collected data.
3. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (2) comprises:
(21) reading an English corpus consisting of paper titles and abstracts, preprocessing data, converting case and case, checking spelling errors and the like, and regarding punctuation marks as invalid words;
(22) considering the paper title and abstract as paragraphs, each paragraphs and each words in the paragraphs are represented in vector form using the distributed storage model (PV-DM) in the sentence vector Doc2Vec model;
(23) after a vector space is generated, cosine similarity is utilized to calculate a cosine value between two segment vector included angles so as to represent the similarity of papers between scientific researchers, and a calculation formula of the cosine similarity is as follows:
Figure FDA0002235183570000021
a, B is a vector representation of two paragraphs, the cosine value is close to 1, the included angle approaches 0, indicating that the two vectors are more similar; the cosine value is close to 0 and the angle approaches 90 degrees, indicating that the two vectors are more dissimilar.
4. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (3) comprises:
(31) according to the existing social network, a relation matrix a between n researchers is established, a being a n × n matrix, where a is if a researcher u pays attention to a researcher vuv1, otherwise auv=0;
(32) Calculating the social association degree between the scientific researchers according to the proportion of the common friends between the scientific researchers, wherein the higher the proportion of the common friends is, the more similar the common friends is proved, and the calculation formula is as follows:
wherein out (u) is the set of researchers u pointing to other friends in the social network association graph, out (v) is the set of researchers v pointing to other friends in the social network association graph, out (u) ∩ out (v) represents the intersection of the two sets of researchers, | out (u) and | out (v) respectively represent the number of friends that the researchers are interested in sets out (u) and out (v).
5. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein in step (4), the interest value, the number of citations, the recommendation number, the reading amount and the paper number of the researchers are added and averaged to obtain the influence of the researchers themselves.
6. The scientific research collaborator recommendation method based on multi-dimensional features under the research gate platform as claimed in claim 1, wherein the step (5) comprises:
(51) the text similarity, the social association degree and the influence are subjected to grouping respectively, and the grouping process is that the ratio of each feature value to the maximum value of the feature is subjected to grouping , so that the features are operated under the same order of magnitude, and the weight parameters are convenient to adjust;
(52) and (3) carrying out linear combination on the three dimensions subjected to the treatment of , and calculating the similarity among researchers, wherein the specific formula is as follows:
score(u1,u2)=α*nor(paper(u1,u2))+β*nor(social(u1,u2))+γ*nor(popular(u2))
the formula integrates the factors of three dimensions, wherein paper (u)1,u2) Shows two researchers u1And u2Similarity of text between papers, social (u)1,u2) Shows two researchers u1And u2Social relationship between, popup (u)2) Presentation scientist u2α, β,Gamma is a weight parameter;
(53) by manually setting α, β and gamma, and then adjusting α, β and gamma based on the similarity of the professional skills and the subject fields of the researchers, the model is optimized, and candidate researchers u can be obtained finally2With a given researcher u1The score of similarity between them.
CN201910981032.7A 2019-10-16 2019-10-16 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform Active CN110737837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910981032.7A CN110737837B (en) 2019-10-16 2019-10-16 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910981032.7A CN110737837B (en) 2019-10-16 2019-10-16 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform

Publications (2)

Publication Number Publication Date
CN110737837A true CN110737837A (en) 2020-01-31
CN110737837B CN110737837B (en) 2022-03-08

Family

ID=69269085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910981032.7A Active CN110737837B (en) 2019-10-16 2019-10-16 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform

Country Status (1)

Country Link
CN (1) CN110737837B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069306A (en) * 2020-07-22 2020-12-11 中国科学院计算机网络信息中心 Paper partner recommendation method based on author writing tree and graph neural network
CN113064965A (en) * 2021-03-23 2021-07-02 南京航空航天大学 Intelligent recommendation method for similar cases of civil aviation unplanned events based on deep learning
CN116910628A (en) * 2023-09-12 2023-10-20 联通在线信息科技有限公司 Creator expertise portrait assessment method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266144A1 (en) * 2006-05-09 2007-11-15 Johan Bollen Usage based indicators to assess the impact of scholarly works: architecture and method
CN105260849A (en) * 2015-10-21 2016-01-20 内蒙古科技大学 Scientific researcher evaluation method across social networks
CN105824922A (en) * 2016-03-16 2016-08-03 重庆邮电大学 Emotion classifying method fusing intrinsic feature and shallow feature
CN107833142A (en) * 2017-11-08 2018-03-23 广西师范大学 Academic social networks scientific research cooperative person recommends method
CN108427715A (en) * 2018-01-30 2018-08-21 重庆邮电大学 A kind of social networks friend recommendation method of fusion degree of belief
CN109658277A (en) * 2018-11-30 2019-04-19 华南师范大学 A kind of science social networks friend recommendation method, system and storage medium
CN109766431A (en) * 2018-12-24 2019-05-17 同济大学 A kind of social networks short text recommended method based on meaning of a word topic model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266144A1 (en) * 2006-05-09 2007-11-15 Johan Bollen Usage based indicators to assess the impact of scholarly works: architecture and method
CN105260849A (en) * 2015-10-21 2016-01-20 内蒙古科技大学 Scientific researcher evaluation method across social networks
CN105824922A (en) * 2016-03-16 2016-08-03 重庆邮电大学 Emotion classifying method fusing intrinsic feature and shallow feature
CN107833142A (en) * 2017-11-08 2018-03-23 广西师范大学 Academic social networks scientific research cooperative person recommends method
CN108427715A (en) * 2018-01-30 2018-08-21 重庆邮电大学 A kind of social networks friend recommendation method of fusion degree of belief
CN109658277A (en) * 2018-11-30 2019-04-19 华南师范大学 A kind of science social networks friend recommendation method, system and storage medium
CN109766431A (en) * 2018-12-24 2019-05-17 同济大学 A kind of social networks short text recommended method based on meaning of a word topic model

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
C. YANG 等: "A Nearest Neighbor Based Personal Rank Algorithm for Collaborator Recommendation", 《2018 15TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM)》 *
吴燎原 等: "一种改进List-wise的科技论文推荐方法研究", 《计算机应用研究》 *
曾庆旺: "基于ResearchGate的科研合作者推荐研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨辰: "科研社交网络平台中的合作者推荐", 《中国博士学位论文全文数据库 经济与管理科学辑》 *
熊回香 等: "基于学术能力及合作网络的学者研究", 《情报科学》 *
袁成哲 等: "面向学术社交网络的多维度团队推荐模型", 《计算机科学与探索》 *
赵杨等: "国内外学术社交网络研究现状述评与思考", 《情报资料工作》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069306A (en) * 2020-07-22 2020-12-11 中国科学院计算机网络信息中心 Paper partner recommendation method based on author writing tree and graph neural network
CN112069306B (en) * 2020-07-22 2022-09-09 中国科学院计算机网络信息中心 Paper partner recommendation method based on author writing tree and graph neural network
CN113064965A (en) * 2021-03-23 2021-07-02 南京航空航天大学 Intelligent recommendation method for similar cases of civil aviation unplanned events based on deep learning
CN116910628A (en) * 2023-09-12 2023-10-20 联通在线信息科技有限公司 Creator expertise portrait assessment method and system
CN116910628B (en) * 2023-09-12 2024-02-06 联通在线信息科技有限公司 Creator expertise portrait assessment method and system

Also Published As

Publication number Publication date
CN110737837B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
Ji et al. Learning private neural language modeling with attentive aggregation
CN109635291B (en) Recommendation method for fusing scoring information and article content based on collaborative training
CN108021616B (en) Community question-answer expert recommendation method based on recurrent neural network
CN107133224B (en) Language generation method based on subject word
EP3180742B1 (en) Generating and using a knowledge-enhanced model
CN110222163B (en) Intelligent question-answering method and system integrating CNN and bidirectional LSTM
CN110543557B (en) Construction method of medical intelligent question-answering system based on attention mechanism
CN110516085A (en) The mutual search method of image text based on two-way attention
CN109635083B (en) Document retrieval method for searching topic type query in TED (tele) lecture
CN112966091B (en) Knowledge map recommendation system fusing entity information and heat
CN108090229A (en) A kind of method and apparatus that rating matrix is determined based on convolutional neural networks
CN110737837A (en) Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
CN110321421B (en) Expert recommendation method for website knowledge community system and computer storage medium
CN109063147A (en) Online course forum content recommendation method and system based on text similarity
CN111079409A (en) Emotion classification method by using context and aspect memory information
CN109145083B (en) Candidate answer selecting method based on deep learning
CN117236410B (en) Trusted electronic file large language model training and reasoning method and device
CN112818106A (en) Evaluation method of generating type question and answer
Lai et al. Transconv: Relationship embedding in social networks
CN112632253A (en) Answer extraction method and device based on graph convolution network and related components
CN116561251A (en) Natural language processing method
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN107239562A (en) The analysis of public opinion method associated based on probability characteristics
CN116720519A (en) Seedling medicine named entity identification method
CN116187852A (en) Online course recommendation method based on community association and behavior feature learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant