CN106886571A - A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis - Google Patents

A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis Download PDF

Info

Publication number
CN106886571A
CN106886571A CN201710030918.4A CN201710030918A CN106886571A CN 106886571 A CN106886571 A CN 106886571A CN 201710030918 A CN201710030918 A CN 201710030918A CN 106886571 A CN106886571 A CN 106886571A
Authority
CN
China
Prior art keywords
cooperation
sustainability
scholar
module
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710030918.4A
Other languages
Chinese (zh)
Inventor
夏锋
王伟
崔自鑫
高桐
孔祥杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201710030918.4A priority Critical patent/CN106886571A/en
Publication of CN106886571A publication Critical patent/CN106886571A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Automation & Control Theory (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of cooperation sustainability Forecasting Methodology based on social network analysis, is divided into cooperation Duration Prediction problem and cooperation continues number of times forecasting problem.Respective personal attribute and social property during two scholar's first time cooperations of statistics, and the data input cooperation sustainability forecast model that will be extracted, that is, obtain by the cooperation sustainability result of model prediction.The initial data extracted from actual computer scientific domain collection of thesis is calculated and is normalized to required attribute data by pretreatment module.Training module declines theory and sets up model using Assembled tree thought and gradient, and using the data point reuse parameter in training set, makes to predict the outcome more accurate.Prediction module is using the data in forecast set and has adjusted the model of parameter and is predicted.Evaluation module actual result and is predicted the outcome by contrasting, and uses " Jackknife " thought to evaluate predicting the outcome for model, calculates the overall performance of forecast model and the input factor chosen is to the influence degree of model prediction result.

Description

A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
Technical field
Social network is based on the present invention relates to a kind of Forecasting Methodology of scientific cooperation sustainability between scholar, more particularly to one kind The scientific cooperation sustainability Forecasting Methodology of network analysis.
Background technology
With fast-developing and science and technology the rapid advancement of science, increasing scholar's selection is total to by way of cooperation With the sciences problems for solving complexity.Scholars can shorten search time by the complementary excellent efficiency for lacking, improving research of cooperation, Make research process more rigorous, finally realize doulbe-sides' victory.Cooperation can help the scholar more efficiently to carry out scientific research and scientific research Analysis.And single-handed research mode then because the limitation and high error rate of personal subjectivity thought and gradually by academic research Abandon on boundary.Increasingly extensive due to scientific cooperation, people also begin to progressively understand and study cooperative mechanism, in finding cooperative mechanism Rule.In academia, two scholars may have cooperation more than once, that is, cooperate to have in relation between scholar certain Sustainability.Thus, the cooperative mechanism how two scholars are changed into partner from stranger is studied, two scholars can be predicted The sustainability of cooperation, is that scholar recommends the affiliate that is more suitable for, so as to advantageously promote scientific cooperation, promotes entering for science and technology Step.
The sustainability of accurate prediction scientific cooperation has certain difficulty.Mainly there is three below reason:First, it is academic The data volume of class data is more huge so that we are difficult to obtain required total data.Second, persistently having for cooperation is certain Contingency and uncertainty, and follow a kind of long tail type regularity of distribution and nonlinear regression, and skewness in time Even forecast model is difficult to set up.3rd, the factor for influenceing scientific cooperation sustainability is uncertain currently, while influence Interaction between factor also can produce interference to predicting the outcome.
At present, which the material elements of cooperation sustainability has between not clear and definite analytic demonstration proves influence scholar, But there is objective individual difference again in the sustainability cooperated between scholar.For the cooperative mechanism between specific researcher, I Propose prediction scientific cooperation sustainability this problem, and thought with integrated boosted tree sets up forecast model.Herein On basis, the present invention proposes a kind of scientific cooperation sustainability Forecasting Methodology based on social network analysis.
The content of the invention
The purpose of the present invention is, based on above mentioned problem, we work out a kind of personal by human relation network and scholar Attribute cooperate sustainability prediction method.Specifically, it is proposed that duration and cooperation number of times from cooperation Angle analyzes the sustainability of scientific cooperation, while personal attribute and the network of the scholar of cooperation sustainability will likely be influenceed Attribute as influence factor, and extensive experiment is carried out on the Academic Data collection (DBLP) of objective proving it is proposed that The validity of method.By after the influence factor of cooperation sustainability between fully analysis and demonstration influence scholar, it is proposed that one New model construction thought, integrated boosted tree thought are planted, and sets up forecast model, be named as cooperation sustainability prediction mould Type, the sustainability problem for predicting scholar's scientific cooperation.
A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, it is characterised in that:Cooperation can hold Continuous property Forecasting Methodology can be designed based on cooperation sustainability the fact early prediction.And integrated boosted tree can be used for Classification and recurrence.The timing node of the scholar's cooperation early stage of model two is when two people cooperate for the first time.Cooperation sustainability The conjunction of all stage that following two people of attribute forecast of community network cooperate when forecast model is by two scholar cooperation early stages Make time and cooperation number of times.And the cooperation sustainability of two scholars is evaluated with this.
Technical scheme:
A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, step is as follows:
The cooperation sustainability forecast model that Forecasting Methodology is used includes data extraction module and modelling module;
Data extraction module includes data prediction and evaluation module, and modelling module includes training module and prediction mould Block;
(1) data extraction module:It is mainly used in extracting the factor of influence cooperation sustainability;Due to cooperating influence at present The factor of mechanism does not have clear and definite conclusion, so need to be experimentally confirmed which factor has an impact the sustainability cooperated, And using these factors as model input factor, model by these be input into factor, to cooperate sustainability be predicted; Including data preprocessing module and evaluation module;
1. data preprocessing module:All data for training and testing cooperation sustainability forecast model be all from Extracted in DBLP data sets;DBLP data are that one group of paper delivered by the scholar of computer science is constituted;In order to eliminate The influence that the scholar of short-term research work produces to result was only done, only with delivering more than ten scholars of paper Data are trained to cooperation sustainability forecast model;After scholar's cooperation data set is rebuild, obtain all of any two Cooperation record between scholar;
In society, there is various factors to promote the sustainability of cooperation, such as personal purpose, main activities Area, cooperation preference and gained interests etc..And forecast model can consider that predicting the outcome obtained by more influence factors also can It is more accurate.In data preprocessing module, personal attribute and social property are extracted, altogether five data of influence factor, and point Analyse its influence to cooperation sustainability;
All of input variable is all normalized to [0,1], and to improve the efficiency of study, the normalization thought for being used is such as Under:
In addition, when the accurate calculating timing node of all input datas is two scholars cooperation for the first time.When calculate most It is that the record of cooperation each time all sets up new scientific cooperation network during short path, and will be cooperated by the network calculations Shortest path between person A and B;For example, scholar A and B started their cooperation in 2000, we will extract all 2000 The paper delivered before year, sets up cooperative network.Because the date issued accuracy of all data is different, if being accurate to the date Or month, the missing of information is will result in, so precision is only accurate to the time by us.
A. personal attribute's module:The prediction work of the cooperation sustainability between two scholars naturally depends on scholar's sheet Body, individual factor is being set up and very important effect has been played in safeguarding scientific research cooperative.Including acientific reputation, cooperation preference, Selection of the individual factors such as career stage to the cooperation behavior and cooperation object of scholar all has a great impact.This method Middle academic age, three attributes of paper amount and partner's quantity of extracting are used as personal attribute's factor.
The academic age:The academic age in finger cooperative relationship during scholar's A and B first time cooperation;Computational methods are to investigate In the time then, subtract scholar and deliver first time of paper.In fact scholar often has in the different career stages Different cooperation policy a, for example, doctor cooperates in the raw assistant often often with him of reading.
Publication amount:When referring to first time cooperation, the Quantity of Papers that scholar A and B are delivered.The publication amount of scholar can be in certain journey His academic performance is reacted on degree, outstanding scholar often has more cooperation and reputation higher.
Partner's quantity:Refer to that scholar A cooperates scholar's quantity of the above two respective cooperations with B.With academic age, publication amount Equally, these three attributes can react the cooperation policy of scholar.
B. social property:In addition to the individual factor of scholar, another direct factor of cooperation sustainability is influenceed just It is the social relation network between two scholars.Conventional research shows that the social status of scholar has very to his academic aptitude Big influence.Therefore it is believed that the sustainability of cooperation will be influenceed by social factor.Based on this it is assumed that we from DBLP Academic Data collection constructs a large-scale scientific cooperation network, wherein one scholar of each node on behalf, two nodes Between connection represent two scholars and had cooperation.Afterwards, we extract two simple essential characteristics from this cooperative network, i.e., Shortest path and common neighbours.
Common neighbours:Common neighbours refer to that two people had the quantity of the scholar of cooperation before scholar A and B cooperate for the first time. Theoretical according to famous sociological theory ternary closure, two people for possessing more how common neighbours are more possible to closed in future Make.Therefore we weigh relative position of two scholars in cooperative relationship network with common neighbours and close on degree.
Shortest path:In the cooperative network that shortest path referred to two scholars before without cooperation, other side is reached mutually The scholar's quantity to be passed through, shortest path can be used to measure the intimate degree between two scholars.
Personal attribute's data are obtained from the metadata of DBLP data sets.If but want obtain social property data, it is necessary to Set up cooperative relationship net.There is at least a piece of paper cooperation to be then identified as cooperation between two scholars.Meanwhile, in order to Those isolated nodes are filtered out, we are extracted the largest connected component in whole network.Based on this largest connected component, We are extracted required social property data.
2. evaluation module:Because no clear and definite influence factor is used as input factor, it is necessary to right after the completion of model is preliminary The performance of model and each input factor are analyzed to the influence for predicting the outcome, to determine all input factors selected all Influence can be produced on cooperation continuation.
We devise the property of substantial amounts of experimental demonstration cooperation sustainability forecast model by two real data sets Energy.Cooperation sustainability forecast model is first model for being used to predict scientific cooperation sustainability, so without same type Model carry out performance comparison.Therefore we use typical machine learning thought, four kinds of typical evaluation sides in linear regression Formula is evaluated predicting the outcome for model.Meanwhile, in order to investigate contribution rate of each input attribute to model, we are using such as Under " jackknife " thought to the contribution rate of each attribute:A. after removing an attribute, it is predicted using remaining attribute (deletion strategy).B. only it is predicted (increase strategy) using an attribute.C. (whole plans are predicted using all properties Slightly).
The sustainability prediction of scientific cooperation is a regression problem rather than classification problem.In regression problem, model is needed Predict a series of continuous values.Therefore, we employ four kinds of typical indexs, including MAE (mean absolute error), MSE (Mean Square Error), PCC (Pearson correlation coefficients) and CCC (uniformity coefficient correlation) evaluate the prediction of cooperation sustainability The performance of model.Provide actual value y (can be cooperation duration or number of times) and predicted valueThen have as follows:
The calculation of MAE:
The calculation of MSE:
The calculation of PCC:
The calculation of CCC:
Wherein, n is the number for predicting the outcome, yiWithIt is respectively legitimate reading and i-th value for predicting the outcome.It is y WithBetween covariance,WithBe respectively y andVariance,WithBe respectively y andAverage value.From these indexs It will be seen that estimated performance is better, the value of MAE and MSE is lower, and the value of PCC and CCC is higher for definition.
In view of nobody did the prediction work of scientific cooperation sustainability before, we use most classical model, linearly Regression model is compared with cooperation sustainability model.Linear regression model (LRM) be for prediction work finds function f (x), The function can be expressed as:
F (x)=ω1x12x2+...+ωdxd+b
Or be expressed as with the form of vector:
F (x)=ωT+b
Wherein ω and b are to learn from training set and obtain.In an experiment, we with all of possible influence factor as linear The input attribute of learning model, while being also analyzed to linear regression method with " Jackknife " thought.
(2) modelling module:Modelling module is responsible for structure and the training of whole cooperation sustainability forecast model, Including training module and prediction module.
1. training module:Because the prediction of scientific cooperation sustainability is a regression problem, so forecast model is by one The decision tree trained by gradient descent method is arranged to constitute.Specially Assembled tree module and gradient decline module.
A. Assembled tree module:The work of cooperation sustainability forecast model is just an attempt to the parameter x by givingiObtain pre- Survey result yi, and optimal parameter is found by the training set for giving.In order to find the parameter of preferably description data, people are always Less than one object function of form is defined, training loss and regular two parts are generally comprised.
Obj (Θ)=L (θ)+Ω (θ)
Wherein L is training loss function, and Ω is regularization term, and Θ is the intersection of input factor, and θ is each specific input Factor.Training loss function L is tested and is proposed performance of the model on training set, the complexity of regularization term Ω Controlling models, To prevent overfitting.
Cooperation sustainability forecast model is the set of a classification and Recurrent Sets, the prediction knot of each classification super ensemble Fruit is added and obtains final result, and specific calculating process is as follows:
Wherein K is the number of Assembled tree, fkIt is a lone tree, F is the set of all possible Assembled tree, therefore can be with Modification above-mentioned formula is as follows:
Wherein l is training loss function, and Ω is regularization term.
The regularization term Ω of cooperation sustainability forecast model is as follows:
The quantity that T and ω represent the leaf node of Assembled tree respectively corresponding with its predicts the outcome (node quality).γ and λ are The parameter of the regular degree of control.
B. described gradient declines module:From above-mentioned formula with function is used as parameter, in terms of equation, it Can not be optimized with traditional mode.Therefore we are instructed using a kind of mode of iteration to it
Practice and optimize.OrderPredicting the outcome during as i-th example, the t times iteration, and increase ftAs entity once The optimization of function:
Now Γ(t)It is physical object function Obj (Θ) in optimization process.
Taylor expansion is carried out to this entity function and is definedWithTherefore on State formula deployable for as follows:
T represents the total degree of iteration, Ij=i | q (xi)=j } represent the entity set of cotyledon node j, therefore optimal cotyledon Node qualityCan calculate by the following method:
WhereinResulting objective value can be calculated by the following manner:
In this case, a less Obj value can cause that the structure of integrated boosted tree is more preferable.Simultaneously to each leaf Child node addition is split, and the entity computing formula after division is:
Wherein L refers to left sibling, and R refers to right node,The quality of left cotyledon node is represented,Represent right son The quality of leaf node.Represent the undecomposed preceding value of origin node.γ represents the regularization entry value above accessory lobe.
2. described prediction module:Prediction module is responsible for being predicted the scientific cooperation sustainability of two scholars.Due to The sustainability of scientific cooperation can be studied and quantified in terms of cooperation duration and cooperation number of times two, so model prediction The prediction work of module is also made up of this two parts, i.e. cooperation duration continuation prediction module and cooperation number of times continuation prediction mould Block.
Beneficial effects of the present invention:This model can be used for predicting oneself and the cooperation between candidate affiliate for scholar Sustainability, helps scholar's selection more suitably affiliate, improves the scientific cooperation level between scholar, and advance science research It is progressive.This direction of cooperation sustainability can provide very big scientific research space between future, scholar simultaneously, it may be considered that to more Many factors are as the prediction attribute of model improving the precision of model.The sustainability of cooperation can extend mutual between scholar Cooperation.If increasing people is put into the research of cooperation sustainability, the mechanism of cooperation may be disclosed, and help lead People is led to formulate across mechanism, subject, even national cooperation policy.
Brief description of the drawings
Fig. 1 is the instance graph of sustainability scientific research cooperative.
Fig. 2 is the distribution map of scientific cooperation duration and number of times.
Fig. 3 is influence distribution map of scholar's individual factor to the cooperation duration.
Fig. 4 is the influence distribution map that scholar's individual factor continues number of times to cooperation.
Fig. 5 is common influence distribution map of the neighbours to cooperation sustainability between scholar.
Fig. 6 is influence distribution map of the shortest path to cooperation sustainability between scholar.
Fig. 7 is the general frame figure of cooperation sustainability forecast model.
Fig. 8 is the instance graph that data are extracted.
Fig. 9 is that cooperation sustainability forecast model and LR models show bent on different training sets to the prediction of cooperation duration Line chart.
Figure 10 is that cooperation sustainability forecast model and LR models are showed the prediction of cooperation number of times on different training sets Curve map.
Figure 11 is the prediction performance curve of cooperation sustainability forecast model and LR models under different cooperation sustainabilities Figure.
Specific embodiment
Below in conjunction with accompanying drawing and technical scheme, specific embodiment of the invention is further illustrated.
Be can see by the example of Fig. 1 scientific research cooperative continuation, the sustainability of scientific cooperation is not followed in time Linear regression.Meanwhile, the time of continuing cooperation and number of times have certain contingency and uncertainty, and follow a kind of long-tail The formula regularity of distribution, while identical cooperation duration but the different situation of cooperation frequency between different partners are likely to appear in, and Forecast model pockety is difficult to set up in time.So the common regression model of this problem of cooperation continuation is not Can accomplish accurately to predict.
Two evaluation criterions of cooperation sustainability, cooperation duration and cooperation number of times can be particularly seen by Fig. 2, in the time On the regularity of distribution, the two variables are also in compliance with the long tail type regularity of distribution.The cooperation duration of long tail type distribution is secondary with lasting Number does not follow strictly linear regression rule, and this can cause that the result of prediction produces larger deviation.Meanwhile, between two scholars Cooperation and non-static, the cooperation of two people may continue several years (Δ t), and the cooperation number of times of two people may in the meantime For several times (m).
Assuming that two scholars in one group of cooperative relationship are respectively i and j, define and computation attribute collection { x1,x2,x3,..., xn, this property set may decide that the lasting degree of the cooperation of two scholars, then the process of design forecast model has been reformed into looks for Collection of functions to shape such as f (X, Y) is used for description collections Y (y1,y2), while y1It is the duration that two people cooperate, y2For two people close Cooperation number of times during work due to cooperation duration and cooperation number of times codomain and differ, so cooperation duration and cooperation number of times It is two different scientific cooperation sustainability parameters.
Fig. 3 and Fig. 4 specifically describe the relation between individual factor and cooperation sustainability (CD, CT).In the width of Fig. 3 first In figure, the color of each pixel represents the flat of cooperation durations of two scholar A from B in the case of different academic age values Average (maximum for being considered is 50 years), maximum 50 for being considered herein.From the figure, it can be seen that with academic year The increase of age value, the value of cooperation duration significantly decreases.Compared with scholar higher of academic age, two more it is young just Lasting longer of cooperation duration of level scholar.Fig. 3 the second width figures show influence of the publication amount to cooperation duration, wherein publication amount Maximum be 300.Picture shows that, with the increase of publication amount, cooperation duration is drastically reduced.And similar trend model figure 3 the 3rd width figures are influence of partner's quantity to cooperation duration, and maximum of which partner quantity is 300 people.From three width of Fig. 3 In figure, we can be found that primary scholar possesses the smaller academic age, cooperate between publication amount, and the people of partner's quantitative value Will be more likely to last longer.With the increase of these individual factor values, cooperation duration is likely to decline.Individual factor Visible Fig. 4 of influence to cooperation number of times.The general trend of cooperation number of times is similar to cooperation duration, wherein the cooperation time of primary scholar Number will be bigger, and when academic age, publication amount, partner's quantitative value increase is that cooperation number of times significantly decreases.
We often think, when scholar selects new partner, are more likely to randomly choose or select the number that publishes thesis More candidates.Famous sociological theory ternary closure shows that people are generally easier and the neighbours of oneself turn into friend.But Our statistics but with the notional result of ternary closure conversely, two scholars possess lower social property similitude, that It is bigger that the cooperation of two people continues longer possibility.Scholar and and oneself have between the scholar of more intimate social relationships and close Make, duration will not be very long, and number of times also will not be a lot.Fig. 5 and Fig. 6 show the social property of scholar to cooperation continuation Influence.
In Figure 5, the size of each circle represents the quantity of such scholar, and round area is bigger, represents under the state Scholar's number it is more.Curve in each subgraph is the matched curve of the figure.According to reality, it is considered to maximum common neighbours and Shortest path is respectively 10 and 50. Fig. 5 and describes influence of the common neighbours to cooperation sustainability.Can be seen from figure one it is bright Aobvious rule, with the increase of common neighbours, cooperation duration and cooperation number of times can all tend to linear decline.Fig. 6 is then embodied most Positive influence of the short path on cooperation duration and cooperation number of times, with the gradually increase of shortest path, cooperation duration and cooperation are secondary Number also can gradually increase.
By above-mentioned data it is proposed that cooperation sustainability forecast model.Cooperation sustainability forecast model is based on cooperation Sustainability can be designed the fact getting up early is predicted.And integrated boosted tree can be used for classifying and return.Model two The timing node of scholar's cooperation early stage is when two people cooperate for the first time.The overall framework of cooperation sustainability forecast model is shown in figure 7.As can see from Figure 7, whole cooperation sustainability forecast model is mainly comprising several modules once.
(1) data preprocessing module:
By Fig. 8, we are recognized that we extract the process of required data from DBLP.If it is desirable that prediction Cooperation sustainability between scholar Lin Da and Bob then has the process for extracting each attribute data as follows.Lin Da was opened since 2014 Before beginning research work is cooperated to her with Bob, delivered four papers altogether, and once respectively with scholar Xia Feng, Wang Wei, Yi Wanhe Made.Since Bob had two partners, Xia Feng and Yi Wan delivered three papers after research work 2005 altogether. Lin Da and Bob cooperate their first time since 2015.It is possible thereby to calculate, the academic age point of Lin Da and Bob It is not 2 and 10, publication amount is respectively 4 and 3, partner's quantity is respectively 4 and 2.Meanwhile, can be built by their cooperation record The cooperative network of 2015 (only shows basic cooperative network, removing should also have others outside above-mentioned several people for the sake of simple Scholar participates in the structure of cooperative network), it is 2 (including Xia Feng and Yi Wan) that we can obtain the common neighbours of Lin Da and Bob, Shortest path is 2.
We are extracted two entirely different data groups from DBLP, and each of which metadata contains the topic of paper Mesh, author delivers the time, delivers periodical etc. information.According to these information, we can build required scientific cooperation network And calculate used all input attributes.In addition, we have delivered at least ten opinions in being chosen at DBLP data sets first The scholar of text, the scholar for being devoted to research work always is filtered out with this.Then the most Dalian in scientific cooperation network is extracted Reduction of fractions to a common denominator amount.And the set of these data is named as data set 1.Additionally, we have chosen has delivered super in DBLP data sets 80 scholars of paper are crossed, and it be obviously (the statistics of two datasets of data set 2 that their data of attribute information is integrated Refer to table 1).It is clear that the cooperation sustainability between scholar in data set 2 is higher.
(2) training module and prediction module described in:
Liang Ge seminar data set 1 and data set 2 are divided into two subsets, training set and test set by us.Wherein Training set is used for the parameter of training pattern, and test set is used to assess the performance for proposing Forecasting Methodology.Specifically, we are random Select in each data set 20% test set of the data as each data set.In order to research and training size of data can hold to cooperation The influence of continuous property forecast model performance, from 10% be sequentially adjusted in 90% the ratio shared by training set data by we.Meanwhile, Cross validation is used in experiment to ensure the stability and validity of our Forecasting Methodologies.
In order to assess performance of the cooperation sustainability forecast model on different pieces of information collection, we are by model in different numbers According to being tested on collection (be shown in Table 1).Specifically, the scholar of data set 1 has at least delivered 10 papers, the scholar of data set 2 80 papers are at least delivered.It will be seen that either cooperation number of times or cooperation duration, of data set 2 from table 1 Person is higher than the scholar of data set 1.Therefore, we can allow cooperation sustainability forecast model to be run on the two data sets Experiment, with the performance (result is shown in Fig. 9 and 10) of more fully assessment models.
Table 1 is the data statistic of two datasets
Data set Data volume Cooperative relationship amount Average cooperation duration Average cooperation number of times
1(>10 articles) 185739 3443845 2.699 2.983
2(>80 articles) 14022 14022 3.077 3.825
Fig. 9 describes cooperation sustainability forecast model and LR models on data set 1 and 2 to the table of forecast collaboration time It is existing.It will be seen that with the increase of training set data amount from Fig. 9 (a), cooperation sustainability forecast model and LR models MAE values can all be reduced with the increase of training set, it means that the performance of forecast model can be with the increase of training set Improve.When training data increases to 90% from 20%, by taking data set 1 as an example, the MAE values of cooperation sustainability forecast model from 2.39 are reduced to 1.91.On the other hand, it will be seen that the forecast model that we are proposed always will than the performance of LR model Good, cooperation sustainability forecast model shows in the prediction of data set 1 and 2 respectively will 11% He than LR predictions in MAE values 12%, exactly because this forecast model employs the thought of integrated boosted tree.Cooperation sustainability forecast model is in data set 2 Predictive ability is in all fields all due to the result of data set 1.Therefore deduce that, cooperation sustainability forecast model is more good at Cooperation sustainability between the academic achievement of prediction scholar higher.
Additionally, in terms of the forecast collaboration time, the present invention at MSE (see Fig. 9 (b)), PCC (see accompanying drawing 9 (c)) and CCC (see Accompanying drawing 9 (d)) performance will be better than LR models.And from accompanying drawing 9, we can also draw a conclusion, with the increase of training set, nothing By being prediction that cooperation sustainability forecast model or LR models can realize the more preferable cooperation duration, and cooperation can hold Continuous predicting the outcome for property forecast model is always better than LR models.
As before stated, the cooperation sustainability prediction not only prediction including cooperation duration, also including cooperation number of times Prediction.Figure 10 describes how training set size influences what U-shaped sustainability forecast model and LR models to cooperation number of times Predict the outcome.When Figure 10 (a) is demonstrated by training set size to cooperation sustainability forecast model and LR model prediction cooperation number of times The influence of MAE values.It will be seen that with the increase of training set proportion, cooperation sustainability forecast model from figure The predictive ability of cooperation number of times is all increased with LR models.Meanwhile, the MAE values of cooperation sustainability forecast model are always Prison LR models, it means that the estimated performance of cooperation sustainability forecast model is better than LR models, this with or time prediction Result during experiment is identical.In addition, with the increasing of training set proportion, the MAE of cooperation sustainability forecast model Value declines faster, and this also indicates that the value of cooperation sustainability forecast model also declines more when training set data amount is increased Hurry up.Figure 10 (c), (b) and (d) can also show the estimated performance of the performance better than LR of cooperation sustainability forecast model.
(3) evaluation module described in:
In order to predict the cooperation sustainability between scholar, we introduce two influence factors, and personal attribute and society belong to Property.We introduce " jackknife " thought:A. after removing an attribute, it is predicted (deletion strategy) using remaining attribute. B. only it is predicted (increase strategy) using an attribute.C. it is predicted (all strategies) using all properties.Based on this three Individual thought, we can find each variable to the overall influence for predicting the outcome.2,3,4 and 5 points of tables of table are illustrated in this thought Under the MAE that predicts the outcome, the value of MSE, PCC and CCC.
Table 2 is the analysis of Influential Factors statistical form to the cooperation duration on data set 1
Table 2 shows under the thought of " Jackknife " cooperation sustainability forecast model in data set 1 to cooperation duration The performance of prediction.After common neighbours are deleted from input factor, the value of MAE improves 10% and (is brought up to from 1.702 1.867).The common neighbours of this explanation are the key factors of forecast collaboration time.And conversely, the MAE values of other input factors are in quilt MAE values have small raising after deletion, illustrate other input factors can also influence cooperation duration predict precision, but shadow Sound is limited.When we employ it is increased tactful when, although partner's quantity increased to the influence for predicting the outcome, but still It is common neighbours maximum to the influence that predicts the outcome.Similar result be able to can be seen in the test data of MSE, PCC and CCC Out.When all of input factor has been used, the performance of cooperation sustainability forecast model also reaches best state.This table Bright, the cooperation sustainability of currently considered all input factors to cooperation sustainability forecast model in data set 1 is predicted Result all has an impact.
Table 3 is the analysis of Influential Factors statistical form for continuing number of times to cooperation on data set 1
Table 3 shows under " Jackknife " thought cooperation sustainability forecast model on data set 1 to cooperation number of times The performance of prediction.Be can see from table 3, common neighbours still play an important role to estimated performance.When common neighbours When being deleted from input factor, the value of MAE has reached 2.11, the value highest of this MAE in all of deletion strategy.Namely Say, common neighbours are the closest with predicting the outcome for cooperation duration.When using increased tactful, the MAE values of factor publication amount Maximum, i.e. publication amount can produce positive effect to predicting the outcome.Meanwhile, when all of factor is all as input factor, MAE, MSE, PCC and CCC show optimum state, i.e., all of input factor is on data set 1 to the prediction of cooperation number of times Performance has facilitation.
Table 4 is the analysis of Influential Factors statistical form to the cooperation duration on data set 2
Table 4 shows under " Jackknife " thought cooperation sustainability forecast model on data set 2 to cooperation duration The performance of prediction.Be can see from table 4, strategy is still either increased by deletion strategy, all selected factors are to pre- Surveying performance has highly important influence.Different from cooperation sustainability forecast model in data set 1 to the pre- of cooperation duration Performance is surveyed, is added under strategy in data set 2, partner's quantity is maximum to the influence for predicting the outcome.Meanwhile, and addition strategy and Deletion strategy is compared, and only when all Considerations are all as input factor, performance could be best for MAE, MSE, PCC and CCC.
Table 5 is the analysis of Influential Factors statistical form for continuing number of times to cooperation on data set 2
Table 5 shows cooperation sustainability forecast model on data set 2 to the performance evaluation of cooperation number of times prediction.With table 3 result is identical, and influence maximum to MAE values in deletion strategy is common neighbours' factor, and MAE value highests are in increasing strategy Publication amount, 3.306.That is the influence that predicts the outcome of the publication amount to data set 2 is maximum.
Figure 11 shows estimated performance of the cooperation sustainability forecast model under different cooperation sustainabilities.Between scholar Cooperation sustainability may be different.For example, scholar A and scholar B may cooperate more than 20 years, it is mutual lifelong conjunction Make partner.And scholar C and scholar D is probably due to scholar D will not work in academia and only cooperate once always.Therefore, I Also to test estimated performance of the cooperation sustainability forecast model between different cooperation sustainability relations.In order to probe into not With the performance of cooperation sustainability forecast model under cooperation sustainability relation, we split data into 20 groups, and calculate respectively The average cooperation duration and cooperation number of times of each group.
Figure 11 (a) shows the MAE values of LR models and cooperation sustainability forecast model under different cooperation durations.From this As can be seen that with the increase of cooperation duration, MAE value of the cooperation sustainability forecast model on data set 1 and 2 all exists in figure It is continuously increased, that is to say, that cooperation sustainability forecast model is more accurate when short-term cooperation duration is predicted.Simultaneously such as Fig. 2 Shown, most of cooperation durations all can be only sustained at the shorter time (in 3 years), although cooperation duration increased to 20 years from 1 year Afterwards, MAE values increase 16.51 from 1.05, but the general performance of cooperation sustainability forecast model is relatively preferable.Figure 11 B () shows the MAE values of LR models and cooperation sustainability forecast model under different cooperation number of times.With the knot of Figure 11 (a) Really similar, with the increase of cooperation number of times, the performance of cooperation sustainability forecast model and LR models is all declining.Simultaneously from figure 11 it is observed that cooperation sustainability forecast model on data set 2 performance than doing very well on data set 1, and The overall performance of cooperation sustainability forecast model is better than LR model.
In sum, when the sustainability of scientific cooperation is predicted on data set 1 and 2, common neighbours' quantity is to determine The key factor of model prediction performance.Partner's quantity and the Quantity of Papers delivered come next.Meanwhile, only when being had an impact When factor is all as input factor, the performance of forecast model can be only achieved optimum state, this also illustrates the input that we select The validity of factor.Meanwhile, cooperation sustainability forecast model is more suitable for predicting the cooperation between several more scholars that publish thesis Sustainability, and the performance of the short-term cooperation of prediction is better than the performance of predicting long-term cooperation, it is as a result also more accurate.

Claims (1)

1. a kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, it is characterised in that step is as follows:
The cooperation sustainability forecast model that Forecasting Methodology is used includes data extraction module and modelling module;
Data extraction module includes data prediction and evaluation module, and modelling module includes training module and prediction module;
(1) data extraction module:Factor for extracting influence cooperation sustainability;The factor of cooperation sustainability will be influenceed to make It is the input factor of model, the sustainability to cooperating is predicted;Data extraction module includes data preprocessing module and comments Valency module;
1. data preprocessing module:All data for training and testing cooperation sustainability forecast model are all from DBLP numbers Extracted according to concentrating;DBLP data are that one group of paper delivered by the scholar of computer science is constituted;Only with delivering ten Scholar's data of piece above paper are trained to cooperation sustainability forecast model;After scholar's cooperation data set is rebuild, obtain Obtain the cooperation record between all of any two scholars;
In data preprocessing module, personal attribute and social property are extracted, wherein five data of influence factor altogether, and analyze Its influence to cooperation sustainability;
All of input data is all normalized to [0,1], and to improve the efficiency of study, the normalization thought for being used is as follows:
x * = x - x m i n x m a x - x m i n ;
In addition, when the calculating timing node of all input datas is two scholars cooperation for the first time;
It is that the record of cooperation each time all sets up new scientific cooperation network, and the net for passing through the foundation when shortest path is calculated Network calculates the shortest path between the scholar A and B that will cooperate;Precision is accurate to the time;
A. personal attribute:Academic age, three attributes of paper amount and partner's quantity are extracted in this method as personal attribute;
The academic age:Refer to scholar A and academic age during scholar's B first time cooperations in cooperative relationship;Computational methods are to investigate Time then subtracts scholar and delivers first time of paper;
Publication amount:When referring to first time cooperation, the Quantity of Papers that scholar A and scholar B are delivered;
Partner's quantity:Refer to that scholar A and scholar B cooperates scholar's quantity of the above two respective cooperations;
B. social property:Two attributes of shortest path and common neighbours are extracted in this method as social property;
Common neighbours:Before referring to that scholar A and scholar B cooperates for the first time, two people had the quantity of the scholar of cooperation;According to sociology Theoretical ternary closure is theoretical, and two people for possessing more how common neighbours are more possible to cooperate in future;Therefore, common neighbour is used Occupy to weigh relative position of two scholars in cooperative relationship network and close on degree;
Shortest path:Refer in cooperative network of two scholars before without cooperation, the other side scholar to be passed through is reached mutually Quantity, shortest path is used to measure the intimate degree between two scholars;
2. evaluation module:Using typical machine learning thought, four kinds of typical evaluation methods are to the pre- of model in linear regression Result is surveyed to be evaluated;Meanwhile, in order to investigate contribution rate of each input attribute to model, using following " jackknife " Contribution rate of the thought to each attribute:A. after removing an attribute, it is predicted using remaining attribute, i.e. deletion strategy;b. Only it is predicted using an attribute, that is, increases strategy;C. be predicted using all properties, i.e., it is all tactful;
Using four kinds of typical indexs, including mean absolute error MAE, Mean Square Error MSE, Pearson correlation coefficients PCC The performance of cooperation sustainability forecast model is evaluated with uniformity coefficient correlation CCC, actual value y and predicted value is givenThen Have as follows:
The calculation of MAE:
The calculation of MSE:
The calculation of PCC:
The calculation of CCC:
Wherein, n is the number for predicting the outcome, yiWithIt is respectively legitimate reading and i-th value for predicting the outcome;Be y andIt Between covariance,WithBe respectively y andVariance,WithBe respectively y andAverage value;Show that estimated performance is better, The value of MAE and MSE is lower, and the value of PCC and CCC is higher;
It is compared with cooperation sustainability model using linear regression model (LRM) in this method, it is prediction work that linear regression model (LRM) is Function f (x) is found, the function representation is:
F (x)=ω1x12x2+...+ωdxd+b
Or be expressed as with the form of vector:
F (x)=ωT+b
Wherein ω and b are to learn from training set and obtain;
(2) modelling module:Modelling module is responsible for structure and the training of whole cooperation sustainability forecast model, including Training module and prediction module;
1. training module:Cooperation sustainability forecast model is made up of a series of decision trees trained by gradient descent method, specifically It is that Assembled tree module and gradient decline module;
A. Assembled tree module:Cooperation sustainability forecast model is just an attempt to the parameter x by givingiObtain the y that predicts the outcomei, and Optimal parameter is found by the training set for giving;The object function of following form is defined, training loss and normalization is generally comprised Two parts;
Obj (Θ)=L (θ)+Ω (θ)
Wherein, L is training loss function, and Ω is regularization term, and Θ is the intersection of input factor, θ be each specific input because Element;Training loss function L is tested and is proposed performance of the model on training set, the complexity of regularization term Ω Controlling models, with Prevent overfitting;
Cooperation sustainability forecast model is the set of a classification and Recurrent Sets, the phase that predicts the outcome of each classification super ensemble Plus final result is obtained, specific calculating process is as follows:
y ^ i = Σ k = 1 K f k ( x i ) , f k ∈ F
Wherein, K is the number of Assembled tree, fkIt is a lone tree, F is the set of all possible Assembled tree, therefore changes above-mentioned Formula is as follows:
O b j ( Θ ) = Σ i n l ( y i , y ^ i ) + Σ k = 1 K Ω ( f k )
Wherein, l is training loss function, and Ω is regularization term;
The regularization term Ω of cooperation sustainability forecast model is as follows:
Ω ( f ) = γ T + 1 2 λ Σ j - 1 T ω j 2
Wherein, T and ω represent the quantity of the leaf node of Assembled tree and corresponding with its predict the outcome respectively;γ and λ are that control is regular The parameter of change degree;
B. gradient declines module:OrderPredicting the outcome during as i-th example, the t times iteration, and increase ftAs real once The optimization of body function:
Γ ( t ) = Σ l ( y i , y ^ i ( t - 1 ) + f t ( x i ) ) + Ω ( f t )
Now Γ(t)It is physical object function Obj (Θ) in optimization process;
Taylor expansion is carried out to this entity function and is definedWithTherefore it is above-mentioned Formula expands into as follows:
Γ ^ ( t ) = Σ i = 1 n [ g i f t ( x i ) + 1 2 h i f + t 2 ( x i ) ] + γ T + 1 2 λ Σ j - 1 T ω j 2 = Σ j = 1 T [ ( Σ i ∈ I j g i ) ω j + 1 2 ( Σ i ∈ I j h i ) + λ ] ω j 2 + γ T
Wherein, T represents the total degree of iteration, Ij=i | q (xi)=j } represent the entity set of cotyledon node j, therefore optimal son Leaf segment point massCalculate by the following method:
ω j * = - G j H j + λ
WhereinResulting objective value is calculated by the following manner:
O b j = - 1 2 Σ j = 2 T G j 2 H j + λ + λ T ;
In this case, a less Obj value causes that the structure of integrated boosted tree is more preferable;Simultaneously to each leaf node Addition is split, and the entity computing formula after division is:
G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ - ( G L + G R ) 2 H L + H R + λ ] - γ
Wherein, L refers to left sibling, and R refers to right node,The quality of left cotyledon node is represented,Represent right cotyledon node Quality;Represent the undecomposed preceding value of origin node;γ represents the regularization entry value above accessory lobe;
2. prediction module:Prediction module is responsible for being predicted the scientific cooperation sustainability of two scholars;Due to scientific cooperation Sustainability is studied and quantified in terms of cooperation duration and cooperation number of times two, the prediction work of prediction module also by this two Part constitutes, i.e. cooperation duration continuation prediction module and cooperation number of times continuation prediction module.
CN201710030918.4A 2017-01-18 2017-01-18 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis Pending CN106886571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710030918.4A CN106886571A (en) 2017-01-18 2017-01-18 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710030918.4A CN106886571A (en) 2017-01-18 2017-01-18 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis

Publications (1)

Publication Number Publication Date
CN106886571A true CN106886571A (en) 2017-06-23

Family

ID=59176536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710030918.4A Pending CN106886571A (en) 2017-01-18 2017-01-18 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis

Country Status (1)

Country Link
CN (1) CN106886571A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145087A (en) * 2018-07-30 2019-01-04 大连理工大学 A kind of scholar's recommendation and collaborative forecasting method based on expression study and competition theory
CN109657122A (en) * 2018-12-10 2019-04-19 大连理工大学 A kind of Academic Teams' important member's recognition methods based on academic big data
CN109858675A (en) * 2018-12-28 2019-06-07 中译语通科技股份有限公司 A kind of expert's science vitality period forecasting method
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111191902A (en) * 2019-12-24 2020-05-22 中国科学技术大学 Method for analyzing and predicting cooperative effect

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN103077175A (en) * 2012-01-12 2013-05-01 西安邮电学院 Effective collaborative construction and self-adoptive evolution method of academic collaboration relation network
US20150193551A1 (en) * 2014-01-08 2015-07-09 Joshua Asher Gordon System and method for facilitating research collaboration cross reference

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN103077175A (en) * 2012-01-12 2013-05-01 西安邮电学院 Effective collaborative construction and self-adoptive evolution method of academic collaboration relation network
US20150193551A1 (en) * 2014-01-08 2015-07-09 Joshua Asher Gordon System and method for facilitating research collaboration cross reference

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASMELASH TEKA HADGU: "Mining Scholarly Communication and Interaction on the Social Web", 《PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB》 *
康文杰等: "基于社会网络分析的学术合作关系研究", 《计算机技术与发展》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145087A (en) * 2018-07-30 2019-01-04 大连理工大学 A kind of scholar's recommendation and collaborative forecasting method based on expression study and competition theory
CN109145087B (en) * 2018-07-30 2020-12-11 大连理工大学 Learner recommendation and cooperation prediction method based on expression learning and competition theory
CN109657122A (en) * 2018-12-10 2019-04-19 大连理工大学 A kind of Academic Teams' important member's recognition methods based on academic big data
CN109858675A (en) * 2018-12-28 2019-06-07 中译语通科技股份有限公司 A kind of expert's science vitality period forecasting method
CN111191902A (en) * 2019-12-24 2020-05-22 中国科学技术大学 Method for analyzing and predicting cooperative effect
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111126396B (en) * 2019-12-25 2023-08-22 北京科技大学 Image recognition method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Shrestha et al. Algorithm supported induction for building theory: How can we use prediction models to theorize?
Petropoulos et al. ‘Horses for Courses’ in demand forecasting
Laver et al. Party competition: An agent-based model
Thangavel et al. Student placement analyzer: A recommendation system using machine learning
CN106886571A (en) A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
Gao et al. Deep cognitive diagnosis model for predicting students’ performance
Garavaglia Modelling industrial dynamics with “History-friendly” simulations
Guala Experimentation in economics
Chang et al. Towards an improved Adaboost algorithmic method for computational financial analysis
Maghsoodi et al. A machine learning driven multiple criteria decision analysis using LS-SVM feature elimination: sustainability performance assessment with incomplete data
Li et al. MOOC-FRS: A new fusion recommender system for MOOCs
Li Applying grey system theory to evaluate the relationship between industrial characteristics and innovation capabilities within Chinese high-tech industries
Liu et al. Model design and parameter optimization of CNN for side-channel cryptanalysis
Van Dinther Agent-based simulation for research in economics
Rahaie et al. Critic learning in multi agent credit assignment problem
Alaimo Open issues in composite indicators construction
Plonsky et al. Prediction oriented behavioral research and its relationship to classical decision research
Fernandes et al. Learning and ensembling lexicographic preference trees with multiple kernels
Kumar et al. Explainable neural network analysis on movie success prediction
Rezaee et al. A data-driven decision support framework for DEA target setting: an explainable AI approach
Yuan et al. Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences
Kutnjak et al. Applying the decision tree method in identifying key indicators of the Digital Economy and Society Index (DESI)
Horn Multi-objective analysis of machine learning algorithms using model-based optimization techniques
Yang The Freedom of Constraint: An Agent-Based Game Theoretic Model of The Politics of Fertility and Economic Development (POFED)
Carroll et al. Capability Ratios Predict Nothing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170623