CN106886571A - A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis - Google Patents
A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis Download PDFInfo
- Publication number
- CN106886571A CN106886571A CN201710030918.4A CN201710030918A CN106886571A CN 106886571 A CN106886571 A CN 106886571A CN 201710030918 A CN201710030918 A CN 201710030918A CN 106886571 A CN106886571 A CN 106886571A
- Authority
- CN
- China
- Prior art keywords
- cooperation
- sustainability
- scholar
- module
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Automation & Control Theory (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of cooperation sustainability Forecasting Methodology based on social network analysis, is divided into cooperation Duration Prediction problem and cooperation continues number of times forecasting problem.Respective personal attribute and social property during two scholar's first time cooperations of statistics, and the data input cooperation sustainability forecast model that will be extracted, that is, obtain by the cooperation sustainability result of model prediction.The initial data extracted from actual computer scientific domain collection of thesis is calculated and is normalized to required attribute data by pretreatment module.Training module declines theory and sets up model using Assembled tree thought and gradient, and using the data point reuse parameter in training set, makes to predict the outcome more accurate.Prediction module is using the data in forecast set and has adjusted the model of parameter and is predicted.Evaluation module actual result and is predicted the outcome by contrasting, and uses " Jackknife " thought to evaluate predicting the outcome for model, calculates the overall performance of forecast model and the input factor chosen is to the influence degree of model prediction result.
Description
Technical field
Social network is based on the present invention relates to a kind of Forecasting Methodology of scientific cooperation sustainability between scholar, more particularly to one kind
The scientific cooperation sustainability Forecasting Methodology of network analysis.
Background technology
With fast-developing and science and technology the rapid advancement of science, increasing scholar's selection is total to by way of cooperation
With the sciences problems for solving complexity.Scholars can shorten search time by the complementary excellent efficiency for lacking, improving research of cooperation,
Make research process more rigorous, finally realize doulbe-sides' victory.Cooperation can help the scholar more efficiently to carry out scientific research and scientific research
Analysis.And single-handed research mode then because the limitation and high error rate of personal subjectivity thought and gradually by academic research
Abandon on boundary.Increasingly extensive due to scientific cooperation, people also begin to progressively understand and study cooperative mechanism, in finding cooperative mechanism
Rule.In academia, two scholars may have cooperation more than once, that is, cooperate to have in relation between scholar certain
Sustainability.Thus, the cooperative mechanism how two scholars are changed into partner from stranger is studied, two scholars can be predicted
The sustainability of cooperation, is that scholar recommends the affiliate that is more suitable for, so as to advantageously promote scientific cooperation, promotes entering for science and technology
Step.
The sustainability of accurate prediction scientific cooperation has certain difficulty.Mainly there is three below reason:First, it is academic
The data volume of class data is more huge so that we are difficult to obtain required total data.Second, persistently having for cooperation is certain
Contingency and uncertainty, and follow a kind of long tail type regularity of distribution and nonlinear regression, and skewness in time
Even forecast model is difficult to set up.3rd, the factor for influenceing scientific cooperation sustainability is uncertain currently, while influence
Interaction between factor also can produce interference to predicting the outcome.
At present, which the material elements of cooperation sustainability has between not clear and definite analytic demonstration proves influence scholar,
But there is objective individual difference again in the sustainability cooperated between scholar.For the cooperative mechanism between specific researcher, I
Propose prediction scientific cooperation sustainability this problem, and thought with integrated boosted tree sets up forecast model.Herein
On basis, the present invention proposes a kind of scientific cooperation sustainability Forecasting Methodology based on social network analysis.
The content of the invention
The purpose of the present invention is, based on above mentioned problem, we work out a kind of personal by human relation network and scholar
Attribute cooperate sustainability prediction method.Specifically, it is proposed that duration and cooperation number of times from cooperation
Angle analyzes the sustainability of scientific cooperation, while personal attribute and the network of the scholar of cooperation sustainability will likely be influenceed
Attribute as influence factor, and extensive experiment is carried out on the Academic Data collection (DBLP) of objective proving it is proposed that
The validity of method.By after the influence factor of cooperation sustainability between fully analysis and demonstration influence scholar, it is proposed that one
New model construction thought, integrated boosted tree thought are planted, and sets up forecast model, be named as cooperation sustainability prediction mould
Type, the sustainability problem for predicting scholar's scientific cooperation.
A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, it is characterised in that:Cooperation can hold
Continuous property Forecasting Methodology can be designed based on cooperation sustainability the fact early prediction.And integrated boosted tree can be used for
Classification and recurrence.The timing node of the scholar's cooperation early stage of model two is when two people cooperate for the first time.Cooperation sustainability
The conjunction of all stage that following two people of attribute forecast of community network cooperate when forecast model is by two scholar cooperation early stages
Make time and cooperation number of times.And the cooperation sustainability of two scholars is evaluated with this.
Technical scheme:
A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, step is as follows:
The cooperation sustainability forecast model that Forecasting Methodology is used includes data extraction module and modelling module;
Data extraction module includes data prediction and evaluation module, and modelling module includes training module and prediction mould
Block;
(1) data extraction module:It is mainly used in extracting the factor of influence cooperation sustainability;Due to cooperating influence at present
The factor of mechanism does not have clear and definite conclusion, so need to be experimentally confirmed which factor has an impact the sustainability cooperated,
And using these factors as model input factor, model by these be input into factor, to cooperate sustainability be predicted;
Including data preprocessing module and evaluation module;
1. data preprocessing module:All data for training and testing cooperation sustainability forecast model be all from
Extracted in DBLP data sets;DBLP data are that one group of paper delivered by the scholar of computer science is constituted;In order to eliminate
The influence that the scholar of short-term research work produces to result was only done, only with delivering more than ten scholars of paper
Data are trained to cooperation sustainability forecast model;After scholar's cooperation data set is rebuild, obtain all of any two
Cooperation record between scholar;
In society, there is various factors to promote the sustainability of cooperation, such as personal purpose, main activities
Area, cooperation preference and gained interests etc..And forecast model can consider that predicting the outcome obtained by more influence factors also can
It is more accurate.In data preprocessing module, personal attribute and social property are extracted, altogether five data of influence factor, and point
Analyse its influence to cooperation sustainability;
All of input variable is all normalized to [0,1], and to improve the efficiency of study, the normalization thought for being used is such as
Under:
In addition, when the accurate calculating timing node of all input datas is two scholars cooperation for the first time.When calculate most
It is that the record of cooperation each time all sets up new scientific cooperation network during short path, and will be cooperated by the network calculations
Shortest path between person A and B;For example, scholar A and B started their cooperation in 2000, we will extract all 2000
The paper delivered before year, sets up cooperative network.Because the date issued accuracy of all data is different, if being accurate to the date
Or month, the missing of information is will result in, so precision is only accurate to the time by us.
A. personal attribute's module:The prediction work of the cooperation sustainability between two scholars naturally depends on scholar's sheet
Body, individual factor is being set up and very important effect has been played in safeguarding scientific research cooperative.Including acientific reputation, cooperation preference,
Selection of the individual factors such as career stage to the cooperation behavior and cooperation object of scholar all has a great impact.This method
Middle academic age, three attributes of paper amount and partner's quantity of extracting are used as personal attribute's factor.
The academic age:The academic age in finger cooperative relationship during scholar's A and B first time cooperation;Computational methods are to investigate
In the time then, subtract scholar and deliver first time of paper.In fact scholar often has in the different career stages
Different cooperation policy a, for example, doctor cooperates in the raw assistant often often with him of reading.
Publication amount:When referring to first time cooperation, the Quantity of Papers that scholar A and B are delivered.The publication amount of scholar can be in certain journey
His academic performance is reacted on degree, outstanding scholar often has more cooperation and reputation higher.
Partner's quantity:Refer to that scholar A cooperates scholar's quantity of the above two respective cooperations with B.With academic age, publication amount
Equally, these three attributes can react the cooperation policy of scholar.
B. social property:In addition to the individual factor of scholar, another direct factor of cooperation sustainability is influenceed just
It is the social relation network between two scholars.Conventional research shows that the social status of scholar has very to his academic aptitude
Big influence.Therefore it is believed that the sustainability of cooperation will be influenceed by social factor.Based on this it is assumed that we from
DBLP Academic Data collection constructs a large-scale scientific cooperation network, wherein one scholar of each node on behalf, two nodes
Between connection represent two scholars and had cooperation.Afterwards, we extract two simple essential characteristics from this cooperative network, i.e.,
Shortest path and common neighbours.
Common neighbours:Common neighbours refer to that two people had the quantity of the scholar of cooperation before scholar A and B cooperate for the first time.
Theoretical according to famous sociological theory ternary closure, two people for possessing more how common neighbours are more possible to closed in future
Make.Therefore we weigh relative position of two scholars in cooperative relationship network with common neighbours and close on degree.
Shortest path:In the cooperative network that shortest path referred to two scholars before without cooperation, other side is reached mutually
The scholar's quantity to be passed through, shortest path can be used to measure the intimate degree between two scholars.
Personal attribute's data are obtained from the metadata of DBLP data sets.If but want obtain social property data, it is necessary to
Set up cooperative relationship net.There is at least a piece of paper cooperation to be then identified as cooperation between two scholars.Meanwhile, in order to
Those isolated nodes are filtered out, we are extracted the largest connected component in whole network.Based on this largest connected component,
We are extracted required social property data.
2. evaluation module:Because no clear and definite influence factor is used as input factor, it is necessary to right after the completion of model is preliminary
The performance of model and each input factor are analyzed to the influence for predicting the outcome, to determine all input factors selected all
Influence can be produced on cooperation continuation.
We devise the property of substantial amounts of experimental demonstration cooperation sustainability forecast model by two real data sets
Energy.Cooperation sustainability forecast model is first model for being used to predict scientific cooperation sustainability, so without same type
Model carry out performance comparison.Therefore we use typical machine learning thought, four kinds of typical evaluation sides in linear regression
Formula is evaluated predicting the outcome for model.Meanwhile, in order to investigate contribution rate of each input attribute to model, we are using such as
Under " jackknife " thought to the contribution rate of each attribute:A. after removing an attribute, it is predicted using remaining attribute
(deletion strategy).B. only it is predicted (increase strategy) using an attribute.C. (whole plans are predicted using all properties
Slightly).
The sustainability prediction of scientific cooperation is a regression problem rather than classification problem.In regression problem, model is needed
Predict a series of continuous values.Therefore, we employ four kinds of typical indexs, including MAE (mean absolute error), MSE
(Mean Square Error), PCC (Pearson correlation coefficients) and CCC (uniformity coefficient correlation) evaluate the prediction of cooperation sustainability
The performance of model.Provide actual value y (can be cooperation duration or number of times) and predicted valueThen have as follows:
The calculation of MAE:
The calculation of MSE:
The calculation of PCC:
The calculation of CCC:
Wherein, n is the number for predicting the outcome, yiWithIt is respectively legitimate reading and i-th value for predicting the outcome.It is y
WithBetween covariance,WithBe respectively y andVariance,WithBe respectively y andAverage value.From these indexs
It will be seen that estimated performance is better, the value of MAE and MSE is lower, and the value of PCC and CCC is higher for definition.
In view of nobody did the prediction work of scientific cooperation sustainability before, we use most classical model, linearly
Regression model is compared with cooperation sustainability model.Linear regression model (LRM) be for prediction work finds function f (x),
The function can be expressed as:
F (x)=ω1x1+ω2x2+...+ωdxd+b
Or be expressed as with the form of vector:
F (x)=ωT+b
Wherein ω and b are to learn from training set and obtain.In an experiment, we with all of possible influence factor as linear
The input attribute of learning model, while being also analyzed to linear regression method with " Jackknife " thought.
(2) modelling module:Modelling module is responsible for structure and the training of whole cooperation sustainability forecast model,
Including training module and prediction module.
1. training module:Because the prediction of scientific cooperation sustainability is a regression problem, so forecast model is by one
The decision tree trained by gradient descent method is arranged to constitute.Specially Assembled tree module and gradient decline module.
A. Assembled tree module:The work of cooperation sustainability forecast model is just an attempt to the parameter x by givingiObtain pre-
Survey result yi, and optimal parameter is found by the training set for giving.In order to find the parameter of preferably description data, people are always
Less than one object function of form is defined, training loss and regular two parts are generally comprised.
Obj (Θ)=L (θ)+Ω (θ)
Wherein L is training loss function, and Ω is regularization term, and Θ is the intersection of input factor, and θ is each specific input
Factor.Training loss function L is tested and is proposed performance of the model on training set, the complexity of regularization term Ω Controlling models,
To prevent overfitting.
Cooperation sustainability forecast model is the set of a classification and Recurrent Sets, the prediction knot of each classification super ensemble
Fruit is added and obtains final result, and specific calculating process is as follows:
Wherein K is the number of Assembled tree, fkIt is a lone tree, F is the set of all possible Assembled tree, therefore can be with
Modification above-mentioned formula is as follows:
Wherein l is training loss function, and Ω is regularization term.
The regularization term Ω of cooperation sustainability forecast model is as follows:
The quantity that T and ω represent the leaf node of Assembled tree respectively corresponding with its predicts the outcome (node quality).γ and λ are
The parameter of the regular degree of control.
B. described gradient declines module:From above-mentioned formula with function is used as parameter, in terms of equation, it
Can not be optimized with traditional mode.Therefore we are instructed using a kind of mode of iteration to it
Practice and optimize.OrderPredicting the outcome during as i-th example, the t times iteration, and increase ftAs entity once
The optimization of function:
Now Γ(t)It is physical object function Obj (Θ) in optimization process.
Taylor expansion is carried out to this entity function and is definedWithTherefore on
State formula deployable for as follows:
T represents the total degree of iteration, Ij=i | q (xi)=j } represent the entity set of cotyledon node j, therefore optimal cotyledon
Node qualityCan calculate by the following method:
WhereinResulting objective value can be calculated by the following manner:
In this case, a less Obj value can cause that the structure of integrated boosted tree is more preferable.Simultaneously to each leaf
Child node addition is split, and the entity computing formula after division is:
Wherein L refers to left sibling, and R refers to right node,The quality of left cotyledon node is represented,Represent right son
The quality of leaf node.Represent the undecomposed preceding value of origin node.γ represents the regularization entry value above accessory lobe.
2. described prediction module:Prediction module is responsible for being predicted the scientific cooperation sustainability of two scholars.Due to
The sustainability of scientific cooperation can be studied and quantified in terms of cooperation duration and cooperation number of times two, so model prediction
The prediction work of module is also made up of this two parts, i.e. cooperation duration continuation prediction module and cooperation number of times continuation prediction mould
Block.
Beneficial effects of the present invention:This model can be used for predicting oneself and the cooperation between candidate affiliate for scholar
Sustainability, helps scholar's selection more suitably affiliate, improves the scientific cooperation level between scholar, and advance science research
It is progressive.This direction of cooperation sustainability can provide very big scientific research space between future, scholar simultaneously, it may be considered that to more
Many factors are as the prediction attribute of model improving the precision of model.The sustainability of cooperation can extend mutual between scholar
Cooperation.If increasing people is put into the research of cooperation sustainability, the mechanism of cooperation may be disclosed, and help lead
People is led to formulate across mechanism, subject, even national cooperation policy.
Brief description of the drawings
Fig. 1 is the instance graph of sustainability scientific research cooperative.
Fig. 2 is the distribution map of scientific cooperation duration and number of times.
Fig. 3 is influence distribution map of scholar's individual factor to the cooperation duration.
Fig. 4 is the influence distribution map that scholar's individual factor continues number of times to cooperation.
Fig. 5 is common influence distribution map of the neighbours to cooperation sustainability between scholar.
Fig. 6 is influence distribution map of the shortest path to cooperation sustainability between scholar.
Fig. 7 is the general frame figure of cooperation sustainability forecast model.
Fig. 8 is the instance graph that data are extracted.
Fig. 9 is that cooperation sustainability forecast model and LR models show bent on different training sets to the prediction of cooperation duration
Line chart.
Figure 10 is that cooperation sustainability forecast model and LR models are showed the prediction of cooperation number of times on different training sets
Curve map.
Figure 11 is the prediction performance curve of cooperation sustainability forecast model and LR models under different cooperation sustainabilities
Figure.
Specific embodiment
Below in conjunction with accompanying drawing and technical scheme, specific embodiment of the invention is further illustrated.
Be can see by the example of Fig. 1 scientific research cooperative continuation, the sustainability of scientific cooperation is not followed in time
Linear regression.Meanwhile, the time of continuing cooperation and number of times have certain contingency and uncertainty, and follow a kind of long-tail
The formula regularity of distribution, while identical cooperation duration but the different situation of cooperation frequency between different partners are likely to appear in, and
Forecast model pockety is difficult to set up in time.So the common regression model of this problem of cooperation continuation is not
Can accomplish accurately to predict.
Two evaluation criterions of cooperation sustainability, cooperation duration and cooperation number of times can be particularly seen by Fig. 2, in the time
On the regularity of distribution, the two variables are also in compliance with the long tail type regularity of distribution.The cooperation duration of long tail type distribution is secondary with lasting
Number does not follow strictly linear regression rule, and this can cause that the result of prediction produces larger deviation.Meanwhile, between two scholars
Cooperation and non-static, the cooperation of two people may continue several years (Δ t), and the cooperation number of times of two people may in the meantime
For several times (m).
Assuming that two scholars in one group of cooperative relationship are respectively i and j, define and computation attribute collection { x1,x2,x3,...,
xn, this property set may decide that the lasting degree of the cooperation of two scholars, then the process of design forecast model has been reformed into looks for
Collection of functions to shape such as f (X, Y) is used for description collections Y (y1,y2), while y1It is the duration that two people cooperate, y2For two people close
Cooperation number of times during work due to cooperation duration and cooperation number of times codomain and differ, so cooperation duration and cooperation number of times
It is two different scientific cooperation sustainability parameters.
Fig. 3 and Fig. 4 specifically describe the relation between individual factor and cooperation sustainability (CD, CT).In the width of Fig. 3 first
In figure, the color of each pixel represents the flat of cooperation durations of two scholar A from B in the case of different academic age values
Average (maximum for being considered is 50 years), maximum 50 for being considered herein.From the figure, it can be seen that with academic year
The increase of age value, the value of cooperation duration significantly decreases.Compared with scholar higher of academic age, two more it is young just
Lasting longer of cooperation duration of level scholar.Fig. 3 the second width figures show influence of the publication amount to cooperation duration, wherein publication amount
Maximum be 300.Picture shows that, with the increase of publication amount, cooperation duration is drastically reduced.And similar trend model figure
3 the 3rd width figures are influence of partner's quantity to cooperation duration, and maximum of which partner quantity is 300 people.From three width of Fig. 3
In figure, we can be found that primary scholar possesses the smaller academic age, cooperate between publication amount, and the people of partner's quantitative value
Will be more likely to last longer.With the increase of these individual factor values, cooperation duration is likely to decline.Individual factor
Visible Fig. 4 of influence to cooperation number of times.The general trend of cooperation number of times is similar to cooperation duration, wherein the cooperation time of primary scholar
Number will be bigger, and when academic age, publication amount, partner's quantitative value increase is that cooperation number of times significantly decreases.
We often think, when scholar selects new partner, are more likely to randomly choose or select the number that publishes thesis
More candidates.Famous sociological theory ternary closure shows that people are generally easier and the neighbours of oneself turn into friend.But
Our statistics but with the notional result of ternary closure conversely, two scholars possess lower social property similitude, that
It is bigger that the cooperation of two people continues longer possibility.Scholar and and oneself have between the scholar of more intimate social relationships and close
Make, duration will not be very long, and number of times also will not be a lot.Fig. 5 and Fig. 6 show the social property of scholar to cooperation continuation
Influence.
In Figure 5, the size of each circle represents the quantity of such scholar, and round area is bigger, represents under the state
Scholar's number it is more.Curve in each subgraph is the matched curve of the figure.According to reality, it is considered to maximum common neighbours and
Shortest path is respectively 10 and 50. Fig. 5 and describes influence of the common neighbours to cooperation sustainability.Can be seen from figure one it is bright
Aobvious rule, with the increase of common neighbours, cooperation duration and cooperation number of times can all tend to linear decline.Fig. 6 is then embodied most
Positive influence of the short path on cooperation duration and cooperation number of times, with the gradually increase of shortest path, cooperation duration and cooperation are secondary
Number also can gradually increase.
By above-mentioned data it is proposed that cooperation sustainability forecast model.Cooperation sustainability forecast model is based on cooperation
Sustainability can be designed the fact getting up early is predicted.And integrated boosted tree can be used for classifying and return.Model two
The timing node of scholar's cooperation early stage is when two people cooperate for the first time.The overall framework of cooperation sustainability forecast model is shown in figure
7.As can see from Figure 7, whole cooperation sustainability forecast model is mainly comprising several modules once.
(1) data preprocessing module:
By Fig. 8, we are recognized that we extract the process of required data from DBLP.If it is desirable that prediction
Cooperation sustainability between scholar Lin Da and Bob then has the process for extracting each attribute data as follows.Lin Da was opened since 2014
Before beginning research work is cooperated to her with Bob, delivered four papers altogether, and once respectively with scholar Xia Feng, Wang Wei, Yi Wanhe
Made.Since Bob had two partners, Xia Feng and Yi Wan delivered three papers after research work 2005 altogether.
Lin Da and Bob cooperate their first time since 2015.It is possible thereby to calculate, the academic age point of Lin Da and Bob
It is not 2 and 10, publication amount is respectively 4 and 3, partner's quantity is respectively 4 and 2.Meanwhile, can be built by their cooperation record
The cooperative network of 2015 (only shows basic cooperative network, removing should also have others outside above-mentioned several people for the sake of simple
Scholar participates in the structure of cooperative network), it is 2 (including Xia Feng and Yi Wan) that we can obtain the common neighbours of Lin Da and Bob,
Shortest path is 2.
We are extracted two entirely different data groups from DBLP, and each of which metadata contains the topic of paper
Mesh, author delivers the time, delivers periodical etc. information.According to these information, we can build required scientific cooperation network
And calculate used all input attributes.In addition, we have delivered at least ten opinions in being chosen at DBLP data sets first
The scholar of text, the scholar for being devoted to research work always is filtered out with this.Then the most Dalian in scientific cooperation network is extracted
Reduction of fractions to a common denominator amount.And the set of these data is named as data set 1.Additionally, we have chosen has delivered super in DBLP data sets
80 scholars of paper are crossed, and it be obviously (the statistics of two datasets of data set 2 that their data of attribute information is integrated
Refer to table 1).It is clear that the cooperation sustainability between scholar in data set 2 is higher.
(2) training module and prediction module described in:
Liang Ge seminar data set 1 and data set 2 are divided into two subsets, training set and test set by us.Wherein
Training set is used for the parameter of training pattern, and test set is used to assess the performance for proposing Forecasting Methodology.Specifically, we are random
Select in each data set 20% test set of the data as each data set.In order to research and training size of data can hold to cooperation
The influence of continuous property forecast model performance, from 10% be sequentially adjusted in 90% the ratio shared by training set data by we.Meanwhile,
Cross validation is used in experiment to ensure the stability and validity of our Forecasting Methodologies.
In order to assess performance of the cooperation sustainability forecast model on different pieces of information collection, we are by model in different numbers
According to being tested on collection (be shown in Table 1).Specifically, the scholar of data set 1 has at least delivered 10 papers, the scholar of data set 2
80 papers are at least delivered.It will be seen that either cooperation number of times or cooperation duration, of data set 2 from table 1
Person is higher than the scholar of data set 1.Therefore, we can allow cooperation sustainability forecast model to be run on the two data sets
Experiment, with the performance (result is shown in Fig. 9 and 10) of more fully assessment models.
Table 1 is the data statistic of two datasets
Data set | Data volume | Cooperative relationship amount | Average cooperation duration | Average cooperation number of times |
1(>10 articles) | 185739 | 3443845 | 2.699 | 2.983 |
2(>80 articles) | 14022 | 14022 | 3.077 | 3.825 |
Fig. 9 describes cooperation sustainability forecast model and LR models on data set 1 and 2 to the table of forecast collaboration time
It is existing.It will be seen that with the increase of training set data amount from Fig. 9 (a), cooperation sustainability forecast model and LR models
MAE values can all be reduced with the increase of training set, it means that the performance of forecast model can be with the increase of training set
Improve.When training data increases to 90% from 20%, by taking data set 1 as an example, the MAE values of cooperation sustainability forecast model from
2.39 are reduced to 1.91.On the other hand, it will be seen that the forecast model that we are proposed always will than the performance of LR model
Good, cooperation sustainability forecast model shows in the prediction of data set 1 and 2 respectively will 11% He than LR predictions in MAE values
12%, exactly because this forecast model employs the thought of integrated boosted tree.Cooperation sustainability forecast model is in data set 2
Predictive ability is in all fields all due to the result of data set 1.Therefore deduce that, cooperation sustainability forecast model is more good at
Cooperation sustainability between the academic achievement of prediction scholar higher.
Additionally, in terms of the forecast collaboration time, the present invention at MSE (see Fig. 9 (b)), PCC (see accompanying drawing 9 (c)) and CCC (see
Accompanying drawing 9 (d)) performance will be better than LR models.And from accompanying drawing 9, we can also draw a conclusion, with the increase of training set, nothing
By being prediction that cooperation sustainability forecast model or LR models can realize the more preferable cooperation duration, and cooperation can hold
Continuous predicting the outcome for property forecast model is always better than LR models.
As before stated, the cooperation sustainability prediction not only prediction including cooperation duration, also including cooperation number of times
Prediction.Figure 10 describes how training set size influences what U-shaped sustainability forecast model and LR models to cooperation number of times
Predict the outcome.When Figure 10 (a) is demonstrated by training set size to cooperation sustainability forecast model and LR model prediction cooperation number of times
The influence of MAE values.It will be seen that with the increase of training set proportion, cooperation sustainability forecast model from figure
The predictive ability of cooperation number of times is all increased with LR models.Meanwhile, the MAE values of cooperation sustainability forecast model are always
Prison LR models, it means that the estimated performance of cooperation sustainability forecast model is better than LR models, this with or time prediction
Result during experiment is identical.In addition, with the increasing of training set proportion, the MAE of cooperation sustainability forecast model
Value declines faster, and this also indicates that the value of cooperation sustainability forecast model also declines more when training set data amount is increased
Hurry up.Figure 10 (c), (b) and (d) can also show the estimated performance of the performance better than LR of cooperation sustainability forecast model.
(3) evaluation module described in:
In order to predict the cooperation sustainability between scholar, we introduce two influence factors, and personal attribute and society belong to
Property.We introduce " jackknife " thought:A. after removing an attribute, it is predicted (deletion strategy) using remaining attribute.
B. only it is predicted (increase strategy) using an attribute.C. it is predicted (all strategies) using all properties.Based on this three
Individual thought, we can find each variable to the overall influence for predicting the outcome.2,3,4 and 5 points of tables of table are illustrated in this thought
Under the MAE that predicts the outcome, the value of MSE, PCC and CCC.
Table 2 is the analysis of Influential Factors statistical form to the cooperation duration on data set 1
Table 2 shows under the thought of " Jackknife " cooperation sustainability forecast model in data set 1 to cooperation duration
The performance of prediction.After common neighbours are deleted from input factor, the value of MAE improves 10% and (is brought up to from 1.702
1.867).The common neighbours of this explanation are the key factors of forecast collaboration time.And conversely, the MAE values of other input factors are in quilt
MAE values have small raising after deletion, illustrate other input factors can also influence cooperation duration predict precision, but shadow
Sound is limited.When we employ it is increased tactful when, although partner's quantity increased to the influence for predicting the outcome, but still
It is common neighbours maximum to the influence that predicts the outcome.Similar result be able to can be seen in the test data of MSE, PCC and CCC
Out.When all of input factor has been used, the performance of cooperation sustainability forecast model also reaches best state.This table
Bright, the cooperation sustainability of currently considered all input factors to cooperation sustainability forecast model in data set 1 is predicted
Result all has an impact.
Table 3 is the analysis of Influential Factors statistical form for continuing number of times to cooperation on data set 1
Table 3 shows under " Jackknife " thought cooperation sustainability forecast model on data set 1 to cooperation number of times
The performance of prediction.Be can see from table 3, common neighbours still play an important role to estimated performance.When common neighbours
When being deleted from input factor, the value of MAE has reached 2.11, the value highest of this MAE in all of deletion strategy.Namely
Say, common neighbours are the closest with predicting the outcome for cooperation duration.When using increased tactful, the MAE values of factor publication amount
Maximum, i.e. publication amount can produce positive effect to predicting the outcome.Meanwhile, when all of factor is all as input factor,
MAE, MSE, PCC and CCC show optimum state, i.e., all of input factor is on data set 1 to the prediction of cooperation number of times
Performance has facilitation.
Table 4 is the analysis of Influential Factors statistical form to the cooperation duration on data set 2
Table 4 shows under " Jackknife " thought cooperation sustainability forecast model on data set 2 to cooperation duration
The performance of prediction.Be can see from table 4, strategy is still either increased by deletion strategy, all selected factors are to pre-
Surveying performance has highly important influence.Different from cooperation sustainability forecast model in data set 1 to the pre- of cooperation duration
Performance is surveyed, is added under strategy in data set 2, partner's quantity is maximum to the influence for predicting the outcome.Meanwhile, and addition strategy and
Deletion strategy is compared, and only when all Considerations are all as input factor, performance could be best for MAE, MSE, PCC and CCC.
Table 5 is the analysis of Influential Factors statistical form for continuing number of times to cooperation on data set 2
Table 5 shows cooperation sustainability forecast model on data set 2 to the performance evaluation of cooperation number of times prediction.With table
3 result is identical, and influence maximum to MAE values in deletion strategy is common neighbours' factor, and MAE value highests are in increasing strategy
Publication amount, 3.306.That is the influence that predicts the outcome of the publication amount to data set 2 is maximum.
Figure 11 shows estimated performance of the cooperation sustainability forecast model under different cooperation sustainabilities.Between scholar
Cooperation sustainability may be different.For example, scholar A and scholar B may cooperate more than 20 years, it is mutual lifelong conjunction
Make partner.And scholar C and scholar D is probably due to scholar D will not work in academia and only cooperate once always.Therefore, I
Also to test estimated performance of the cooperation sustainability forecast model between different cooperation sustainability relations.In order to probe into not
With the performance of cooperation sustainability forecast model under cooperation sustainability relation, we split data into 20 groups, and calculate respectively
The average cooperation duration and cooperation number of times of each group.
Figure 11 (a) shows the MAE values of LR models and cooperation sustainability forecast model under different cooperation durations.From this
As can be seen that with the increase of cooperation duration, MAE value of the cooperation sustainability forecast model on data set 1 and 2 all exists in figure
It is continuously increased, that is to say, that cooperation sustainability forecast model is more accurate when short-term cooperation duration is predicted.Simultaneously such as Fig. 2
Shown, most of cooperation durations all can be only sustained at the shorter time (in 3 years), although cooperation duration increased to 20 years from 1 year
Afterwards, MAE values increase 16.51 from 1.05, but the general performance of cooperation sustainability forecast model is relatively preferable.Figure 11
B () shows the MAE values of LR models and cooperation sustainability forecast model under different cooperation number of times.With the knot of Figure 11 (a)
Really similar, with the increase of cooperation number of times, the performance of cooperation sustainability forecast model and LR models is all declining.Simultaneously from figure
11 it is observed that cooperation sustainability forecast model on data set 2 performance than doing very well on data set 1, and
The overall performance of cooperation sustainability forecast model is better than LR model.
In sum, when the sustainability of scientific cooperation is predicted on data set 1 and 2, common neighbours' quantity is to determine
The key factor of model prediction performance.Partner's quantity and the Quantity of Papers delivered come next.Meanwhile, only when being had an impact
When factor is all as input factor, the performance of forecast model can be only achieved optimum state, this also illustrates the input that we select
The validity of factor.Meanwhile, cooperation sustainability forecast model is more suitable for predicting the cooperation between several more scholars that publish thesis
Sustainability, and the performance of the short-term cooperation of prediction is better than the performance of predicting long-term cooperation, it is as a result also more accurate.
Claims (1)
1. a kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis, it is characterised in that step is as follows:
The cooperation sustainability forecast model that Forecasting Methodology is used includes data extraction module and modelling module;
Data extraction module includes data prediction and evaluation module, and modelling module includes training module and prediction module;
(1) data extraction module:Factor for extracting influence cooperation sustainability;The factor of cooperation sustainability will be influenceed to make
It is the input factor of model, the sustainability to cooperating is predicted;Data extraction module includes data preprocessing module and comments
Valency module;
1. data preprocessing module:All data for training and testing cooperation sustainability forecast model are all from DBLP numbers
Extracted according to concentrating;DBLP data are that one group of paper delivered by the scholar of computer science is constituted;Only with delivering ten
Scholar's data of piece above paper are trained to cooperation sustainability forecast model;After scholar's cooperation data set is rebuild, obtain
Obtain the cooperation record between all of any two scholars;
In data preprocessing module, personal attribute and social property are extracted, wherein five data of influence factor altogether, and analyze
Its influence to cooperation sustainability;
All of input data is all normalized to [0,1], and to improve the efficiency of study, the normalization thought for being used is as follows:
In addition, when the calculating timing node of all input datas is two scholars cooperation for the first time;
It is that the record of cooperation each time all sets up new scientific cooperation network, and the net for passing through the foundation when shortest path is calculated
Network calculates the shortest path between the scholar A and B that will cooperate;Precision is accurate to the time;
A. personal attribute:Academic age, three attributes of paper amount and partner's quantity are extracted in this method as personal attribute;
The academic age:Refer to scholar A and academic age during scholar's B first time cooperations in cooperative relationship;Computational methods are to investigate
Time then subtracts scholar and delivers first time of paper;
Publication amount:When referring to first time cooperation, the Quantity of Papers that scholar A and scholar B are delivered;
Partner's quantity:Refer to that scholar A and scholar B cooperates scholar's quantity of the above two respective cooperations;
B. social property:Two attributes of shortest path and common neighbours are extracted in this method as social property;
Common neighbours:Before referring to that scholar A and scholar B cooperates for the first time, two people had the quantity of the scholar of cooperation;According to sociology
Theoretical ternary closure is theoretical, and two people for possessing more how common neighbours are more possible to cooperate in future;Therefore, common neighbour is used
Occupy to weigh relative position of two scholars in cooperative relationship network and close on degree;
Shortest path:Refer in cooperative network of two scholars before without cooperation, the other side scholar to be passed through is reached mutually
Quantity, shortest path is used to measure the intimate degree between two scholars;
2. evaluation module:Using typical machine learning thought, four kinds of typical evaluation methods are to the pre- of model in linear regression
Result is surveyed to be evaluated;Meanwhile, in order to investigate contribution rate of each input attribute to model, using following " jackknife "
Contribution rate of the thought to each attribute:A. after removing an attribute, it is predicted using remaining attribute, i.e. deletion strategy;b.
Only it is predicted using an attribute, that is, increases strategy;C. be predicted using all properties, i.e., it is all tactful;
Using four kinds of typical indexs, including mean absolute error MAE, Mean Square Error MSE, Pearson correlation coefficients PCC
The performance of cooperation sustainability forecast model is evaluated with uniformity coefficient correlation CCC, actual value y and predicted value is givenThen
Have as follows:
The calculation of MAE:
The calculation of MSE:
The calculation of PCC:
The calculation of CCC:
Wherein, n is the number for predicting the outcome, yiWithIt is respectively legitimate reading and i-th value for predicting the outcome;Be y andIt
Between covariance,WithBe respectively y andVariance,WithBe respectively y andAverage value;Show that estimated performance is better,
The value of MAE and MSE is lower, and the value of PCC and CCC is higher;
It is compared with cooperation sustainability model using linear regression model (LRM) in this method, it is prediction work that linear regression model (LRM) is
Function f (x) is found, the function representation is:
F (x)=ω1x1+ω2x2+...+ωdxd+b
Or be expressed as with the form of vector:
F (x)=ωT+b
Wherein ω and b are to learn from training set and obtain;
(2) modelling module:Modelling module is responsible for structure and the training of whole cooperation sustainability forecast model, including
Training module and prediction module;
1. training module:Cooperation sustainability forecast model is made up of a series of decision trees trained by gradient descent method, specifically
It is that Assembled tree module and gradient decline module;
A. Assembled tree module:Cooperation sustainability forecast model is just an attempt to the parameter x by givingiObtain the y that predicts the outcomei, and
Optimal parameter is found by the training set for giving;The object function of following form is defined, training loss and normalization is generally comprised
Two parts;
Obj (Θ)=L (θ)+Ω (θ)
Wherein, L is training loss function, and Ω is regularization term, and Θ is the intersection of input factor, θ be each specific input because
Element;Training loss function L is tested and is proposed performance of the model on training set, the complexity of regularization term Ω Controlling models, with
Prevent overfitting;
Cooperation sustainability forecast model is the set of a classification and Recurrent Sets, the phase that predicts the outcome of each classification super ensemble
Plus final result is obtained, specific calculating process is as follows:
Wherein, K is the number of Assembled tree, fkIt is a lone tree, F is the set of all possible Assembled tree, therefore changes above-mentioned
Formula is as follows:
Wherein, l is training loss function, and Ω is regularization term;
The regularization term Ω of cooperation sustainability forecast model is as follows:
Wherein, T and ω represent the quantity of the leaf node of Assembled tree and corresponding with its predict the outcome respectively;γ and λ are that control is regular
The parameter of change degree;
B. gradient declines module:OrderPredicting the outcome during as i-th example, the t times iteration, and increase ftAs real once
The optimization of body function:
Now Γ(t)It is physical object function Obj (Θ) in optimization process;
Taylor expansion is carried out to this entity function and is definedWithTherefore it is above-mentioned
Formula expands into as follows:
Wherein, T represents the total degree of iteration, Ij=i | q (xi)=j } represent the entity set of cotyledon node j, therefore optimal son
Leaf segment point massCalculate by the following method:
WhereinResulting objective value is calculated by the following manner:
In this case, a less Obj value causes that the structure of integrated boosted tree is more preferable;Simultaneously to each leaf node
Addition is split, and the entity computing formula after division is:
Wherein, L refers to left sibling, and R refers to right node,The quality of left cotyledon node is represented,Represent right cotyledon node
Quality;Represent the undecomposed preceding value of origin node;γ represents the regularization entry value above accessory lobe;
2. prediction module:Prediction module is responsible for being predicted the scientific cooperation sustainability of two scholars;Due to scientific cooperation
Sustainability is studied and quantified in terms of cooperation duration and cooperation number of times two, the prediction work of prediction module also by this two
Part constitutes, i.e. cooperation duration continuation prediction module and cooperation number of times continuation prediction module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710030918.4A CN106886571A (en) | 2017-01-18 | 2017-01-18 | A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710030918.4A CN106886571A (en) | 2017-01-18 | 2017-01-18 | A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106886571A true CN106886571A (en) | 2017-06-23 |
Family
ID=59176536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710030918.4A Pending CN106886571A (en) | 2017-01-18 | 2017-01-18 | A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106886571A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145087A (en) * | 2018-07-30 | 2019-01-04 | 大连理工大学 | A kind of scholar's recommendation and collaborative forecasting method based on expression study and competition theory |
CN109657122A (en) * | 2018-12-10 | 2019-04-19 | 大连理工大学 | A kind of Academic Teams' important member's recognition methods based on academic big data |
CN109858675A (en) * | 2018-12-28 | 2019-06-07 | 中译语通科技股份有限公司 | A kind of expert's science vitality period forecasting method |
CN111126396A (en) * | 2019-12-25 | 2020-05-08 | 北京科技大学 | Image recognition method and device, computer equipment and storage medium |
CN111191902A (en) * | 2019-12-24 | 2020-05-22 | 中国科学技术大学 | Method for analyzing and predicting cooperative effect |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609546A (en) * | 2011-12-08 | 2012-07-25 | 清华大学 | Method and system for excavating information of academic journal paper authors |
CN103077175A (en) * | 2012-01-12 | 2013-05-01 | 西安邮电学院 | Effective collaborative construction and self-adoptive evolution method of academic collaboration relation network |
US20150193551A1 (en) * | 2014-01-08 | 2015-07-09 | Joshua Asher Gordon | System and method for facilitating research collaboration cross reference |
-
2017
- 2017-01-18 CN CN201710030918.4A patent/CN106886571A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609546A (en) * | 2011-12-08 | 2012-07-25 | 清华大学 | Method and system for excavating information of academic journal paper authors |
CN103077175A (en) * | 2012-01-12 | 2013-05-01 | 西安邮电学院 | Effective collaborative construction and self-adoptive evolution method of academic collaboration relation network |
US20150193551A1 (en) * | 2014-01-08 | 2015-07-09 | Joshua Asher Gordon | System and method for facilitating research collaboration cross reference |
Non-Patent Citations (2)
Title |
---|
ASMELASH TEKA HADGU: "Mining Scholarly Communication and Interaction on the Social Web", 《PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB》 * |
康文杰等: "基于社会网络分析的学术合作关系研究", 《计算机技术与发展》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145087A (en) * | 2018-07-30 | 2019-01-04 | 大连理工大学 | A kind of scholar's recommendation and collaborative forecasting method based on expression study and competition theory |
CN109145087B (en) * | 2018-07-30 | 2020-12-11 | 大连理工大学 | Learner recommendation and cooperation prediction method based on expression learning and competition theory |
CN109657122A (en) * | 2018-12-10 | 2019-04-19 | 大连理工大学 | A kind of Academic Teams' important member's recognition methods based on academic big data |
CN109858675A (en) * | 2018-12-28 | 2019-06-07 | 中译语通科技股份有限公司 | A kind of expert's science vitality period forecasting method |
CN111191902A (en) * | 2019-12-24 | 2020-05-22 | 中国科学技术大学 | Method for analyzing and predicting cooperative effect |
CN111126396A (en) * | 2019-12-25 | 2020-05-08 | 北京科技大学 | Image recognition method and device, computer equipment and storage medium |
CN111126396B (en) * | 2019-12-25 | 2023-08-22 | 北京科技大学 | Image recognition method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shrestha et al. | Algorithm supported induction for building theory: How can we use prediction models to theorize? | |
Petropoulos et al. | ‘Horses for Courses’ in demand forecasting | |
Laver et al. | Party competition: An agent-based model | |
Thangavel et al. | Student placement analyzer: A recommendation system using machine learning | |
CN106886571A (en) | A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis | |
Gao et al. | Deep cognitive diagnosis model for predicting students’ performance | |
Garavaglia | Modelling industrial dynamics with “History-friendly” simulations | |
Guala | Experimentation in economics | |
Chang et al. | Towards an improved Adaboost algorithmic method for computational financial analysis | |
Maghsoodi et al. | A machine learning driven multiple criteria decision analysis using LS-SVM feature elimination: sustainability performance assessment with incomplete data | |
Li et al. | MOOC-FRS: A new fusion recommender system for MOOCs | |
Li | Applying grey system theory to evaluate the relationship between industrial characteristics and innovation capabilities within Chinese high-tech industries | |
Liu et al. | Model design and parameter optimization of CNN for side-channel cryptanalysis | |
Van Dinther | Agent-based simulation for research in economics | |
Rahaie et al. | Critic learning in multi agent credit assignment problem | |
Alaimo | Open issues in composite indicators construction | |
Plonsky et al. | Prediction oriented behavioral research and its relationship to classical decision research | |
Fernandes et al. | Learning and ensembling lexicographic preference trees with multiple kernels | |
Kumar et al. | Explainable neural network analysis on movie success prediction | |
Rezaee et al. | A data-driven decision support framework for DEA target setting: an explainable AI approach | |
Yuan et al. | Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences | |
Kutnjak et al. | Applying the decision tree method in identifying key indicators of the Digital Economy and Society Index (DESI) | |
Horn | Multi-objective analysis of machine learning algorithms using model-based optimization techniques | |
Yang | The Freedom of Constraint: An Agent-Based Game Theoretic Model of The Politics of Fertility and Economic Development (POFED) | |
Carroll et al. | Capability Ratios Predict Nothing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170623 |