CN108510205B - Author skill evaluation method based on hypergraph - Google Patents

Author skill evaluation method based on hypergraph Download PDF

Info

Publication number
CN108510205B
CN108510205B CN201810316651.XA CN201810316651A CN108510205B CN 108510205 B CN108510205 B CN 108510205B CN 201810316651 A CN201810316651 A CN 201810316651A CN 108510205 B CN108510205 B CN 108510205B
Authority
CN
China
Prior art keywords
author
skill
field
distance
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810316651.XA
Other languages
Chinese (zh)
Other versions
CN108510205A (en
Inventor
夏锋
杨安东
刘雷
孔祥杰
于硕
宁兆龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201810316651.XA priority Critical patent/CN108510205B/en
Publication of CN108510205A publication Critical patent/CN108510205A/en
Application granted granted Critical
Publication of CN108510205B publication Critical patent/CN108510205B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of student skill evaluation, and relates to a hypergraph-based student skill evaluation method, which can evaluate the level of a certain skill of a student in a certain field in a fine granularity manner and can reflect the change rule of the student skill along with time. The method considers factors such as the number of the thesis, the quality of the thesis, the difference of different fields, time change and the like. The use of the hypergraph concept allows the method to fuse scholars, domains and skills, thereby allowing the method to provide a fine-grained assessment scheme. When the distance of a student, a field and a skill is calculated, expansion is carried out on the basis of traditional evaluation parameters such as paper quote amount, H-index and the like, reliability is guaranteed, meanwhile, operation efficiency is improved through normalization, and errors are reduced. Finally, the time factor is added, so that the method can analyze the field of scholars, and the change of skills along with time provides more raw materials for research.

Description

Author skill evaluation method based on hypergraph
Technical Field
The invention belongs to the technical field of author skill evaluation, and relates to an author skill evaluation method based on a hypergraph.
Background
With the continuous development of science and technology, more and more authors engaged in scientific research work, and the research on scientific researchers is promoted by the increase of the number of scientific researchers. The system and the method have the advantages that the system and the method can evaluate the level of a scientific research worker, have the specialties, have the rules and the like published in a thesis, have the promotion effects on the establishment of a scientific research team, project investment, the comprehensive evaluation of academic levels of authors, the comparison of different authors, the study of academic cooperation behavior mechanisms, the discovery of potential rules of scientific research and the like, and are beneficial to the progress and development of academic circles and even human society.
Currently, H-index is mostly used for evaluating the author level, and indexes such as quoted number of papers and publication number of papers are adopted. The above indexes are generally used for overall evaluation of an author, and have the problems that the proficiency of a certain skill of the author cannot be known, the academic level change of the author in a certain time period cannot be known, and the like, so that the research range and depth are limited to a certain extent.
A hypergraph is a generalized graph in which a hyperedge may contain multiple vertices. The characteristics of the super-edge enable the hypergraph to be capable of fusing the multiple attributes of the author, and the hypergraph is quite suitable for the three-element processing of the author, the field and the skill required in the skill research of the author.
Disclosure of Invention
The invention mainly aims at the defects of the existing research, provides an author skill evaluation method, and provides an author evaluation algorithm based on a hypergraph by analyzing the contribution of an author in published papers and combining time factors. The algorithm carries out fine-grained evaluation on the author, considers the number and quality of the treatises and also considers the differences of different fields, the proficiency of the author in a specific skill in a certain field can be obtained through the algorithm, and meanwhile, the time factor is added, so that the change of the skill of the author along with the time can be obtained.
The technical scheme of the invention is as follows:
an author skill assessment method based on hypergraphs comprises the following steps:
step 1): combining the skill of the author and the author in the paper and the field of the paper into a super limit, counting all papers participated by the author, merging the skill types, and obtaining the statistical data of the author, the skill and the field of the papers; the hypergraph has good compatibility, and can integrate three factors of an author, a field and skill; counting the thesis information published by the author, wherein the author, the skill and the field are used as three vertexes of the super edge;
the super edge connects a certain skill of an author in a certain field, the proficiency of the skill of the author in the certain field is reflected by calculating the weight of the super edge, and the network scale of the author can be effectively reduced by using the super graph;
the skills are complicated, so that the related calculation of the following steps can be influenced, and the skills in the data set are merged to obtain a uniform standard data set;
step 2): combining the three vertexes of the hyperedge pairwise, and calculating the distance of each vertex in the hyperedge; the distance between the attributes is calculated by the following formula:
distance of author j from field f:
Figure GDA0003096019640000021
where n is the number of authors in the field, nfIs the total number of papers in the field, ciIs the quoted number, h, of paper ijIs the H-index of the author;
the distance between the author j and the field f is normalized, so that subsequent data processing is facilitated; the normalized formula is as follows:
Figure GDA0003096019640000022
wherein avg (dis (field)) refers to the distance between all authors and the field, and the calculation results are summed and averaged;
distance of author j from skill s:
Figure GDA0003096019640000031
wherein n is the number of characters used by the author in the skill, ciIs the number of times of citation, h, of paper ijIs the H-index, n, of the authorisIs the number of participants in the skill in the paper;
the distance of author j from skill s uses the following normalization formula:
Figure GDA0003096019640000032
wherein, avg (dis) refers to the distance between all authors and skills, and the average value is calculated after the results are summed;
distance of area f from skill s:
Figure GDA0003096019640000033
wherein n isfIs the total number of papers in the field, nsIs the total number of papers containing the skill, nfsIs the number of domains that contain the skill;
the distance of the domain f from the skill s is normalized using the following formula:
Figure GDA0003096019640000034
wherein, avg (field, skip) refers to the average value after summing the distance calculation results of all fields and skills;
step 3): calculating the weight of the excess edge by using an excess edge weight calculation method, wherein the weight is the proficiency of a writer in a specific skill in a certain field; calculating the weight of the super edge by using the deformation of the Gaussian kernel function according to the hypergraph theory, and linking the three distances in the step 2) to obtain a specific skill level parameter of the author in a certain field;
the excess edge weight is calculated using the following formula:
Figure GDA0003096019640000041
wherein d (x, y) is the distance between two authors, areas and skills, σ is the average for the distance;
Figure GDA0003096019640000042
is the level value of the skill of i author in the field f s;
step 4): the process is changed along with time, and is applied to each year, so that the change rule of the specific skill of an author in a certain field along with the change of the time is obtained; various time points exist in the research life of an author, such as changing a research institution and changing the research direction, and if the change condition of each skill of the author at different time is known, the change rule of the skill of the author along with the time can be researched, so that the potential rule of scientific research is discovered; in order to realize the goal, the data set is divided into a subdata set every year according to the increase of time, and the data in the year and before the year are stored; and repeating the step 2) and the step 3) for each data subset, and extracting the skill change of each author in each year from the result, namely obtaining the change condition of the skill of the author along with the time.
The invention has the beneficial effects that: skill assessment of authors is a hypergraph-based method that takes into account the number of papers and the quality of the papers, differences in different fields, temporal variations, etc. The use of hypergraph concepts allows the method to fuse authors, fields and skills, thus allowing the method to provide a fine-grained assessment solution. When the distance of an author, a field and skill is calculated, expansion is carried out on the basis of traditional evaluation parameters such as paper quoted amount, H-index and the like, reliability is guaranteed, meanwhile, operation efficiency is improved by normalization, and errors are reduced.
The invention adds time factor to analyze the author field, and the skill changes with time, to provide more raw material for follow-up research.
Drawings
FIG. 1 is a flow chart of the data preprocessing performed on Ploss datasets according to experimental requirements in accordance with the present invention.
Fig. 2 is a final result author skill radar chart a of the present invention.
FIG. 3 is a final result author skill radar chart b of the present invention
Fig. 4 is a schematic diagram a of the annual maximum skill level of the author as it changes incrementally over time.
Fig. 5 is a schematic diagram b of the author's annual maximum skill level increasing with time.
FIG. 6 is an exemplary graph of author skill over time.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
The embodiment of the invention provides an author skill evaluation method based on a hypergraph, which comprises the following steps:
step 1: selecting a Plosone data set as an experimental data set of the method, and preprocessing the Plosone data set, wherein the processing process is shown in fig. 1.
In order to capture the contribution of the author in the paper, i.e. the skill the author uses in this paper, the present invention uses the Plosone dataset. Data set raw data are as follows:
TABLE 1 Ploss dataset
Figure GDA0003096019640000051
As can be seen from table 1, the number of different skills is very large, which may be due to the lack of standard naming rules for the skills, resulting in similar skills being used with different expressions. The original skill naming if the data set is used directly can lead to inaccurate and redundant results.
Therefore, the invention makes statistics on skills, finds that 10342 skills appear less than 10 times and 21 skills appear in a jump increase mode, therefore, the skills appear less than 10 times are discarded, and then classifies the skill names based on the 21 jump increase skills to finally obtain 16 skill classes, wherein each skill class is represented by the skill name with the largest occurrence frequency.
Because the number of authors is large, there is a great probability that different authors have the same name, which may interfere with the experimental results. In order to relieve the influence of the same-name problem on the experimental result, the method disclosed by the invention is used for carrying out same-name distinguishing on the author by combining the actual condition that the data set contains the mechanism to which the author belongs and using the cooperation condition of the author and the research mechanism to which the author belongs as reference according to the existing same-name distinguishing algorithm. The homonymous distinguishing rule used by the invention is as follows: if two authors of the same name have collaborated with the same author, it is reasonable to think that the two authors of the same name have the same probability. If two authors of the same name belong to the same research institution, the two authors of the same name are likely to be the same person. Because the same-name distinguishing is a research problem at present, no better solution exists, and the invention does not discuss the same-name distinguishing problem any more.
Step 2: combining the three vertexes of the hyper-edge two by two, and then calculating the distance between the three vertexes.
And (3) naming the standardized skills obtained in the step (1) and collecting the data subjected to homonymy distinguishing, calculating the data according to a distance calculation formula, and then normalizing the result to obtain the distances between authors and fields, between authors and skills and between skills and fields.
And step 3: and calculating the weight of the excess edge through a weight calculation method of the excess edge, wherein the weight is the specific skill proficiency of an author in a certain field.
According to the hypergraph theory, the deformation of the Gaussian kernel function is used for calculating the hyperedge weight, and the calculation formula is as follows:
Figure GDA0003096019640000061
wherein d (j, s) represents the distance between the author and the skill, d (j, f) represents the distance between the author and the domain, d (f, s) represents the distance between the domain and the skill, and σ representsjsMean, σ, representing the distance of all authors from the skilljfMean value, σ, representing the distance of all authors from the fieldfsRepresents the average of all domain-to-skill distances.
To demonstrate the author skill distribution in a concrete and concise manner, the present invention uses radar maps to represent the skill distribution of the author. Fig. 2 and 3 give radar map examples of the author skill distributions, one field for each circle in the figures.
And 4, step 4: the change rule of the specific skill of the author in a certain field along with the change of the time can be obtained by applying the process along with the change of the time to each year.
According to the invention, the data of Ploss is divided into one data set every year according to the increment of time, and 12 sub-data sets from 2006 to 2017 are divided.
And (3) applying the steps 2 and 3 to the data subset of each year, and extracting the skill change condition corresponding to each author in different years from the result, so that the change of different skills of the author in different fields along with time can be obtained.
The invention integrates the author skill and year into a line graph in order to show the change of the author skill along with the time. Figure 6 gives the author a line graph of skill over time. Since the number of skills is too large, each learner has a plurality of skill levels which change with time, and it is difficult to find rules, the extraction of the highest skill level of the author is combined with the time variation, and the scatter diagrams shown in fig. 4 and 5 are obtained.

Claims (1)

1. An author skill assessment method based on hypergraph is characterized by comprising the following steps:
step 1): combining the skills of the author and the author in the paper and the field of the paper into a super limit, counting all papers participated by the author, merging the skill types, and obtaining the statistical data of the author, the skill and the field of the papers; counting the thesis information published by the author, wherein the author, the skill and the field are used as three vertexes of the super edge;
step 2): combining the three vertexes of the hyperedge pairwise, and calculating the distance of each vertex in the hyperedge; the distance between the attributes is calculated by the following formula:
distance of author j from field f:
Figure FDA0003096019630000011
where n is the number of authors in the field, nfIs the total number of papers in the field, ciIs the quoted number, h, of paper ijIs the H-index of the author;
the distance between the author j and the field f is normalized, so that subsequent data processing is facilitated; the normalized formula is as follows:
Figure FDA0003096019630000012
wherein avg (dis (field)) refers to the distance between all authors and the field, and the calculation results are summed and averaged;
distance of author j from skill s:
Figure FDA0003096019630000013
wherein n is the number of characters used by the author in the skill, ciIs the number of times of citation, h, of paper ijIs the H-index, n, of the authorisIs the number of participants in the skill in the paper;
the distance of author j from skill s uses the following normalization formula:
Figure FDA0003096019630000014
wherein, avg (dis) refers to the distance between all authors and skills, and the average value is calculated after the results are summed;
distance of area f from skill s:
Figure FDA0003096019630000021
wherein n isfIs the total number of papers in the field, nsIs the total number of papers containing the skill, nfsIs the number of domains that contain the skill;
the distance of the domain f from the skill s is normalized using the following formula:
Figure FDA0003096019630000022
wherein, avg (field, skip) refers to the average value after summing the distance calculation results of all fields and skills;
step 3): calculating the weight of the excess edge by using an excess edge weight calculation method, wherein the weight is the proficiency of a writer in a specific skill in a certain field; calculating the weight of the super edge by using the deformation of the Gaussian kernel function according to the hypergraph theory, and linking the three distances in the step 2) to obtain the skill level parameter of the author in a certain field;
the excess edge weight is calculated using the following formula:
Figure FDA0003096019630000023
wherein d (x, y) is the distance between two authors, areas and skills, σ is the average for the distance;
Figure FDA0003096019630000024
is the level value of the skill of i author in the field f s;
step 4): the process is changed along with time, and is applied to each year, so that the change rule of the specific skill of an author in a certain field along with the change of the time is obtained; various time points exist in the research life of an author, such as changing research institutions and changing research directions, if the change conditions of various skills of the author at different time are known, the change rule of the skill of the author along with time is researched, and therefore the potential rule of scientific research is found; in order to realize the goal, the data set is divided into a subdata set every year according to the increase of time, and the data in the year and before the year are stored; and repeating the step 2) and the step 3) for each data subset, and extracting the skill change of each author in each year from the result, namely obtaining the change condition of the skill of the author along with the time.
CN201810316651.XA 2018-04-08 2018-04-08 Author skill evaluation method based on hypergraph Expired - Fee Related CN108510205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810316651.XA CN108510205B (en) 2018-04-08 2018-04-08 Author skill evaluation method based on hypergraph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810316651.XA CN108510205B (en) 2018-04-08 2018-04-08 Author skill evaluation method based on hypergraph

Publications (2)

Publication Number Publication Date
CN108510205A CN108510205A (en) 2018-09-07
CN108510205B true CN108510205B (en) 2021-07-16

Family

ID=63381338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810316651.XA Expired - Fee Related CN108510205B (en) 2018-04-08 2018-04-08 Author skill evaluation method based on hypergraph

Country Status (1)

Country Link
CN (1) CN108510205B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862015A (en) * 2021-04-01 2021-05-28 北京理工大学 Paper classification method and system based on hypergraph neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN104090936A (en) * 2014-06-27 2014-10-08 华南理工大学 News recommendation method based on hypergraph sequencing
CN105956197A (en) * 2016-06-15 2016-09-21 杭州量知数据科技有限公司 Social media graph representation model-based social risk event extraction method
CN106778011A (en) * 2016-12-29 2017-05-31 大连理工大学 A kind of scholar's influence power appraisal procedure based on academic heterogeneous network
CN107273207A (en) * 2017-05-25 2017-10-20 天津大学 A kind of related data storage method based on hypergraph partitioning algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9996567B2 (en) * 2014-05-30 2018-06-12 Georgetown University Process and framework for facilitating data sharing using a distributed hypergraph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN104090936A (en) * 2014-06-27 2014-10-08 华南理工大学 News recommendation method based on hypergraph sequencing
CN105956197A (en) * 2016-06-15 2016-09-21 杭州量知数据科技有限公司 Social media graph representation model-based social risk event extraction method
CN106778011A (en) * 2016-12-29 2017-05-31 大连理工大学 A kind of scholar's influence power appraisal procedure based on academic heterogeneous network
CN107273207A (en) * 2017-05-25 2017-10-20 天津大学 A kind of related data storage method based on hypergraph partitioning algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Music Recommendation by Unified Hypergraph:Combining Social Media Information and Music Content;Jiajun Bu et al.;《Proceedings of the 18th ACM international conference on Multimedia》;20101029;全文 *

Also Published As

Publication number Publication date
CN108510205A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
Jiang et al. A topic modeling based bibliometric exploration of hydropower research
Caralis et al. Profitability of wind energy investments in China using a Monte Carlo approach for the treatment of uncertainties
Smith et al. Predicting firm-level bankruptcy in the Spanish economy using extreme gradient boosting
Cong et al. Performance evaluation of public-private partnership projects from the perspective of Efficiency, Economic, Effectiveness, and Equity: A study of residential renovation projects in China
Fauzia et al. Mapping the potential of zakat collection digitally in Indonesia
Hu et al. The dynamic evolution of global energy security and geopolitical games: 1995~ 2019
Rudkin et al. On the topology of cryptocurrency markets
CN108510205B (en) Author skill evaluation method based on hypergraph
Wang et al. Forecasting VaR and ES by using deep quantile regression, GANs-based scenario generation, and heterogeneous market hypothesis
Jiuwen et al. Impact of urban form on housing affordability stress in Chinese cities: Does public service efficiency matter?
Liu et al. Assessing the credit risk of corporate bonds based on factor Analysis and logistic regress analysis techniques: evidence from new energy enterprises in China
CN108197729A (en) Value investment analysis method, equipment and storage medium based on machine learning
Yao et al. Evaluating and Analyzing Urban Renewal and Transformation Potential Based on AET Models: A Case Study of Shenzhen City
Lin [Retracted] Big Data Technology in the Macrodecision‐Making Model of Regional Industrial Economic Information Applied Research
Koukal et al. Offshore wind energy in emerging countries: a decision support system for the assessment of projects
CN111242520B (en) Feature synthesis model generation method and device and electronic equipment
Gu et al. Financial Decision Management of Enterprise Cloud Accounting Based on Big Data Technology
CN113344247A (en) Deep learning-based power facility site selection prediction method and system
Muhammad et al. Financial feasibility analysis of Gumanti micro hydro power plant project
Vnukova et al. Identifying changes in insurance companies’ competitiveness on the travel services market
Huang et al. Dynamic Analysis of Regional Integration Development: Comprehensive Evaluation, Evolutionary Trend, and Driving Factors
HONG et al. Financial Decentralization, SOEs and Industrial Upgrading: An Empirical Explanation for Regional Differences of Financial Decentralization
Huang et al. Graph neural network-based identification of ditch matching patterns across multi-scale geospatial data
Pan et al. How does digital transformation affect systemic financial risks of commercial banks? An investigation based on fuzzy-set qualitative comparative analysis
Şahinarslan et al. Machine learning algorithms to forecast population: Turkey example

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210716