CN108510205B

CN108510205B - A hypergraph-based method for assessing author skills

Info

Publication number: CN108510205B
Application number: CN201810316651.XA
Authority: CN
Inventors: 夏锋; 杨安东; 刘雷; 孔祥杰; 于硕; 宁兆龙
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2021-07-16
Anticipated expiration: 2038-04-08
Also published as: CN108510205A

Abstract

The invention belongs to the technical field of scholar skill assessment, and relates to a hypergraph-based scholar skill assessment method, which can fine-grainly assess the level of a scholar's skill in a certain field and can reflect the changing law of the scholar's skill over time. The method takes into account factors such as the number and quality of papers, differences in different fields, and time changes. The use of the hypergraph concept allows the method to integrate scholars, domains and skills so that the method provides a fine-grained evaluation scheme. When calculating the distance between scholars, fields and skills, it is expanded based on traditional evaluation parameters such as paper citations and H-index to ensure reliability. At the same time, the use of normalization improves computing efficiency and reduces errors. Finally, adding the time factor enables this method to analyze the change of scholars' fields and skills over time, providing more "raw material" for research.

Description

Author skill evaluation method based on hypergraph

Technical Field

The invention belongs to the technical field of author skill evaluation, and relates to an author skill evaluation method based on a hypergraph.

Background

With the continuous development of science and technology, more and more authors engaged in scientific research work, and the research on scientific researchers is promoted by the increase of the number of scientific researchers. The system and the method have the advantages that the system and the method can evaluate the level of a scientific research worker, have the specialties, have the rules and the like published in a thesis, have the promotion effects on the establishment of a scientific research team, project investment, the comprehensive evaluation of academic levels of authors, the comparison of different authors, the study of academic cooperation behavior mechanisms, the discovery of potential rules of scientific research and the like, and are beneficial to the progress and development of academic circles and even human society.

Currently, H-index is mostly used for evaluating the author level, and indexes such as quoted number of papers and publication number of papers are adopted. The above indexes are generally used for overall evaluation of an author, and have the problems that the proficiency of a certain skill of the author cannot be known, the academic level change of the author in a certain time period cannot be known, and the like, so that the research range and depth are limited to a certain extent.

A hypergraph is a generalized graph in which a hyperedge may contain multiple vertices. The characteristics of the super-edge enable the hypergraph to be capable of fusing the multiple attributes of the author, and the hypergraph is quite suitable for the three-element processing of the author, the field and the skill required in the skill research of the author.

Disclosure of Invention

The invention mainly aims at the defects of the existing research, provides an author skill evaluation method, and provides an author evaluation algorithm based on a hypergraph by analyzing the contribution of an author in published papers and combining time factors. The algorithm carries out fine-grained evaluation on the author, considers the number and quality of the treatises and also considers the differences of different fields, the proficiency of the author in a specific skill in a certain field can be obtained through the algorithm, and meanwhile, the time factor is added, so that the change of the skill of the author along with the time can be obtained.

The technical scheme of the invention is as follows:

an author skill assessment method based on hypergraphs comprises the following steps:

step 1): combining the skill of the author and the author in the paper and the field of the paper into a super limit, counting all papers participated by the author, merging the skill types, and obtaining the statistical data of the author, the skill and the field of the papers; the hypergraph has good compatibility, and can integrate three factors of an author, a field and skill; counting the thesis information published by the author, wherein the author, the skill and the field are used as three vertexes of the super edge;

the super edge connects a certain skill of an author in a certain field, the proficiency of the skill of the author in the certain field is reflected by calculating the weight of the super edge, and the network scale of the author can be effectively reduced by using the super graph;

the skills are complicated, so that the related calculation of the following steps can be influenced, and the skills in the data set are merged to obtain a uniform standard data set;

step 2): combining the three vertexes of the hyperedge pairwise, and calculating the distance of each vertex in the hyperedge; the distance between the attributes is calculated by the following formula:

distance of author j from field f:

where n is the number of authors in the field, n_fIs the total number of papers in the field, c_iIs the quoted number, h, of paper i_jIs the H-index of the author;

the distance between the author j and the field f is normalized, so that subsequent data processing is facilitated; the normalized formula is as follows:

wherein avg (dis (field)) refers to the distance between all authors and the field, and the calculation results are summed and averaged;

distance of author j from skill s:

wherein n is the number of characters used by the author in the skill, c_iIs the number of times of citation, h, of paper i_jIs the H-index, n, of the author_isIs the number of participants in the skill in the paper;

the distance of author j from skill s uses the following normalization formula:

wherein, avg (dis) refers to the distance between all authors and skills, and the average value is calculated after the results are summed;

distance of area f from skill s:

wherein n is_fIs the total number of papers in the field, n_sIs the total number of papers containing the skill, n_fsIs the number of domains that contain the skill;

the distance of the domain f from the skill s is normalized using the following formula:

wherein, avg (field, skip) refers to the average value after summing the distance calculation results of all fields and skills;

step 3): calculating the weight of the excess edge by using an excess edge weight calculation method, wherein the weight is the proficiency of a writer in a specific skill in a certain field; calculating the weight of the super edge by using the deformation of the Gaussian kernel function according to the hypergraph theory, and linking the three distances in the step 2) to obtain a specific skill level parameter of the author in a certain field;

the excess edge weight is calculated using the following formula:

wherein d (x, y) is the distance between two authors, areas and skills, σ is the average for the distance;

is the level value of the skill of i author in the field f s;

step 4): the process is changed along with time, and is applied to each year, so that the change rule of the specific skill of an author in a certain field along with the change of the time is obtained; various time points exist in the research life of an author, such as changing a research institution and changing the research direction, and if the change condition of each skill of the author at different time is known, the change rule of the skill of the author along with the time can be researched, so that the potential rule of scientific research is discovered; in order to realize the goal, the data set is divided into a subdata set every year according to the increase of time, and the data in the year and before the year are stored; and repeating the step 2) and the step 3) for each data subset, and extracting the skill change of each author in each year from the result, namely obtaining the change condition of the skill of the author along with the time.

The invention has the beneficial effects that: skill assessment of authors is a hypergraph-based method that takes into account the number of papers and the quality of the papers, differences in different fields, temporal variations, etc. The use of hypergraph concepts allows the method to fuse authors, fields and skills, thus allowing the method to provide a fine-grained assessment solution. When the distance of an author, a field and skill is calculated, expansion is carried out on the basis of traditional evaluation parameters such as paper quoted amount, H-index and the like, reliability is guaranteed, meanwhile, operation efficiency is improved by normalization, and errors are reduced.

The invention adds time factor to analyze the author field, and the skill changes with time, to provide more raw material for follow-up research.

Drawings

FIG. 1 is a flow chart of the data preprocessing performed on Ploss datasets according to experimental requirements in accordance with the present invention.

Fig. 2 is a final result author skill radar chart a of the present invention.

FIG. 3 is a final result author skill radar chart b of the present invention

Fig. 4 is a schematic diagram a of the annual maximum skill level of the author as it changes incrementally over time.

Fig. 5 is a schematic diagram b of the author's annual maximum skill level increasing with time.

FIG. 6 is an exemplary graph of author skill over time.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

The embodiment of the invention provides an author skill evaluation method based on a hypergraph, which comprises the following steps:

step 1: selecting a Plosone data set as an experimental data set of the method, and preprocessing the Plosone data set, wherein the processing process is shown in fig. 1.

In order to capture the contribution of the author in the paper, i.e. the skill the author uses in this paper, the present invention uses the Plosone dataset. Data set raw data are as follows:

TABLE 1 Ploss dataset

As can be seen from table 1, the number of different skills is very large, which may be due to the lack of standard naming rules for the skills, resulting in similar skills being used with different expressions. The original skill naming if the data set is used directly can lead to inaccurate and redundant results.

Therefore, the invention makes statistics on skills, finds that 10342 skills appear less than 10 times and 21 skills appear in a jump increase mode, therefore, the skills appear less than 10 times are discarded, and then classifies the skill names based on the 21 jump increase skills to finally obtain 16 skill classes, wherein each skill class is represented by the skill name with the largest occurrence frequency.

Because the number of authors is large, there is a great probability that different authors have the same name, which may interfere with the experimental results. In order to relieve the influence of the same-name problem on the experimental result, the method disclosed by the invention is used for carrying out same-name distinguishing on the author by combining the actual condition that the data set contains the mechanism to which the author belongs and using the cooperation condition of the author and the research mechanism to which the author belongs as reference according to the existing same-name distinguishing algorithm. The homonymous distinguishing rule used by the invention is as follows: if two authors of the same name have collaborated with the same author, it is reasonable to think that the two authors of the same name have the same probability. If two authors of the same name belong to the same research institution, the two authors of the same name are likely to be the same person. Because the same-name distinguishing is a research problem at present, no better solution exists, and the invention does not discuss the same-name distinguishing problem any more.

Step 2: combining the three vertexes of the hyper-edge two by two, and then calculating the distance between the three vertexes.

And (3) naming the standardized skills obtained in the step (1) and collecting the data subjected to homonymy distinguishing, calculating the data according to a distance calculation formula, and then normalizing the result to obtain the distances between authors and fields, between authors and skills and between skills and fields.

And step 3: and calculating the weight of the excess edge through a weight calculation method of the excess edge, wherein the weight is the specific skill proficiency of an author in a certain field.

According to the hypergraph theory, the deformation of the Gaussian kernel function is used for calculating the hyperedge weight, and the calculation formula is as follows:

wherein d (j, s) represents the distance between the author and the skill, d (j, f) represents the distance between the author and the domain, d (f, s) represents the distance between the domain and the skill, and σ represents_jsMean, σ, representing the distance of all authors from the skill_jfMean value, σ, representing the distance of all authors from the field_fsRepresents the average of all domain-to-skill distances.

To demonstrate the author skill distribution in a concrete and concise manner, the present invention uses radar maps to represent the skill distribution of the author. Fig. 2 and 3 give radar map examples of the author skill distributions, one field for each circle in the figures.

And 4, step 4: the change rule of the specific skill of the author in a certain field along with the change of the time can be obtained by applying the process along with the change of the time to each year.

According to the invention, the data of Ploss is divided into one data set every year according to the increment of time, and 12 sub-data sets from 2006 to 2017 are divided.

And (3) applying the steps 2 and 3 to the data subset of each year, and extracting the skill change condition corresponding to each author in different years from the result, so that the change of different skills of the author in different fields along with time can be obtained.

The invention integrates the author skill and year into a line graph in order to show the change of the author skill along with the time. Figure 6 gives the author a line graph of skill over time. Since the number of skills is too large, each learner has a plurality of skill levels which change with time, and it is difficult to find rules, the extraction of the highest skill level of the author is combined with the time variation, and the scatter diagrams shown in fig. 4 and 5 are obtained.

Claims

1. a method for assessing author skills based on hypergraph, is characterized in that, step is as follows:

Step 1): Combine the author, the author's skills in this paper and the field of this paper into a super-edge, count all papers the author participates in, merge the skill types, and get the statistical data of the author, skill and field of the paper; combine the papers published by the author Information is counted, with author, skill and domain as the three vertices of the hyperedge;

Step 2): Combine the three vertices of the hyperedge in pairs to calculate the distance of each vertex in the hypergraph; the distance between each attribute is calculated by the following formula:

The distance between author j and domain f:

Among them, n is the number of papers of the author in the field, n _f is the total number of papers in the field, ci is the number of citations of paper _i , and h _j is the H-index of the author;

The distance between author j and domain f is normalized to facilitate subsequent data processing; the normalization formula is as follows:

Among them, avg(dis(author,field)) refers to the distance between all authors and the field, and the calculation results are summed and averaged;

The distance between author j and skill s:

Among them, n is the number of papers that the author uses this skill, ci is the number of citations of paper _i , h _j is the H-index of the author, and n _is the number of participants of this skill in the paper;

The distance between author j and skill s is normalized using the following formula:

Among them, avg(dis(author,skill)) refers to the distance between all authors and skills, and the calculation results are summed and averaged;

The distance between domain f and skill s:

where n _f is the total number of papers in the field, n _s is the total number of papers with the skill, and n _fs is the number of fields with the skill;

The distance between domain f and skill s is normalized using the following formula:

Among them, avg(dis(field,skill)) refers to the average of the distance calculation results of all fields and skills;

Step 3): Calculate the weight of the hyperedge through the weight calculation method of the hyperedge, and the weight is the specific skill proficiency of an author in a certain field; according to the hypergraph theory, use the deformation of the Gaussian kernel function to calculate the weight of the hyperedge, Link the three distances in step 2) to obtain the author's skill level parameter in a certain field;

The hyperedge weights are calculated using the following formula:

Among them, d(x,y) is the distance between authors, fields and skills, and σ is the average of the distances;

is the level value of the s skill of the i author in the f field;

Step 4): Apply the above process over time and apply it to each year to obtain the changing law of the author's specific skills in a certain field over time; there are various time points in the author's research career, such as changing Research institutions and changes in research direction, if you know the changes of authors' skills at different times, they can study the changes of authors' skills over time, so as to discover the potential laws of scientific research; to achieve this goal, the data set will grow according to time, Divide a sub-data set every year to store the data of this year and before this year; repeat the above steps 2) and 3) for each data subset, and extract the skill changes of each author in each year from the results, that is, get Changes in author skills over time.