CN117252481A

CN117252481A - Manuscript-examining expert evaluation method based on supervised learning, computer program and device

Info

Publication number: CN117252481A
Application number: CN202311344985.5A
Authority: CN
Inventors: 董文杰; 李苑; 林家乐
Original assignee: Beijing Magtech Technology Development Co ltd; National Science Library Chinese Academy Of Sciences
Current assignee: Beijing Magtech Technology Development Co ltd; National Science Library Chinese Academy Of Sciences
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2023-12-19

Abstract

A manuscript-examining expert evaluation method based on supervised learning provides and predicts manuscript-examining expert scores by decomposing and calculating manuscript-examining behavior elements (such as liveness, manuscript-examining quality and manuscript-examining speed). The method comprises the core steps of determining evaluation indexes and data requirements, collecting and processing data, confirming validity information and feature extraction, determining corresponding weights for each evaluation index, constructing an evaluation model based on the extracted features, the validity information and the determined weights, and evaluating and optimizing the model. Through the method, the manuscript examination and evaluation system is improved and optimized, and more reliable support is provided for quality evaluation of scientific research papers.

Description

Manuscript-examining expert evaluation method based on supervised learning, computer program and device

Technical Field

The invention relates to the field of computer application, in particular to a manuscript-trial expert evaluation method based on supervised learning, a related computer program and an electronic device.

Background

The academic quality of scientific papers is a core element of academic journals, so that a manuscript approval evaluation system plays a crucial role in the whole publishing process. However, current manuscript evaluation systems still present many challenges, such as: (1) The evaluation of paper quality often relies on subjective judgment of editorial and manuscript specialists, and personal views and preferences of the paper can directly influence the evaluation result, so that subjectivity and inconsistency of the evaluation result are caused. In addition, different journals and institutions of students use different evaluation criteria and indexes to evaluate manuscript quality, and inconsistency of the criteria also affects judgment of manuscript-examining experts, and trouble of authors and manuscripts-examining persons is increased. (2) lack of evaluation index for manuscript specialist: the lack of objectivity and quantifiable criteria, coupled with the complexity of the analysis of the manuscript behavioral data, makes it difficult for conventional methods to effectively analyze and process large amounts of manuscript data, and thus makes it difficult for manuscript specialists to objectively evaluate, thereby identifying excellent manuscripts. (3) lack of predictive capability: the existing manuscript evaluation system can only provide feedback on the current manuscript manifestation, and future manuscript manifestation is difficult to predict. This limits the ability of academic journals to optimize the manuscript specialist team.

In view of the above-mentioned limitations and drawbacks of the background art, the present invention aims to provide a manuscript-examining expert evaluation method based on supervised learning, so as to overcome these problems and improve objectivity, consistency and accuracy of evaluation. Through learning the evaluation mode of the manuscript-examining expert, the invention is hopeful to improve and optimize the manuscript-examining evaluation system and provides more reliable support for the quality evaluation of scientific research papers.

Disclosure of Invention

The invention provides a manuscript-examining expert evaluation method based on supervised learning, which not only can provide scores for manuscript-examining experts, but also has the capability of predicting the scores of the manuscript-examining experts. Through the supervised learning technology, the method can automatically learn the evaluation mode of the manuscript-examining expert, and further realize more accurate, objective and consistent evaluation results.

The implementation steps of the invention are as follows:

step S1: and determining evaluation indexes and data requirements. In the embodiment of the invention, a series of evaluation indexes for evaluating the manuscript specialist are definitely defined, wherein the evaluation indexes comprise key dimensions such as the liveness of a manuscript inspector, the manuscript quality, the manuscript inspection speed and the like. The following are the detailed steps:

1.1 defining evaluation criteria including, but not limited to: liveness: the participation frequency and the manuscript number of the manuscript inspector. The manuscript quality: the detail degree, quality and accuracy of the manuscript opinion. Manuscript-examining speed: the time required for the manuscript to be finished by the manuscript inspector.

1.2 identify data requirements. Based on the evaluation index, relevant data to be extracted from the journal examination manuscript system is determined, including but not limited to: the historical manuscript examination data of the manuscript examination person comprises manuscript examination opinion content, manuscript examination time, manuscript examination conclusion and the like. Related information of manuscripts, such as manuscript type, manuscript examining period, manuscript examining state and the like.

1.3 integrate journal examination manuscript system. In order to acquire necessary manuscript examination data, a data connection or interface is established with a journal manuscript examination system, so that historical manuscript examination data of a manuscript examination person is extracted and used for calculation of subsequent evaluation indexes and ranking of manuscript examination specialists.

1.4 determining a data processing method. The method for cleaning, preprocessing and converting the extracted data is clear, so that the accuracy and consistency of the data are ensured.

1.5 data collection plan. A detailed data collection plan is formulated, including the frequency of data acquisition, data updating strategies, and data interaction modes with the journal approval system.

Step S2: and (5) data collection and processing. In the embodiment of the invention, the purpose of data collection and processing is to acquire necessary data for evaluation index calculation and perform corresponding data processing. The following are the detailed steps:

2.1 data collection: and (3) extracting historical manuscript examination data of a manuscript examination person from a journal manuscript examination system according to the data acquisition plan in the step (1), wherein the historical manuscript examination data comprise manuscript examination opinions, manuscript examination time stamps, manuscript examination conclusions and the like, and related manuscript information.

2.2 data time ranges: the data collection time frame includes the last five years of the manuscript records and related information.

2.3 data cleaning and pretreatment: and cleaning, de-duplication and formatting the data extracted from the journal manuscript-examining system to ensure the accuracy and consistency of the data. Including but not limited to removing duplicate data, processing missing values, and format normalization.

2.4 data conversion: and converting the cleaned and preprocessed data into a format suitable for evaluation index calculation. This may include the measurement variables required to calculate the rating index for each individual reviewer.

Step S3: confirming validity information and feature extraction. In this step, the validity information of each of the contributors is screened and feature extraction is performed.

3.1 confirming validity information. The historical manuscript data for each manuscript is analyzed according to predetermined criteria to determine its validity. The validity information confirmation may include the following: the participation frequency and the manuscript number of the manuscript is confirmed to determine whether the validity requirement is satisfied. For example, whether or not the person is a valid person is determined based on whether or not the person has a document return record or whether or not there is a document rejection in the past 36 months. The invalid manuscript will not participate in the calculation of the subsequent evaluation index.

3.2, extracting characteristics. Features for evaluation are extracted from the historical manuscript data of each effective manuscript. The feature extraction includes: extracting information required by evaluation indexes from the manuscript-examining opinions, such as manuscript-examining opinion length, manuscript-examining time and the like; calculating characteristics related to the manuscript quality and manuscript speed of a manuscript inspector; and calculating the weighing variable parameters according to the specific evaluation indexes.

3.3 behavior variable extraction and calculation. Manuscript behavioral variables are extracted from historical manuscript data of manuscript specialists and calculated, and include but are not limited to: the relative length variable X1 of the manuscript-examining opinion of the manuscript-examining person; conclusion consistency variable X2; conclusion the variability variable X3; a manuscript-examining speed variable X4; a recently completed manuscript checking number variable X5; the proportion of the completion degree of the manuscript is X6, etc.

3.4 data normalization. Behavioral variables extracted from each reviewer's historical data are normalized to ensure accuracy of comparisons and comprehensive calculations between different variables.

Step S4: weight determination and model construction

4.1 determining the weights. In this step, a corresponding weight is determined for each evaluation index (e.g., liveness, manuscript quality, manuscript speed). The determination of weights may be based on the needs and goals of the editing team, or may be data driven methods, such as historical data based analysis and feedback. Different indicators may be given different weights to reflect their importance in the evaluation.

4.2 model construction. Based on the extracted features, the validity information and the determined weights, an evaluation model is constructed, which is capable of learning the score of the manuscript specialist.

Step S5 model evaluation and optimization

5.1 in this step, a decision tree model is constructed using the data in the training set, and the model is tested using the data in the test set. The model is evaluated through performance index and cross validation. And (3) performing optimization and optimization of the model according to the model evaluation result so as to improve the prediction accuracy. The optimization process may include adjusting model parameters, feature selection, and performance optimization. And meanwhile, real-time and expandability checking is carried out, so that the method can be operated in a real-time manuscript checking environment and can process large-scale manuscript checking data.

5.2, generating manuscript-expert ordering recommendation: and scoring and ranking the new manuscript specialists by using the trained model.

In some embodiments, the description and calculation method of the manuscript-oriented action variable is as follows:

the relative length variable X1 of the manuscript opinion: for evaluating the contribution of the length of the manuscript opinion of the manuscript inspector relative to the average length. This helps determine whether the manuscript opinion of the manuscript is exhaustive. The calculation method comprises the following steps: in each manuscript-examining period, calculating the ratio of the single manuscript-examining opinion length to the average manuscript-examining opinion length, and then carrying out weighted average on the result of each manuscript-examining period.

The result consistency variable X2: and the method is used for measuring whether the manuscript examination conclusion of the manuscript examination person is consistent. This will be scored higher if the evaluation results of one reviewer are consistent across different contributions. The calculation method comprises the following steps: and counting the inconsistent manuscript space and total manuscript space of the conclusion in each manuscript checking period. This will result in a ratio X2 'representing the inconsistent ratio, and a conclusion consistency variable x2=1-X2' is calculated.

The conclusion variability variable X3: the method is used for measuring the difference between the manuscript examination conclusion of the manuscript examination person and the overall manuscript examination conclusion. Higher variability may decrease the score. The calculation method comprises the following steps: calculating the ratio of the manuscript approving conclusion of the manuscript approver to the total manuscript approving number, calculating the ratio of the total manuscript approving conclusion, obtaining the vector of the two dimension manuscript approving conclusion numbers, and representing the two vectors as X and Y. Performing distance normalization processing on the two vectors to obtain a differential variable X3; the normalization may be performed in one of the following ways: calculating the spearman correlation coefficient of the two vectors, and converting the spearman correlation coefficient into a differential variable X3; calculating Euclidean distance of the two vectors, and converting the Euclidean distance into a differential variable X3; the weighted values of the two vectors are calculated using the formula D (X, Y) = Σwi (xi-yi), where xi and yi are the elements of vectors X and Y, respectively, and wi is the weight of the corresponding element. If xi is equal to 0, the corresponding weight wi is zero.

The manuscript-examining speed variable X4: the method is used for evaluating the manuscript checking speed of the manuscript checking person, and a higher score can be obtained for the higher manuscript checking speed. The calculation method comprises the following steps: in each draft checking period, calculating the number of days spent by each draft checking; comparing the number of days spent on manuscripts with a comparison number of days for each manuscript examination; the number of days for comparison can be a preset standard or a specific time period, and is used for measuring the rationality and the speed of the manuscript; average calculation is carried out on the ratio of each manuscript examination (the ratio of the manuscript examination days to the comparison days) in all manuscript examination periods; and carrying out normalization processing on the average value of the ratio to obtain the manuscript-examining speed variable X4.

The recent completion manuscript trial number variable X5: for examining the number of times the reviewer completed the review in the near term (e.g., the last three months). The calculation method comprises the following steps: the number of times the manuscript is assigned to be checked (assigned number of times of checking the manuscript) and the number of times the manuscript has returned comments (number of times of checking the manuscript completed) are recorded in the specified recent time period, and then the ratio of the number of times of checking the manuscript completed to the number of times of checking the manuscript assigned is calculated.

The manuscript-finished degree proportion X6: the method is used for examining the proportion of the manuscript-examining person to finish the manuscript-examining within a specific time period, and the finish refers to receiving the examination and returning the manuscript-examining opinion. The calculation method comprises the following steps: recording the number of finished manuscripts (number of finished manuscripts) of each manuscript inspector in a specific time period (for example, last year or last three years); recording the number of the invited and unfinished manuscript approvals (the number of the invited and unfinished manuscripts) of each manuscript approver; the ratio of the number of times each of the reviewers completed the review to the number of times invited but not completed the review (number of times invited but not completed the review/number of times completed the review) X6', x6=1 to X6' is calculated.

Each manuscript examination period refers to a complete manuscript examination process of a manuscript examination person, and the time period from manuscript receiving to manuscript examination opinion submitting is shortened. The method comprises the steps of receiving manuscripts by a manuscript-examining person, reviewing the manuscripts, submitting manuscript-examining comments and ending the manuscript-examining period.

The term "document-checking period threshold" refers to a time limit for defining a document-checking task to be completed by a document-checking person, and is used for defining a time for the document-checking task to be completed by the document-checking person, and generally refers to a maximum document-checking period (for example, 7 days, 15 days, 20 days, and set according to document-checking system).

In some embodiments, the calculating method of the manuscript evaluation index in the steps S3 and S4 is to sequentially perform weighted arithmetic average or weighted geometric average calculation on the related variable parameters related to the three dimensions of the liveness, the manuscript quality and the manuscript speed to obtain the manuscript behavior evaluation index score; the parameter variables related to the liveness are X4, X5 and X6, the parameter variables related to the manuscript quality are X1, X2 and X3, and the parameter variable related to the manuscript speed is X4.

In some embodiments, step S4 includes model construction. The model may employ various machine learning algorithms such as linear regression, decision trees, random forests, neural networks, and the like. The construction of the model may comprise the steps of: (a) feature engineering: the features are further processed, including feature selection, feature transformation, etc., to prepare the input model. (b) model selection: an appropriate machine learning algorithm is selected or multiple models are built for comparison. (c) model training: the model is trained using historical manuscript data and scoring data, enabling it to learn the scoring patterns of manuscript specialists from the data. (d) model evaluation: the performance and accuracy of the model is assessed by cross-validation or other assessment methods.

The model optimization stage in step S5 performs model optimization according to the result of model evaluation, and the optimization process may include: (a) The super parameters of the model (e.g., learning rate, regularization parameters, tree depth, etc.) are adjusted to find the best parameter combination. (b) Feature engineering is optimized to consider removing unimportant or redundant features to simplify the model and increase its generalization ability, or to include the introduction of new features, feature combinations, or in-depth processing of text features to capture more information. (c) final model selection: a number of different machine learning algorithms or model structures may be tried in model construction, selecting the best performing model as the final model. (d) The integrated learning method, such as random forest, gradient lifting, etc., combines the prediction results of multiple models to improve the overall performance. (e) Cross-validation, such as by partitioning the dataset into a training set and a validation set, and alternating the use of different validation sets multiple times for evaluation. (f) Regularization techniques, such as L1 regularization or L2 regularization, are introduced to reduce the risk of overfitting of the model. (g) Performance metrics, the performance of the model is measured using appropriate performance metrics such as mean square error, accuracy, F1 score, etc., and adjusted based on the metrics. (h) Model interpretation, knowing which features have the greatest impact on the final ranking result, for further improvement and optimization. (i) And checking the real-time performance and the expandability, ensuring that the model can operate efficiently in an actual manuscript-examining environment, avoiding performance bottleneck and processing large-scale manuscript-examining data. (j) continuous monitoring and improvement: the performance of the model is continuously monitored, and the model is timely adjusted and improved to adapt to changing requirements and data.

The step S5 of sorting the experts includes: (a) data preparation: new manuscript specialist information and related features, such as X1, X2, X3, X4, X5, etc., are collected for inputting the model. (b) feature engineering: the new manuscript specialist data is subjected to feature extraction and conversion to prepare an input model. (c) model prediction: and using a trained manuscript-examining expert evaluation model to carry out grading prediction on the new manuscript-examining expert data. The model may compare the new manuscript specialist with the existing manuscript specialist and assign a score to it according to patterns learned in the history data. (d) generating a ranking recommendation: the scoring of the model predictions is used to generate ranking recommendations for the manuscript specialist. The manuscript specialists may be ranked in order of top-to-bottom score to determine the most appropriate manuscript specialist. (e) visualization of results: the generated ranking recommendation results are visualized so that a decision maker and editors can clearly know the recommended manuscript specialists. (e) feedback and sustained improvement: and periodically collecting feedback information, and knowing the effect and accuracy of sequencing recommendation. And continuously improving and optimizing the model according to the feedback information so as to improve the accuracy and satisfaction of sequencing.

Another aspect of an embodiment of the present invention provides an electronic device, including: the processor executes executable computer program instructions in the memory and is used for realizing the manuscript-trial expert evaluation method based on manuscript-trial behaviors.

Yet another aspect of the embodiments of the present invention provides a computer program product including program instructions that, when executed by a processor, enable the method for evaluating a manuscript specialist based on manuscript-approval behavior.

The embodiment of the invention provides a manuscript-examining expert evaluation algorithm based on supervised learning, which can rapidly identify and filter invalid manuscripts through validity judgment. This step not only contributes to an improvement in the document-checking efficiency, but also contributes to a shortening of the document-checking period. Invalid reviewers often cause unnecessary delays and waste of resources, and this algorithm ensures the efficiency of the review process by identifying and eliminating these reviewers. Secondly, the algorithm not only comprises quantitative evaluation, but also can comprise supervised learning, model optimization and the like. Quantitative assessment provides a specific numerical score for the presentation of the reviewer. These scores may help determine the aggregate capabilities of the reviewers for ranking and selection. Supervised learning techniques train models using historical manuscript data and related scoring data to predict the manuscript's score. This process allows the model to learn the scoring patterns of the manuscript specialist from the data, thereby improving the accuracy of the assessment. Model optimization may involve steps such as hyper-parameter tuning, feature engineering, performance evaluation, and model selection. These steps aim to improve the performance of the model, enabling it to evaluate the reviewer's performance more accurately. In summary, the present algorithm is a comprehensive optimization algorithm that combines a variety of techniques and methods to maximize the efficiency of the review evaluation system, including improving the review efficiency, shortening the review period, improving the review quality, and the like. This comprehensive approach leverages data science and machine learning techniques to support editors and decision makers in choosing and managing the manuscript specialists more intelligently during the manuscript process.

Drawings

Fig. 1 is a flowchart of a manuscript-trial expert evaluation method based on manuscript-trial behaviors.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

Example 1

As shown in fig. 1, an embodiment of the present invention provides a method for evaluating a manuscript-trial expert based on manuscript-trial behavior, for assisting journal editing to provide a ranking recommendation of the manuscript-trial expert, the method comprising the steps of:

s1, data collection and processing. Historical manuscript data of each manuscript is acquired and processed. The method comprises the steps of examining manuscript basic information, examining manuscript opinion, examining manuscript time stamp, examining manuscript conclusion and the like. Determining a data collection time range, such as: the last three years have examined the manuscript records and related information. And cleaning, de-duplication and formatting the data extracted from the journal manuscript-examining system to ensure the accuracy and consistency of the data.

S2, determining validity information. If the record is returned according to 36 months of no-manuscript, the record is marked as an invalid manuscript inspector, and the invalid manuscript inspector does not participate in subsequent index calculation.

S3, extracting features. Features for evaluation such as manuscript opinion length, manuscript time, conclusion consistency, conclusion variability, manuscript speed, etc. are extracted from the historical manuscript data of each manuscript inspector. These features will be used in subsequent evaluation index calculations, for example, for the reviewer A, to calculate X1-X6 as follows:

a) Relative length variable X1 of manuscript opinion

Manuscript b: the 3 copies were examined in the last 5 years and marked as copies 1, 2 and 3 respectively.

(1) Manuscript 1: the total 3 persons examine manuscripts (initial examination and review are combined into 1 time), the average length of manuscripts is L0=319 words, the opinion length of A examination manuscripts is L=200 words, and the manuscripts are assigned by the relation of L and L0 according to the following table rules:

L＞＝150％＊L0	5
		50％＊L0＞L＞＝L＊120％	4
120％＊L0＞L＞＝80％＊L0	3
		80％＊L0＞L＞＝50％＊L0	2
50％＊L0＞L	1

the score of the manuscript A on the manuscript 1 is 2 points; the score of the manuscript on the manuscript 2 is correspondingly obtained to be 4, the score on the manuscript 3 is 5 (manuscript 2: total 5 persons for manuscript examination, the manuscript examination opinion length of a manuscript A is 300 words, the manuscript average length is 240 words, and the manuscript 3: total 3 persons for manuscript examination, the manuscript examination opinion length of a manuscript A is 500 words, and the manuscript average length is 300 words).

(2) Calculating and normalizing a weighted average value of the score of the manuscript A to obtain:

b) Conclusion consistency variable X2

The manuscript a examines the manuscript times 3, the inconsistent times 2, and the inconsistent variables X2' =2/3=0.667. Conclusion the consistency variable x2=1-X2' =0.333.

c) Calculating differential variable X3 of the conclusion of the manuscript A

(1) Calculating the ratio of the conclusion of the manuscript A to the manuscript amount, wherein the results of 5 conclusions are as follows: a (recording, minor repair, major repair, review, return) = (1/3, 1/3,0,0,1/3); the total draft conclusion ratio is calculated according to the average value, namely T (recording, minor repair, major repair, review, draft returning) = (1/5 )

(2) Calculating the difference between the manuscript conclusion of the manuscript A and the overall conclusion, wherein the conclusion is calculated according to the arithmetic average of the conclusions

Where sign (x) is a sign function, where x > 0, where x=0, where 0, where x < 0, is-1.

(3) The results obtained for this distance do not need to be normalized.

d) Manuscript-examining speed variable X4

(1) Calculating a manuscript-holding speed score

Screening the manuscript-examining duration D of a manuscript-examining person A and the corresponding planning days D0 according to a 5-year period:

manuscript 1: the planned days are 28 days, and the actual manuscript examination days are 4 days;

manuscript 2: the planned days are 60 days, and the actual manuscript examination days are 32 days;

manuscript 3: the planned days are 10 days, and the actual manuscript examination days are 16 days;

the following table is used for assigning the components according to the ratio:

D＜＝1/2D ₀	5
		1/2D ₀ ＜D＜＝D ₀	4
D ₀ ＜D＜＝3/2D _o	3
		3/2D ₀ ＜D＜＝2D ₀	2
2D ₀ ＜D	1

namely, the scores of the manuscript A on 3 documents to be examined are respectively as follows: 5. 4, 2.

(2) Normalized mean score as variable X4

e) Time variable X5 of recently completed manuscript

The assigned manuscript trial number of the screening manuscript a is 2 times, and the opinion number is 2 times (at least one time of multiple trial returns) according to 24 months as a specific time period. Then the ratio is taken as X5:

X5＝R/R0＝2/2＝1。

f) Manuscript completion degree X6

The reviewer a completed 3 reviews within 5 years while being invited but not completed 2 times. X6=1-2/3=0.333.

S4, model construction

In the model construction part, the invention selects to use a decision tree model to evaluate the manuscript A, and the concrete steps are as follows:

a) Model selection: decision tree. The decision tree model is an intuitive and easy-to-interpret model, and is suitable for evaluating the performance of a complex multifactor. It can help the present invention understand the impact of different features on performance and provide an interpretable score for each reviewer.

b) Feature and weight input:

the invention extracts the following characteristics from the historical manuscript data of the manuscript A, and weights the characteristics according to the weight:

(1) Calculating liveness index

Selecting related parameters: x4, X5, calculating an arithmetic average with the weights specified:

liveness = 30% > ' x4+40% > ' x5+30% > ' x6 = 71.98%

(2) Calculating manuscript quality index

Selecting related parameters: x1, X2, X3 calculate the arithmetic mean with the assigned weights:

manuscript quality=40% > ' x1+40% > ' x2+20% > ' 1-X3) = 54.64%

(3) Calculating manuscript-examining speed index

Selecting related parameters: x4, directly serving as a manuscript-examining speed index:

manuscript speed=x4×100% =73.33%

c) Model training:

the invention uses the historical manuscript data of the manuscript-examining person A as a training set, and uses the characteristics and the corresponding manuscript-examining performance as input data. The decision tree model will learn how to predict the performance score of the reviewer a based on these characteristics.

(1) The construction of the decision tree comprises the following steps: the root node of the tree, the root node of the decision tree represents the whole sample set. Here, the sample of the present invention is the history data of the reviewer a. Splitting the nodes, the decision tree selecting a feature on each node, dividing the dataset into two subsets. The selected features are determined based on information gain or a coefficient of kunning, etc., which helps to find the features that most effectively segment the data. Leaf nodes, when a split reaches a certain condition, a node is defined as a leaf node and no longer splits. In the case of the present invention, it may be that the depth of the tree reaches a certain value or the number of samples is insufficient for further splitting, depending on specific termination conditions, for example.

It is assumed that the present invention intends to construct a decision tree to predict whether or not the reviewer a will follow the requirements of the review period in the review, i.e., whether or not the review can be completed within a prescribed 28 days. The present invention selects the following features to construct this decision tree:

historical average manuscript time for manuscript a: this feature represents the average review time that reviewer a has passed through the contribution. If this time is short, it may indicate that he has completed the examination of the manuscript faster.

Recent manuscript speed of the manuscript a: this feature represents the speed of the manuscript being examined by the manuscript a in the last year. If his recent manuscript is faster, it may indicate that he is performing well during this time.

Number of manuscript reviews by the manuscript reviewer a: this feature represents the number of past manuscripts by the manuscript a. If he examines more times, he may be presented with a rich experience in the field.

Now, decision trees begin to build:

root node: the root node of the tree includes all of the historical manuscript data representing the case of all of the manuscript a.

The first split node: to find the best splitting feature, the decision tree may select the historical average manuscript time for manuscript a. If the average manuscript time is less than the set threshold (e.g., 10 days), then the left sub-tree is divided; otherwise, the right subtree is divided.

Left subtree (fast complete manuscript): if the condition selected in the first split node is met, this branch may represent those reviewers A whose average review time is short. Here, splitting may be continued, for example, by selecting a feature again for splitting based on the recent speed of the manuscript by the manuscript a.

Right subtree (slower completion of the manuscript): if the condition selected in the first split node is not satisfied, then this branch may represent those reviewers A whose average review time is long.

Leaf node: the splitting of the tree continues until a certain condition is met (e.g., the depth of the tree reaches a certain value or the number of samples is insufficient for further splitting), at which point the node will become a leaf node. Each leaf node represents a group of manuscript-examining persons a, and according to their characteristic values, it can be predicted whether they can complete manuscript-examining in time in the manuscript-examining period.

(2) The interpretation of the decision tree includes: the importance of the features, the decision tree model can tell the invention the relative importance of each feature because important features are more likely to be selected when splitting the node. The splitting rule, a conditional statement describing how to split data according to the value of a feature, is a splitting rule for each node of the decision tree. For example, if X1 is less than a certain threshold, then the left subtree is entered, otherwise the right subtree is entered. Evaluation prediction, by following the path of the tree, the evaluation score of the reviewer a can be predicted. This is achieved by passing the eigenvalues along the split path of the tree, eventually reaching the leaf nodes and obtaining the predicted values.

S5, model evaluation and optimization

The model evaluation includes the steps of:

(1) Splitting a data set: first, it is necessary to divide the historical manuscript data into a training set and a test set. Typically, most of the data (e.g., 80%) is used to train the model, while the remaining data (20%) is used to test the performance of the model.

(2) Training a model: using the data in the training set, a decision tree model is constructed, and the model is learned according to the selected features and weights.

(3) Test model: the model is tested using the data in the test set. For the manuscript specialist a, his historical manuscript data is input into the model, which predicts whether his manuscript period is satisfactory.

(4) Performance index: common metrics for evaluating model performance include Accuracy (Accuracy), precision (Precision), recall (Recall), F1 Score (F1-Score), and the like. For example, recall represents the proportion of the manuscript specialists successfully identified by the model that failed to complete the manuscript on time to all manuscript specialists that failed to complete the manuscript on time.

(5) Cross-validation: to ensure stability and generality of the model, the model may be evaluated using cross-validation. For example, using k-fold cross-validation, the data is divided into k subsets, and then the model is trained and tested multiple times to average the performance metrics.

(6) Adjusting model parameters: based on the results of the model evaluation, if the performance is poor, an attempt may be made to adjust the parameters of the model, change feature choices or weights, to improve the accuracy of the model.

(7) Verifying model stability: verification tests are typically required to ensure stability and performance of the model in a real environment before the model is brought on-line.

By training and evaluating the historical data of the reviewer A, the invention obtains a decision tree model which can predict the performance score of the reviewer A according to the characteristics of X1 to X8. This score can be used to rank the reviewer a and provide information about its performance to the editing team for better assignment of the review tasks and decision making. Meanwhile, the interpretability of the model enables an editing team to understand the basis of performance evaluation, so that the behaviors of the manuscript inspector are guided better and the manuscript quality is improved.

Example 2

An electronic device, comprising: a processor and a memory, the processor executing executable computer program instructions in the memory for implementing the manuscript-specialist assessment method based on manuscript-trial behaviour of embodiment 1.

A computer program product comprising program instructions which, when executed by a processor, enable the method for evaluating a manuscript specialist based on manuscript behaviour of embodiment 1 to be implemented.

The embodiment of the invention provides a manuscript-examining expert evaluation algorithm based on manuscript-examining behaviors, a corresponding computer program and an electronic device, and provides a manuscript-examining expert evaluation algorithm based on supervised learning. First, through validity judgment, the algorithm can quickly identify and filter out invalid contributors. Secondly, the algorithm not only comprises quantitative evaluation, but also can comprise supervised learning, model optimization and the like. Supervised learning techniques train models using historical manuscript data and related scoring data to predict the manuscript's score. This process allows the model to learn the scoring patterns of the manuscript specialist from the data, thereby improving the accuracy of the assessment. Model optimization may involve steps such as hyper-parameter tuning, feature engineering, performance evaluation, and model selection. These steps aim to improve the performance of the model, enabling it to evaluate the reviewer's performance more accurately. In summary, the present algorithm is a comprehensive optimization algorithm that combines a variety of techniques and methods to maximize the efficiency of the review evaluation system, including improving the review efficiency, shortening the review period, improving the review quality, and the like. This comprehensive approach leverages data science and machine learning techniques to support editors and decision makers in choosing and managing the manuscript specialists more intelligently during the manuscript process.

In the description of this specification, program code or instructions for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM, read-only memory (ROM) erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback): and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The foregoing is merely illustrative embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present invention, and the invention should be covered. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A manuscript specialist assessment method based on supervised learning, the method being used for providing a score for a manuscript specialist and predicting the manuscript specialist score, the method comprising the steps of:

a) Determining evaluation indexes and data requirements, wherein the evaluation indexes comprise key dimensions such as the liveness of a manuscript inspector, the manuscript inspection quality, the manuscript inspection speed and the like;

b) The data collection and processing comprises the steps of extracting historical manuscript examination data of a manuscript examination person from a journal manuscript examination system, cleaning, removing duplication and formatting, and converting the data into a format suitable for calculating an evaluation index;

c) Confirming the validity information of the manuscript and extracting the characteristics, including determining which manuscript can meet the validity requirement, and extracting the characteristics for calculating the evaluation index;

d) Behavioral variable extraction and calculation, including extraction of manuscript behavioral variables from historical manuscript data, including: the relative length variable X1 of the manuscript-examining opinion of the manuscript-examining person; conclusion consistency variable X2; conclusion the variability variable X3; a manuscript-examining speed variable X4; a recently completed manuscript checking number variable X5; the proportion of the completion degree of the manuscript is X6;

e) Weight determination and model construction, wherein the weight determination and model construction comprise the steps of determining the weight of each evaluation index and constructing a machine learning model to learn the scoring mode of the manuscript specialist.

2. The manuscript expert assessment method of claim 1, wherein the data cleansing and preprocessing of the data collection and processing step in step b) includes removing duplicate data, processing missing values and format normalization.

3. The manuscript expert assessment method according to claim 1 or 2, wherein the weight determination in step e) and the model construction in the model construction step comprise feature engineering, model selection, model training and model evaluation.

4. The manuscript expert assessment method according to claim 1 or 2, wherein the weight determination in step e) and the model construction in the model construction step employ a machine learning algorithm comprising linear regression, decision trees, random forests or neural networks.

5. The method of claim 1 or 2, wherein the model building step in step e) further comprises real-time and extensibility checking of the model to ensure that large-scale manuscript data is run and processed in a real-time manuscript environment.

6. The manuscript expert assessment method according to claim 1 or 2, wherein the weight of the assessment index is determined based on a data driven method, including based on historical data analysis and feedback.

7. The method of claim 1 or 2, wherein predicting the manuscript specialist score comprises scoring and ranking new manuscript specialists using a trained model.

8. A computer program for executing the supervised learning based manuscript specialist assessment method according to any of claims 1 to 7.

9. An electronic device having a central processor and a memory for executing the program of the supervised learning based manuscript specialist assessment method according to any of claims 1 to 7.