CN103823896A - Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm - Google Patents

Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm Download PDF

Info

Publication number
CN103823896A
CN103823896A CN201410092584.XA CN201410092584A CN103823896A CN 103823896 A CN103823896 A CN 103823896A CN 201410092584 A CN201410092584 A CN 201410092584A CN 103823896 A CN103823896 A CN 103823896A
Authority
CN
China
Prior art keywords
subject
project
evaluation expert
algorithm
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410092584.XA
Other languages
Chinese (zh)
Other versions
CN103823896B (en
Inventor
王晓华
张超
张钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BENGBU MEDICAL COLLEGE
Original Assignee
BENGBU MEDICAL COLLEGE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BENGBU MEDICAL COLLEGE filed Critical BENGBU MEDICAL COLLEGE
Priority to CN201410092584.XA priority Critical patent/CN103823896B/en
Publication of CN103823896A publication Critical patent/CN103823896A/en
Application granted granted Critical
Publication of CN103823896B publication Critical patent/CN103823896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a subject characteristic value algorithm and a subject characteristic value algorithm-based project evaluation expert recommendation algorithm. The subject characteristic value algorithm-based project evaluation expert recommendation algorithm comprises the following steps: (1) carrying out text similarity calculation: 1) carrying out word segmentation on text information of the project study content and evaluation expert research direction, 2) carrying out establishment of a text characteristic vector model of the project study content and evaluation expert research direction, and 3) carrying out similarity calculation on the text characteristic vector of the project study content and evaluation expert research direction; (2) carrying out the subject characteristic value algorithm; (3) carrying out calculation on the project evaluation expert recommendation value, wherein the calculation formula is as follows: ProSim(V, U)=w(c) x exp[sim(V,U)]; (4) sequencing the project evaluation expert recommendation values obtained by calculation in the step (3). The subject characteristic value algorithm and the subject characteristic value algorithm-based project evaluation expert recommendation algorithm disclosed by the invention has the advantages that under the condition of no artificial interference, a processing program which applies the project evaluation expert recommendation algorithm can automatically calculate scientific research projects and recommendation values of different evaluation experts, and user time is saved.

Description

A kind of subject eigenwert algorithm and the project appraisal expert proposed algorithm based on it
Technical field
The present invention relates to proposed algorithm field, specifically a kind of subject eigenwert algorithm and the project appraisal expert proposed algorithm based on it, for being used computing machine automatically to complete scientific research project evaluation expert's recommended work.
Background technology
Effectively scientific research project is China's development in science and technology tissue and the fundamental prerequisite of implementing, and is a strong guarantee that is related to China's science and technology development strategy development.At present for the evaluation of science and technology item, its most critical be that what is called is looked for " suitable people does suitable thing ", does not but reach choosing of evaluation expert to the choosing of evaluation expert.Tracing it to its cause is the automatic Selection Model that there is no at present a set of unification, also still rests on by Research Management and checks that then scientific research project application form choose evaluation expert's pattern according to experience and intuition.
Old and the accuracy of this pattern remains to be discussed, particularly choosing also being in a big way time the numerous and evaluation expert of application form number, scientific research project managerial personnel are to some evaluation expert's research direction and be good at field and be unfamiliar with, and tend to select wrong evaluation expert and the scientific research project with better conception and direction was cancelled in the evaluation stage.Thereby therefore automatically carrying out according to scientific research project itself and evaluation expert's information that optimization matching recommends is a problem that letter is to be solved.
The core of setting up scientific and reasonable effective scientific research project evaluation expert's commending system is a set of complete effective evaluation expert's proposed algorithm of design, and at present, about aspect proposed algorithm present Research, various proposed algorithms are gradually improved.Numerous research fields such as cognitive science, psychology, information retrieval, management are comprised.Relevant evaluation expert scholar has proposed multiple recommend method, and content-based recommendation, collaborative filtering recommending, mixing recommendation etc., used different mathematical models, and such as text cluster, reverse neural network, correlation rule etc. are realized different recommend methods.
But above these proposed algorithms are to be born based on commercialization proposed algorithm model substantially, substantially based on " user-target " this two-dimensional space tolerance, as the algorithm based on project research contents and evaluation expert's research direction, it calculates the similarity of Text eigenvector by extracting keyword, thereby has ignored relevant other information.But scientific research project and evaluation expert's selection often relates to other factors in reality, the most important thing is project and evaluation expert's selection based on the classification of different stage subject.
Summary of the invention
Technical matters to be solved by this invention is to provide one and can be applicable to computer program, can quick and precisely automatically calculate project appraisal expert's recommendation, the project appraisal expert proposed algorithm saving manpower and time.
In order to solve the problems of the technologies described above, the present invention adopts following technical scheme: first, provide a kind of subject eigenwert algorithm, this algorithm is a kind of project of classifying based on subject and evaluation expert's subject eigenwert algorithm, comprises the following steps:
(1) project and evaluation expert's subject modeling:
According to national standard " subject classification and code ", use is set up vectorial pattern and project subject and evaluation expert's subject is carried out to modeling, project subject and the proper vector of evaluation expert's constitution based on following sign:
p={c1,c2,c3}
Wherein c1, c2, c3 represent respectively one-level subject code, secondary subject code and the three grades of subject codes in subject classification;
(2) project and evaluation expert's subject eigenwert is calculated: computing formula is as follows:
w ( c ) = ( Nc 1 ) n × B 1 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 2 ) n × B 2 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 3 ) n × B 3 ( Nc 1 + Nc 2 + Nc 3 ) n
Wherein Nc1, Nc2, Nc3 represent that respectively the project of required evaluation adheres to the subject quantity of each rank subject separately in subject classification, whether the code that B1, B2, B3 are respectively used to same levels subject between expression project and evaluation expert is identical, identical value is 1, difference value is 0, index n is eigenwert, represents having the punishment whether different stage subject is identical.
Compare the existing algorithm based on project research contents and evaluation expert's research direction, it calculates the similarity of Text eigenvector by extracting keyword, and subject eigenwert algorithm provided by the invention is based on national standard " subject classification and code ", national standard " subject classification and code " itself is a kind of scientific and reasonable criteria for classification, the present invention compares project and evaluation expert by this criteria for classification, can calculate exactly in subject aspect, and the designed computing formula of the present invention has fully been considered the proportion of every one-level subject, lay particular emphasis on the segmentation degree of subject, along with the intensification of subject segmentation degree, the difference of different secondary subjects is also larger, result of calculation more rationally effectively like this.
Preferably, described index n value is 2.Computation process is clear like this, relatively convenient.
The invention provides a kind of project appraisal expert proposed algorithm based on subject eigenwert algorithm, comprise the following steps:
(1) similarity of Text eigenvector is calculated:
1) the text message word segmentation processing of project research contents and evaluation expert's research direction: extract keyword and carry out Semantics Reconstruction from project research contents and evaluation expert's research direction;
2) the Text eigenvector model of project research contents and evaluation expert's research direction is set up: use the vector space model TF-IDF algorithm based on keyword weight, produce according to the vector of weighting lexical item composition by extract and calculate the frequency that in target text, keyword occurs and the contrary text frequency occurring in all text sets;
3) similarity of the Text eigenvector of project research contents and evaluation expert's research direction is calculated, and computing formula is as follows:
sim ( V , U ) = Σ i = 1 n ( V i × U i ) Σ i = 1 n ( V i ) 2 × Σ i = 1 n ( U i ) 2
Wherein V and U represent respectively the n dimensional feature vector that project application content and evaluation expert's information extraction go out, thereby obtain Text similarity computing result by calculating its vectorial cosine value;
(2) subject eigenwert algorithm:
1) project and evaluation expert's subject modeling:
According to national standard " subject classification and code ", use is set up vectorial pattern and project subject and evaluation expert's subject is carried out to modeling, project subject and the proper vector of evaluation expert's constitution based on following sign:
p={c1,c2,c3}
Wherein c 1, c 2, c 3represent respectively one-level subject code, secondary subject code and three grades of subject codes in subject classification;
2) project and evaluation expert's subject eigenwert is calculated: computing formula is as follows:
w ( c ) = ( Nc 1 ) n × B 1 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 2 ) n × B 2 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 3 ) n × B 3 ( Nc 1 + Nc 2 + Nc 3 ) n
Wherein Nc1, Nc2, Nc3 represent that respectively the project of required evaluation adheres to the subject quantity of each rank subject separately in subject classification, whether the code that B1, B2, B3 are respectively used to same levels subject between expression project and evaluation expert is identical, identical value is 1, difference value is 0, index n is eigenwert, represents having the punishment whether different stage subject is identical;
(3) project appraisal expert recommendation is calculated, and computing formula is as follows:
ProSim(V,U)=w(c)×exp[sim(V,U)]
W in above formula (c) is the project that calculates of step (2) and evaluation expert's subject eigenwert, sim (V, U) be the project research contents that calculates of step (1) and the similarity value of evaluation expert's research direction Text eigenvector, exp[sim (V, U)] representative is the exponential function of value take e as end sim (V, U);
(4) project appraisal expert recommendation step (3) being calculated sorts.
Compared with prior art, the present invention has clear superiority, be mainly reflected in: the similarity of comparing the existing algorithm based on project research contents and evaluation expert's research direction and calculate Text eigenvector, the present invention is based on subject eigenwert algorithm project appraisal expert proposed algorithm comprehensive consideration the classification of research contents and subject, utilize similarity algorithm and the subject eigenwert algorithm provided by the invention of existing Text eigenvector, from project research contents, these three aspects of evaluation expert's research direction and project and evaluation expert's subject feature compare, the recommendation calculating is like this more scientific and reasonable, and the project appraisal expert proposed algorithm orderliness that the present invention is based on subject eigenwert algorithm is clear, step is clear and definite, there is extraordinary enforceability, can in computer program, carry out by the form actual deployment of writing code, making application the present invention is based on the handling procedure of the project appraisal expert proposed algorithm of subject eigenwert algorithm can be automatic in the situation that of unmanned the interference, fast, calculate exactly project appraisal expert's recommendation, finished item evaluation expert's recommendation, save manpower and materials and saved user's time.
Accompanying drawing explanation
Fig. 1 is the project appraisal expert proposed algorithm block diagram that the present invention is based on subject eigenwert algorithm.
Fig. 2 is that Top keyword number percent affects Line Chart to F metric.
Fig. 3 is that different μ values affect Line Chart to F metric.
Fig. 4 is that different subject eigenwerts affect histogram to F metric.
Fig. 5 is that collaborative calculating of different subject eigenwerts and research contents affects histogram to F metric.
Fig. 6 is that different μ values and Top keyword number percent affect Line Chart to F metric.
Fig. 7 is that after reconstruct data, collaborative calculating of different subject eigenwerts and research contents affects histogram to F metric.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention are described in detail:
As shown in Figure 1, the project appraisal expert proposed algorithm that the present invention is based on subject eigenwert algorithm comprises the following steps:
(1) similarity of Text eigenvector is calculated
1) word segmentation processing of the text message of project research contents and evaluation expert's research direction
The foundation basis of research contents vector model is that research contents text is carried out to keyword processing.The pretreated text word segmentation processing of to the effect that carrying out in this algorithm.Due to the special punctuate structure of Chinese word, what in this algorithm embodiment, adopt is that the ICTCLAS (http://www.ictclas.org) of the Chinese Academy of Sciences is as definite participle instrument.Its Main Function is two, is respectively to remove stop words and the keyword extracting is done to Semantics Reconstruction.
Removing stop words is mainly to remove some conventional auxiliary words, and the existence of these words can not produce any impact to the meaning of article.For example conventional adverbial word, preposition, and specific place name, unit or the organizational structure's title etc. that in some texts of setting, occur.So that in the time that text is carried out to feature selecting, ignored and avoid the foundation of proper vector to exert an influence.
Next is that the keyword of extraction is done to Semantics Reconstruction.Owing to there will be the more proper noun being made up of common noun in the application of scientific research project content, this is two different words for for example " data mining " and " data structure ", represents two diverse subjects.But in the time of semantic analysis, the difference that participle device is often set due to rule, is split into " data ", " excavation ", " data ", " structure " these four words.This is marked as the text with 50% similarity by diverse two texts in follow-up analysis, is a very serious mistake like this.Therefore must be reconstructed setting rule, distinguish different concepts.
In general, evaluation expert's essential information preparation method is that the form by making a report on questionnaire is obtained, and wherein can comprise that evaluation expert's name, age, subject, research keyword and research contents forms.In the time that project is selected evaluation expert, general more concern is evaluation expert's subject, research keyword and research contents etc., and therefore we extract keyword from these contents, then keyword are done to Semantics Reconstruction.
2) the Text eigenvector model of project research contents and evaluation expert's research direction is set up
2.1) project research contents Text eigenvector model is set up
Later be keyword abstraction and weight calculation to the participle of research contents and direction.What use is the vector space model TF-IDF algorithm based on keyword weight, thereby it produces according to the vector of weighting lexical item composition by extract and calculate the frequency that in target text, keyword occurs and the contrary text frequency occurring in all text sets.
Its computing formula is as follows:
TF-IDF(wd)=tf(wd)×idf(wd)=tf(wd)×log[N/df(wd)]
Wherein (w is d) frequency that a certain characteristic key words occurs in target text to tf; Idf (wd) is the contrary text frequency of current keyword; (w d) represents have how many texts to occur keyword wd in total text set to df; Capitalization N represents the sum of text set Chinese version.By all target keyword are carried out after above-mentioned calculating, obtain proper vector v (t, d)={ [t based on keyword and keyword weight 1, w (d 1)], [t 2, w (d 2)], [t 3, w (d 3)] ... [t i, w (d i)].Wherein t i,i=1,2,3 ... n is for extracting keyword; W (d i) for using TF-IDF to calculate the rear keyword weight obtaining; The proper vector that v (t, d) forms for the keyword extracting based on whole research contents.
It should be noted that for different text messages, after the definite keyword vector of TF-IDF, wherein may comprise plurality object characteristic key words, the credible keyword of therefore choosing different numbers can cause impact to a certain degree to result.It is generally acknowledged, the keyword number of choosing is on the low side, the information entropy deficiency of representative; And too much, introduce more noise item may to keyword vector, reduce the accuracy that text message similarity is calculated.It is 60% that this algorithm can obtain optimization Top keyword number percent by test, and similarity threshold μ is 0.8.
2.2) evaluation expert's research direction Text eigenvector model is set up
In general, evaluation expert's essential information preparation method is that the form by making a report on questionnaire is obtained, and wherein can comprise that evaluation expert's name, age, subject, research keyword and research contents forms.The general evaluation expert's who more pays close attention to subject, research keyword and research contents etc. in the time that project is selected evaluation expert.Therefore can copy the form of project research contents Text eigenvector modeling to be set up the proper vector based on research contents and subject direction.
Evaluation expert's research direction proper vector modeling pattern and project research contents Text eigenvector modeling pattern are similar, first from evaluation expert storehouse, obtain evaluation expert's essential information, after participle, use TF-IDF algorithm to extract and calculate keyword weight, build personal information proper vector u (t, the d)={ [t based on evaluation expert's keyword 1, w (d 1)], [t 2, w (d 2)], [t 3, w (d 3)] ... [t i, w (d i)].Wherein t i, i=1,2,3 ... n is for extracting keyword; W (d i) for using TF-IDF to calculate the rear keyword weight obtaining.
3) similarity of the Text eigenvector of project research contents and evaluation expert's research direction is calculated
Similarity for key feature vector is calculated, and the similarity result that the keyword vector model that the present invention builds after extracting by TF-IDF algorithm carries out the acquisition of cosine similarity calculating method, can specify relation between performance characteristic vector.Its computing formula is as follows:
sim ( V , U ) = Σ i = 1 n ( V i × U i ) Σ i = 1 n ( V i ) 2 × Σ i = 1 n ( U i ) 2
Wherein V and U represent respectively the n dimensional feature vector that project application content and evaluation expert's information extraction go out.Thereby obtain similarity result of calculation by calculating its vectorial cosine value.
(2) subject eigenwert algorithm
1) project and evaluation expert's subject modeling
National standard " subject classification and code " is (GB/T13745-2009) the authority regulation of subject classification, and it is classified to one, two, three subject.One-level subject represents by three bit digital, and two, three grades of subjects represent with two digits respectively, in the middle of I and II subject, separate with putting, and code structure is XXXXXXX, for example 5702520, and wherein 570 is one-level subject, and 25 is secondary subject, and 20 is three grades of subjects.
To use to set up vectorial pattern it is processed equally for the processing mode that provides of subject under the affiliated subject text itself indicating in project application and evaluation expert.Based on " subject classification and code ", project subject and evaluation expert's subject can form the proper vector based on following sign:
p={c1,c2,c3}
Wherein c 1, c 2, c 3represent respectively one-level subject code, secondary subject code and three grades of subject codes in subject classification.
2) project and evaluation expert's subject eigenwert is calculated
Under extracting subject, classification, also relatively as project and evaluation expert's feature foundation, does not also have special document at present and researchs and proposes corresponding algorithm.Therefore in order to solve the object using Subject Character as feature reference value, the present invention proposes the eigenwert algorithm based on full subject value.Computing formula is as follows:
w ( c ) = ( Nc 1 ) n × B 1 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 2 ) n × B 2 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 3 ) n × B 3 ( Nc 1 + Nc 2 + Nc 3 ) n
Wherein Nc1, Nc2, Nc3 represent that respectively the project of required evaluation adheres to the subject quantity of each rank subject separately in subject classification, for example subject under some application projects, and secondary subject has 5, and three grades of subjects have 10.So Nc1, Nc2, Nc3 be taken as respectively 1,5,10; Whether the code that B1, B2, B3 are respectively used to same levels subject between sign project and evaluation expert is identical, identical Bn value is 1, difference Bn value is 0, for example, between project and evaluation expert, one-level subject code is identical, B1 value is 1, difference B1 value is 0, and the value of same B2 depends on that between project and evaluation expert, whether secondary subject code is identical, and the value of B3 depends on that between project and evaluation expert, whether three grades of subject codes are identical; Index n is eigenwert, represents having the punishment whether different stage subject is identical.
The benefit of doing is like this to lay particular emphasis on the segmentation degree of subject, and along with the intensification of subject segmentation degree, the difference of different secondary subjects is also larger.
For example 520 representative " computer science and technology " subjects in subject classification, wherein 52010 to 52060, and independent 52099 represent respectively 7 kinds of different secondary subjects, and the number of these upper three grades of subjects in basis reaches 45.
Similarity based on this algorithm is calculated, if needing in this article calculation code is that " the Artificial intelligence " and 5202040 of 5202010 representatives is for representing similarity between subject " pattern-recognition ", can set respectively B1 according to the ownership of its correspondence is 1, B2 is 1, so and B3 because final not identical its value of three grades of subjects is 0.
The algorithm net result of using formula is as follows
w ( c ) = ( 1 ) n × 1 ( 1 + 7 + 45 ) n + ( 7 ) n × 1 ( 1 + 7 + 45 ) n + ( 45 ) n × 0 ( 1 + 7 + 45 ) n
Index n, as eigenwert, indicates having the punishment whether different stage subject is identical, and it is also closely not identical that the difference of n indicates punishment dynamics.Generally get [1,2], the value of n is preferably 2 in the present invention.
Therefore net result:
w ( c ) = ( 1 ) 2 × 1 ( 1 + 7 + 45 ) n + ( 7 ) 2 × 1 ( 1 + 7 + 45 ) n + ( 45 ) 2 × 0 ( 1 + 7 + 45 ) n ≈ 0.018
Copy computing formula like this, can obtain one based on the project similarity matrix corresponding with evaluation expert, as shown in table 1:
Table 1: individual event order and single evaluation expert's similarity
? P0 P1 P2 P3
T
0 0.00036 0.018 0.74
Here P0, P1, P2, P3 represent respectively from the complete different subjects of Project evaluation T, belong to identical one-level subject, belong to identical secondary and the identical evaluation expert of all subjects.
Calculate and can obtain similarity matrix as shown in table 2 for multiple projects similarities different from evaluation expert:
Table 2: entry and many evaluation experts similarity
? P 1 P 2 P 3 P 4 P n
T 1 W(1,1) W(1,2) W(1,3) W(1,4) W(1,n)
T 2 W(2,1) W(2,2) W(2,3) W(2,4) W(2,n)
T 3 W(3,1) W(3,2) W(3,3) W(3,4) W(3,n)
T n W(4,1) W(4,2) W(4,3) W(4,4) W(4,n)
Here table 2 similarity matrix represents respectively the similar value that the corresponding different evaluation experts of different Project evaluations calculate.
It should be noted that this similarity matrix is a sparse matrix, is that 0 similarity value calculation is saved computational resource thereby can delete large value in the time that follow-up overall similarity calculates.
(3) project appraisal expert recommendation is calculated
After acquisition project and evaluation expert's subject eigenwert, foundation extracts the similarity result of the Text eigenvector of corresponding project research contents and evaluation expert's research direction, and next step is exactly the recommendation between computational item and evaluation expert.
Generally, subject eigenwert is a sparse matrix.For most project, can calculate different subject eigenwerts.And for according to comparatively difficulty of the keyword similarity value of keyword feature vector calculation, even because subject eigenwert is higher, and due to the difference of concrete research direction, it may be 0 that its value also has larger.Therefore can not simply be calculated.
The recommendation computing formula of the present invention's definition is:
ProSim(V,U)=w(c)×exp[sim(V,U)]
W in above formula (c) is the subject eigenwert that research project and evaluation expert calculate; Sim (V, U) is the similarity value going out according to keyword vector calculation between project information and evaluation expert's information; Exp[sim (V, U)] represent take e as end sim (V, U) to be the exponential function of value.The object of doing is like this along with sim (V, U) increases, and overall calculation value presents rising forward curve very fast, thereby better outstanding text similarity is for the contribution of whole result of calculation.ProSim (V, U) represents according to the end value that jointly cooperates and calculate with text similarity based on subject eigenwert.
(4) project appraisal expert recommendation step (3) being calculated sorts
The project appraisal expert recommendation that step (3) is calculated sorts from high to low or from low to high.
Experimental section:
Below in conjunction with accompanying drawing, algorithm provided by the invention is verified by experiment:
Data set explanation:
Because to scientific research evaluation proposed algorithm, research is blank at present, lack a public Common item evaluation expert storehouse various proposed algorithms are carried out to analysis and assessment, therefore best detection method can only be to evaluate scientific research project and select from existing.
According to State Standard of the People's Republic of China determined " subject classification and code ", establish altogether 58 one-level subjects, 573 secondary subjects, nearly 6000 three grades of subjects.Because subject is too much, 20 three grades of comparatively popular subjects of this experimental selection are originated as experimental data subject.
The data set of this experiment, from certain higher level scientific research project storehouse, has been randomly drawed 300 evaluation experts in three grades of set subject evaluation experts, wherein comprises 248 of natural science evaluation experts, 58 of social science evaluation experts.According to evaluation expert, randomly draw 961 parts of scientific research projects of having evaluated again, guarantee that every evaluation expert has at least 2 Project evaluations to be included in the project library of extraction.Subject taxonomic structure, evaluation expert's number and classification item number are as shown in table 3:
Table 3: subject taxonomic structure, evaluation expert's number and classification item number
Figure BDA0000476490220000091
Determining of evaluation index:
For the checking of algorithm net result, do not have at present a directly effective verification algorithm.By the practical application object of this algorithm, generally require evaluation expert and artificial selection that algorithm is finally recommended out to recommend evaluation expert more approaching, illustrate that result is more accurate.
In order to reach this object, this experiment is used F-metric conventional in test text similarity algorithm and is weighed.F-metric is a kind of balance index of inspection recall rate (Recall) and accuracy rate (Precision) conventional in text similarity algorithm, and it is worth between 0 and 1.Can check each scientific research project whether can be assigned to the correct evaluation expert of artificial identification through the experimental result of calculating.F metric is larger, and selection result and truth are more approaching.
If being algorithm, Rc recommends evaluation expert's collection, the recommendation evaluation expert collection that Pc is artificial selection.Recall rate, accuracy rate and F value computing formula are as follows:
Recall = Rc ∩ Pc Rc
Precision = Rc ∩ Pc Pc
F = 2 × Recall × Precision Recall + Precision
Experimental result and analysis
First experiment adopts the natural language processing instrument ICTCLAS selecting to carry out pre-service to all items content and evaluation expert's research contents, applies afterwards TF-IDF algorithm all keywords are calculated, thereby obtain corresponding keyword feature vector; Subject eigenwert is that the project and the evaluation expert's subject eigenwert algorithm that propose according to the present invention calculate acquisition, finally uses the project appraisal expert proposed algorithm that the present invention defines to calculate final recommendation.
Experiment 1
The selection of Top keyword ratio and similarity threshold
The project appraisal expert proposed algorithm of experiment in order more objectively to reflect that the present invention proposes, need to determine the Top key words similarity threshold in text classification, thereby solves the problem of text cluster.
In experiment one, first to determine the impact of different Top keyword ratios on text cluster.Project-based actual analysis, arranges similarity threshold μ=0, regards as of equal importance by all semantic similarities in text.Fig. 2 has provided the test findings of F metric under different Top keyword number percent states.Experiment shows, if choose in text 60% Top keyword, can obtain good result.
After definite Top keyword ratio, in order to obtain best text Clustering Effect, next step is to determine similarity threshold μ.The 60%Top keyword ratio of selecting previous step to obtain in this part experiment is calculated, and studies the impact on text cluster under different threshold values.
Fig. 3 has shown the impact on F metric under different μ values, and from scheming, along with increasing of μ, F metric ceaselessly increases, in the time reaching 0.8 left and right, and F metric maximum.Continue to improve and can cause that on the contrary F metric declines.
Experiment 2
Different subject eigenwerts affect F metric
Experiment two is to use separately project provided by the invention and evaluation expert's subject eigenwert algorithm to carry out the checking of F metric to project and evaluation expert, and does not relate to the text classification calculating section of research contents.This experiment point is carried out for three times, is the complete subject eigenwert computing method that propose according to algorithm for the first time, brings recommendation computing formula into after calculating whole three grades of subjects, tries to achieve final F value; To adopt secondary subject computing discipline eigenwert to bring computing formula into ask for F value for the second time.A first order calculation subject obtains F value as checkout result for the third time.Net result as shown in Figure 4.
Can find a very significant phenomenon from testing two, the result that uses project provided by the invention and evaluation expert's subject eigenwert algorithm to calculate is: first order calculation subject is as checkout result, and F metric is 0.19; Along with increasing of subject category level, be namely upgraded to after secondary subject by three grades of subjects, it is 0.55 that F metric raises, this still has certain difference with sample results.And in the time only using one-level subject to calculate basis as characteristic of division, it is 0.93 that F metric raises, substantially can match with sample actual result preferably.
Experiment 3
The collaborative recommendation of calculating of different subject eigenwerts and research contents
Experiment three is to be 60% setting Top keyword ratio, under the condition that similarity threshold μ is 0.8, utilizes project appraisal expert proposed algorithm calculated recommendation value provided by the invention.This experiment adopts the test method of experiment two, uses different subject category level to carry out the calculating of F metric.Its result as shown in Figure 5.After calculating whole three grades of subjects, bring recommendation computing formula into, final F metric is 0.12; And after subject classification rising secondary subject, it is 0.19 that F metric raises; In the time using one-level subject as basis, F metric is the highest by 0.39.
For the analysis of result, can see from testing two, if use separately the F metric computing method based on subject feature, substantially identical with actual result on one-level subject basis of classification, and along with the reinforcement of subject segmentation, F metric significantly declines.May be because scientific research project is divided timing carrying out evaluation expert, be more the classification laying particular emphasis on for one-level subject, and ignored the classification situation of secondary and three grades of subjects.
And for experiment three result and actually differ larger situation, may be due to the Text similarity computing of having introduced research contents, make text result of calculation become certain distracter, net result and raw data are differed greatly.
Experiment 4
Reconstruct data source test proposed algorithm
This experiment reconstruct scientific research project and evaluation expert's data source.In original scientific research project storehouse, extract 112 parts of scientific research projects of 7 three grades of subjects, formed a new artificial selection data source by manually having reselected 37 relevant evaluation experts afterwards, strict corresponding its subject classification and research direction.Data source is as shown in table 4:
Table 4: gravity treatment subject taxonomic structure, evaluation expert's number and classification item number
Figure BDA0000476490220000111
First the Top similarity keyword of computational item and the selection of similarity threshold, the experimental technique is here similar with experiment 1, no longer too much sets forth, and net result is as shown in Figure 6.Result demonstration Top keyword is at 75%, μ between [0.7,0.8], and F metric is obtained maximum, and along with Top number percent increases and the increase of μ, curve is on a declining curve thereafter.The rising that shows to increase threshold value and can not bring F metric.
Next step is the classification of different brackets subject to be set use project appraisal expert proposed algorithm of the present invention to calculate.Result as shown in Figure 7.From Tu Ke get, now carry out the calculating of F metric, using the result of calculation of one-level subject is 0.92; Secondary is 0.87; Three grades is 0.86.This result of calculation shows that the final F metric that this algorithm obtains can reflect the relation between the evaluation expert of system recommendation and the evaluation expert of artificial recommendation preferably.
Be to be understood that example as herein described and embodiment are only in order to illustrate, those skilled in the art can make various modifications or variation according to it, all belong to protection scope of the present invention.

Claims (3)

1. a subject eigenwert algorithm, comprises the following steps:
(1) project and evaluation expert's subject modeling:
According to national standard " subject classification and code ", use is set up vectorial pattern and project subject and evaluation expert's subject is carried out to modeling, project subject and the proper vector of evaluation expert's constitution based on following sign:
p={c1,c2,c3}
Wherein c 1, c 2, c 3represent respectively one-level subject code, secondary subject code and three grades of subject codes in subject classification;
(2) project and evaluation expert's subject eigenwert is calculated: computing formula is as follows:
w ( c ) = ( Nc 1 ) n × B 1 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 2 ) n × B 2 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 3 ) n × B 3 ( Nc 1 + Nc 2 + Nc 3 ) n
Wherein Nc1, Nc2, Nc3 represent that respectively the project of required evaluation adheres to the subject quantity of each rank subject separately in subject classification, whether the code that B1, B2, B3 are respectively used to same levels subject between expression project and evaluation expert is identical, identical value is 1, difference value is 0, index n is eigenwert, represents having the punishment whether different stage subject is identical.
2. subject eigenwert algorithm as claimed in claim 1, is characterized in that: described index n value is 2.
3. a project appraisal expert proposed algorithm for the subject eigenwert algorithm based on described in claim 1 or 2, comprises the following steps:
(1) similarity of Text eigenvector is calculated:
1) the text message word segmentation processing of project research contents and evaluation expert's research direction: extract keyword and carry out Semantics Reconstruction from project research contents and evaluation expert's research direction;
2) the Text eigenvector model of project research contents and evaluation expert's research direction is set up: use the vector space model TF-IDF algorithm based on keyword weight, produce according to the vector of weighting lexical item composition by extract and calculate the frequency that in target text, keyword occurs and the contrary text frequency occurring in all text sets;
3) similarity of the Text eigenvector of project research contents and evaluation expert's research direction is calculated, and computing formula is as follows:
sim ( V , U ) = Σ i = 1 n ( V i × U i ) Σ i = 1 n ( V i ) 2 × Σ i = 1 n ( U i ) 2
Wherein V and U represent respectively the n dimensional feature vector that project application content and evaluation expert's information extraction go out, thereby obtain Text similarity computing result by calculating its vectorial cosine value;
(2) subject eigenwert algorithm:
1) project and evaluation expert's subject modeling:
According to national standard " subject classification and code ", use is set up vectorial pattern and project subject and evaluation expert's subject is carried out to modeling, project subject and the proper vector of evaluation expert's constitution based on following sign:
p={c1,c2,c3}
Wherein c 1, c 2, c 3represent respectively one-level subject code, secondary subject code and three grades of subject codes in subject classification;
2) project and evaluation expert's subject eigenwert is calculated: computing formula is as follows:
w ( c ) = ( Nc 1 ) n × B 1 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 2 ) n × B 2 ( Nc 1 + Nc 2 + Nc 3 ) n + ( Nc 3 ) n × B 3 ( Nc 1 + Nc 2 + Nc 3 ) n
Wherein Nc1, Nc2, Nc3 represent that respectively the project of required evaluation adheres to the subject quantity of each rank subject separately in subject classification, whether the code that B1, B2, B3 are respectively used to same levels subject between expression project and evaluation expert is identical, identical value is 1, difference value is 0, index n is eigenwert, represents having the punishment whether different stage subject is identical;
(3) project appraisal expert recommendation is calculated, and computing formula is as follows:
ProSim(V,U)=w(c)×exp[sim(V,U)]
W in above formula (c) is the project that calculates of step (2) and evaluation expert's subject eigenwert, sim (V, U) be the project research contents that calculates of step (1) and the similarity value of evaluation expert's research direction Text eigenvector, exp[sim (V, U)] representative is the exponential function of value take e as end sim (V, U);
(4) project appraisal expert recommendation step (3) being calculated sorts.
CN201410092584.XA 2014-03-13 2014-03-13 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm Active CN103823896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410092584.XA CN103823896B (en) 2014-03-13 2014-03-13 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410092584.XA CN103823896B (en) 2014-03-13 2014-03-13 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm

Publications (2)

Publication Number Publication Date
CN103823896A true CN103823896A (en) 2014-05-28
CN103823896B CN103823896B (en) 2017-02-15

Family

ID=50758960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410092584.XA Active CN103823896B (en) 2014-03-13 2014-03-13 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm

Country Status (1)

Country Link
CN (1) CN103823896B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331498A (en) * 2014-11-19 2015-02-04 亚信科技(南京)有限公司 Method for automatically classifying webpage content visited by Internet users
CN104361102A (en) * 2014-11-24 2015-02-18 清华大学 Expert recommendation method and system based on group matching
CN104834702A (en) * 2015-04-28 2015-08-12 南京邮电大学 Subject selection method aiming at scientific research project application
CN105335447A (en) * 2014-08-14 2016-02-17 北京奇虎科技有限公司 Computer network-based expert question-answering system and construction method thereof
CN105786960A (en) * 2015-01-14 2016-07-20 通用电气公司 Method, System, And User Interface For Expert Search Based On Case Resolution Logs
CN105894183A (en) * 2016-03-30 2016-08-24 腾讯科技(深圳)有限公司 Project evaluation method and apparatus
CN106952191A (en) * 2017-03-09 2017-07-14 深圳市华第时代科技有限公司 The automatic reviewing method of motion and system
CN107656920A (en) * 2017-09-14 2018-02-02 杭州电子科技大学 A kind of skilled personnel based on patent recommend method
CN107807978A (en) * 2017-10-26 2018-03-16 北京航空航天大学 A kind of code review person based on collaborative filtering recommends method
CN107833061A (en) * 2017-11-17 2018-03-23 中农网购(江苏)电子商务有限公司 One kind is for retail Intelligent agricultural product allocator
CN108846056A (en) * 2018-06-01 2018-11-20 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement evaluation expert recommended method and device
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
CN109299905A (en) * 2018-05-09 2019-02-01 北京京润恒远科技有限公司 A kind of project appraisal method and system
CN110188958A (en) * 2019-06-03 2019-08-30 杭州志优网络科技有限公司 A kind of method that college entrance will intelligently makes a report on prediction recommendation
CN110322895A (en) * 2018-03-27 2019-10-11 亿度慧达教育科技(北京)有限公司 Speech evaluating method and computer storage medium
CN110443574A (en) * 2019-07-25 2019-11-12 昆明理工大学 Entry convolutional neural networks evaluation expert's recommended method
CN111143690A (en) * 2019-12-31 2020-05-12 中国电子科技集团公司信息科学研究院 Expert recommendation method and system based on associated expert database
CN111191108A (en) * 2018-10-26 2020-05-22 上海交通大学 Software crowdsourcing project recommendation method and system based on reinforcement learning
CN111202511A (en) * 2020-01-17 2020-05-29 武汉中旗生物医疗电子有限公司 Recommendation and distribution method and device for electrocardiogram data labeling
CN111260197A (en) * 2020-01-10 2020-06-09 光明网传媒有限公司 Network article evaluation method, system, computer equipment and readable storage medium
CN111666420A (en) * 2020-05-29 2020-09-15 华东师范大学 Method for intensively extracting experts based on subject knowledge graph
CN111782797A (en) * 2020-07-13 2020-10-16 贵州省科技信息中心 Automatic matching method for scientific and technological project review experts and storage medium
CN113868407A (en) * 2021-08-17 2021-12-31 北京智谱华章科技有限公司 Evaluation method and device for review recommendation algorithm based on scientific research big data
WO2024164698A1 (en) * 2023-02-07 2024-08-15 中国计量科学研究院 Method and apparatus for preference and avoidance of test experts in scientific and technological achievement test

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11853750B2 (en) 2020-12-14 2023-12-26 International Business Machines Corporation Subject matter expert identification and code analysis based on a probabilistic filter

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010039605A (en) * 2008-08-01 2010-02-18 Ricoh Co Ltd Person search system, person search method, program and recording medium
CN103605665B (en) * 2013-10-24 2017-01-11 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN103631859B (en) * 2013-10-24 2017-01-11 杭州电子科技大学 Intelligent review expert recommending method for science and technology projects

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨光飞: "基于本体的项目和领域专家匹配系统研究", 《万方数据》 *
胡斌: "科技项目评审专家推荐系统的研究与实现", 《万方数据》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335447A (en) * 2014-08-14 2016-02-17 北京奇虎科技有限公司 Computer network-based expert question-answering system and construction method thereof
CN104331498A (en) * 2014-11-19 2015-02-04 亚信科技(南京)有限公司 Method for automatically classifying webpage content visited by Internet users
CN104331498B (en) * 2014-11-19 2017-08-01 亚信科技(南京)有限公司 A kind of method that web page contents to internet user access are classified automatically
CN104361102A (en) * 2014-11-24 2015-02-18 清华大学 Expert recommendation method and system based on group matching
CN104361102B (en) * 2014-11-24 2018-05-11 清华大学 A kind of expert recommendation method and system based on group matches
CN105786960A (en) * 2015-01-14 2016-07-20 通用电气公司 Method, System, And User Interface For Expert Search Based On Case Resolution Logs
CN104834702A (en) * 2015-04-28 2015-08-12 南京邮电大学 Subject selection method aiming at scientific research project application
CN104834702B (en) * 2015-04-28 2018-10-02 南京邮电大学 For the subject selection method of science research programs
CN105894183B (en) * 2016-03-30 2020-11-10 腾讯科技(深圳)有限公司 Project evaluation method and device
CN105894183A (en) * 2016-03-30 2016-08-24 腾讯科技(深圳)有限公司 Project evaluation method and apparatus
CN106952191A (en) * 2017-03-09 2017-07-14 深圳市华第时代科技有限公司 The automatic reviewing method of motion and system
CN107656920A (en) * 2017-09-14 2018-02-02 杭州电子科技大学 A kind of skilled personnel based on patent recommend method
CN107656920B (en) * 2017-09-14 2020-12-18 杭州电子科技大学 Scientific and technological talent recommendation method based on patents
CN107807978A (en) * 2017-10-26 2018-03-16 北京航空航天大学 A kind of code review person based on collaborative filtering recommends method
CN107807978B (en) * 2017-10-26 2021-07-06 北京航空航天大学 Code reviewer recommendation method based on collaborative filtering
CN107833061A (en) * 2017-11-17 2018-03-23 中农网购(江苏)电子商务有限公司 One kind is for retail Intelligent agricultural product allocator
CN110322895A (en) * 2018-03-27 2019-10-11 亿度慧达教育科技(北京)有限公司 Speech evaluating method and computer storage medium
CN109299905A (en) * 2018-05-09 2019-02-01 北京京润恒远科技有限公司 A kind of project appraisal method and system
CN108846056B (en) * 2018-06-01 2021-04-23 云南电网有限责任公司电力科学研究院 Scientific and technological achievement review expert recommendation method and device
CN108846056A (en) * 2018-06-01 2018-11-20 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement evaluation expert recommended method and device
CN108920556B (en) * 2018-06-20 2021-11-19 华东师范大学 Expert recommending method based on discipline knowledge graph
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
CN111191108A (en) * 2018-10-26 2020-05-22 上海交通大学 Software crowdsourcing project recommendation method and system based on reinforcement learning
CN110188958A (en) * 2019-06-03 2019-08-30 杭州志优网络科技有限公司 A kind of method that college entrance will intelligently makes a report on prediction recommendation
CN110443574A (en) * 2019-07-25 2019-11-12 昆明理工大学 Entry convolutional neural networks evaluation expert's recommended method
CN110443574B (en) * 2019-07-25 2023-04-07 昆明理工大学 Recommendation method for multi-project convolutional neural network review experts
CN111143690A (en) * 2019-12-31 2020-05-12 中国电子科技集团公司信息科学研究院 Expert recommendation method and system based on associated expert database
CN111260197A (en) * 2020-01-10 2020-06-09 光明网传媒有限公司 Network article evaluation method, system, computer equipment and readable storage medium
CN111202511A (en) * 2020-01-17 2020-05-29 武汉中旗生物医疗电子有限公司 Recommendation and distribution method and device for electrocardiogram data labeling
CN111666420A (en) * 2020-05-29 2020-09-15 华东师范大学 Method for intensively extracting experts based on subject knowledge graph
CN111782797A (en) * 2020-07-13 2020-10-16 贵州省科技信息中心 Automatic matching method for scientific and technological project review experts and storage medium
CN113868407A (en) * 2021-08-17 2021-12-31 北京智谱华章科技有限公司 Evaluation method and device for review recommendation algorithm based on scientific research big data
CN113868407B (en) * 2021-08-17 2024-06-28 北京智谱华章科技有限公司 Evaluation method and device of review recommendation algorithm based on scientific research big data
WO2024164698A1 (en) * 2023-02-07 2024-08-15 中国计量科学研究院 Method and apparatus for preference and avoidance of test experts in scientific and technological achievement test

Also Published As

Publication number Publication date
CN103823896B (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN103823896A (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
Hausladen et al. Text classification of ideological direction in judicial opinions
CN112632228A (en) Text mining-based auxiliary bid evaluation method and system
CN107357837A (en) The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method
CN101127042A (en) Sensibility classification method based on language model
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN103473317A (en) Method and equipment for extracting keywords
CN111400499A (en) Training method of document classification model, document classification method, device and equipment
CN106776672A (en) Technology development grain figure determines method
CN112052396A (en) Course matching method, system, computer equipment and storage medium
Almiman et al. Deep neural network approach for Arabic community question answering
CN104778157A (en) Multi-document abstract sentence generating method
Iqbal et al. Bias-aware lexicon-based sentiment analysis
CN105912648A (en) Side information-based code snippet programming language detecting method
Akther et al. Compilation, analysis and application of a comprehensive Bangla Corpus KUMono
Munggaran et al. Sentiment analysis of twitter users’ opinion data regarding the use of chatgpt in education
CN111104492B (en) Civil aviation field automatic question and answer method based on layering Attention mechanism
Tran et al. Context-aware detection of sneaky vandalism on wikipedia across multiple languages
Qi et al. Application of LDA and word2vec to detect English off-topic composition
CN117235253A (en) Truck user implicit demand mining method based on natural language processing technology
Laeeq et al. Sentimental Classification of Social Media using Data Mining.
CN111400496B (en) Public praise emotion analysis method for user behavior analysis
CN113326348A (en) Blog quality evaluation method and tool
Morstatter et al. Text, topics, and turkers: A consensus measure for statistical topics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant