CN103631859B - Intelligent review expert recommending method for science and technology projects - Google Patents

Intelligent review expert recommending method for science and technology projects Download PDF

Info

Publication number
CN103631859B
CN103631859B CN201310509358.2A CN201310509358A CN103631859B CN 103631859 B CN103631859 B CN 103631859B CN 201310509358 A CN201310509358 A CN 201310509358A CN 103631859 B CN103631859 B CN 103631859B
Authority
CN
China
Prior art keywords
word
science
node
feature
expert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310509358.2A
Other languages
Chinese (zh)
Other versions
CN103631859A (en
Inventor
徐小良
吴仁克
林建海
陈秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201310509358.2A priority Critical patent/CN103631859B/en
Publication of CN103631859A publication Critical patent/CN103631859A/en
Application granted granted Critical
Publication of CN103631859B publication Critical patent/CN103631859B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent review expert recommending method for science and technology projects. The method includes the following steps that (1) the science and technology projects to be reviewed and expert information main texts are segmented into substring sequences, ICTCLAS segmentation of Chinese academy of sciences is carried out on the substring sequences, and stop word filtering is carried out on a segmentation result to obtain a term set; (2) a term network of project information is built, feature words are extracted on the basis of statistical characteristics and aggregation characteristics, and if expert information is relatively concise, the term set obtained in the step (1) directly serves as the feature words; (3) a knowledge representation model is built on the basis of fields and weights of the feature words, and a relative information index is built; (4) experts are recommended in groups to carry out feature merging operations between the fields and between the projects on the knowledge representation model; (5) similarity of the experts and the science and technology projects or groups to be viewed is calculated on the basis of semantics, threshold truncation is set, and a final recommended expert list is generated. By means of the method, the problems that recommending workload is large and review decisions lack scientificity are greatly alleviated.

Description

A kind of evaluation expert's intelligent recommendation method towards science and technology item
Technical field
The invention belongs to expert's recommended technology field, particularly relate to a kind of science and technology item evaluation expert based on network service Intelligent recommendation method, it is a kind of intelligent method assisting science and technology item Authorize to Invest.
Background technology
Along with technological project management system is popularized rapidly in each functional department of China, the evaluation of science and technology item was from the past Concentration conference model develop into current network schemer, broken the restriction of expert region in evaluation.Evaluation expert's root According to domain knowledge and the subsidy standard of subsidy mechanism, appraiseing project application book through discussion, that subsidizes mechanism foundation expert appraises feelings through discussion Condition decides whether to subsidize.
At present the expert towards science and technology item recommends the most only subjective consciousness with project manager to recommend expert to treat Careful project is evaluated, and a pending trial project generally requires multiple expert and evaluates, and artificial recommendation expert certainly will exist efficiency The problems such as the highest, workload big, shortage science, the expert selected is not most suitable.Therefore, science and technology item is commented Examining the research that expert intelligence recommends is non-the normally off key, can effectively alleviate expert and not mate with the commented contents of a project etc. and to ask Topic, is greatly promoted the community service ability of science and technology item evaluation.
Intelligent recommendation technology now, such as collaborative filtering recommending, content-based recommendation etc., mostly applies and recommends net at video display Stand, commercial product recommending website, rarely have the research in science and technology item evaluation expert's information bank and application, due to the limit of specific area System, for science and technology item intelligent recommendation expert's technology and general recommended technology or distinguishing: first, technological project management system The recommendation of system relates to all trades and professions, and domain knowledge is extremely complex;Secondly, the recommendation of science and technology item evaluation expert relates to science and technology item Purpose sustentation fund, the requirement to objectivity, fairness and accuracy that expert recommends is the highest.The most in this respect, China also lacks systematized guide for method and ripe technical support.And information text has features such as " semi-structured ", specially The content of family's information and pending trial science and technology item information may be matched, and the present invention makes full use of architectural feature and word Semantic information calculates the information similarity of project and expert.If similarity is higher, then it represents that this project is familiar with by expert, and generation pushes away Recommend specialist list project is evaluated.Present invention simultaneously provides a kind of decision support system recommending evaluation expert for science and technology item System (Decision Support System, DSS), is assigned to evaluation expert the project that domain knowledge matches and carries out science Evaluation so that auxiliary expert (decision-making user) realizes the decision-making of science, and aid decision making user improves level of decision-making and quality, makes to comment Examine more scientific and objectivity.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, it is provided that a kind of evaluation expert intelligent recommendation side towards science and technology item Method.
The present invention comprises the steps: towards evaluation expert's recommendation process of science and technology item
Step 1. disables dictionary using the general term in science and technology item and expert info and usual word as specialty;Punctuate is accorded with Number, non-Chinese character is as cutting signature library.
Step 2. carries out participle to science and technology item information, expert info: according to cutting labelling in science and technology item information, by item The information such as mesh title, main research, technical specification are cut into substring sequence;According to cutting labelling in evaluation expert's information, Project that extraction expert info, prize-winning situation, invention situation, the situation that publishes thesis, problem undertook and performance, research side To etc. information be cut into substring sequence, that is one field information of a sub-string sequence;Utilize Chinese Academy of Sciences ICTCLAS antithetical phrase string sequence Carry out participle.
Step 3. science and technology item feature word extracts: utilizes and general disables dictionary and specialty disables dictionary and stops participle Word is filtered, and the general dictionary that disables uses Harbin Institute of Technology to disable vocabulary, using the word segmentation result removing stop words as a word collection Close.
It is the most perfect process of a self study that specialty disables the structure of dictionary, constantly adds up during information participle The word frequency of word, the probability that word occurs at text is more than certain threshold value, it is brought into and disable dictionary.
Science and technology item quantity of information is relatively big, and set of words is carried out Semantic Similarity Measurement between word, according to the semantic pass of word The cooccurrence relation of system and word builds term network, calculates the word aggregation characteristic value in network;Statistics then in conjunction with word is special Value indicative, the criticality calculating word extracts science and technology item feature word;The feature word of science and technology item extracts comprehensively exactly The statistical nature information of text and semantic feature information, more accurately extract feature word.
Described Semantic Similarity Measurement process is as follows:
In knowing net semantic dictionary, if for two word W1And W2, W1There is a n concept: S11, S12 ..., S1n, W2 There is a m concept: S21, S22 ..., S2m.Word W1And W2Similarity SimSEM (W1, W2) equal to the similarity of each concept Maximum:
S i m S E M ( W 1 , W 2 ) = max i = 1 , ... n . j = 1... m S i m ( S 1 i , S 2 j )
Notional word and function word have different description language, need to calculate the syntax justice of its correspondence is former or relation adopted former between Similarity.Notional word concept includes that the first basic meaning is former, other basic meanings are former, the adopted former description of relation, relational symbol describe, similarity It is designated as Sim1 (p respectively1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2).The Similarity Measure of two feature structures Finally revert to the Similarity Measure of the former or concrete word of basic meaning.
Sim 4 ( S 1 , S 2 ) = Σ i = 1 4 β i Sim i ( S 1 , S 2 )
βi(1≤i≤4) are adjustable parameters, and have: β1234=1, β1≥β2≥β3≥β4
If CW={C1, C2 ..., Cm} be process after the set of words that obtains, the semantic similarity adjacency matrix of its correspondence SmIt is defined as:
Wherein, Sim (C1,C2) it is word C1With word C2Semantic similarity, Sim (Ci,Ci) it is 1, Sim (Ci,Cj)=Sim (Cj,Ci)。
Set of words CW={C1, C2 ..., Cm} is calculated m × (1+m)/2 word through semantic similarity Between the value of similarity.
It is as follows that the cooccurrence relation of described word calculates process:
Word co-occurrence model is one of important models of natural language processing research field based on statistical method.According to word altogether Existing model, if two frequent co-occurrences of word are at the same window unit (such as a word, a paragragh etc.) of document, the two word exists Being to be mutually related in meaning, they express the semantic information of the text to a certain extent.Utilize sliding window (sliding window A length of 3) word in sequence of terms is carried out word co-occurrence degree calculate, sliding window as shown in Figure 1:
First, sequence of terms is carried out word extraction, i.e. remove space, null and merge identical word, obtain word Set CW={C1, C2 ..., Cm}, wherein m≤n.
Word co-occurrence degree Matrix C m corresponding to set of words CW is defined as:
When Cm is initial, Coo (Ci, Cj) is 01 (1≤i, j≤m).
By sliding window, sequence of terms being carried out word co-occurrence degree to calculate, the word in sliding window is Ti-1TiTi+1(1<i < n):
1) if i=n-1,4 are turned);If Ti-1Being space or null, sliding window slides to next word, i++;Otherwise, 2 are turned).
2) if TiFor Chinese, then Coo (Ti-1,Ti) ++, turn 3);If TiFor null, turn 3);Otherwise turn 1).
3) if TiChinese, then Coo (Ti-1,Ti+1) ++, i++, turn 1);Otherwise, 1 is turned).
4) if Tn-2It is Chinese, turns 5);Otherwise, 7 are turned)
5) if Tn-1It is Chinese, Coo (Tn-2,Tn-1) ++, turn 6);If Tn-1It is space, turns 6);Otherwise terminate.
6) if TnIt is Chinese, Coo (Tn-2,Tn) ++, terminate;Otherwise terminate.
7) if Tn-1It is Chinese, and TnAlso be Chinese, then Coo (Tn-1,Tn) ++, terminate;Otherwise terminate.
Through the calculating of previous step, obtain word co-occurrence degree Matrix C m, and each element of Cm is normalized Processing, namely each element is divided by the maximum of all elements in matrix, i.e. max{Coo (Ci,Cj)|1≤i,j≤m}。
Described term network is as follows:
When building cum rights term network, first having to obtain the weight matrix of term network, definition weight matrix Wm is:
Wherein, α is 0.3, and β is 0.7, the semantic relation between strengthening word, weakens the cooccurrence relation between word.
WmAs the adjacency matrix that the term network of input is corresponding, then the network of its correspondence is defined as: G={V, E};Its Middle figure G is undirected weighted graph, and V represents the vertex set in figure G, and E represents the limit collection in G, viRepresent i-th summit (word) in V.
The calculating process of described word aggregation characteristic value is as follows:
The distribution of key character degree of having, average shortest path length, concentration class and the convergence factor of term network.The degree of node embodies This node associates situation with other node.The concentration class of node and convergence factor are embodied in the node in this node subrange It is connected with each other density.The degree of node and convergence factor embody this node importance in subrange.The present invention passes through node Add measures and weights, convergence factor and node betweenness to calculate the aggregation characteristic value of node, important word can be allowed to give higher Weights, ensure that again the word related word important with many also has higher scoring.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, then node vi The definition adding measures and weights be:
WD i = &Sigma; j = 1 n w i j / n
Wherein, wijFor node viWith vjBetween weights on limit, n is total number of node.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, node vi's Non-power measures and weights DiFor Di=| { (vi,vj):(vi,vj)∈E,vi,vj∈V}|;Node viConcentration class KiFor existing between neighbor node Actual limit number: Ti=| { (vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈ V} |, then node vjConvergence factor Ci's It is defined as:
C i = T i D i 2 = 2 T i / D i ( D i - 1 )
In semantic similarity network, node betweenness Betweenness is between node x and w and shortest path leads to Cross node viProbability probability.Pair Analysis between two nonneighbor nodes depends on the joint on the shortest path connecting point-to-point transmission Point, these nodes are potential plays the part of the role of interactive information stream, B between control nodeiEmbody node viConnecting under local environment Degree, then the definition of node betweenness Betweenness is:
B i = &Sigma; w &Element; G , x &Element; G r v i ( w , x ) d ( w , x )
D (w, x) represents in cum rights semantic similarity network shortest path number between any two node w and x,Represent any two node w and x and through vi(vi∈ G) shortest path number.
By node viAverage weighted degree, convergence factor and betweenness Betweenness be weighted comprehensively weighing node Aggregation characteristic value, node viAggregation characteristic value ZiDefinition be:
Z i = a &times; WD i + b &times; C i / &Sigma; j = 1 n C j + c &times; B i
Wherein, a+b+c=1.
The calculating process of the statistical characteristics of described word is as follows:
Use nonlinear function that word frequency is normalized.Word WiWord frequency weight TFi in the text is defined as:
T F i = f ( W i ) &Sigma; j = 1 n f ( p j )
Wherein, TFi represents word WiWord frequency weight, pjRepresenting certain word in text, f is word frequency statistics function.
Can identify text characteristics in Chinese text is usually notional word, such as noun, verb, adjective etc..And interjection, Jie Feature word, to determining that text categories is the most nonsensical, can be extracted and bring the biggest interference by the function word such as word, conjunction.Word Wi? Part of speech weight posi in text is defined as:
Word is the longest more can reflect concrete information, otherwise, the represented meaning of shorter word is the most abstract.Especially at literary composition Mostly feature word in Dang is some specialties academic combination vocabulary, and length is longer, and its implication is clearer and more definite, more can reflect text master Topic.Increase the weight of long word, be conducive to vocabulary is split, thus reflect word important journey in a document more accurately Degree.
Word WiLong weight leni of word in the text is defined as:
For each word in sequence of terms, its statistical characteristics is
statsi=A*TFi+B*posi+C*leni
Wherein, A+B+C=1.
Described word WiThe calculating process of criticality is as follows:
Corresponding to each node in weighting term network, its crucial angle value ImpiIt is defined as:
Impi=β * statsi+(1-β)*Zi
Wherein, 0 < β < 1.
The value of criticality will be obtained by calculating, and sort from big to small, set a threshold gamma (0 < γ < 1), before taking-up The value of q, then these words are using the feature word as science and technology item, and these words fully reflect theme, and are heavier The word wanted.
Step 4. evaluation expert's feature word extracts: evaluation expert's quantity of information is few compared with science and technology item information, science and technology item Feature Words builds network and based on statistical nature and the extractive technique of semantic feature, is not suitable for the feature word of evaluation expert's information Extract, directly disable dictionary and specialty disables dictionary and carries out stop words filtration according to general, extract the feature word set of each expert Closing, general to disable dictionary and be also be to use Harbin Institute of Technology to disable vocabulary, and specialty disables dictionary needs personnel constantly to safeguard.
Step 5. builds point field Knowledge Representation Model of science and technology item, evaluation expert: by vector space model and Matter-element Knowledge Set model is extended, according to the different field information in science and technology item set up text representation model PRO=(id, F, WF, T, V), wherein id represents the identification field in project library;F represents field category set in science and technology item;WF is field Weight;T is characterized word;V represents the word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f (vi2),...,vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijCorresponding to key word Frequency.The representation of knowledge of science and technology item information is as follows:
In like manner, Knowledge Representation Model TM=(id, F, WF, T, V) is set up according to the different field information in expert.Wherein, Id represents the identification field in experts database;F represents field category set in evaluation expert;WF is the weight set of field;T is Feature word;V represents the feature word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f(vi2),..., vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijFeature word is at corresponding word The frequency of occurrences in Duan.The representation of knowledge of evaluation expert's information is:
Evaluation expert's information index storehouse builds: after evaluation Expert Knowledge Expression model construction completes, entered by information index Storehouse: first read the content item information of an evaluation expert from experts database;Phrase semantic network is set up also based on word segmentation result Extract the Feature Words that evaluation expert is comprised;Foundation Knowledge Representation Model also utilizes Apache Lucene that it is set up index;Will The index established is added in the index database of correspondence by generic, until all of evaluation expert indexes warehouse-in.
Step 6: according to the number of project, it is recommended that mode is divided into single pending trial project recommendation expert and packet (multiple) pending trial Project recommendation expert.Packet recommends expert that the pending trial project knowledge of step 5 being represented, model does between corresponding interfield and project Feature union operation, single pending trial expert recommend only do corresponding interfield feature union operation.Meanwhile, commenting step 5 The Knowledge Representation Model examining expert carries out interfield feature merging.Foundation Knowledge Representation Model also utilizes Apache Lucene couple Characteristic information after merging sets up index.Wherein, science and technology item index construct is carried out when carrying out project recommendation.
In science and technology item declaration management system, pending trial project needs packet to recommend often, features described above union operation, Guarantee removal process 5 to arrange different field weight that Similarity Measure is produced the contribution recommended is poor by Knowledge Representation Model Different.
Described pending trial project, that the feature merga pass logic xor operation of evaluation expert carries out process is as follows:
(1) pending trial project, the interfield feature of an evaluation expert merge
Assume field feature set of words W'1And W'2Merge, then define W'1And W'2Merge ruleFor:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , f ( word 1 i ) + f ( word 2 i ) 2 } | word 1 i = word 2 j }
Wherein, word1i, word2jIt is characterized word.
Add field weight improve and extend above-mentioned definition, the interfield feature of evaluation expert, science and technology item is closed And, merging rule is:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , w 1 * f ( word 1 i ) + w 2 * f ( word 2 i ) w 1 2 + w 2 2 } | word 1 i = word 2 j }
(2) between the project of packet pending trial project, feature merges
The operation of this merging process, just for the characteristic vector of pending trial science and technology item, is not for evaluation expert's characteristic vector, Expert features vector has only to do interfield feature union operation.If V is (d1) and V (d2) it is that two science and technology items are through word respectively Vector model after the merging of intersegmental feature, to any t1j∈V(d1), t2j∈V(d2), if there is t1jWith t2jIdentical, merge.It is defined as:
V ( d 1 ) &CirclePlus; V ( d 2 ) = { < t k , w k ( p ) = w i ( d 1 ) + w j ( d 2 ) 2 > }
Wherein, k=1 ..., n, tkIt is characterized entry item, wkP () is tkWeight.
The basic process that the Knowledge Representation Model of science and technology item group produces is as follows:
A). merge science and technology item interfield feature, obtain vector model V (d) of each project;
B). all science and technology item vector model set are used consolidation strategyLogical Cross above-mentioned method, science and technology item is set up the Knowledge Representation Model of the vector space that is based on.
V (p)={ < t1,w1(p) >, < t2,w2(p) > ..., < tn,wn(p) > }
Wherein, k=1 ..., n, tkFor project team's Feature Words entry item, wkP () is tkWeight.
Step 7. is closed through the interfield feature of the evaluation expert of step 6 and the Knowledge Representation Model of science and technology item And, it is assumed that if evaluation expert's information vector is expressed as P={s1,f(s1),s2,f(s2),...,sn,f(sn), science and technology item information (group) vector representation is Q={t1,f(t1),t2,f(t2),...,tn,f(tn), calculate pending trial science and technology based on maximum matching algorithm Project (group) vector and the semantic similarity of evaluation expert.
Step 8. arranges similarity and blocks, and produces according to the size of similarity and recommends index, produces final recommendation evaluation Specialist list.
The present invention has the beneficial effect that:
Can recommend quickly and conveniently, intelligently, accurately science and technology project appraisal expert;Can significantly alleviate science and technology item The mesh declaration management system scientific worker distribution task to evaluation expert, reduces the cost of management;Ensure that evaluation Expert and pending trial science and technology item have higher field matching degree, it is ensured that the evaluation of project is accomplished objectivity, public affairs by evaluation expert Positivity and science, it is provided that automatic, efficient, just decision support, it is to avoid science and technology item examination & approval occur human feelings network of personal connections, The problem that the examination & approval such as " Matthew effect " are improper.
Accompanying drawing explanation
Fig. 1 is to carry out word co-occurrence degree in the present invention to calculate sliding window.
Fig. 2 is maximum matching algorithm principle schematic based on bigraph (bipartite graph) in the present invention.
Fig. 3 is the evaluation expert's intelligent recommendation method flow diagram in the present invention towards science and technology item.
Fig. 4 is the extraction flow chart of the Feature Words of science and technology item and evaluation expert's information in the present invention.
Fig. 5 is that in the present invention, evaluation expert's knowledge index storehouse builds flow chart.
Detailed description of the invention
The invention will be further described below in conjunction with the accompanying drawings, it should be emphasised that be that the description below is merely exemplary, Rather than in order to limit the scope of the present invention and application thereof.Hereinafter the detailed description of the invention of the present invention is described in further detail, base Embodiment in invention, the every other enforcement that those of ordinary skill in the art are obtained under not having creative work premise Example, broadly falls into protection scope of the present invention.
As it is shown on figure 3, the main thought of the recommendation method of the present invention is: (1) is in science and technology item declaration management system Expert info and pending trial science and technology item information, main text dividing is become substring sequence and carries out Chinese Academy of Sciences's ICTCLAS participle, Word segmentation result is carried out stop words and is filtrated to get set of words;(2) science and technology item information includes main research, technical specification Etc. information, quantity of information is relatively big, and the cooccurrence relation inventing the semantic relation according to word and word builds term network, and calculates word net The node rendezvous eigenvalue of network, with statistical characteristics weighted calculation word criticality, extracts the Feature Words of each science and technology item; (3) expert info is than science and technology item information reduction, and quantity of information is less, directly by each expert info word collection through being filtrated to get Cooperation is characterized word;(4) field weight is set according to the importance difference of science and technology item, expert's field information, according to (2) and (3) Feature Words obtained, builds the Knowledge Representation Model for project and expert respectively, builds expert's index database;(5) packet pushes away Recommend expert model pending trial project knowledge and represent that model is the feature union operation between interfield and project, single pending trial project expert Recommend only to do interfield feature union operation.Expert Knowledge Expression model does interfield feature simultaneously merge.(6) consider Word has the feature of Semantic fuzzy matching, calculates the similarity of expert info and pending trial science and technology item information, by setting threshold Value blocks the consequently recommended specialist list of generation.
Step 1. disables dictionary using the general term in science and technology item and expert info and usual word as specialty;Punctuate is accorded with Number, non-Chinese character is as cutting signature library.
Step 2. carries out participle to science and technology item information, expert info: according to cutting labelling in science and technology item information, by item The information such as mesh title, main research, technical specification are cut into substring sequence;According to cutting labelling in evaluation expert's information, Project that extraction expert info, prize-winning situation, invention situation, the situation that publishes thesis, problem undertook and performance, research side To etc. information be cut into substring sequence, that is one field information of a sub-string sequence;Utilize Chinese Academy of Sciences ICTCLAS antithetical phrase string sequence Carry out participle.
Step 3. science and technology item feature word extracts: utilizes and general disables dictionary and specialty disables dictionary and stops participle Word is filtered, and the general dictionary that disables uses Harbin Institute of Technology to disable vocabulary, using the word segmentation result removing stop words as a word collection Close, see Fig. 4.
It is the most perfect process of a self study that specialty disables the structure of dictionary, constantly adds up during information participle The word frequency of word, the probability that word occurs at text is more than certain threshold value, it is brought into and disable dictionary.
Science and technology item quantity of information is relatively big, and set of words is carried out Semantic Similarity Measurement between word, according to the semantic pass of word The cooccurrence relation of system and word builds term network, calculates the word aggregation characteristic value in network;Statistics then in conjunction with word is special Value indicative, the criticality calculating word extracts science and technology item feature word;The feature word of science and technology item extracts comprehensively exactly The statistical nature information of text and semantic feature information, more accurately extract feature word.
Described Semantic Similarity Measurement process is as follows:
In knowing net semantic dictionary, if for two word W1And W2, W1There is a n concept: S11, S12 ..., S1n, W2 There is a m concept: S21, S22 ..., S2m.Word W1And W2Similarity SimSEM (W1, W2) equal to the similarity of each concept Maximum:
S i m S E M ( W 1 , W 2 ) = max i = 1 , ... n . j = 1... m S i m ( S 1 i , S 2 j )
Notional word and function word have different description language, need to calculate the syntax justice of its correspondence is former or relation adopted former between Similarity.Notional word concept includes that the first basic meaning is former, other basic meanings are former, the adopted former description of relation, relational symbol describe, similarity It is designated as Sim1 (p respectively1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2).The Similarity Measure of two feature structures Finally revert to the Similarity Measure of the former or concrete word of basic meaning.
Sim 4 ( S 1 , S 2 ) = &Sigma; i = 1 4 &beta; i Sim i ( S 1 , S 2 )
βi(1≤i≤4) are adjustable parameters, and have: β1234=1, β1≥β2≥β3≥β4
If CW={C1, C2 ..., Cm} be process after the set of words that obtains, the semantic similarity adjacency matrix of its correspondence SmIt is defined as:
Wherein, Sim (C1,C2) it is word C1With word C2Semantic similarity, Sim (Ci,Ci) it is 1, Sim (Ci,Cj)=Sim (Cj,Ci)。
Set of words CW={C1, C2 ..., Cm} is calculated m × (1+m)/2 word through semantic similarity Between the value of similarity.
It is as follows that the cooccurrence relation of described word calculates process:
Word co-occurrence model is one of important models of natural language processing research field based on statistical method.According to word altogether Existing model, if two frequent co-occurrences of word are at the same window unit (such as a word, a paragragh etc.) of document, the two word exists Being to be mutually related in meaning, they express the semantic information of the text to a certain extent.Utilize sliding window (sliding window A length of 3) word in sequence of terms is carried out word co-occurrence degree calculate, sliding window as shown in Figure 1:
First, sequence of terms is carried out word extraction, i.e. remove space, null and merge identical word, obtain word Set CW={C1, C2 ..., Cm}, wherein m≤n.
Word co-occurrence degree Matrix C m corresponding to set of words CW is defined as:
When Cm is initial, Coo (Ci, Cj) is 01 (1≤i, j≤m).
By sliding window, sequence of terms being carried out word co-occurrence degree to calculate, the word in sliding window is Ti-1TiTi+1(1<i < n):
1) if i=n-1,4 are turned);If Ti-1Being space or null, sliding window slides to next word, i++;Otherwise, 2 are turned).
2) if TiFor Chinese, then Coo (Ti-1,Ti) ++, turn 3);If TiFor null, turn 3);Otherwise turn 1).
3) if TiChinese, then Coo (Ti-1,Ti+1) ++, i++, turn 1);Otherwise, 1 is turned).
4) if Tn-2It is Chinese, turns 5);Otherwise, 7 are turned)
5) if Tn-1It is Chinese, Coo (Tn-2,Tn-1) ++, turn 6);If Tn-1It is space, turns 6);Otherwise terminate.
6) if TnIt is Chinese, Coo (Tn-2,Tn) ++, terminate;Otherwise terminate.
7) if Tn-1It is Chinese, and TnAlso be Chinese, then Coo (Tn-1,Tn) ++, terminate;Otherwise terminate.
Through the calculating of previous step, obtain word co-occurrence degree Matrix C m, and each element of Cm is normalized Processing, namely each element is divided by the maximum of all elements in matrix, i.e. max{Coo (Ci,Cj)|1≤i,j≤m}。
Described term network is as follows:
When building cum rights term network, first having to obtain the weight matrix of term network, definition weight matrix Wm is:
Wherein, α is 0.3, and β is 0.7, the semantic relation between strengthening word, weakens the cooccurrence relation between word.
WmAs the adjacency matrix that the term network of input is corresponding, then the network of its correspondence is defined as: G={V, E};Its Middle figure G is undirected weighted graph, and V represents the vertex set in figure G, and E represents the limit collection in G, viRepresent i-th summit (word) in V.
The calculating process of described word aggregation characteristic value is as follows:
The distribution of key character degree of having, average shortest path length, concentration class and the convergence factor of term network.The degree of node embodies This node associates situation with other node.The concentration class of node and convergence factor are embodied in the node in this node subrange It is connected with each other density.The degree of node and convergence factor embody this node importance in subrange.The present invention passes through node Add measures and weights, convergence factor and node betweenness to calculate the aggregation characteristic value of node, important word can be allowed to give higher Weights, ensure that again the word related word important with many also has higher scoring.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, then node vi The definition adding measures and weights be:
WD i = &Sigma; j = 1 n w i j / n
Wherein, wijFor node viWith vjBetween weights on limit, n is total number of node.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, node vi's Non-power measures and weights DiFor Di=| { (vi,vj):(vi,vj)∈E,vi,vj∈V}|;Node viConcentration class KiFor existing between neighbor node Actual limit number: Ti=| { (vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈ V} |, then node vjConvergence factor Ci's It is defined as:
C i = T i D i 2 = 2 T i / D i ( D i - 1 )
In semantic similarity network, node betweenness Betweenness is between node x and w and shortest path leads to Cross node viProbability probability.Pair Analysis between two nonneighbor nodes depends on the joint on the shortest path connecting point-to-point transmission Point, these nodes are potential plays the part of the role of interactive information stream, B between control nodeiEmbody node viConnecting under local environment Degree, then the definition of node betweenness Betweenness is:
B i = &Sigma; w &Element; G , x &Element; G r v i ( w , x ) d ( w , x )
D (w, x) represents in cum rights semantic similarity network shortest path number between any two node w and x,Represent any two node w and x and through vi(vi∈ G) shortest path number.
By node viAverage weighted degree, convergence factor and betweenness Betweenness be weighted comprehensively weighing node Aggregation characteristic value, aggregation characteristic value Z of node viiDefinition be:
Z i = a &times; WD i + b &times; C i / &Sigma; j = 1 n C j + c &times; B i
Wherein, a+b+c=1.
The calculating process of the statistical characteristics of described word is as follows:
Use nonlinear function that word frequency is normalized.Word WiWord frequency weight TFi in the text is defined as:
T F i = f ( W i ) &Sigma; j = 1 n f ( p j )
Wherein, TFi represents word WiWord frequency weight, pjRepresenting certain word in text, f is word frequency statistics function.
Can identify text characteristics in Chinese text is usually notional word, such as noun, verb, adjective etc..And interjection, Jie Feature word, to determining that text categories is the most nonsensical, can be extracted and bring the biggest interference by the function word such as word, conjunction.Word Wi? Part of speech weight posi in text is defined as:
Word is the longest more can reflect concrete information, otherwise, the represented meaning of shorter word is the most abstract.Especially at literary composition Mostly feature word in Dang is some specialties academic combination vocabulary, and length is longer, and its implication is clearer and more definite, more can reflect text master Topic.Increase the weight of long word, be conducive to vocabulary is split, thus reflect word important journey in a document more accurately Degree.
Word WiLong weight leni of word in the text is defined as:
For each word in sequence of terms, its statistical characteristics is
statsi=A*TFi+B*posi+C*leni
Wherein, A+B+C=1.
Described word WiThe calculating process of criticality is as follows:
Corresponding to each node in weighting term network, its crucial angle value ImpiIt is defined as:
Impi=β * statsi+(1-β)*Zi
Wherein, 0 < β < 1.
The value of criticality will be obtained by calculating, and sort from big to small, set a threshold gamma (0 < γ < 1), before taking-up The value of q, then these words are using the feature word as science and technology item, and these words fully reflect theme, and are heavier The word wanted.
Step 4. evaluation expert's feature word extracts: evaluation expert's quantity of information is few compared with science and technology item information, science and technology item Feature Words builds network and based on statistical nature and the extractive technique of semantic feature, is not suitable for the feature word of evaluation expert's information Extract, directly disable dictionary and specialty disables dictionary and carries out stop words filtration according to general, extract the feature word set of each expert Closing, general to disable dictionary and be also be to use Harbin Institute of Technology to disable vocabulary, and specialty disables dictionary needs personnel constantly to safeguard.
Step 5. builds point field Knowledge Representation Model of science and technology item, evaluation expert: by vector space model and Matter-element Knowledge Set model is extended, according to the different field information in science and technology item set up text representation model PRO=(id, F, WF, T, V), wherein id represents the identification field in project library;F represents field category set in science and technology item;WF is field Weight;T is characterized word;V represents the word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f (vi2),...,vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijCorresponding to key word Frequency.The representation of knowledge of science and technology item information is as follows:
In like manner, Knowledge Representation Model TM=(id, F, WF, T, V) is set up according to the different field information in expert.Wherein, Id represents the identification field in experts database;F represents field category set in evaluation expert;WF is the weight set of field;T is Feature word;V represents the feature word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f(vi2),..., vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijFeature word is at corresponding word The frequency of occurrences in Duan.The representation of knowledge of evaluation expert's information is:
Evaluation expert's information index storehouse builds: after evaluation Expert Knowledge Expression model construction completes, entered by information index Storehouse: first read the content item information of an evaluation expert from experts database;Phrase semantic network is set up also based on word segmentation result Extract the Feature Words that evaluation expert is comprised;Foundation Knowledge Representation Model also utilizes Apache Lucene that it is set up index;Will The index established is added to by generic, in the index database of correspondence, until all of evaluation expert indexes warehouse-in, see Fig. 5.
Step 6: according to the number of project, it is recommended that mode is divided into single pending trial project recommendation expert and packet (multiple) pending trial Project recommendation expert.Packet recommends expert that the pending trial project knowledge of step 5 being represented, model does between corresponding interfield and project Feature union operation, single pending trial expert recommend only do corresponding interfield feature union operation.Meanwhile, commenting step 5 The Knowledge Representation Model examining expert carries out interfield feature merging.Foundation Knowledge Representation Model also utilizes Apache Lucene couple Characteristic information after merging sets up index.Wherein, science and technology item index construct is carried out when carrying out project recommendation.
In science and technology item declaration management system, pending trial project needs packet to recommend often, features described above union operation, Guarantee removal process 5 to arrange different field weight that Similarity Measure is produced the contribution recommended is poor by Knowledge Representation Model Different.
Described pending trial project, that the feature merga pass logic xor operation of evaluation expert carries out process is as follows:
(1) pending trial project, the interfield feature of an evaluation expert merge
Assume field feature set of words W'1And W'2Merge, then define W'1And W'2Merge ruleFor:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , f ( word 1 i ) + f ( word 2 i ) 2 } | word 1 i = word 2 j }
Wherein, word1i, word2jIt is characterized word.
Add field weight improve and extend above-mentioned definition, the interfield feature of evaluation expert, science and technology item is closed And, merging rule is:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , w 1 * f ( word 1 i ) + w 2 * f ( word 2 i ) w 1 2 + w 2 2 } | word 1 i = word 2 j }
(2) between the project of packet pending trial project, feature merges
The operation of this merging process, just for the characteristic vector of pending trial science and technology item, is not for evaluation expert's characteristic vector, Expert features vector has only to do interfield feature union operation.If V is (d1) and V (d2) it is that two science and technology items are through word respectively Vector model after the merging of intersegmental feature, to any t1j∈V(d1), t2j∈V(d2), if there is t1jWith t2jIdentical, merge.It is defined as:
V ( d 1 ) &CirclePlus; V ( d 2 ) = { < t k , w k ( p ) = w i ( d 1 ) + w j ( d 2 ) 2 > }
Wherein, k=1 ..., n, tkIt is characterized entry item, wkP () is tkWeight.
The knowledge model of science and technology item group represents that the basic process of generation is as follows:
A). merge science and technology item interfield feature, obtain vector model V (d) of each project;
B). all science and technology item vector model set are used consolidation strategyLogical Cross above-mentioned method, science and technology item is set up the Knowledge Representation Model of the vector space that is based on.
V (p)={ < t1,w1(p) >, < t2,w2(p) > ..., < tn,wn(p) > }
Wherein, k=1 ..., n, tkFor project team's Feature Words entry item, wkP () is tkWeight.
Step 7. is closed through the interfield feature of the evaluation expert of step 6 and the Knowledge Representation Model of science and technology item And, it is assumed that if evaluation expert's information vector is expressed as P={s1,f(s1),s2,f(s2),...,sn,f(sn), science and technology item information (group) vector representation is Q={t1,f(t1),t2,f(t2),...,tn,f(tn), calculate pending trial science and technology based on maximum matching algorithm Project (group) vector and the semantic similarity of evaluation expert.
Described pending trial science and technology item (group) vector calculates language with evaluation expert's vector based on bigraph (bipartite graph) maximum matching algorithm Justice Similarity Measure process is as follows:
Based on maximum matching algorithm computing semantic similarity, it is simply that obtain two texts uses maximum based on bigraph (bipartite graph) Matching algorithm similarity.As in figure 2 it is shown, maximum matching algorithm based on bigraph (bipartite graph) calculates the similarity of characteristic item, its principle is just It is that each Feature Words of evaluation expert's vector is made as the summit in X portion using each Feature Words of science and technology item (group) vector For a summit in Y portion, being equivalent to ask the maximum weight matching of a complete bipartite graph, in accompanying drawing 2, thick line portion is exactly X portion feature The semantic similarity that word is maximum with certain Y portion Feature Words.
So-called semantic similarity, it is simply that obtain based on the Similarity Measure knowing net.The present invention is by knowing net semantic dictionary With maximum matching algorithm calculates the semantic similarity between pending trial project (group) and evaluation expert, then computing formula is:
S i m S E M ( P , Q ) = ( &Sigma; k = 1 p f ( s i ) * f ( t j ) * S i m S E M ( s i , t j ) ) / min ( m , n )
Wherein, si, tjFor semantic similarity maximum SimSEM (si,tj) two words corresponding to limit (thick line in Fig. 2) Language node, m, n are respectively the Feature Words number of science and technology item vector representation and the Feature Words number of evaluation expert's vector representation.p Number for the limit (thick line in Fig. 2) of semantic similarity maximum.
Above-mentioned pending trial project (group) relates to language, phrase semantic, word knot with the semantic similarity of evaluation expert's information The many factors such as structure, it represents both matching degrees, and similarity is big, illustrate that both matching degrees are high, and evaluation expert is suitable for evaluating This project (group).
Step 8. arranges similarity and blocks, and produces according to the size of similarity and recommends index, produces final recommendation evaluation Specialist list.
The above is only the preferred embodiment of the present invention, it is noted that for science and technology item evaluation expert field Intelligent machine recommended technology, on the premise of without departing from the technology of the present invention principle, it is also possible to make some improvement and deformation, these Improve and deform the legal scope that also should be considered as the present invention.

Claims (3)

1. the evaluation expert's intelligent recommendation method towards science and technology item, it is characterised in that the method comprises the following steps:
Step 1, the general term in science and technology item and expert info and usual word are disabled dictionary as specialty;Punctuation mark, Non-Chinese character is as cutting signature library;
Step 2, science and technology item information, expert info are carried out participle: according to cutting labelling in science and technology item information, by entry name Title, main research, technical specification are cut into substring sequence;According to cutting labelling in evaluation expert's information, extraction expert's letter Project that breath, prize-winning situation, invention situation, the situation that publishes thesis, problem undertook and performance, research direction are cut into son String sequence, that is one field information of a sub-string sequence;Chinese Academy of Sciences ICTCLAS antithetical phrase string sequence is utilized to carry out participle;
Step 3, science and technology item feature word extract: utilize and general disable dictionary and specialty disables dictionary and participle is carried out stop words Filtering, the described general dictionary that disables uses Harbin Institute of Technology to disable vocabulary, using the word segmentation result removing stop words as a word Set;
It is the most perfect process of a self study that specialty disables the structure of dictionary, constantly adds up word during information participle Word frequency, it, more than certain threshold value, is brought into and disables dictionary by the probability that word occurs at text;
Science and technology item quantity of information is relatively big, and set of words is carried out Semantic Similarity Measurement between word, according to the semantic relation of word and The cooccurrence relation of word builds term network, calculates the word aggregation characteristic value in network;Then in conjunction with the statistical characteristics of word, The criticality calculating word extracts science and technology item feature word;The feature word of science and technology item extracts comprehensive text Statistical nature information and semantic feature information, more accurately extract feature word;
Step 4, evaluation expert's feature word extract: disable dictionary and specialty disables dictionary and carries out stop words filtration according to general, Extract the Feature Words set of each expert;
Step 5, structure science and technology item, point field Knowledge Representation Model of evaluation expert: by vector space model and matter-element Knowledge Set model is extended, according to the different field information in science and technology item set up text representation model PRO=(id, F, WF, T, V), wherein id represents the identification field in project library;F represents field category set in science and technology item;WF is the power of field Weight;T is characterized word;V represents the word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f (vi2),...,vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijCorresponding to key word Frequency;The representation of knowledge of science and technology item information is as follows:
In like manner, Knowledge Representation Model TM=(id, F, WF, T, V) is set up according to the different field information in expert;Wherein, id table Show the identification field in experts database;F represents field category set in evaluation expert;WF is the weight set of field;T is characterized Word;V represents the feature word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f(vi2),...,vin,f (vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijFeature word is in corresponding field The frequency of occurrences;The representation of knowledge of evaluation expert's information is:
Evaluation expert's information index storehouse builds: after evaluation Expert Knowledge Expression model construction completes, put in storage by information index: first From experts database, first read the content item information of an evaluation expert;Set up phrase semantic network based on word segmentation result and extraction is commented Examine the Feature Words that expert is comprised;Foundation Knowledge Representation Model also utilizes Apache Lucene that it is set up index;To establish Index add in the index database of correspondence by generic, until all of evaluation expert indexes warehouse-in;
Step 6, number according to project, it is recommended that mode is divided into single pending trial project recommendation expert and packet pending trial project recommendation special Family;The feature that packet recommends expert that the pending trial project knowledge of step 5 representing, model does between corresponding interfield and project merges Operation, single pending trial expert recommends only to do corresponding interfield feature union operation;Meanwhile, knowing the evaluation expert of step 5 Know and represent that model carries out interfield feature merging;According to Knowledge Representation Model the spy after utilizing Apache Lucene to be combined Reference breath sets up index;Wherein, science and technology item index construct is carried out when carrying out project recommendation;
In science and technology item declaration management system, pending trial project needs packet to recommend often, features described above union operation, it is ensured that Removal process 5 will not arrange different field weight Similarity Measure produces the contribution difference recommended by Knowledge Representation Model;
Step 7, merge through the interfield feature of the evaluation expert of step 6 and the Knowledge Representation Model of science and technology item, false If if evaluation expert's information vector is expressed as P={s1,f(s1),s2,f(s2),...,sn,f(sn), science and technology item information vector It is expressed as Q={t1,f(t1),t2,f(t2),...,tn,f(tn), calculate pending trial science and technology item vector based on maximum matching algorithm Semantic similarity with evaluation expert;
Step 8, similarity is set blocks, produce according to the size of similarity and recommend index, produce final recommendation evaluation expert List.
A kind of evaluation expert's intelligent recommendation method towards science and technology item the most according to claim 1, it is characterised in that: step Semantic Similarity Measurement process described in rapid 3 is as follows:
In knowing net semantic dictionary, if for two word W1And W2, W1There is a n concept: S11, S12 ..., S1n, W2There is m Concept: S21, S22 ..., S2m;Word W1And W2Similarity SimSEM (W1, W2) equal to the maximum of similarity of each concept Value:
S i m S E M ( W 1 , W 2 ) = m a x i = 1 , ... n . j = 1 ... m S i m ( S 1 i , S 2 j ) ;
Notional word and function word have different description language, need to calculate the syntax justice of its correspondence is former or relation adopted former between similar Degree;Notional word concept includes that the first basic meaning is former, other basic meanings are former, the adopted former description of relation, relational symbol describe, and similarity is respectively It is designated as Sim1 (p1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2);The Similarity Measure of two feature structures is final Revert to the Similarity Measure of the former or concrete word of basic meaning;
Sim 4 ( S 1 , S 2 ) = &Sigma; i = 1 4 &beta; i Sim i ( S 1 , S 2 ) ;
βi(1≤i≤4) are adjustable parameters, and have: β1234=1, β1≥β2≥β3≥β4
If CW={C1, C2 ..., Cm} be process after the set of words that obtains, the semantic similarity adjacency matrix S of its correspondencemFixed Justice is:
Wherein, Sim (C1,C2) it is word C1With word C2Semantic similarity, Sim (Ci,Ci) it is 1, Sim (Ci,Cj)=Sim (Cj,Ci);
Set of words CW={C1, C2 ..., Cm} is calculated phase between m × (1+m)/2 word through semantic similarity Value like degree;
It is as follows that the cooccurrence relation of described word calculates process:
Word co-occurrence model is one of important models of natural language processing research field based on statistical method;According to Term co-occurrence mould Type, if two frequent co-occurrences of word are at the same window unit of document, the two word is to be mutually related in meaning, and they are one Determine to express in degree the semantic information of the text;Utilize sliding window that the word in sequence of terms carries out word co-occurrence degree meter Calculate:
First, sequence of terms is carried out word extraction, i.e. remove space, null and merge identical word, obtain set of words CW={C1, C2 ..., Cm}, wherein m≤n;
Word co-occurrence degree Matrix C m corresponding to set of words CW is defined as:
When Cm is initial, Coo (Ci, Cj) is 01 (1≤i, j≤m);
By sliding window, sequence of terms being carried out word co-occurrence degree to calculate, the word in sliding window is Ti-1TiTi+1(1 < i < n):
1) if i=n-1,4 are turned);If Ti-1Being space or null, sliding window slides to next word, i++;Otherwise, 2 are turned);
2) if TiFor Chinese, then Coo (Ti-1,Ti) ++, turn 3);If TiFor null, turn 3);Otherwise turn 1);
3) if TiChinese, then Coo (Ti-1,Ti+1) ++, i++, turn 1);Otherwise, 1 is turned);
4) if Tn-2It is Chinese, turns 5);Otherwise, 7 are turned)
5) if Tn-1It is Chinese, Coo (Tn-2,Tn-1) ++, turn 6);If Tn-1It is space, turns 6);Otherwise terminate;
6) if TnIt is Chinese, Coo (Tn-2,Tn) ++, terminate;Otherwise terminate;
7) if Tn-1It is Chinese, and TnAlso be Chinese, then Coo (Tn-1,Tn) ++, terminate;Otherwise terminate;
Through the calculating of previous step, obtain word co-occurrence degree Matrix C m, and each element of Cm be normalized, Namely each element is divided by the maximum of all elements in matrix, i.e. max{Coo (Ci,Cj)|1≤i,j≤m};
Described term network is as follows:
When building cum rights term network, first having to obtain the weight matrix of term network, definition weight matrix Wm is:
Wherein, α is 0.3, and β is 0.7, the semantic relation between strengthening word, weakens the cooccurrence relation between word;
WmAs the adjacency matrix that the term network of input is corresponding, then the network of its correspondence is defined as: G={V, E};Wherein scheme G For undirected weighted graph, V represents the vertex set in figure G, and E represents the limit collection in G, viRepresent i-th summit in V;
The calculating process of described word aggregation characteristic value is as follows:
The distribution of key character degree of having, average shortest path length, concentration class and the convergence factor of term network;The degree of node embodies this joint Point associates situation with other node;The node that the concentration class of node and convergence factor are embodied in this node subrange is mutual Connection Density;The degree of node and convergence factor embody this node importance in subrange;Measures and weights, poly-is added by node Collection coefficient and node betweenness calculate the aggregation characteristic value of node, important word can be allowed to give higher weights, ensure again The word related word important with many also has higher scoring;
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, then node viAdd The definition of measures and weights is:
WD i = &Sigma; j = 1 n w i j / n ;
Wherein, wijFor node viWith vjBetween weights on limit, n is total number of node;
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, node viNon-power Measures and weights DiFor Di=| { (vi,vj):(vi,vj)∈E,vi,vj∈V}|;Node viConcentration class KiFor the reality existed between neighbor node Limit, border number: Ti=| { (vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈ V} |, then node vjConvergence factor CiDefinition For:
C i = T i D i 2 = 2 T i / D i ( D i - 1 ) ;
In semantic similarity network, node betweenness Betweenness is between node x and w and shortest path is by joint Point viProbability probability;Pair Analysis between two nonneighbor nodes depends on the node on the shortest path connecting point-to-point transmission, These nodes potential playing the part of controls the role of interactive information stream, B between nodeiEmbody node viDegree of connecting under local environment, Then the definition of node betweenness Betweenness is:
B i = &Sigma; w &Element; G , x &Element; G r v i ( w , x ) d ( w , x ) ;
D (w, x) represents in cum rights semantic similarity network shortest path number between any two node w and x, Represent any two node w and x and through viShortest path number vi∈G;
By node viAverage weighted degree, convergence factor and betweenness Betweenness to be weighted comprehensively weighing the gathering of node special Value indicative, node viAggregation characteristic value ZiDefinition be:
Z i = a &times; WD i + b &times; C i / &Sigma; j = 1 n C j + c &times; B i ;
Wherein, a+b+c=1;
The calculating process of the statistical characteristics of described word is as follows:
Use nonlinear function that word frequency is normalized;Word WiWord frequency weight TFi in the text is defined as:
T F i = f ( W i ) &Sigma; j = 1 n f ( p j ) ;
Wherein, TFi represents word WiWord frequency weight, pjRepresenting certain word in text, f is word frequency statistics function;
Word WiPart of speech weight posi in the text is defined as:
Word is the longest more can reflect concrete information, otherwise, the represented meaning of shorter word is the most abstract;The most in a document Feature word be mostly some specialties academic combination vocabulary, length is longer, and its implication is clearer and more definite, more can reflect text subject;Increase Add the weight of long word, be conducive to vocabulary is split, thus reflect word significance level in a document more accurately;
Word WiLong weight leni of word in the text is defined as:
For each word in sequence of terms, its statistical characteristics is
statsi=A*TFi+B*posi+C*leni
Wherein, A+B+C=1;
Described word WiThe calculating process of criticality is as follows:
Corresponding to each node in weighting term network, its crucial angle value ImpiIt is defined as:
Impi=β * statsi+(1-β)*Zi
Wherein, 0 < β < 1;
The value of criticality will be obtained by calculating, sort from big to small, and set a threshold gamma, 0 < γ < 1, take out first q Value, then these words are using the feature word as science and technology item, and these words fully reflect theme, and are important words Language.
A kind of evaluation expert's intelligent recommendation method towards science and technology item the most according to claim 1, it is characterised in that: step It is as follows that feature merga pass logic xor operation described in rapid 6 carries out process:
(1) pending trial project, the interfield feature of an evaluation expert merge
Assume field feature set of words W'1And W'2Merge, then define W'1And W'2Merge ruleFor:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , f ( word 1 i ) + f ( word 2 i ) 2 } | word 1 i = word 2 j } ;
Wherein, word1i, word2jIt is characterized word;
Add field weight improve and extend above-mentioned definition, the interfield feature of evaluation expert, science and technology item is merged, close And rule is:
W &prime; 1 &CirclePlus; W &prime; 2 = { &ForAll; i , j , { word 1 i , w 1 * f ( word 1 i ) + w 2 * f ( word 2 i ) w 1 2 + w 2 2 } | word 1 i = word 2 j } ;
(2) between the project of packet pending trial project, feature merges
The operation of this merging process, just for the characteristic vector of pending trial science and technology item, is not for evaluation expert's characteristic vector, expert Characteristic vector has only to do interfield feature union operation;If V is (d1) and V (d2) it is that two science and technology items are through interfield respectively Vector model after feature merging, to any t1j∈V(d1), t2j∈V(d2), if there is t1jWith t2jIdentical, merge;It is defined as:
V ( d 1 ) &CirclePlus; V ( d 2 ) = { < t k , w k ( p ) = w i ( d 1 ) + w j ( d 2 ) 2 > } ;
Wherein, k=1 ..., n, tkIt is characterized entry item, wkP () is tkWeight;
The basic process that Knowledge Representation Model produces is as follows:
A). merge science and technology item interfield feature, obtain vector model V (d) of each project;
B). all science and technology item vector model set are used consolidation strategyBy above-mentioned Method, science and technology item is set up the Knowledge Representation Model of vector space of being based on;
V (p)={ < t1,w1(p) >, < t2,w2(p) > ..., < tn,wn(p) > };
Wherein, k=1 ..., n, tkFor project team's Feature Words entry item, wkP () is tkWeight.
CN201310509358.2A 2013-10-24 2013-10-24 Intelligent review expert recommending method for science and technology projects Expired - Fee Related CN103631859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310509358.2A CN103631859B (en) 2013-10-24 2013-10-24 Intelligent review expert recommending method for science and technology projects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310509358.2A CN103631859B (en) 2013-10-24 2013-10-24 Intelligent review expert recommending method for science and technology projects

Publications (2)

Publication Number Publication Date
CN103631859A CN103631859A (en) 2014-03-12
CN103631859B true CN103631859B (en) 2017-01-11

Family

ID=50212901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310509358.2A Expired - Fee Related CN103631859B (en) 2013-10-24 2013-10-24 Intelligent review expert recommending method for science and technology projects

Country Status (1)

Country Link
CN (1) CN103631859B (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823896B (en) * 2014-03-13 2017-02-15 蚌埠医学院 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN104361102B (en) * 2014-11-24 2018-05-11 清华大学 A kind of expert recommendation method and system based on group matches
US20160203140A1 (en) * 2015-01-14 2016-07-14 General Electric Company Method, system, and user interface for expert search based on case resolution logs
CN105912581A (en) * 2016-03-31 2016-08-31 比美特医护在线(北京)科技有限公司 Information processing method and device
CN107194672B (en) * 2016-11-09 2021-07-13 北京理工大学 Review distribution method integrating academic expertise and social network
CN108427667B (en) * 2017-02-15 2021-08-10 北京国双科技有限公司 Legal document segmentation method and device
CN107229738B (en) * 2017-06-18 2020-04-03 杭州电子科技大学 Academic paper search ordering method based on document scoring model and relevancy
CN107609006B (en) * 2017-07-24 2021-01-29 华中师范大学 Search optimization method based on local log research
CN107656920B (en) * 2017-09-14 2020-12-18 杭州电子科技大学 Scientific and technological talent recommendation method based on patents
CN107784087B (en) * 2017-10-09 2020-11-06 东软集团股份有限公司 Hot word determination method, device and equipment
CN107807978B (en) * 2017-10-26 2021-07-06 北京航空航天大学 Code reviewer recommendation method based on collaborative filtering
CN108229684B (en) * 2018-01-26 2022-04-15 中国科学技术信息研究所 Method and device for constructing expert knowledge vector model and terminal equipment
CN108399491B (en) * 2018-02-02 2021-10-29 浙江工业大学 Employee diversity ordering method based on network graph
CN108804633B (en) * 2018-06-01 2021-10-08 腾讯科技(深圳)有限公司 Content recommendation method based on behavior semantic knowledge network
CN108549730A (en) * 2018-06-01 2018-09-18 云南电网有限责任公司电力科学研究院 A kind of search method and device of expert info
CN108846056B (en) * 2018-06-01 2021-04-23 云南电网有限责任公司电力科学研究院 Scientific and technological achievement review expert recommendation method and device
CN108920556B (en) * 2018-06-20 2021-11-19 华东师范大学 Expert recommending method based on discipline knowledge graph
CN108873706B (en) * 2018-07-30 2022-04-15 中国石油化工股份有限公司 Trap evaluation intelligent expert recommendation method based on deep neural network
CN109308315B (en) * 2018-10-19 2022-09-16 南京理工大学 Collaborative recommendation method based on similarity and incidence relation of expert fields
CN109857872A (en) * 2019-02-18 2019-06-07 浪潮软件集团有限公司 The information recommendation method and device of knowledge based map
CN109992642B (en) * 2019-03-29 2022-11-18 华南理工大学 Single task expert automatic selection method and system based on scientific and technological entries
CN110046225B (en) * 2019-04-16 2020-11-24 广东省科技基础条件平台中心 Scientific and technological project material integrity assessment decision model training method
CN112182327B (en) * 2019-07-05 2024-06-14 北京猎户星空科技有限公司 Data processing method, device, equipment and medium
CN110442618B (en) * 2019-07-25 2023-04-18 昆明理工大学 Convolutional neural network review expert recommendation method fusing expert information association relation
CN110443574B (en) * 2019-07-25 2023-04-07 昆明理工大学 Recommendation method for multi-project convolutional neural network review experts
CN111143690A (en) * 2019-12-31 2020-05-12 中国电子科技集团公司信息科学研究院 Expert recommendation method and system based on associated expert database
CN111598526B (en) * 2020-04-21 2023-02-03 奇计(江苏)科技服务有限公司 Intelligent comparison review method for describing scientific and technological innovation content
CN111666420B (en) * 2020-05-29 2021-02-26 华东师范大学 Method for intensively extracting experts based on subject knowledge graph
CN111951141A (en) * 2020-07-09 2020-11-17 广东港鑫科技有限公司 Double-random supervision method and system based on big data intelligent analysis and terminal equipment
CN111782797A (en) * 2020-07-13 2020-10-16 贵州省科技信息中心 Automatic matching method for scientific and technological project review experts and storage medium
CN112100370B (en) * 2020-08-10 2023-07-25 淮阴工学院 Picture-trial expert combination recommendation method based on text volume and similarity algorithm
CN112287679A (en) * 2020-10-16 2021-01-29 国网江西省电力有限公司电力科学研究院 Structured extraction method and system for text information in scientific and technological project review
CN112381381B (en) * 2020-11-12 2023-11-17 深圳供电局有限公司 Expert's device is recommended to intelligence
CN112487260A (en) * 2020-12-07 2021-03-12 上海市研发公共服务平台管理中心 Instrument project declaration and review expert matching method, device, equipment and medium
CN112417870A (en) * 2020-12-10 2021-02-26 北京中电普华信息技术有限公司 Expert information screening method and system
CN112948527B (en) * 2021-02-23 2023-06-16 云南大学 Improved TextRank keyword extraction method and device
CN113554210A (en) * 2021-05-17 2021-10-26 南京工程学院 Comment scoring and declaration prediction system and method for fund project declaration
CN113255364A (en) * 2021-05-28 2021-08-13 华斌 Multi-expert opinion machine integration method for government affair informatization project based on knowledge fusion
CN113516094B (en) * 2021-07-28 2024-03-08 中国科学院计算技术研究所 System and method for matching and evaluating expert for document
CN113569575B (en) * 2021-08-10 2024-02-09 云南电网有限责任公司电力科学研究院 Evaluation expert recommendation method based on pictographic-semantic dual-feature space mapping
CN113643008A (en) * 2021-10-15 2021-11-12 中国铁道科学研究院集团有限公司科学技术信息研究所 Acceptance expert matching method, device, equipment and readable storage medium
CN114186002A (en) * 2021-12-14 2022-03-15 智博天宫(苏州)人工智能产业研究院有限公司 Scientific and technological achievement data processing and analyzing method and system
CN115033772B (en) * 2022-06-20 2024-06-21 浙江大学 Creative excitation method and device based on semantic network
CN115577696B (en) * 2022-11-15 2023-04-07 四川省公路规划勘察设计研究院有限公司 Project similarity evaluation and analysis method based on WBS tree
CN116303642A (en) * 2023-02-07 2023-06-23 中国计量科学研究院 Method and device for optimizing and avoiding test expert in scientific and technological achievement test
CN117093670A (en) * 2023-07-18 2023-11-21 北京智信佳科技有限公司 Method for realizing intelligent recommending expert in paper
CN117034273A (en) * 2023-08-28 2023-11-10 山东省计算中心(国家超级计算济南中心) Android malicious software detection method and system based on graph rolling network
CN117131279A (en) * 2023-09-13 2023-11-28 合肥工业大学 Data processing method and device for expert recommendation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075942A (en) * 2007-06-22 2007-11-21 清华大学 Method and system for processing social network expert information based on expert value progation algorithm
CN102495860A (en) * 2011-11-22 2012-06-13 北京大学 Expert recommendation method based on language model
CN102855241A (en) * 2011-06-28 2013-01-02 上海迈辉信息技术有限公司 Multi-index expert suggestion system and realization method thereof
CN102880657A (en) * 2012-08-31 2013-01-16 电子科技大学 Expert recommending method based on searcher

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075942A (en) * 2007-06-22 2007-11-21 清华大学 Method and system for processing social network expert information based on expert value progation algorithm
CN102855241A (en) * 2011-06-28 2013-01-02 上海迈辉信息技术有限公司 Multi-index expert suggestion system and realization method thereof
CN102495860A (en) * 2011-11-22 2012-06-13 北京大学 Expert recommendation method based on language model
CN102880657A (en) * 2012-08-31 2013-01-16 电子科技大学 Expert recommending method based on searcher

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
科技项目评审专家推荐系统的研究与实现;胡斌;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20130715(第7期);全文 *

Also Published As

Publication number Publication date
CN103631859A (en) 2014-03-12

Similar Documents

Publication Publication Date Title
CN103631859B (en) Intelligent review expert recommending method for science and technology projects
Saad et al. Twitter sentiment analysis based on ordinal regression
Desai et al. Techniques for sentiment analysis of Twitter data: A comprehensive survey
CN108363790A (en) For the method, apparatus, equipment and storage medium to being assessed
CN107609121A (en) Newsletter archive sorting technique based on LDA and word2vec algorithms
CN106599065B (en) Food safety network public opinion early warning system based on Storm distributed framework
CN110929034A (en) Commodity comment fine-grained emotion classification method based on improved LSTM
CN106250438A (en) Based on random walk model zero quotes article recommends method and system
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN108304509B (en) Junk comment filtering method based on text multi-directional expression mutual learning
CN110598219A (en) Emotion analysis method for broad-bean-net movie comment
CN104298732B (en) The personalized text sequence of network-oriented user a kind of and recommendation method
CN111159396B (en) Method for establishing text data classification hierarchical model facing data sharing exchange
CN103631858A (en) Science and technology project similarity calculation method
CN108363784A (en) A kind of public sentiment trend estimate method based on text machine learning
CN105930509A (en) Method and system for automatic extraction and refinement of domain concept based on statistics and template matching
Zhao et al. Multi-layer features ablation of BERT model and its application in stock trend prediction
Pouromid et al. ParsBERT post-training for sentiment analysis of tweets concerning stock market
CN109918648A (en) A kind of rumour depth detection method based on the scoring of dynamic sliding window feature
Zhang et al. A hybrid neural network approach for fine-grained emotion classification and computing
Kanev et al. Sentiment analysis of multilingual texts using machine learning methods
Zhang Application of data mining technology in the analysis of e-commerce emotional law
Su et al. An improved BERT method for the evolution of network public opinion of major infectious diseases: Case Study of COVID-19
CN104572613A (en) Data processing device, data processing method and program
CN106599304B (en) Modular user retrieval intention modeling method for small and medium-sized websites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140312

Assignee: Hangzhou eddy current technology Co.,Ltd.

Assignor: HANGZHOU DIANZI University

Contract record no.: X2020330000008

Denomination of invention: Intelligent review expert recommending method for science and technology projects

Granted publication date: 20170111

License type: Common License

Record date: 20200117

EE01 Entry into force of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170111

CF01 Termination of patent right due to non-payment of annual fee