A kind of evaluation expert's intelligent recommendation method towards science and technology item
Technical field
The invention belongs to expert's recommended technology field, particularly relate to a kind of science and technology item evaluation expert based on network service
Intelligent recommendation method, it is a kind of intelligent method assisting science and technology item Authorize to Invest.
Background technology
Along with technological project management system is popularized rapidly in each functional department of China, the evaluation of science and technology item was from the past
Concentration conference model develop into current network schemer, broken the restriction of expert region in evaluation.Evaluation expert's root
According to domain knowledge and the subsidy standard of subsidy mechanism, appraiseing project application book through discussion, that subsidizes mechanism foundation expert appraises feelings through discussion
Condition decides whether to subsidize.
At present the expert towards science and technology item recommends the most only subjective consciousness with project manager to recommend expert to treat
Careful project is evaluated, and a pending trial project generally requires multiple expert and evaluates, and artificial recommendation expert certainly will exist efficiency
The problems such as the highest, workload big, shortage science, the expert selected is not most suitable.Therefore, science and technology item is commented
Examining the research that expert intelligence recommends is non-the normally off key, can effectively alleviate expert and not mate with the commented contents of a project etc. and to ask
Topic, is greatly promoted the community service ability of science and technology item evaluation.
Intelligent recommendation technology now, such as collaborative filtering recommending, content-based recommendation etc., mostly applies and recommends net at video display
Stand, commercial product recommending website, rarely have the research in science and technology item evaluation expert's information bank and application, due to the limit of specific area
System, for science and technology item intelligent recommendation expert's technology and general recommended technology or distinguishing: first, technological project management system
The recommendation of system relates to all trades and professions, and domain knowledge is extremely complex;Secondly, the recommendation of science and technology item evaluation expert relates to science and technology item
Purpose sustentation fund, the requirement to objectivity, fairness and accuracy that expert recommends is the highest.The most in this respect,
China also lacks systematized guide for method and ripe technical support.And information text has features such as " semi-structured ", specially
The content of family's information and pending trial science and technology item information may be matched, and the present invention makes full use of architectural feature and word
Semantic information calculates the information similarity of project and expert.If similarity is higher, then it represents that this project is familiar with by expert, and generation pushes away
Recommend specialist list project is evaluated.Present invention simultaneously provides a kind of decision support system recommending evaluation expert for science and technology item
System (Decision Support System, DSS), is assigned to evaluation expert the project that domain knowledge matches and carries out science
Evaluation so that auxiliary expert (decision-making user) realizes the decision-making of science, and aid decision making user improves level of decision-making and quality, makes to comment
Examine more scientific and objectivity.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, it is provided that a kind of evaluation expert intelligent recommendation side towards science and technology item
Method.
The present invention comprises the steps: towards evaluation expert's recommendation process of science and technology item
Step 1. disables dictionary using the general term in science and technology item and expert info and usual word as specialty;Punctuate is accorded with
Number, non-Chinese character is as cutting signature library.
Step 2. carries out participle to science and technology item information, expert info: according to cutting labelling in science and technology item information, by item
The information such as mesh title, main research, technical specification are cut into substring sequence;According to cutting labelling in evaluation expert's information,
Project that extraction expert info, prize-winning situation, invention situation, the situation that publishes thesis, problem undertook and performance, research side
To etc. information be cut into substring sequence, that is one field information of a sub-string sequence;Utilize Chinese Academy of Sciences ICTCLAS antithetical phrase string sequence
Carry out participle.
Step 3. science and technology item feature word extracts: utilizes and general disables dictionary and specialty disables dictionary and stops participle
Word is filtered, and the general dictionary that disables uses Harbin Institute of Technology to disable vocabulary, using the word segmentation result removing stop words as a word collection
Close.
It is the most perfect process of a self study that specialty disables the structure of dictionary, constantly adds up during information participle
The word frequency of word, the probability that word occurs at text is more than certain threshold value, it is brought into and disable dictionary.
Science and technology item quantity of information is relatively big, and set of words is carried out Semantic Similarity Measurement between word, according to the semantic pass of word
The cooccurrence relation of system and word builds term network, calculates the word aggregation characteristic value in network;Statistics then in conjunction with word is special
Value indicative, the criticality calculating word extracts science and technology item feature word;The feature word of science and technology item extracts comprehensively exactly
The statistical nature information of text and semantic feature information, more accurately extract feature word.
Described Semantic Similarity Measurement process is as follows:
In knowing net semantic dictionary, if for two word W1And W2, W1There is a n concept: S11, S12 ..., S1n, W2
There is a m concept: S21, S22 ..., S2m.Word W1And W2Similarity SimSEM (W1, W2) equal to the similarity of each concept
Maximum:
Notional word and function word have different description language, need to calculate the syntax justice of its correspondence is former or relation adopted former between
Similarity.Notional word concept includes that the first basic meaning is former, other basic meanings are former, the adopted former description of relation, relational symbol describe, similarity
It is designated as Sim1 (p respectively1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2).The Similarity Measure of two feature structures
Finally revert to the Similarity Measure of the former or concrete word of basic meaning.
βi(1≤i≤4) are adjustable parameters, and have: β1+β2+β3+β4=1, β1≥β2≥β3≥β4。
If CW={C1, C2 ..., Cm} be process after the set of words that obtains, the semantic similarity adjacency matrix of its correspondence
SmIt is defined as:
Wherein, Sim (C1,C2) it is word C1With word C2Semantic similarity, Sim (Ci,Ci) it is 1, Sim (Ci,Cj)=Sim
(Cj,Ci)。
Set of words CW={C1, C2 ..., Cm} is calculated m × (1+m)/2 word through semantic similarity
Between the value of similarity.
It is as follows that the cooccurrence relation of described word calculates process:
Word co-occurrence model is one of important models of natural language processing research field based on statistical method.According to word altogether
Existing model, if two frequent co-occurrences of word are at the same window unit (such as a word, a paragragh etc.) of document, the two word exists
Being to be mutually related in meaning, they express the semantic information of the text to a certain extent.Utilize sliding window (sliding window
A length of 3) word in sequence of terms is carried out word co-occurrence degree calculate, sliding window as shown in Figure 1:
First, sequence of terms is carried out word extraction, i.e. remove space, null and merge identical word, obtain word
Set CW={C1, C2 ..., Cm}, wherein m≤n.
Word co-occurrence degree Matrix C m corresponding to set of words CW is defined as:
When Cm is initial, Coo (Ci, Cj) is 01 (1≤i, j≤m).
By sliding window, sequence of terms being carried out word co-occurrence degree to calculate, the word in sliding window is Ti-1TiTi+1(1<i
< n):
1) if i=n-1,4 are turned);If Ti-1Being space or null, sliding window slides to next word, i++;Otherwise, 2 are turned).
2) if TiFor Chinese, then Coo (Ti-1,Ti) ++, turn 3);If TiFor null, turn 3);Otherwise turn 1).
3) if TiChinese, then Coo (Ti-1,Ti+1) ++, i++, turn 1);Otherwise, 1 is turned).
4) if Tn-2It is Chinese, turns 5);Otherwise, 7 are turned)
5) if Tn-1It is Chinese, Coo (Tn-2,Tn-1) ++, turn 6);If Tn-1It is space, turns 6);Otherwise terminate.
6) if TnIt is Chinese, Coo (Tn-2,Tn) ++, terminate;Otherwise terminate.
7) if Tn-1It is Chinese, and TnAlso be Chinese, then Coo (Tn-1,Tn) ++, terminate;Otherwise terminate.
Through the calculating of previous step, obtain word co-occurrence degree Matrix C m, and each element of Cm is normalized
Processing, namely each element is divided by the maximum of all elements in matrix, i.e. max{Coo (Ci,Cj)|1≤i,j≤m}。
Described term network is as follows:
When building cum rights term network, first having to obtain the weight matrix of term network, definition weight matrix Wm is:
Wherein, α is 0.3, and β is 0.7, the semantic relation between strengthening word, weakens the cooccurrence relation between word.
WmAs the adjacency matrix that the term network of input is corresponding, then the network of its correspondence is defined as: G={V, E};Its
Middle figure G is undirected weighted graph, and V represents the vertex set in figure G, and E represents the limit collection in G, viRepresent i-th summit (word) in V.
The calculating process of described word aggregation characteristic value is as follows:
The distribution of key character degree of having, average shortest path length, concentration class and the convergence factor of term network.The degree of node embodies
This node associates situation with other node.The concentration class of node and convergence factor are embodied in the node in this node subrange
It is connected with each other density.The degree of node and convergence factor embody this node importance in subrange.The present invention passes through node
Add measures and weights, convergence factor and node betweenness to calculate the aggregation characteristic value of node, important word can be allowed to give higher
Weights, ensure that again the word related word important with many also has higher scoring.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, then node vi
The definition adding measures and weights be:
Wherein, wijFor node viWith vjBetween weights on limit, n is total number of node.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, node vi's
Non-power measures and weights DiFor Di=| { (vi,vj):(vi,vj)∈E,vi,vj∈V}|;Node viConcentration class KiFor existing between neighbor node
Actual limit number: Ti=| { (vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈ V} |, then node vjConvergence factor Ci's
It is defined as:
In semantic similarity network, node betweenness Betweenness is between node x and w and shortest path leads to
Cross node viProbability probability.Pair Analysis between two nonneighbor nodes depends on the joint on the shortest path connecting point-to-point transmission
Point, these nodes are potential plays the part of the role of interactive information stream, B between control nodeiEmbody node viConnecting under local environment
Degree, then the definition of node betweenness Betweenness is:
D (w, x) represents in cum rights semantic similarity network shortest path number between any two node w and x,Represent any two node w and x and through vi(vi∈ G) shortest path number.
By node viAverage weighted degree, convergence factor and betweenness Betweenness be weighted comprehensively weighing node
Aggregation characteristic value, node viAggregation characteristic value ZiDefinition be:
Wherein, a+b+c=1.
The calculating process of the statistical characteristics of described word is as follows:
Use nonlinear function that word frequency is normalized.Word WiWord frequency weight TFi in the text is defined as:
Wherein, TFi represents word WiWord frequency weight, pjRepresenting certain word in text, f is word frequency statistics function.
Can identify text characteristics in Chinese text is usually notional word, such as noun, verb, adjective etc..And interjection, Jie
Feature word, to determining that text categories is the most nonsensical, can be extracted and bring the biggest interference by the function word such as word, conjunction.Word Wi?
Part of speech weight posi in text is defined as:
Word is the longest more can reflect concrete information, otherwise, the represented meaning of shorter word is the most abstract.Especially at literary composition
Mostly feature word in Dang is some specialties academic combination vocabulary, and length is longer, and its implication is clearer and more definite, more can reflect text master
Topic.Increase the weight of long word, be conducive to vocabulary is split, thus reflect word important journey in a document more accurately
Degree.
Word WiLong weight leni of word in the text is defined as:
For each word in sequence of terms, its statistical characteristics is
statsi=A*TFi+B*posi+C*leni
Wherein, A+B+C=1.
Described word WiThe calculating process of criticality is as follows:
Corresponding to each node in weighting term network, its crucial angle value ImpiIt is defined as:
Impi=β * statsi+(1-β)*Zi
Wherein, 0 < β < 1.
The value of criticality will be obtained by calculating, and sort from big to small, set a threshold gamma (0 < γ < 1), before taking-up
The value of q, then these words are using the feature word as science and technology item, and these words fully reflect theme, and are heavier
The word wanted.
Step 4. evaluation expert's feature word extracts: evaluation expert's quantity of information is few compared with science and technology item information, science and technology item
Feature Words builds network and based on statistical nature and the extractive technique of semantic feature, is not suitable for the feature word of evaluation expert's information
Extract, directly disable dictionary and specialty disables dictionary and carries out stop words filtration according to general, extract the feature word set of each expert
Closing, general to disable dictionary and be also be to use Harbin Institute of Technology to disable vocabulary, and specialty disables dictionary needs personnel constantly to safeguard.
Step 5. builds point field Knowledge Representation Model of science and technology item, evaluation expert: by vector space model and
Matter-element Knowledge Set model is extended, according to the different field information in science and technology item set up text representation model PRO=(id,
F, WF, T, V), wherein id represents the identification field in project library;F represents field category set in science and technology item;WF is field
Weight;T is characterized word;V represents the word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f
(vi2),...,vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijCorresponding to key word
Frequency.The representation of knowledge of science and technology item information is as follows:
In like manner, Knowledge Representation Model TM=(id, F, WF, T, V) is set up according to the different field information in expert.Wherein,
Id represents the identification field in experts database;F represents field category set in evaluation expert;WF is the weight set of field;T is
Feature word;V represents the feature word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f(vi2),...,
vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijFeature word is at corresponding word
The frequency of occurrences in Duan.The representation of knowledge of evaluation expert's information is:
Evaluation expert's information index storehouse builds: after evaluation Expert Knowledge Expression model construction completes, entered by information index
Storehouse: first read the content item information of an evaluation expert from experts database;Phrase semantic network is set up also based on word segmentation result
Extract the Feature Words that evaluation expert is comprised;Foundation Knowledge Representation Model also utilizes Apache Lucene that it is set up index;Will
The index established is added in the index database of correspondence by generic, until all of evaluation expert indexes warehouse-in.
Step 6: according to the number of project, it is recommended that mode is divided into single pending trial project recommendation expert and packet (multiple) pending trial
Project recommendation expert.Packet recommends expert that the pending trial project knowledge of step 5 being represented, model does between corresponding interfield and project
Feature union operation, single pending trial expert recommend only do corresponding interfield feature union operation.Meanwhile, commenting step 5
The Knowledge Representation Model examining expert carries out interfield feature merging.Foundation Knowledge Representation Model also utilizes Apache Lucene couple
Characteristic information after merging sets up index.Wherein, science and technology item index construct is carried out when carrying out project recommendation.
In science and technology item declaration management system, pending trial project needs packet to recommend often, features described above union operation,
Guarantee removal process 5 to arrange different field weight that Similarity Measure is produced the contribution recommended is poor by Knowledge Representation Model
Different.
Described pending trial project, that the feature merga pass logic xor operation of evaluation expert carries out process is as follows:
(1) pending trial project, the interfield feature of an evaluation expert merge
Assume field feature set of words W'1And W'2Merge, then define W'1And W'2Merge ruleFor:
Wherein, word1i, word2jIt is characterized word.
Add field weight improve and extend above-mentioned definition, the interfield feature of evaluation expert, science and technology item is closed
And, merging rule is:
(2) between the project of packet pending trial project, feature merges
The operation of this merging process, just for the characteristic vector of pending trial science and technology item, is not for evaluation expert's characteristic vector,
Expert features vector has only to do interfield feature union operation.If V is (d1) and V (d2) it is that two science and technology items are through word respectively
Vector model after the merging of intersegmental feature, to any t1j∈V(d1), t2j∈V(d2), if there is t1jWith t2jIdentical, merge.It is defined as:
Wherein, k=1 ..., n, tkIt is characterized entry item, wkP () is tkWeight.
The basic process that the Knowledge Representation Model of science and technology item group produces is as follows:
A). merge science and technology item interfield feature, obtain vector model V (d) of each project;
B). all science and technology item vector model set are used consolidation strategyLogical
Cross above-mentioned method, science and technology item is set up the Knowledge Representation Model of the vector space that is based on.
V (p)={ < t1,w1(p) >, < t2,w2(p) > ..., < tn,wn(p) > }
Wherein, k=1 ..., n, tkFor project team's Feature Words entry item, wkP () is tkWeight.
Step 7. is closed through the interfield feature of the evaluation expert of step 6 and the Knowledge Representation Model of science and technology item
And, it is assumed that if evaluation expert's information vector is expressed as P={s1,f(s1),s2,f(s2),...,sn,f(sn), science and technology item information
(group) vector representation is Q={t1,f(t1),t2,f(t2),...,tn,f(tn), calculate pending trial science and technology based on maximum matching algorithm
Project (group) vector and the semantic similarity of evaluation expert.
Step 8. arranges similarity and blocks, and produces according to the size of similarity and recommends index, produces final recommendation evaluation
Specialist list.
The present invention has the beneficial effect that:
Can recommend quickly and conveniently, intelligently, accurately science and technology project appraisal expert;Can significantly alleviate science and technology item
The mesh declaration management system scientific worker distribution task to evaluation expert, reduces the cost of management;Ensure that evaluation
Expert and pending trial science and technology item have higher field matching degree, it is ensured that the evaluation of project is accomplished objectivity, public affairs by evaluation expert
Positivity and science, it is provided that automatic, efficient, just decision support, it is to avoid science and technology item examination & approval occur human feelings network of personal connections,
The problem that the examination & approval such as " Matthew effect " are improper.
Accompanying drawing explanation
Fig. 1 is to carry out word co-occurrence degree in the present invention to calculate sliding window.
Fig. 2 is maximum matching algorithm principle schematic based on bigraph (bipartite graph) in the present invention.
Fig. 3 is the evaluation expert's intelligent recommendation method flow diagram in the present invention towards science and technology item.
Fig. 4 is the extraction flow chart of the Feature Words of science and technology item and evaluation expert's information in the present invention.
Fig. 5 is that in the present invention, evaluation expert's knowledge index storehouse builds flow chart.
Detailed description of the invention
The invention will be further described below in conjunction with the accompanying drawings, it should be emphasised that be that the description below is merely exemplary,
Rather than in order to limit the scope of the present invention and application thereof.Hereinafter the detailed description of the invention of the present invention is described in further detail, base
Embodiment in invention, the every other enforcement that those of ordinary skill in the art are obtained under not having creative work premise
Example, broadly falls into protection scope of the present invention.
As it is shown on figure 3, the main thought of the recommendation method of the present invention is: (1) is in science and technology item declaration management system
Expert info and pending trial science and technology item information, main text dividing is become substring sequence and carries out Chinese Academy of Sciences's ICTCLAS participle,
Word segmentation result is carried out stop words and is filtrated to get set of words;(2) science and technology item information includes main research, technical specification
Etc. information, quantity of information is relatively big, and the cooccurrence relation inventing the semantic relation according to word and word builds term network, and calculates word net
The node rendezvous eigenvalue of network, with statistical characteristics weighted calculation word criticality, extracts the Feature Words of each science and technology item;
(3) expert info is than science and technology item information reduction, and quantity of information is less, directly by each expert info word collection through being filtrated to get
Cooperation is characterized word;(4) field weight is set according to the importance difference of science and technology item, expert's field information, according to (2) and
(3) Feature Words obtained, builds the Knowledge Representation Model for project and expert respectively, builds expert's index database;(5) packet pushes away
Recommend expert model pending trial project knowledge and represent that model is the feature union operation between interfield and project, single pending trial project expert
Recommend only to do interfield feature union operation.Expert Knowledge Expression model does interfield feature simultaneously merge.(6) consider
Word has the feature of Semantic fuzzy matching, calculates the similarity of expert info and pending trial science and technology item information, by setting threshold
Value blocks the consequently recommended specialist list of generation.
Step 1. disables dictionary using the general term in science and technology item and expert info and usual word as specialty;Punctuate is accorded with
Number, non-Chinese character is as cutting signature library.
Step 2. carries out participle to science and technology item information, expert info: according to cutting labelling in science and technology item information, by item
The information such as mesh title, main research, technical specification are cut into substring sequence;According to cutting labelling in evaluation expert's information,
Project that extraction expert info, prize-winning situation, invention situation, the situation that publishes thesis, problem undertook and performance, research side
To etc. information be cut into substring sequence, that is one field information of a sub-string sequence;Utilize Chinese Academy of Sciences ICTCLAS antithetical phrase string sequence
Carry out participle.
Step 3. science and technology item feature word extracts: utilizes and general disables dictionary and specialty disables dictionary and stops participle
Word is filtered, and the general dictionary that disables uses Harbin Institute of Technology to disable vocabulary, using the word segmentation result removing stop words as a word collection
Close, see Fig. 4.
It is the most perfect process of a self study that specialty disables the structure of dictionary, constantly adds up during information participle
The word frequency of word, the probability that word occurs at text is more than certain threshold value, it is brought into and disable dictionary.
Science and technology item quantity of information is relatively big, and set of words is carried out Semantic Similarity Measurement between word, according to the semantic pass of word
The cooccurrence relation of system and word builds term network, calculates the word aggregation characteristic value in network;Statistics then in conjunction with word is special
Value indicative, the criticality calculating word extracts science and technology item feature word;The feature word of science and technology item extracts comprehensively exactly
The statistical nature information of text and semantic feature information, more accurately extract feature word.
Described Semantic Similarity Measurement process is as follows:
In knowing net semantic dictionary, if for two word W1And W2, W1There is a n concept: S11, S12 ..., S1n, W2
There is a m concept: S21, S22 ..., S2m.Word W1And W2Similarity SimSEM (W1, W2) equal to the similarity of each concept
Maximum:
Notional word and function word have different description language, need to calculate the syntax justice of its correspondence is former or relation adopted former between
Similarity.Notional word concept includes that the first basic meaning is former, other basic meanings are former, the adopted former description of relation, relational symbol describe, similarity
It is designated as Sim1 (p respectively1,p2)、Sim2(p1,p2)、Sim3(p1,p2)、Sim4(p1,p2).The Similarity Measure of two feature structures
Finally revert to the Similarity Measure of the former or concrete word of basic meaning.
βi(1≤i≤4) are adjustable parameters, and have: β1+β2+β3+β4=1, β1≥β2≥β3≥β4。
If CW={C1, C2 ..., Cm} be process after the set of words that obtains, the semantic similarity adjacency matrix of its correspondence
SmIt is defined as:
Wherein, Sim (C1,C2) it is word C1With word C2Semantic similarity, Sim (Ci,Ci) it is 1, Sim (Ci,Cj)=Sim
(Cj,Ci)。
Set of words CW={C1, C2 ..., Cm} is calculated m × (1+m)/2 word through semantic similarity
Between the value of similarity.
It is as follows that the cooccurrence relation of described word calculates process:
Word co-occurrence model is one of important models of natural language processing research field based on statistical method.According to word altogether
Existing model, if two frequent co-occurrences of word are at the same window unit (such as a word, a paragragh etc.) of document, the two word exists
Being to be mutually related in meaning, they express the semantic information of the text to a certain extent.Utilize sliding window (sliding window
A length of 3) word in sequence of terms is carried out word co-occurrence degree calculate, sliding window as shown in Figure 1:
First, sequence of terms is carried out word extraction, i.e. remove space, null and merge identical word, obtain word
Set CW={C1, C2 ..., Cm}, wherein m≤n.
Word co-occurrence degree Matrix C m corresponding to set of words CW is defined as:
When Cm is initial, Coo (Ci, Cj) is 01 (1≤i, j≤m).
By sliding window, sequence of terms being carried out word co-occurrence degree to calculate, the word in sliding window is Ti-1TiTi+1(1<i
< n):
1) if i=n-1,4 are turned);If Ti-1Being space or null, sliding window slides to next word, i++;Otherwise, 2 are turned).
2) if TiFor Chinese, then Coo (Ti-1,Ti) ++, turn 3);If TiFor null, turn 3);Otherwise turn 1).
3) if TiChinese, then Coo (Ti-1,Ti+1) ++, i++, turn 1);Otherwise, 1 is turned).
4) if Tn-2It is Chinese, turns 5);Otherwise, 7 are turned)
5) if Tn-1It is Chinese, Coo (Tn-2,Tn-1) ++, turn 6);If Tn-1It is space, turns 6);Otherwise terminate.
6) if TnIt is Chinese, Coo (Tn-2,Tn) ++, terminate;Otherwise terminate.
7) if Tn-1It is Chinese, and TnAlso be Chinese, then Coo (Tn-1,Tn) ++, terminate;Otherwise terminate.
Through the calculating of previous step, obtain word co-occurrence degree Matrix C m, and each element of Cm is normalized
Processing, namely each element is divided by the maximum of all elements in matrix, i.e. max{Coo (Ci,Cj)|1≤i,j≤m}。
Described term network is as follows:
When building cum rights term network, first having to obtain the weight matrix of term network, definition weight matrix Wm is:
Wherein, α is 0.3, and β is 0.7, the semantic relation between strengthening word, weakens the cooccurrence relation between word.
WmAs the adjacency matrix that the term network of input is corresponding, then the network of its correspondence is defined as: G={V, E};Its
Middle figure G is undirected weighted graph, and V represents the vertex set in figure G, and E represents the limit collection in G, viRepresent i-th summit (word) in V.
The calculating process of described word aggregation characteristic value is as follows:
The distribution of key character degree of having, average shortest path length, concentration class and the convergence factor of term network.The degree of node embodies
This node associates situation with other node.The concentration class of node and convergence factor are embodied in the node in this node subrange
It is connected with each other density.The degree of node and convergence factor embody this node importance in subrange.The present invention passes through node
Add measures and weights, convergence factor and node betweenness to calculate the aggregation characteristic value of node, important word can be allowed to give higher
Weights, ensure that again the word related word important with many also has higher scoring.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, then node vi
The definition adding measures and weights be:
Wherein, wijFor node viWith vjBetween weights on limit, n is total number of node.
In semantic similarity network, unordered couple (vi,vj) represent node viWith vjBetween limit, node vi's
Non-power measures and weights DiFor Di=| { (vi,vj):(vi,vj)∈E,vi,vj∈V}|;Node viConcentration class KiFor existing between neighbor node
Actual limit number: Ti=| { (vj,vk):(vi,vk)∈E,(vj,vk)∈E,vi,vj∈ V} |, then node vjConvergence factor Ci's
It is defined as:
In semantic similarity network, node betweenness Betweenness is between node x and w and shortest path leads to
Cross node viProbability probability.Pair Analysis between two nonneighbor nodes depends on the joint on the shortest path connecting point-to-point transmission
Point, these nodes are potential plays the part of the role of interactive information stream, B between control nodeiEmbody node viConnecting under local environment
Degree, then the definition of node betweenness Betweenness is:
D (w, x) represents in cum rights semantic similarity network shortest path number between any two node w and x,Represent any two node w and x and through vi(vi∈ G) shortest path number.
By node viAverage weighted degree, convergence factor and betweenness Betweenness be weighted comprehensively weighing node
Aggregation characteristic value, aggregation characteristic value Z of node viiDefinition be:
Wherein, a+b+c=1.
The calculating process of the statistical characteristics of described word is as follows:
Use nonlinear function that word frequency is normalized.Word WiWord frequency weight TFi in the text is defined as:
Wherein, TFi represents word WiWord frequency weight, pjRepresenting certain word in text, f is word frequency statistics function.
Can identify text characteristics in Chinese text is usually notional word, such as noun, verb, adjective etc..And interjection, Jie
Feature word, to determining that text categories is the most nonsensical, can be extracted and bring the biggest interference by the function word such as word, conjunction.Word Wi?
Part of speech weight posi in text is defined as:
Word is the longest more can reflect concrete information, otherwise, the represented meaning of shorter word is the most abstract.Especially at literary composition
Mostly feature word in Dang is some specialties academic combination vocabulary, and length is longer, and its implication is clearer and more definite, more can reflect text master
Topic.Increase the weight of long word, be conducive to vocabulary is split, thus reflect word important journey in a document more accurately
Degree.
Word WiLong weight leni of word in the text is defined as:
For each word in sequence of terms, its statistical characteristics is
statsi=A*TFi+B*posi+C*leni
Wherein, A+B+C=1.
Described word WiThe calculating process of criticality is as follows:
Corresponding to each node in weighting term network, its crucial angle value ImpiIt is defined as:
Impi=β * statsi+(1-β)*Zi
Wherein, 0 < β < 1.
The value of criticality will be obtained by calculating, and sort from big to small, set a threshold gamma (0 < γ < 1), before taking-up
The value of q, then these words are using the feature word as science and technology item, and these words fully reflect theme, and are heavier
The word wanted.
Step 4. evaluation expert's feature word extracts: evaluation expert's quantity of information is few compared with science and technology item information, science and technology item
Feature Words builds network and based on statistical nature and the extractive technique of semantic feature, is not suitable for the feature word of evaluation expert's information
Extract, directly disable dictionary and specialty disables dictionary and carries out stop words filtration according to general, extract the feature word set of each expert
Closing, general to disable dictionary and be also be to use Harbin Institute of Technology to disable vocabulary, and specialty disables dictionary needs personnel constantly to safeguard.
Step 5. builds point field Knowledge Representation Model of science and technology item, evaluation expert: by vector space model and
Matter-element Knowledge Set model is extended, according to the different field information in science and technology item set up text representation model PRO=(id,
F, WF, T, V), wherein id represents the identification field in project library;F represents field category set in science and technology item;WF is field
Weight;T is characterized word;V represents the word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f
(vi2),...,vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijCorresponding to key word
Frequency.The representation of knowledge of science and technology item information is as follows:
In like manner, Knowledge Representation Model TM=(id, F, WF, T, V) is set up according to the different field information in expert.Wherein,
Id represents the identification field in experts database;F represents field category set in evaluation expert;WF is the weight set of field;T is
Feature word;V represents the feature word corresponding to field and weight set i.e. V thereofi={ vi1,f(vi1),vi2,f(vi2),...,
vin,f(vin), vijRepresent the jth feature word in i-th field, f (vij) represent vijFeature word is at corresponding word
The frequency of occurrences in Duan.The representation of knowledge of evaluation expert's information is:
Evaluation expert's information index storehouse builds: after evaluation Expert Knowledge Expression model construction completes, entered by information index
Storehouse: first read the content item information of an evaluation expert from experts database;Phrase semantic network is set up also based on word segmentation result
Extract the Feature Words that evaluation expert is comprised;Foundation Knowledge Representation Model also utilizes Apache Lucene that it is set up index;Will
The index established is added to by generic, in the index database of correspondence, until all of evaluation expert indexes warehouse-in, see Fig. 5.
Step 6: according to the number of project, it is recommended that mode is divided into single pending trial project recommendation expert and packet (multiple) pending trial
Project recommendation expert.Packet recommends expert that the pending trial project knowledge of step 5 being represented, model does between corresponding interfield and project
Feature union operation, single pending trial expert recommend only do corresponding interfield feature union operation.Meanwhile, commenting step 5
The Knowledge Representation Model examining expert carries out interfield feature merging.Foundation Knowledge Representation Model also utilizes Apache Lucene couple
Characteristic information after merging sets up index.Wherein, science and technology item index construct is carried out when carrying out project recommendation.
In science and technology item declaration management system, pending trial project needs packet to recommend often, features described above union operation,
Guarantee removal process 5 to arrange different field weight that Similarity Measure is produced the contribution recommended is poor by Knowledge Representation Model
Different.
Described pending trial project, that the feature merga pass logic xor operation of evaluation expert carries out process is as follows:
(1) pending trial project, the interfield feature of an evaluation expert merge
Assume field feature set of words W'1And W'2Merge, then define W'1And W'2Merge ruleFor:
Wherein, word1i, word2jIt is characterized word.
Add field weight improve and extend above-mentioned definition, the interfield feature of evaluation expert, science and technology item is closed
And, merging rule is:
(2) between the project of packet pending trial project, feature merges
The operation of this merging process, just for the characteristic vector of pending trial science and technology item, is not for evaluation expert's characteristic vector,
Expert features vector has only to do interfield feature union operation.If V is (d1) and V (d2) it is that two science and technology items are through word respectively
Vector model after the merging of intersegmental feature, to any t1j∈V(d1), t2j∈V(d2), if there is t1jWith t2jIdentical, merge.It is defined as:
Wherein, k=1 ..., n, tkIt is characterized entry item, wkP () is tkWeight.
The knowledge model of science and technology item group represents that the basic process of generation is as follows:
A). merge science and technology item interfield feature, obtain vector model V (d) of each project;
B). all science and technology item vector model set are used consolidation strategyLogical
Cross above-mentioned method, science and technology item is set up the Knowledge Representation Model of the vector space that is based on.
V (p)={ < t1,w1(p) >, < t2,w2(p) > ..., < tn,wn(p) > }
Wherein, k=1 ..., n, tkFor project team's Feature Words entry item, wkP () is tkWeight.
Step 7. is closed through the interfield feature of the evaluation expert of step 6 and the Knowledge Representation Model of science and technology item
And, it is assumed that if evaluation expert's information vector is expressed as P={s1,f(s1),s2,f(s2),...,sn,f(sn), science and technology item information
(group) vector representation is Q={t1,f(t1),t2,f(t2),...,tn,f(tn), calculate pending trial science and technology based on maximum matching algorithm
Project (group) vector and the semantic similarity of evaluation expert.
Described pending trial science and technology item (group) vector calculates language with evaluation expert's vector based on bigraph (bipartite graph) maximum matching algorithm
Justice Similarity Measure process is as follows:
Based on maximum matching algorithm computing semantic similarity, it is simply that obtain two texts uses maximum based on bigraph (bipartite graph)
Matching algorithm similarity.As in figure 2 it is shown, maximum matching algorithm based on bigraph (bipartite graph) calculates the similarity of characteristic item, its principle is just
It is that each Feature Words of evaluation expert's vector is made as the summit in X portion using each Feature Words of science and technology item (group) vector
For a summit in Y portion, being equivalent to ask the maximum weight matching of a complete bipartite graph, in accompanying drawing 2, thick line portion is exactly X portion feature
The semantic similarity that word is maximum with certain Y portion Feature Words.
So-called semantic similarity, it is simply that obtain based on the Similarity Measure knowing net.The present invention is by knowing net semantic dictionary
With maximum matching algorithm calculates the semantic similarity between pending trial project (group) and evaluation expert, then computing formula is:
Wherein, si, tjFor semantic similarity maximum SimSEM (si,tj) two words corresponding to limit (thick line in Fig. 2)
Language node, m, n are respectively the Feature Words number of science and technology item vector representation and the Feature Words number of evaluation expert's vector representation.p
Number for the limit (thick line in Fig. 2) of semantic similarity maximum.
Above-mentioned pending trial project (group) relates to language, phrase semantic, word knot with the semantic similarity of evaluation expert's information
The many factors such as structure, it represents both matching degrees, and similarity is big, illustrate that both matching degrees are high, and evaluation expert is suitable for evaluating
This project (group).
Step 8. arranges similarity and blocks, and produces according to the size of similarity and recommends index, produces final recommendation evaluation
Specialist list.
The above is only the preferred embodiment of the present invention, it is noted that for science and technology item evaluation expert field
Intelligent machine recommended technology, on the premise of without departing from the technology of the present invention principle, it is also possible to make some improvement and deformation, these
Improve and deform the legal scope that also should be considered as the present invention.