CN103455596B - A kind of method of science and technology items based on big data project verification assessment - Google Patents

A kind of method of science and technology items based on big data project verification assessment Download PDF

Info

Publication number
CN103455596B
CN103455596B CN201310393575.XA CN201310393575A CN103455596B CN 103455596 B CN103455596 B CN 103455596B CN 201310393575 A CN201310393575 A CN 201310393575A CN 103455596 B CN103455596 B CN 103455596B
Authority
CN
China
Prior art keywords
result
project
science
text
vector space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310393575.XA
Other languages
Chinese (zh)
Other versions
CN103455596A (en
Inventor
罗亮
卢智星
方少亮
徐迪威
蔡建新
林珠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Science & Technology Infrastructure Center
Original Assignee
Guangdong Science & Technology Infrastructure Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Science & Technology Infrastructure Center filed Critical Guangdong Science & Technology Infrastructure Center
Priority to CN201310393575.XA priority Critical patent/CN103455596B/en
Publication of CN103455596A publication Critical patent/CN103455596A/en
Application granted granted Critical
Publication of CN103455596B publication Critical patent/CN103455596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes the method for a kind of science and technology items based on big data project verification assessment, comprises the steps: that acquisition is approved and initiate a project request for data;Compare with project verification request for data after project correlation technique latest development direction and focus are processed and obtain the first result;Compare with project verification request for data after national policy is helped direction and local government's industrial development direction information processing and obtain the second result;The 3rd result is exported after authorities' industry on the spot is investigated feedback suggestion and comprehensive strength information processing;Compare with project verification request for data after similar science and technology item over the years to authorities project verification, established project construction situation over the years and effect process and obtain the 4th result;Project verification application assessment and suggestion report is obtained according to the first result, the second result, the 3rd result, the 4th result and weight thereof.The method implementing science and technology items based on the big data project verification assessment of the present invention, has the advantages that the quality of evaluation of S&T projects and level is higher, assessment result is more objective, more comprehensively.

Description

A kind of method of science and technology items based on big data project verification assessment
Technical field
The present invention relates to project evaluation field, particularly to the project verification assessment of a kind of science and technology items based on big data Method.
Background technology
Science and technology item information has the big data characteristicses such as data type destructuring, data volume be huge, and data are come Source is often by the trans-regional data accumulated for many years, and the process time using common machines and algorithm is longer. Existing evaluation of S&T projects uses expert analysis mode pattern, and the main contents of assessment include: evaluation item is set up the project Necessity and feasibility;Using development present situation, technical situation and the research level of project as index It is evaluated;Project research and development content, technical matters route, embodiment are evaluated;Estimation items The ability of mesh carrier, to the capacity of scientific research of carrier, the professional standards of skilled personnel, research equipment The operating capability of equipment and science and technology item is estimated;The implementation of conditions of evaluation item funds, to funds source, Self-raised fund, government's supporting funds and dollar amount are evaluated.Owing to science and technology item information has big data Complexity feature, and the understanding of expert is different, ken has deviation, with certain subjective consciousness, And application procedures also have complexity, the condition of declaring has ambiguity, so project evaluation is only by specially The conclusion often scientific basis that family's scoring draws is not enough, and the project that is hardly formed preferably is known together.So expert Evaluation can exist some random, blindness and one-sidedness, cause quality and the level of evaluation of S&T projects The highest, science and technology item experts' evaluation mechanism imperfection.Comprehensive and the credibility of expert's project scoring need into One step is strengthened.
Summary of the invention
The technical problem to be solved in the present invention is, for the quality of the above-mentioned evaluation of S&T projects of prior art With poor defect, it is provided that the quality of a kind of evaluation of S&T projects and higher based on the big data of level The method of science and technology item project verification assessment.
The technical solution adopted for the present invention to solve the technical problems is: construct a kind of science and technology based on big data The method of project evaluation, comprises the steps:
A) science and technology item project verification request for data is obtained;
B) obtain latest development direction and the hot information of science and technology item correlation technique, and it is processed To current research hot spot data, by the phase of described current research hot spot data with science and technology item project verification request for data Should partly compare and obtain the first result;
C) obtain national policy and help direction and local government's industrial development direction information, obtain after it is processed Help bearing data, the appropriate section of described support bearing data with science and technology item project verification request for data is carried out Relatively obtain the second result;
D) obtain authorities' industry on the spot and investigate the comprehensive strength letter of feedback advisory information and project application unit Breath, and export the 3rd result embodied with data mode after processing;
E) obtain authorities similar science and technology items over the years project verification situation, established project construction situation the most over the years and Effect, and obtain a vector space collection to after its process, by described vector space collection and described science and technology item The appropriate section of project verification request for data compares and obtains the 4th result;
F) described first result, the second result, the 3rd result and the weight of the 4th result are determined, and according to institute State the first result, the second result, the 3rd result, the 4th result and respective weight and obtain science and technology item project verification The assessment of application and project verification suggestion report.
In the method for science and technology items based on big data of the present invention project verification assessment, described step A) Farther include:
A1) the first text about science and technology item project verification application is obtained;
A2) described first text is processed the primary vector space obtained for representing project verification application project; Described science and technology item project verification request for data is described primary vector space.
In the method for science and technology items based on big data of the present invention project verification assessment, described step B) enter One step includes:
B1) obtain the latest development direction of science and technology item correlation technique and hot information and form a series of text Record;
B2) each text entry is carried out successively Chinese word segmentation, filters extraction text feature after stop words;
B3) described text feature is clustered, and extract the primary vector representing current research focus;
B4) described primary vector is compared with described primary vector space, obtain for representing science and technology item Mesh project verification application and the first result of current techniques focus degree of association.
In the method for science and technology items based on big data of the present invention project verification assessment, described step C) enter One step includes:
C1) obtain about national policy support direction and the second text of local government's industrial development direction;
C2) described second text is carried out successively Chinese word segmentation, filter stop words after obtain the 3rd text;
C3) by described 3rd text calculating word frequency is obtained current support direction key word;
C4) weight of each key word is distributed equally and builds secondary vector space;
C5) described secondary vector space is compared with described primary vector space, obtain representing described section Skill is approved and initiate a project and is applied for and the second result of the support direction goodness of fit.
In the method for science and technology items based on big data of the present invention project verification assessment, described step D) Farther include:
D1) obtain about the comprehensive strength information of project applying unit, request slip in scientific and technological resources investigating system Position project construction situation over the years and authorities' industry on the spot investigate a series of texts note of feedback advisory information Record, and form the 4th text;Described comprehensive strength information includes manpower, financial resources, material resources and base information;
D2) from described 4th text, performance information is extracted;
D3) according to described performance information, described applying unit is carried out prestige scoring;
D4) scoring of described prestige is normalized obtains the 3rd result of data type.
In the method for science and technology items based on big data of the present invention project verification assessment, described step E) enter One step includes:
E1) obtain about authorities' similar science and technology item over the years project verification situation, established project construction over the years Document data the source C={C1, C2 of situation and effect ... Ci ... };
E2) from document data source C={C1, C2 ... Ci ... read a text Ci in };
E3) the word frequency word order prototype vector Vi of described text Ci is initialized;
E4) described text Ci is carried out Chinese word segmentation, and the participle filter that will obtain after described text Ci participle Except stop words, obtain first participle vector space Ti=(Ti1, Ti2 ..., Tin);
E5) calculate described first participle vector space Ti=(Ti1, Ti2 ..., Tin) in the word of vector element Tij Frequently, obtain word frequency weight Fij in corresponding described text Ci, obtain the first word frequency weighing vector space Fi=(Fi1, Fi2,……,Fin);
E6) described word frequency weighing vector space Fi is carried out dimensionality reduction, obtain the second word frequency weighing vector space Fi ' =(Fi1, Fi2 ..., Fik) and the second participle vector space Ti '=(Ti1, Ti2 ..., Tik);
E7) calculate the second participle vector space Ti ' in the word order of vector element, obtain word order weight Sij (j=1,2 ..., k), and obtain word order weighing vector space S i=(Si1, Si2 ..., Sik);
E8) the word frequency lexical order vector Vi=(Ti ', Fi ', Si) of text is built;
E9) judge that the text in described document data source C runs through the most, in this way, perform step E10); Otherwise, step E2 is returned) read next text;
E10) generate corresponding to described document data source C={C1, C2 ... Ci ... } the word frequency word order of Chinese version Vector space V={V1, V2 ... Vi};
E11) by described word frequency lexical order vector SPACE V={ V1, V2 ... Vi} enters with described primary vector space Row compares, and obtains representing the 4th result of similarity between the project verification application of described science and technology item and project over the years.
In the method for science and technology items based on big data of the present invention project verification assessment, described step F) enter One step includes:
F1) judging whether the 4th result is more than the threshold value set, in this way, scoring is 0;Otherwise, step is performed F2);
F2) described first result, the second result, the 3rd result and the weight of the 4th result are determined;
F3) according to described first result, the second result, the 3rd result, the 4th result and respective weight thereof Obtain score value;The suggestion report of output project verification simultaneously.
In the method for science and technology items based on big data of the present invention project verification assessment, described project verification suggestion Report includes similarity the setting up the project over the years in set point with science and technology item project verification application, declares unit phase Like project, unit performance situation, current support direction and key technology hot information.
In the method for science and technology items based on big data of the present invention project verification assessment, described step A2) In primary vector space be according to described E2) to E11) and method obtain.
The method implementing science and technology items based on the big data project verification assessment of the present invention, has the advantages that Due to by standing with science and technology item after the latest development direction of science and technology item correlation technique and hot information are processed The appropriate section of request for data compares and obtains the first result;By national policy being helped direction and local political affairs After the industrial development direction information processing of mansion, the appropriate section with science and technology item project verification request for data compares and obtains second Result;By authorities' industry on the spot is investigated feedback advisory information and the comprehensive strength of project application unit The 3rd result is exported after information processing;By similar science and technology item over the years to authorities project verification situation, over the years With the appropriate section ratio of described science and technology item project verification request for data after established project construction situation and effect process Relatively obtain the 4th result;By the first result, the second result, the 3rd result, the 4th result and respective weight Obtain assessment and the project verification suggestion report of science and technology item project verification application, so being made that approving and initiate a project and comparing More scientific and normal careful assessment, experts' evaluation be referred to use for reference model evaluation with further comprehensively, carefully, Understand by the concrete condition of Project evaluation, it is to avoid some in experts' evaluation are random, blind in the past responsiblely From property and the drawback of one-sidedness, so that the quality of evaluation of S&T projects and level are higher.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the flow process in one embodiment of method of present invention science and technology item based on big data project verification assessment Figure;
Fig. 2 is the particular flow sheet obtaining science and technology item project verification request for data in described embodiment;
Fig. 3 is that in described embodiment, latest development direction and hot information to science and technology item correlation technique is carried out The particular flow sheet processed;
Fig. 4 is to help direction and local government's industrial development direction information to enter national policy in described embodiment The particular flow sheet that row processes;
Fig. 5 is, in described embodiment, authorities' industry on the spot is investigated feedback advisory information and Project Unit Comprehensive strength information carries out the particular flow sheet processed;
Fig. 6 is the concrete stream that in described embodiment, similar science and technology item situation over the years to authorities processes Cheng Tu;
Fig. 7 is the assessment of science and technology item project verification application in described embodiment and the concrete of suggestion report acquisition of setting up the project Flow chart.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation The every other embodiment obtained under property work premise, broadly falls into the scope of protection of the invention.
In the embodiment of the method for present invention science and technology item based on big data project verification assessment, the flow process of its method Figure is as shown in Figure 1.In Fig. 1, the method comprises the steps:
Step S01 obtains science and technology item project verification request for data: in this step, obtains science and technology item project verification application Data, namely obtain the data about new established project.About the most specifically obtaining science and technology item project verification application Data, will be described in detail after a while.
Step S02 obtains latest development direction and the hot information of science and technology item correlation technique, and carries out it Process obtains current research hot spot data, by current research hot spot data and science and technology item project verification request for data Appropriate section compares and obtains the first result: in this step, for each Large-scale professional technology web sites, periodical Carry out statistical analysis with the relevant big data in scientific and technological resources investigation and management system, use compound recipe Chinese word segmentation to calculate Relevant big data are processed by method and Kmeans clustering algorithm, export the forward position research heat of current scientific research technology Point data, compares the appropriate section of current research hot spot data with science and technology item project verification request for data To the first result.Thus achieve the latest development direction to science and technology item correlation technique and hot information Objective evaluation.
Step S03 obtains national policy and helps direction and local government's industrial development direction information, processes it After obtain helping bearing data, enter helping the set up the project appropriate section of request for data of bearing data and science and technology item Go to compare and obtain the second result: in this step, obtain national policy and help direction and local government's industry development Directional information, its main information is that national policy is helped, the industry development of local government is sent to and helps direction Official document, belongs to Text Information Data and excavates, and obtains helping bearing data, will help direction number after processing it Comparing according to the appropriate section with science and technology item project verification request for data and obtain the second result, the second result is for holding up Hold direction key word.Thus achieve the national policy to project and help direction, local government's industry development Help the objective evaluation in direction.
Step S04 obtains authorities industries on the spot and investigates the comprehensive of feedback advisory information and project application unit Information for strength, and export the 3rd result embodied with data mode after processing: in this step, obtain main Pipe department industry on the spot investigates feedback advisory information and the comprehensive strength information of project application unit, and it is mainly believed Breath is authorities' on-the-spot investigation local industries development, investigates engineering center at different levels and key lab of enterprise Feedback advisory information, exports the 3rd result embodied with data mode after processing it, the 3rd result is for helping Direction keyword and the support dynamics factor.Thus achieve and authorities' industry on the spot is investigated feedback opinion Objective evaluation.
Step S05 obtains authorities' similar science and technology item over the years project verification situation, established project construction over the years Situation and effect, and obtain a vector space collection to after its process, vector space collection is stood with science and technology item The appropriate section of item request for data compares and obtains the 4th result: in this step, obtains science and technology integrated service Authorities in management system established project situation, approving and initiate a project over the years of project application unit over the years is built If situation and performance information, use " high-performance calculation text based on TF/IDF and Markov model spy Levy extraction algorithm " (The high performance computing text feature extraction Algorithm based on the TF/IDF and markov model) (it is called for short: HpTF/IDF-MM) these information are carried out process and obtain a vector space collection, by vector space collection and section The approve and initiate a project appropriate section of request for data of skill compares and obtains the 4th result.Thus achieve supervisor Department's similar terms over the years project verification situation, established project construction situation and the objective evaluation of performance information over the years.
Step S06 determines the first result, the second result, the 3rd result and the weight of the 4th result, and foundation First result, the second result, the 3rd result, the 4th result and respective weight obtain science and technology item project verification Shen Assessment please and project verification suggestion report: in this step, determine the first result, the second result, the 3rd result and The weight of the 4th result, in the present embodiment, the setting of weight is configured by rule of thumb by expert, then foundation First result, the second result, the 3rd result, the 4th result and respective weight obtain science and technology item project verification Shen Assessment please and project verification suggestion report.Describe for convenience, in the present embodiment, by the first result, the second knot Really, the 3rd result and the 4th result are respectively labeled as M1, M2, M3 and M4, then stand according to project over the years Expert's assessment determine the weight of M1, M2, M3 and M4, the final assessment obtaining project verification application with And project verification suggestion report.
Project verification suggestion report provides relevant project verification to advise reference, project verification suggestion report with the pattern of text to expert The content accused includes: with the similarity of science and technology item project verification application project verification (this reality over the years in set point Execute in example as the project verification over the years of 5 before similarity), declare unit similar terms, unit performance situation, currently hold up Hold direction and key technology hot information.Wherein, 5 set up the project as reference information over the years before similarity;Defeated Go out to declare unit similar terms for avoiding repeating project verification.
In the present embodiment, the project obtained by step S01-S06 is marked, and marks the highest, feasibility of setting up the project The best;Current support direction and focus key technology is understood by step S02, S03;Carried by step S04 For the credit rating of unit, comprehensive strength and performance situation;It is extracted first 5 of similarity by step S05 Project name, Distribution Area.Science and technology item expert analysis mode pattern in compared to existing technology, the present invention couple Appraisement system that science and technology item project verification is estimated having quantization and relatively uniform evaluation criterion, by section The recent technological advances direction of skill project, hot technology;National policy helps direction;Local government's industry is sent out Direction is helped in exhibition;Authorities' similar terms over the years project verification situation;Established project construction situation the most over the years with become Effect and authorities' industry on the spot investigate the scoring of six aspects of feedback opinion, make approving and initiate a project relatively Relatively scientific and normal careful assessment.Experts' evaluation be referred to use for reference appraisal procedure to accomplish further comprehensively, Carefully, understand by the concrete condition of Project evaluation, it is to avoid some in current experts' evaluation are random responsiblely Property, following blindly property and one-sidedness such that it is able to play the effect improving science and technology item experts' evaluation mechanism.
It is noted that in FIG, for the sake of narration, step S02-S05 has certain Sequentially.It practice, above-mentioned steps can also be divided into several groups, each group of internal step has certain Sequentially, between group and group, can be ordering relation as above, it is also possible to be parallel, also may be used To be the order different from said sequence.Such as, a kind of method of packet is: step S02 and S03 are one Group, step S04 and S05 are one group, and between these groups can be arranged side by side or executed in parallel, it is possible to To perform (can not be above-mentioned order) according to certain rule order.
For the present embodiment, above-mentioned steps S01 also can refine further, the idiographic flow after its refinement Figure is as shown in Figure 2.In Fig. 2, step S01 farther includes:
Step S11 obtains the first text about science and technology item project verification application: in this step, obtains about section Skill approve and initiate a project application the first text, namely obtain about new projects project verification application the first text.
It is empty that step S12 processes the primary vector obtained for representing project verification application project to the first text Between: in this step, in this step, the first text is processed and obtains for representing project verification application project Primary vector space, particularly as be the project verification application to new projects use HpTF/IDF-MM algorithm represented The primary vector space of new projects, in the present embodiment, is designated as L by primary vector space;Above-mentioned science and technology item stands Item request for data is primary vector space.HpTF/IDF-MM algorithm can be introduced after a while.
For the present embodiment, above-mentioned steps S02 also can refine further, the idiographic flow after its refinement Figure is as shown in Figure 3.In Fig. 3, step S02 farther includes:
Step S21 obtains the latest development direction of science and technology item correlation technique and hot information and is formed a series of Text entry: in this step, the latest development direction of science and technology item correlation technique and hot information are from data Source is extracted.Data source is country and the science and technology plan item data of provinces and cities, then extracts from data source Just obtain science and technology item be correlated with about the recent technological advances direction of science and technology item, the data set of hot technology The latest development direction of technology and hot information, and form a series of text entry.
Step S22 carries out Chinese word segmentation successively, filters extraction text feature after stop words each text entry: In this step, each text entry is carried out Chinese word segmentation, filters stop words, then extract text feature.
Text feature is clustered by step S23, and extracts the primary vector representing current research focus: this In step, use Kmeans clustering algorithm to cluster to text feature, and extract expression current research heat The primary vector of point.
Primary vector is compared by step S24 with primary vector space, obtains for representing that science and technology item stands Item application and the first result of current techniques focus degree of association: in this step, by primary vector and primary vector Space L compares, and obtains for representing science and technology item project verification application and the of current techniques focus degree of association One result M1;The span of the first result M1 is: 0 < M1 < 1.
For the present embodiment, above-mentioned steps S03 also can refine further, the idiographic flow after its refinement Figure is as shown in Figure 4.In Fig. 4, step S03 farther includes:
Step S31 obtains about national policy support direction and the second text of local government's industrial development direction: In this step, national policy helps the second text of direction and local government's industrial development direction to be from data source Middle extraction.Data source is country and the science and technology plan item data of provinces and cities, then extracts from data source and closes The data set helping direction and local government's industrial development direction in national policy has just obtained national policy support Direction and the second text of local government's industrial development direction.
Step S32 carries out Chinese word segmentation successively to the second text, filter stop words after obtain the 3rd text: this In step, the first text is carried out successively Chinese word segmentation, filter the pretreatment such as stop words after obtain the 3rd text.
Step S33 obtains current support direction key word by the 3rd text calculates word frequency: in this step,
Current support direction key word is obtained by the 3rd text being calculated word frequency.
Step S34 is distributed equally and builds secondary vector space to the weight of each key word: in this step, The weight of each key word is distributed equally, and direct construction vector space model, in the present embodiment, will This vector space model is referred to as secondary vector space.
Secondary vector space is compared by step S35 with primary vector space, obtains representing that science and technology item stands Item application is with the second result helping the direction goodness of fit: in this step, by secondary vector space and primary vector Space L compares, and obtains the second result M2 representing science and technology item project verification application with helping the direction goodness of fit, The span of the second result M2 is: 0 < M2 < 1.
For the present embodiment, above-mentioned steps S04 also can refine further, the idiographic flow after its refinement Figure is as shown in Figure 5.In Fig. 5, step S04 farther includes:
Step S41 obtains about the comprehensive strength information of project applying unit, Shen in scientific and technological resources investigating system Please unit project construction over the years situation and a series of texts of authorities' investigation feedback advisory information of industry on the spot Record, and form the 4th text: in this step, extract from above-mentioned data source and investigate system about scientific and technological resources The comprehensive strength information of Tong Zhong project applying unit, applying unit's project construction over the years situation and authorities are real Real estate sector investigates a series of text entries of feedback advisory information, and forms the 4th text, it is worth mentioning at this point that, In the present embodiment, comprehensive strength information includes manpower, financial resources, material resources and base information.
Step S42 extracts performance information from the 4th text: in this step, extracts performance from the 4th text Information.
Step S43 carries out prestige scoring to applying unit according to performance information: in this step, by identifying the Performance information in four texts, can determine the situation of passing judgement on, then applying unit be carried out prestige scoring.
Step S44 is normalized the 3rd result obtaining data type to prestige scoring: in this step, Prestige scoring is normalized, is output as the 3rd result M3 of data type.Taking of 3rd result M3 Value scope is: 0 < M3 < 1.
For the present embodiment, in above-mentioned steps S05, the situation that main input is approved and initiate a project over the years, its Including the autograph of project, unit, application documents, report of feasibility etc., data source is arranged, will be every Individual item is visually every text entry.Text entry is carried out Chinese word segmentation, filters stop words, structure text Feature, set up vector space, use high-performance calculation text feature based on TF/IDF and Markov model Extraction algorithm (HpTF/IDF-MM) builds vector space collection based on word frequency word order, and its output result is one The vector space collection of system.Contrast with primary vector space L, take the value of highest similarity, be designated as the 4th knot Really M4.The particular flow sheet of step S05 is as shown in Figure 6.In Fig. 6, step S05 farther includes:
Step S501 obtains about authorities' similar science and technology item over the years project verification situation, established project over the years Document data the source C={C1, C2 of construction situation and effect ... Ci ... }: in this step, obtain about supervisor Department's similar science and technology item over the years project verification situation, established project construction situation and the document data of effect over the years Source C={C1, C2 ... Ci ... }.
Step S502 is from document data source C={C1, C2 ... Ci ... read text Ci: this step in } In, from document data source C={C1, C2 ... Ci ... read a text Ci in }.
Step S503 initializes the word frequency word order prototype vector Vi of text Ci: in this step, initializes text The word frequency word order prototype vector Vi of Ci.
Step S504 carries out Chinese word segmentation to text Ci, and is filtered by the participle obtained after text Ci participle and stop Word, obtain first participle vector space Ti=(Ti1, Ti2 ..., Tin): in this step, text Ci is carried out Chinese word segmentation, and the participle obtained after text Ci participle is filtered stop words, obtain first participle vector space Ti=(Ti1, Ti2 ..., Tin).
Step S505 calculate first participle vector space Ti=(Ti1, Ti2 ..., Tin) in the word of vector element Tij Frequently, obtain word frequency weight Fij in corresponding text Ci, obtain the first word frequency weighing vector space Fi=(Fi1, Fi2 ..., Fin): in this step, calculate first participle vector space Ti=(Ti1, Ti2 ..., Tin) in element vector The word frequency of element Tij, obtains word frequency weight Fij in corresponding text Ci, obtains the first word frequency weighing vector space Fi=(Fi1, Fi2 ..., Fin).
Step S506 carries out dimensionality reduction to word frequency weighing vector space Fi, obtains the second word frequency weighing vector space Fi '=(Fi1, Fi2 ..., Fik) and the second participle vector space Ti '=(Ti1, Ti2 ..., Tik): this step In Zhou, word frequency weighing vector space Fi is carried out dimensionality reduction, obtain the second word frequency weighing vector space Fi '=(Fi1, Fi2 ..., Fik) and the second participle vector space Ti '=(Ti1, Ti2 ..., Tik).
Step S507 calculates the second participle vector space Ti ' in the word order of vector element, obtain word order weight Sij (j=1,2 ..., k), and obtain word order weighing vector space S i=(Si1, Si2 ..., Sik): in this step, Calculate the second participle vector space Ti ' in the word order of vector element, obtain word order weight Sij (j=1,2 ..., k), And obtain word order weighing vector space S i=(Si1, Si2 ..., Sik).
Step S508 builds the word frequency lexical order vector Vi=(Ti ', Fi ', Si) of text: in this step, builds literary composition Word frequency lexical order vector Vi=originally (Ti ', Fi ', Si).
Step S509 judges that the text in the C of document data source runs through the most: in this step, it is judged that number of files The most run through according to the text in the C of source, in this way, perform step S510;Otherwise, return step S502 to read Take next text.
Step S510 generates corresponding to document data source C={C1, C2 ... Ci ... } the word frequency word order of Chinese version Vector space V={V1, V2 ... Vi}: if the judged result of above-mentioned steps S509 is yes, then perform basis Step.In this step, generate corresponding to document data source C={C1, C2 ... Ci ... } the word frequency word of Chinese version Ordered vector space V={V1, V2 ... Vi}.
Step S511 is by word frequency lexical order vector SPACE V={ V1, V2 ... Vi} compares with primary vector space Relatively, obtain representing the 4th result of similarity between science and technology item project verification application and project over the years: in this step, By word frequency lexical order vector SPACE V={ V1, V2 ... Vi} compares with primary vector space L, is represented 4th result M1 of similarity between science and technology item project verification application and project over the years.The value of the 4th result M4 Scope is: 0 < M4 < 1.
It is noted that the primary vector space L in above-mentioned steps S12 is according to above-mentioned steps S512 extremely The method of step S511 obtains.
HpTF/IDF-MM algorithm is applicable to High-Performance Computing Cluster application, uses Mapreduce programming, it is possible to The big Data processing of project material reaches effect rapidly and efficiently.Input information is text document data collection, defeated Go out the document space vector set after information is characterized weighting.Algorithm can extract good characteristic information, builds Representative vector space, it is particular in that it can either extract text by word frequency well Key character, the feature of every terms of information in project verification material can be reflected by word order again.HpTF/IDF-MM Algorithm carries out feature extraction to text, can balance the load well in the Text character extraction stage, improves computing Speed, shortens operation time;This algorithm fusion feature of word frequency and word order, can reflect the concern of project Focus, the most also can be highly suitable for by the related information between word order reflection science and technology item declaration material Science and technology item material destructuring but the most normalized data characteristics;This algorithm is realizing in step, first First with the feature that word frequency amount of calculation is less, the result after HpTF/IDF computing is carried out text dimensionality reduction, then Carry out the computing of word order feature again, effectively reduce the time complexity of algorithm.
For the present embodiment, above-mentioned steps S06 also can refine further, the idiographic flow after its refinement Figure is as shown in Figure 7.In Fig. 7, step S06 farther includes:
Step S61 judges that whether the 4th result is more than the threshold value set: in this step, it is judged that the 4th result Whether M4 is more than the threshold value set, and in this way, performs step S62;Otherwise, step S63 is performed.
The scoring of step S62 is 0: if the judged result of above-mentioned steps S61 is yes, then perform this step.This In embodiment, scoring (project scoring) is labeled as in F, this step, when M4 is more than the threshold value set (threshold value of this setting is derived as 0.9 by expertise in training set, it is also possible to freely arrange), project is marked F=0.
Step S63 determines the first result, the second result, the 3rd result and the weight of the 4th result: on if The judged result stating step S61 is no, then perform this step.In this step, by the first result, the second knot Really, the weight of the 3rd result and the 4th result be respectively labeled as W1, W2, W3 and W4, weight is also arranged It is configured by expertise.Under normal circumstances, the meaning represented according to all types of data, each weight should expire Following condition: the W1=W2 < W3 < W4 of foot, meanwhile, W1+W2 < 0.5.
Step S64 is according to the first result, the second result, the 3rd result, the 4th result and respective weight thereof Obtain score value;The suggestion report of output project verification simultaneously: in this step, according to the first result M1, the second result M2, the 3rd result M3, the 4th result M4 and respective weight thereof obtain score value.Project is marked F=W1*M1+W2*M2+W3*M3+W4* (1-M4).In this step, the most also output project verification suggestion report Its content includes: 5 set up the project as reference information over the years before similarity;Unit similar terms is declared in output, To avoid repeating project verification;Output unit performance situation;Output is current helps direction and focus key technology.
In a word, in the present embodiment, the method is to the recent technological advances direction of science and technology item, hot technology; National policy helps direction;Direction is helped in local government's industry development;Authorities' similar terms over the years project verification Situation;Established project construction situation and effect and authorities industry on the spot over the years investigates feedback opinion six Aspect is estimated, so that " high-performance calculation Text character extraction based on TF/IDF and Markov model is calculated Method " big data processing technique excavates the factor of marking, objectively to science and technology item from magnanimity science and technology item information Mark, be aided with expert analysis mode pattern, make evaluation of S&T projects accomplish science, reasonable, symbol as far as possible Close practical situation, and blindness and the randomness of Authorize to Invest can be reduced for the service of relevant authorities, There is emphasis, distribute scientific and technological resources efficiently, improve, improve quality and the level of science and technology decision-making.Compare section Skill project expert analysis mode pattern, the present invention be estimated having to science and technology item project verification quantization appraisement system and Relatively uniform evaluation criterion, makes the most scientific and normal careful assessment to approving and initiate a project.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within bright spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (4)

1. the method for science and technology items based on a big data project verification assessment, it is characterised in that include walking as follows Rapid:
A) science and technology item project verification request for data is obtained;
B) obtain latest development direction and the hot information of science and technology item correlation technique, and it is processed To current research hot spot data, by the phase of described current research hot spot data with science and technology item project verification request for data Should partly compare and obtain the first result;
C) obtain national policy and help direction and local government's industrial development direction information, obtain after it is processed Help bearing data, the appropriate section of described support bearing data with science and technology item project verification request for data is carried out Relatively obtain the second result;
D) obtain authorities' industry on the spot and investigate the comprehensive strength letter of feedback advisory information and project application unit Breath, and export the 3rd result embodied with data mode after processing;
E) obtain authorities similar science and technology items over the years project verification situation, established project construction situation the most over the years and Effect, and obtain a vector space collection to after its process, by described vector space collection and described science and technology item The appropriate section of project verification request for data compares and obtains the 4th result;
F) described first result, the second result, the 3rd result and the weight of the 4th result are determined, and according to institute State the first result, the second result, the 3rd result, the 4th result and respective weight and obtain science and technology item project verification The assessment of application and project verification suggestion report;
Described step A) farther include:
A1) the first text about science and technology item project verification application is obtained;
A2) described first text is processed the primary vector space obtained for representing project verification application project; Described science and technology item project verification request for data is described primary vector space;
Described step B) farther include:
B1) obtain the latest development direction of science and technology item correlation technique and hot information and form a series of text Record;
B2) each text entry is carried out successively Chinese word segmentation, filters extraction text feature after stop words;
B3) described text feature is clustered, and extract the primary vector representing current research focus;
B4) described primary vector is compared with described primary vector space, obtain for representing science and technology item Mesh project verification application and the first result of current techniques focus degree of association;
Described step C) farther include:
C1) obtain about national policy support direction and the second text of local government's industrial development direction;
C2) described second text is carried out successively Chinese word segmentation, filter stop words after obtain the 3rd text;
C3) by described 3rd text calculating word frequency is obtained current support direction key word;
C4) weight of each key word is distributed equally and builds secondary vector space;
C5) described secondary vector space is compared with described primary vector space, obtain representing described section Skill is approved and initiate a project and is applied for and the second result of the support direction goodness of fit;
Described step D) farther include:
D1) obtain about the comprehensive strength information of project applying unit, request slip in scientific and technological resources investigating system Position project construction situation over the years and authorities' industry on the spot investigate a series of texts note of feedback advisory information Record, and form the 4th text;Described comprehensive strength information includes manpower, financial resources, material resources and base information;
D2) from described 4th text, performance information is extracted;
D3) according to described performance information, described applying unit is carried out prestige scoring;
D4) scoring of described prestige is normalized obtains the 3rd result of data type;
Described step E) farther include:
E1) obtain about authorities' similar science and technology item over the years project verification situation, established project construction over the years Document data the source C={C1, C2 of situation and effect ... Ci ... };
E2) from document data source C={C1, C2 ... Ci ... read a text Ci in };
E3) the word frequency word order prototype vector Vi of described text Ci is initialized;
E4) described text Ci is carried out Chinese word segmentation, and the participle filter that will obtain after described text Ci participle Except stop words, obtain first participle vector space Ti=(Ti1, Ti2 ..., Tin);
E5) calculate described first participle vector space Ti=(Ti1, Ti2 ..., Tin) in the word of vector element Tij Frequently, obtain word frequency weight Fij in corresponding described text Ci, obtain the first word frequency weighing vector space Fi=(Fi1, Fi2,……,Fin);
E6) described word frequency weighing vector space Fi is carried out dimensionality reduction, obtain the second word frequency weighing vector space Fi ' =(Fi1, Fi2 ..., Fik) and the second participle vector space Ti '=(Ti1, Ti2 ..., Tik);
E7) calculate the second participle vector space Ti ' in the word order of vector element, obtain word order weight Sij (j=1,2 ..., k), and obtain word order weighing vector space S i=(Si1, Si2 ..., Sik);
E8) the word frequency lexical order vector Vi=(Ti ', Fi ', Si) of text is built;
E9) judge that the text in described document data source C runs through the most, in this way, perform step E10); Otherwise, step E2 is returned) read next text;
E10) generate corresponding to described document data source C={C1, C2 ... Ci ... } the word frequency word order of Chinese version Vector space V={V1, V2 ... Vi};
E11) by described word frequency lexical order vector SPACE V={ V1, V2 ... Vi} enters with described primary vector space Row compares, and obtains representing the 4th result of similarity between the project verification application of described science and technology item and project over the years.
The method of science and technology items based on big data the most according to claim 1 project verification assessment, its feature It is, described step F) farther include:
F1) judging whether the 4th result is more than the threshold value set, in this way, scoring is 0;Otherwise, step is performed F2);
F2) described first result, the second result, the 3rd result and the weight of the 4th result are determined;
F3) according to described first result, the second result, the 3rd result, the 4th result and respective weight thereof Obtain score value;The suggestion report of output project verification simultaneously.
The method of science and technology items based on big data the most according to claim 2 project verification assessment, its feature Being, described project verification suggestion report includes and similarity the going through in set point of science and technology item project verification application Set up the project in year, declare unit similar terms, unit performance situation, current support direction and key technology focus letter Breath.
The method of science and technology items based on big data the most according to claim 3 project verification assessment, its feature Be, described step A2) in primary vector space be according to described E2) to E11) and method obtain.
CN201310393575.XA 2013-09-02 2013-09-02 A kind of method of science and technology items based on big data project verification assessment Active CN103455596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310393575.XA CN103455596B (en) 2013-09-02 2013-09-02 A kind of method of science and technology items based on big data project verification assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310393575.XA CN103455596B (en) 2013-09-02 2013-09-02 A kind of method of science and technology items based on big data project verification assessment

Publications (2)

Publication Number Publication Date
CN103455596A CN103455596A (en) 2013-12-18
CN103455596B true CN103455596B (en) 2016-11-02

Family

ID=49737959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310393575.XA Active CN103455596B (en) 2013-09-02 2013-09-02 A kind of method of science and technology items based on big data project verification assessment

Country Status (1)

Country Link
CN (1) CN103455596B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133842A (en) * 2014-06-24 2014-11-05 国家电网公司 Data processing method and data processing system with intelligent expert detection function
CN104156386A (en) * 2014-06-24 2014-11-19 国家电网公司 Data processing method and system with image recognition function
CN104133838A (en) * 2014-06-24 2014-11-05 国家电网公司 Data processing method and system with system detection function
CN104077485B (en) * 2014-06-30 2017-05-03 西安电子科技大学 Model correctness evaluation method based on goodness of fit
CN109636352A (en) * 2018-12-20 2019-04-16 湖南晖龙集团股份有限公司 A kind of distributed content duplicate checking early warning system based on financial big data
CN110046225B (en) * 2019-04-16 2020-11-24 广东省科技基础条件平台中心 Scientific and technological project material integrity assessment decision model training method
CN110310048B (en) * 2019-07-10 2023-07-04 云南电网有限责任公司电力科学研究院 Distribution network planning overall process evaluation method and device
CN111160778A (en) * 2019-12-30 2020-05-15 佰聆数据股份有限公司 Outbound project auditing and evaluating method and system based on big data and computer equipment
CN111598331B (en) * 2020-05-13 2023-07-07 中国科学院计算机网络信息中心 Project feasibility prediction analysis method based on scientific research multidimensional features
CN113159537B (en) * 2021-04-06 2023-05-23 南方电网能源发展研究院有限责任公司 Assessment method and device for new technical project of power grid and computer equipment
CN113421026A (en) * 2021-07-19 2021-09-21 首都医科大学附属北京儿童医院 Hospital scientific research project application management method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7359865B1 (en) * 2001-11-05 2008-04-15 I2 Technologies Us, Inc. Generating a risk assessment regarding a software implementation project
US7366680B1 (en) * 2002-03-07 2008-04-29 Perot Systems Corporation Project management system and method for assessing relationships between current and historical projects
CN101697217A (en) * 2009-11-06 2010-04-21 金蝶软件(中国)有限公司 Method and device for generating evaluation scheme
CN102402732A (en) * 2010-09-14 2012-04-04 中国船舶工业综合技术经济研究院 Method and system for evaluating scientific research projects
CN102426603A (en) * 2011-11-11 2012-04-25 任子行网络技术股份有限公司 Text information regional recognition method and device
CN103226751A (en) * 2013-03-29 2013-07-31 广西电网公司电力科学研究院 Post-evaluation method for science and technology projects of grid enterprise

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8019793B2 (en) * 2003-02-14 2011-09-13 Accenture Global Services Limited Methodology infrastructure and delivery vehicle
US8280897B2 (en) * 2009-01-30 2012-10-02 Accenture Global Services Limited Methods and systems for assessing project management offices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7359865B1 (en) * 2001-11-05 2008-04-15 I2 Technologies Us, Inc. Generating a risk assessment regarding a software implementation project
US7366680B1 (en) * 2002-03-07 2008-04-29 Perot Systems Corporation Project management system and method for assessing relationships between current and historical projects
CN101697217A (en) * 2009-11-06 2010-04-21 金蝶软件(中国)有限公司 Method and device for generating evaluation scheme
CN102402732A (en) * 2010-09-14 2012-04-04 中国船舶工业综合技术经济研究院 Method and system for evaluating scientific research projects
CN102426603A (en) * 2011-11-11 2012-04-25 任子行网络技术股份有限公司 Text information regional recognition method and device
CN103226751A (en) * 2013-03-29 2013-07-31 广西电网公司电力科学研究院 Post-evaluation method for science and technology projects of grid enterprise

Also Published As

Publication number Publication date
CN103455596A (en) 2013-12-18

Similar Documents

Publication Publication Date Title
CN103455596B (en) A kind of method of science and technology items based on big data project verification assessment
Karakatsanis et al. Data mining approach to monitoring the requirements of the job market: A case study
CN107851097B (en) Data analysis system, data analysis method, data analysis program, and storage medium
Maleki et al. A comprehensive literature review of the rank reversal phenomenon in the analytic hierarchy process
Md Saad et al. Hamming distance method with subjective and objective weights for personnel selection
Boräng et al. Identifying frames: A comparison of research methods
Ensslin et al. Seaport-performance tools: an analysis of the international literature
Kostka et al. Large infrastructure projects in Germany: A cross-sectoral analysis
CN104574110A (en) Digital credit authentication method
Duarte et al. An optimization‐based approach for designing attribute acceptance sampling plans
CN107122438A (en) A kind of judicial case search method and system
Lee et al. Incremental analysis for generalized TODIM
Gottschalk et al. Stages of knowledge management technology in the value shop: the case of police investigation performance
CN106228453A (en) A kind of method and apparatus obtaining user&#39;s occupational information
Norlander New Evidence on Employee Noncompete, No Poach, and No Hire Agreements in the Franchise Sector
CN107220778A (en) A kind of method, device and the electronic equipment of employee&#39;s credit appraisal and application
Toloo et al. A simplification generalized returns to scale approach for selecting performance measures in data envelopment analysis
Tret'yakova Russian economic journals indexed in Web of Science: current state and the ways of increasing international visibility
Sari et al. Application of the Promethee II method for determining road improvement priorities
CN110134866A (en) Information recommendation method and device
Webster Principles to guide reliable and ethical research evaluation using metric-based indicators of impact
WO2015118619A1 (en) Document analysis system, document analysis method, and document analysis program
García et al. Monitoring credit risk in the social economy sector by means of a binary goal programming model
Abdullah et al. Mapping crowdfunding research on the web of science database: A bibliometric analysis approach
CN107885706A (en) A kind of system of data similarity detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510033 No. 171, Xin Lian Road, Guangdong, Guangzhou

Applicant after: GUANGDONG SCIENCE & TECHNOLOGY INFRASTRUCTURE CENTER

Address before: 510033 No. 171, Xin Lian Road, Guangzhou, Guangdong, Yuexiu District

Applicant before: Guangdong Computing Center

CB03 Change of inventor or designer information

Inventor after: Luo Liang

Inventor after: Lu Zhixing

Inventor after: Fang Shaoliang

Inventor after: Xu Diwei

Inventor after: Cai Jianxin

Inventor after: Lin Zhu

Inventor before: Luo Liang

Inventor before: Lu Zhixing

Inventor before: Fang Shaoliang

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant