CN111598331A - Project feasibility prediction analysis method based on scientific research multidimensional characteristics - Google Patents

Project feasibility prediction analysis method based on scientific research multidimensional characteristics Download PDF

Info

Publication number
CN111598331A
CN111598331A CN202010403375.8A CN202010403375A CN111598331A CN 111598331 A CN111598331 A CN 111598331A CN 202010403375 A CN202010403375 A CN 202010403375A CN 111598331 A CN111598331 A CN 111598331A
Authority
CN
China
Prior art keywords
project
feasibility
characteristic
value
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010403375.8A
Other languages
Chinese (zh)
Other versions
CN111598331B (en
Inventor
王月
于建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202010403375.8A priority Critical patent/CN111598331B/en
Publication of CN111598331A publication Critical patent/CN111598331A/en
Application granted granted Critical
Publication of CN111598331B publication Critical patent/CN111598331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to a project feasibility prediction analysis method based on scientific research multidimensional characteristics, which comprises the following steps: s1, defining a project feasibility characteristic attribute classification and expression method; s2, extracting and calculating the evaluation characteristic attribute; s3, project feasibility prediction based on the multilayer neural network; s4, mapping the scoring rule and the characteristic attribute, and realizing the calculation of the characteristic attribute value; s5, item feasibility analysis based on the scoring rules; and S6, giving a project feasibility analysis report. The method is suitable for realizing project feasibility prediction and project feasibility scoring based on the neural network by utilizing the characteristic attributes and the scoring rules.

Description

Project feasibility prediction analysis method based on scientific research multidimensional characteristics
Technical Field
The invention belongs to the technical field of Big Data Analysis (Big Data Analysis), and provides a method for forming multidimensional characteristics such as research advancement, team strength and content similarity by acquiring research content, technical routes, teams and other information in a scientific research project feasibility report, establishing a feasibility prediction model to judge the project advancement on the basis of comparison and comparison with a historical project library, and realizing project feasibility Analysis and evaluation by combining multiple types of rules.
Background
In the process of analyzing and evaluating the feasibility of the project, the research content of the project is matched with the project guide, the research content is advanced, the project team is provided with strength and other factors. Due to the fact that various projects are large in type span, various in research direction and different in filling content, a unified evaluation method and an evaluation rule are lacked, whether the project is feasible or not is mainly evaluated in a manual mode. Whether the project has feasibility or not is mainly judged based on expert experience, and the method has high subjectivity. Through reasonable feature extraction, relevant keywords and topics, scientific research capability and research foundation of project team personnel can be extracted from the project feasibility report, namely the relevant keywords and topics are converted into feature attributes such as text content, relationship network and time, the project feasibility problem is converted into a two-classification problem through feature attribute calculation, whether the project is feasible or infeasible is judged, and a judgment reason and basis of feasibility are provided. Therefore, by reasonably selecting the features, extracting relevant feature attribute values and inputting the extracted relevant feature attribute values as the features into the two-classification prediction model, the project feasibility judgment is realized and is used as a reference basis for field expert project review, and the method is a problem which needs to be solved urgently in realizing project feasibility analysis by big data analysis at present.
The project feasibility prediction analysis process is essentially oriented to the project feasibility analysis problem, and by utilizing a historical project library and a project guide, relevant evaluation contents such as project advancement, capability of scientific research teams, feasibility of technical routes, guide degree conformity and the like are extracted, text features and personnel relation features are extracted from a project feasibility report, project feasibility is predicted by constructing a neural network model, and project content is evaluated and scored according to evaluation indexes.
The project feasibility prediction generally extracts characteristic attributes, constraint conditions and optimization targets from a historical project library through a big data analysis method to form a project feasibility evaluation index system, utilizes a machine learning algorithm to realize project classification, gives scores of various indexes, and judges whether a project is feasible or not through similarity with a project guide and a historical project. (as shown in fig. 1).
The existing project feasibility evaluation method has applicability in the aspect of solving feasibility prediction. First, the feature attributes reflecting the actual level of the item are not sufficiently extracted. In the actual project evaluation process, whether a project is feasible or not is related to a plurality of factors, such as whether the project research content is matched with a project guide or not, whether a technical route has advancement or not, whether a project team has strength or not, and the like. That is, it is not solved by the existing project evaluation system and method to extract the index features describing the feasibility from the project feasibility report and further establish the evaluation index system. Then, how to calculate the feature situation related to project feasibility, and further determine that the project feasibility is not available in the current systems and methods, or has not been proposed yet. That is, there is no systematic method and theory to solve the project feasibility prediction problem. In addition, whether the current project is feasible or not is judged, except for the conclusion, the current project has a large relation with the current evaluation rule, the evaluation rule needs to be converted into an evaluation index item, and the score of each index needs to be calculated. Meanwhile, a review explanation can be automatically formed, namely summary information meeting the requirements of review conclusions is generated from the text.
Aiming at the feasibility prediction model, the mainstream modeling method is to extract relevant keywords of text research content, construct a classification algorithm model and perform similarity matching with the existing project library; on the basis, project manual review is carried out by combining with domain experts, and a feasibility conclusion is given. The existing method has more manual means, mainly depends on the understanding ability of field experts on the research content of the project, does not fully utilize big data and machine learning technology to improve the quality and efficiency of project review, and is difficult to prejudge historical related projects, if the project is applied through other channels. The main problems that exist include: 1) a feature classification system for project feasibility evaluation is lacked, and various features for project feasibility evaluation are divided in a fine-grained manner; 2) and the explanation mechanism of the constraint condition and the characteristic attribute represented by feasibility and the evaluation rule is lacked. The scoring rules are defined by the review characteristics of different projects, and how to realize the mutual transformation of the scoring rules and the characteristic attributes is difficult to realize by the conventional evaluation method and system. 3) A complete rule-driven based evaluation system is lacking. The method comprises the steps of defining a scoring rule, converting the scoring rule into a characteristic attribute, matching the characteristic attribute, predicting feasibility mechanism based on multi-factor characteristics and converting the characteristic attribute combination into the scoring rule.
Disclosure of Invention
The invention aims to solve the problems of the existing project feasibility evaluation method.
In order to achieve the purpose, the invention provides a project feasibility prediction analysis method based on scientific research multidimensional characteristics, which is characterized by comprising the following steps of:
step one, defining a project feasibility characteristic attribute classification and expression method, expressing project feasibility characteristics and constraints in a fine-grained manner, and learning the weight of characteristic attributes in a principal component analysis mode;
extracting and calculating an evaluation index value, extracting content features and relation network features, and calculating corresponding feature attribute values;
thirdly, project feasibility prediction based on the multilayer neural network;
mapping the scoring rules and the characteristic attributes, and realizing the calculation of the characteristic attribute values;
fifthly, project feasibility analysis based on the scoring rule;
and sixthly, providing a project feasibility analysis report.
The project feasibility prediction analysis method based on scientific research multidimensional characteristics realizes fine-grained description of the feasibility characteristic attributes of scientific research projects; the project feasibility prediction model based on the multilayer neural network realizes the overall prediction of the project feasibility; based on a rule-driven project feasibility analysis report, a mapping and matching mechanism of the feature attributes and the scoring rules is realized, the problems of translation and interpretation from the scoring rules to the feature attributes are solved, and project feasibility analysis is realized through a similarity calculation model and an automatic summarization technology.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of a conventional project feasibility prediction analysis method;
fig. 2 is a schematic diagram of a project feasibility prediction analysis method based on scientific research multidimensional characteristics according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 2 is a schematic diagram of a project feasibility prediction analysis method based on scientific research multidimensional characteristics according to an embodiment of the present invention. As shown in fig. 2, the project feasibility prediction analysis method based on scientific research multidimensional features provided in the embodiments of the present invention includes the following steps:
step S2.1: and defining a project feasibility characteristic attribute classification and expression method. And the fine granularity represents project feasibility characteristics and constraints, and the weight of the characteristic attribute is learned through a principal component analysis mode.
The main sections of a project feasibility report expressing the feasibility of the project feasibility report comprise research content, key scientific problems, technical routes and other text content, team composition, team research foundation and other personnel relationship and influence network characteristics. Besides the characteristic attributes expressed by the project, the expanded attributes such as the matching degree of the research content and the project guide, the research direction heat degree, the project team relationship network and the like have a large relationship with external scientific research data. And decomposing the project feasibility evaluation indexes into content characteristics and relationship network characteristics by extracting the content in the project feasibility report and combining with an external data source. The content features of the text comprise keywords extracted from paragraphs such as topics, keywords, research content, scientific questions, technical routes and the like; the relational network characteristics comprise the centrality of the student status in the character collaborative network, the influence in the character field based on PageRank, the team clustering coefficient, the influence of related results, the popularity of keywords, the strength of teams and the like.
By classifying the characteristic attributes of the scientific research feasibility reports, any project feasibility report can be converted into the description of the characteristic attributes, and attribute value calculation can be performed on the corresponding characteristic attributes.
Step S2.2: and calculating an evaluation index value. And extracting the content features and the relationship network features, and calculating corresponding evaluation feature attribute values.
In computing the content features of the text, a feasibility report may be represented as a document, the document consisting of different topics, the different topics consisting of different keywords. Therefore, for the calculation of the feature attributes of the content, the embodiment of the present invention uses an lda (content Dirichlet allocation) document topic generation model for representation and calculation, that is, each keyword in the feasibility report is obtained by a process of selecting a certain topic with a certain probability and selecting a certain keyword from the topic with a certain probability, the document to the topic is subject to a polynomial distribution, and the topic to the keyword is subject to a polynomial distribution.
Figure BDA0002490342880000041
Figure BDA0002490342880000042
Representing a content characteristic Attribute pojOn subject ltiThe probability distribution is calculated by the LDA model.
Figure BDA0002490342880000043
The larger the description of the content characteristic attribute pojSubject with (lt)iThe higher the degree of correlation.
Figure BDA0002490342880000044
It means that the probability distributions of all the content characteristic attributes over the topic are summed.
In addition to the calculation of the static attribute value, the content feature also needs to consider the feature superposition based on the time factor, namely the heat degree of the research direction. The research direction may be expressed as the heat of the keyword.
Figure BDA0002490342880000045
Where Num (Δ t) represents the number at a specified cycle time and sum (kw) represents the number of keywords obtained. By NmaxFor normalization to [0,1]]The value is obtained.
The method for calculating the relational network characteristics comprises the following steps:
1) centrality of project membership
Project membership centrality is a method for measuring network node importance, and can reflect the position of the person in an academic cooperation network. The near-centrality value is the inverse of the average of the shortest paths from each node to all other nodes.
Figure BDA0002490342880000051
Wherein N represents the number of total nodes in the scholars relational network; the shortest path between any two nodes i and j in the network is represented and is usually calculated by a Dijkstra method.
The academic influence of the project members in a certain field can be better reflected by the influence in the project member field based on the PageRank. The higher PageRank value of the human indicates that the human has higher influence and transmission capability in the field
Figure BDA0002490342880000052
Wherein M ispiIs all and the person piSet of authors with direct partnership, L (p)j) Is author pjN is the total number of authors, α typically takes 0.85.
The team clustering coefficient characterizes how closely the project members are in a domain of collaboration. Clustering coefficient CiThe degree of mutual connection between all adjacent nodes expressing the node i is defined as
Figure BDA0002490342880000053
Wherein k isiIs the degree of node i, MiK for node iiThe number of edges in the neighbor node that are actually connected. Degree of rotation
Figure BDA0002490342880000054
Wherein wijAnd represents the weight of the cooperative network edge, namely the cooperative times of the node i and the node j.
Step S2.3: project feasibility prediction based on a multi-layer neural network. And obtaining a characteristic attribute set by using a characteristic attribute representation mechanism and an extraction method, wherein the characteristic attribute set is used as characteristic input based on the multilayer neural network. The project feasibility prediction problem belongs to a two-class problem, namely, the project feasibility prediction problem can be expressed as a supportable option and a non-supportable option. Inputting a plurality of characteristic attributes into a training model, and completing three steps of forward calculation, reverse calculation and weight gradient updating calculation by using an error common measurement method loss function expressing the predicted value and the actual value based on a back propagation BP algorithm so as to improve the prediction accuracy and the prediction performance. In practical application, the related feature attributes can be combined, for example, the text content features extracted from keywords and titles can be used as an evaluation index as feature input after weight calculation.
Specifically, the content feature and the relational network feature index value are used as model input, a weighting loss function is defined, a two-classification learning target is defined, and feasibility classification is realized on the basis of the LTSM neural network.
For the two-class prediction problem, the neural network model is usually measured by a loss function when expressing the error between its predicted value and actual value. Wherein, the cross entropy loss function is commonly used to handle the two-classification problem, for the cross entropy of a single sample:
Figure BDA0002490342880000061
Figure BDA0002490342880000062
where y is the true value in the experimental data,
Figure BDA0002490342880000063
representing the predicted value of the model.
Further introduces a weight wiThus balancing the positive and negative sample distributions:
Figure BDA0002490342880000064
Figure BDA0002490342880000065
Figure BDA0002490342880000066
p and N are respectively positive and negative samples, N is the total number of samples
And finally, predicting based on the weighted cross entropy loss function improved multi-layer neural network LSTM model. According to the accuracy of prediction, determine whether feasibility is available? When the item is determined to be feasible, step S2.4 is executed; otherwise step S2.5 is performed.
Step S2.4: and (4) mapping the scoring rule and the characteristic attribute, and realizing the calculation of the value of the characteristic attribute. On the basis of predicting whether the project is feasible or not, scoring and feasibility analysis evaluation are needed to be carried out on the project. The scoring rules of different projects are different, and the scoring rules of the projects need to be decomposed. For example, research content and technical routes can be combined as a scoring rule: the technology is advanced. That is, the set scoring rules need to be decomposed into machine-understandable text content features or relationship network features, and the corresponding feature attributes are weighted and combined to form a certain scoring rule. And analyzing and decomposing the manually defined rules to form characteristic attributes aiming at certain types of project evaluation, and finishing the value calculation of the corresponding characteristic attributes on the basis. The feature value calculation method adopts various feature calculation methods provided in step S2.2. Under the driving of the scoring rules, when each rule is scored, a standard value needs to be given, namely, a result value which is most consistent with the rule is used as the standard value. The standard value may be defined as 1. When the actual project is evaluated, the actual project is converted into features, attribute values are calculated and normalized to be [0,1] values, weights of various feature attributes are calculated, the calculated result is compared with a standard value, and a proportion value is multiplied by 100 to serve as a final evaluation value.
Step S2.5: and (4) predicting and judging the feasibility of the project. Firstly, the judgment of item guide conformity is realized through the similarity judgment of the content characteristic attribute and the guide text characteristic attribute.
Content similarity
Figure BDA0002490342880000067
Wherein
Figure BDA0002490342880000071
I.e. calculated using the negative symmetric KL divergence formula.
A threshold is set beyond which the content of the guide is deemed to be consistent.
Further, a project feasibility prediction mechanism based on the multilayer neural network is utilized to judge whether the project is feasible or infeasible.
Step S2.6: and (5) giving a project feasibility analysis report. The feasibility analysis report comprises specific scoring conditions of defined scoring rules, and for feasibility, a TextRank keyword extraction algorithm is used for extracting phrases and forming an automatic abstract, so that evaluation experts can directly extract quotations conveniently.
Step S2.7: the flow ends. And providing a project feasibility evaluation and project feasibility evaluation analysis report.
In one example, different items have different evaluation rules, but these evaluation rules can refine the decomposition into different feature attributes and weight the calculation combination. The main sections of the project feasibility report expressing the feasibility of the project feasibility report comprise research content, key scientific problems, technical routes and other text content, and personnel relationships and influential network characteristics of team composition, team research foundation and the like. And extracting relevant characteristic attributes, calculating characteristic attribute values, predicting the feasibility of the characteristic attributes as characteristic input of a subsequent two-classification algorithm, and outputting score values by forming calculated values of a scoring rule through weighted combination.
Judging whether a project is feasible or not, mainly extracting two types of features: text and network relationships. The context characteristics mark the feasibility of the research portion of the report, and the network relationships mark the research basis for the team. Taking a natural fund application as an example, keywords and topics of paragraphs such as extracted research content, research targets, to-be-solved scientific problems, research methods and the like are taken as the content characteristics of the text. And extracting the names, the titles, the ages and the feasibility analysis parts of the team members as basic data of the relationship network characteristics, and forming the relationship network characteristics by combining information such as thesis, patents and the like.
And after the relevant characteristic attributes are obtained, calculating the characteristic attribute values to form the input conditions of the multilayer neural network. Before prediction is carried out, a model is trained by using historical data, and two classification labels 0 and 1 are set, wherein 0 represents infeasibility, and 1 represents feasibility. And (3) giving parameter values of the model by training the neural network model and combining with the actual judgment condition of an expert.
And testing the test data set by using the trained neural network model, checking the accuracy of prediction, and giving a feasible or infeasible conclusion.
For the scoring rules, three types of scoring criteria can be set, such as compliance with guideline conditions, project advancement, team strength. And (4) converting the three types of scoring rules into combination weighting of text similarity and feasibility report content characteristics of each paragraph and team influence respectively. And respectively calculating by using a characteristic calculation formula to give corresponding scores. Giving text similarity, such as 80%, for the guideline fit case; aiming at the advancement of the project, on the basis of setting a standard value, combining and calculating keywords of paragraphs such as research content, a research target, a scientific problem to be solved, a research method and the like, wherein if the calculated value is 0.69, the score is 69; and (3) aiming at the team influence, calculating to obtain a numerical value between [0,1], weighting the numerical value by percentage, and giving a specific score, such as 87. Setting standard texts with different scores according to the condition of meeting the guideline, and giving specific opinions according to the scores; automatically forming summary information of related contents aiming at the advancement of the project; and aiming at the strength of the team, setting standard texts with different scores, and giving specific opinions according to the scores.
The embodiment of the invention is oriented to a scientific research report characteristic attribute representation mechanism and an extraction method, and aims at the content characteristics, the personnel relationship and the time characteristics of a project feasibility report, and the identification is converted into an evaluation index system aiming at the project feasibility; performing two-classification prediction on project feasibility based on a project feasibility prediction model of a multilayer neural network; and calculating a score through a set scoring rule, and providing a project feasibility analysis report.
For example, a significant amount of project review experience has shown that typically a project needs to be matched with project guide content at the time of application. Under the premise of being consistent with the project guide, experts comprehensively evaluate the factors reflecting the team strength, such as the advancement of research contents, the feasibility of technical routes, the team capacity, the research foundation and the like. In the evaluation process, screening and filtering are carried out according to whether the research content of the project is a research hotspot or not, whether the project is a repeated application or not and the like. In the actual project evaluation process, the feature value calculation is carried out by extracting the relevant text subject characteristics and the character relation characteristics in the project feasibility report and expressing the relevant text subject characteristics and the character relation characteristics as different feature attributes and evaluation indexes. And on the basis, an algorithm model is used for judging the feasibility of the project and the feasibility calculation basis. The evaluation expert uses the method and the process to accelerate the project evaluation process and can quickly judge whether the project is applied or the research content is repeated.
It will be obvious that many variations of the invention described herein are possible without departing from the true spirit and scope of the invention. Accordingly, all changes which would be obvious to one skilled in the art are intended to be included within the scope of this invention as defined by the appended claims. The scope of the invention is only limited by the claims.

Claims (10)

1. A project feasibility prediction analysis method based on scientific research multidimensional characteristics is characterized by comprising the following steps:
step one, defining a project feasibility characteristic attribute classification and expression method, expressing project feasibility characteristics and constraints in a fine-grained manner, and learning the weight of characteristic attributes in a principal component analysis mode;
extracting and calculating the evaluation feature attributes, extracting the content features and the relationship network features, and calculating the numerical values of the corresponding evaluation feature attributes;
thirdly, project feasibility prediction based on the multilayer neural network;
mapping the scoring rules and the characteristic attributes, and realizing the calculation of the characteristic attribute values;
fifthly, project feasibility analysis based on the scoring rule;
and sixthly, providing a project feasibility analysis report.
2. The method according to claim 1, wherein in the second step, the text content features comprise topics, keywords, research content, scientific questions, keywords extracted from technical route paragraphs; the relational network characteristics comprise the centrality of the student status in the character collaborative network, the influence in the character field based on PageRank, the team clustering coefficient, the influence of related results, the heat of keywords and the strength of teams; aiming at the content characteristics, an LDA document theme generation model is adopted for representation and calculation, and the formula is as follows:
Figure FDA0002490342870000011
Figure FDA0002490342870000012
representing text feature attributes pojOn subject ltiThe probability distribution is calculated by an LDA model;
Figure FDA0002490342870000013
larger caption text feature attribute pojSubject with (lt)iThe higher the degree of correlation;
Figure FDA0002490342870000014
it represents the sum of the probability distributions of all text feature attributes over the topic.
3. The method according to claim 2, wherein in the second step, the superposition of features based on time factors, i.e. the heat of the research direction, is considered; the research direction can be expressed as the heat degree of the keyword and is expressed by the following formula:
Figure FDA0002490342870000015
wherein Num (Δ t) represents the number at a prescribed cycle time, sum (kw) represents the number of obtained keywords; by NmaxFor normalization to [0,1]]The value is obtained.
4. The method according to claim 1 or 2, wherein in the second step, the relationship network feature calculation method comprises:
calculating the item membership centrality by the following formula:
Figure FDA0002490342870000021
wherein N represents the number of total nodes in the scholars relational network; the shortest path between any two nodes i and j in the network is represented and is usually calculated by a Dijkstra method;
the PageRank value is calculated by the following formula:
Figure FDA0002490342870000022
wherein
Figure FDA0002490342870000023
Is all and the person piSet of authors with direct partnership, L (p)j) Is author pjN is the total number of authors, α typically takes 0.85.
The team clustering coefficient is defined as
Figure FDA0002490342870000024
Clustering coefficient CiExpresses the degree of mutual connection between all adjacent nodes of the node i, kiIs the degree of node i, MiK for node iiNumber of edges actually connected in neighboring nodes, degree
Figure FDA0002490342870000025
Wherein wijAnd represents the weight of the cooperative network edge, namely the cooperative times of the node i and the node j.
5. The method according to claim 1, wherein the third step is specifically: and taking the content characteristics and the relational network characteristic index values as model input, defining a weighting loss function, defining a two-classification learning target, and realizing feasibility classification based on the LTSM neural network.
6. The method of claim 5, wherein for the two-class prediction problem, the neural network model is usually scaled by a loss function when expressing the error between its predicted and actual values; wherein, the cross entropy loss function is commonly used to handle the two-classification problem, for the cross entropy of a single sample:
Figure FDA0002490342870000026
Figure FDA0002490342870000027
where y is the true value in the experimental data,
Figure FDA0002490342870000028
representing the predicted value of the model.
Further introduces a weight wiThereby balancing the positive and negative samplesDistribution:
Figure FDA0002490342870000031
Figure FDA0002490342870000032
Figure FDA0002490342870000033
p and N are respectively positive and negative samples, N is the total number of samples
And finally, predicting based on the weighted cross entropy loss function improved multi-layer neural network LSTM model.
7. The method according to claim 1, wherein the fourth step is specifically: analyzing and decomposing the manually defined rules to form characteristic attributes aiming at certain type of project review, and finishing the value calculation of the corresponding characteristic attributes on the basis; the characteristic value calculation method adopts various provided characteristic calculation methods; under the driving of a scoring rule, when each rule is scored, a standard value needs to be given, namely a result value which is most consistent with the rule is taken as the standard value; the standard value can be defined as 1; when the actual project is evaluated, the actual project is converted into features, attribute values are calculated and normalized to be [0,1] values, weights of various feature attributes are calculated, the calculated result is compared with a standard value, and a proportion value is multiplied by 100 to serve as a final evaluation value.
8. The method according to claim 1, wherein the step five is specifically: firstly, judging the conformity of the project guide by judging the similarity of the content characteristic attribute and the guide text characteristic attribute;
content similarity
Figure FDA0002490342870000034
Wherein
Figure FDA0002490342870000035
I.e. calculated using the negative symmetric KL divergence formula.
A threshold is set beyond which the content of the guide is deemed to be consistent.
9. The method of claim 8, wherein a project feasibility prediction mechanism based on a multi-layer neural network is utilized to give a feasible or infeasible judgment to the project.
10. The method of claim 1, wherein the feasibility analysis report comprises specific scoring scenarios of defined scoring rules, and for feasibility, phrases are extracted and automatic summaries are formed using a TextRank keyword extraction algorithm.
CN202010403375.8A 2020-05-13 2020-05-13 Project feasibility prediction analysis method based on scientific research multidimensional features Active CN111598331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010403375.8A CN111598331B (en) 2020-05-13 2020-05-13 Project feasibility prediction analysis method based on scientific research multidimensional features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010403375.8A CN111598331B (en) 2020-05-13 2020-05-13 Project feasibility prediction analysis method based on scientific research multidimensional features

Publications (2)

Publication Number Publication Date
CN111598331A true CN111598331A (en) 2020-08-28
CN111598331B CN111598331B (en) 2023-07-07

Family

ID=72191481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010403375.8A Active CN111598331B (en) 2020-05-13 2020-05-13 Project feasibility prediction analysis method based on scientific research multidimensional features

Country Status (1)

Country Link
CN (1) CN111598331B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298377A (en) * 2021-05-21 2021-08-24 建信金融科技有限责任公司 Method and device for screening items in enterprise research and development expense and deduction
CN116596395A (en) * 2023-05-29 2023-08-15 深圳市中联信信息技术有限公司 Operation quality control platform for engineering project evaluation unit guidance and detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737285A (en) * 2012-06-15 2012-10-17 北京理工大学 Back propagation (BP) neural network-based appropriation budgeting method for scientific research project
CN103455596A (en) * 2013-09-02 2013-12-18 广东省计算中心 Science and technology project establishment evaluation method based on big data
WO2016053147A1 (en) * 2014-09-30 2016-04-07 Федеральное Государственное Бюджетное Образовательное Учреждение Высшего Образования "Российская Академия Народного Хозяйства И Государственной Службы При Президенте Российской Федерации" Evaluation of scientific research projects for correspondence to a world-class research level
CN109272228A (en) * 2018-09-12 2019-01-25 石家庄铁道大学 Scientific research influence power analysis method based on Research Team's cooperative network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737285A (en) * 2012-06-15 2012-10-17 北京理工大学 Back propagation (BP) neural network-based appropriation budgeting method for scientific research project
CN103455596A (en) * 2013-09-02 2013-12-18 广东省计算中心 Science and technology project establishment evaluation method based on big data
WO2016053147A1 (en) * 2014-09-30 2016-04-07 Федеральное Государственное Бюджетное Образовательное Учреждение Высшего Образования "Российская Академия Народного Хозяйства И Государственной Службы При Президенте Российской Федерации" Evaluation of scientific research projects for correspondence to a world-class research level
CN109272228A (en) * 2018-09-12 2019-01-25 石家庄铁道大学 Scientific research influence power analysis method based on Research Team's cooperative network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298377A (en) * 2021-05-21 2021-08-24 建信金融科技有限责任公司 Method and device for screening items in enterprise research and development expense and deduction
CN113298377B (en) * 2021-05-21 2023-06-16 建信金融科技有限责任公司 Method and device for screening project in enterprise development cost addition and deduction
CN116596395A (en) * 2023-05-29 2023-08-15 深圳市中联信信息技术有限公司 Operation quality control platform for engineering project evaluation unit guidance and detection
CN116596395B (en) * 2023-05-29 2023-12-01 深圳市中联信信息技术有限公司 Operation quality control platform for engineering project evaluation unit guidance and detection

Also Published As

Publication number Publication date
CN111598331B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
Wang et al. Studies on a multidimensional public opinion network model and its topic detection algorithm
CN103229168B (en) The method and system that evidence spreads between multiple candidate answers during question and answer
Mythili et al. An Analysis of students’ performance using classification algorithms
Kaza et al. Evaluating ontology mapping techniques: An experiment in public safety information sharing
CN105893483A (en) Construction method of general framework of big data mining process model
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN107103100A (en) A kind of fault-tolerant intelligent semantic searching method based on data collection of illustrative plates, Information Atlas and knowledge mapping framework for putting into driving
CN112734154A (en) Multi-factor public opinion risk assessment method based on fuzzy number similarity
CN111598331B (en) Project feasibility prediction analysis method based on scientific research multidimensional features
CN117271767B (en) Operation and maintenance knowledge base establishing method based on multiple intelligent agents
CN113779264A (en) Trade recommendation method based on patent supply and demand knowledge graph
CN110909529A (en) User emotion analysis and prejudgment system of company image promotion system
Xu et al. CET-4 score analysis based on data mining technology
Liu et al. Ontology representation and mapping of common fuzzy knowledge
Zhao RETRACTED ARTICLE: Application of deep learning algorithm in college English teaching process evaluation
CN114580418A (en) Knowledge map system for police physical training
Su The study of physical education evaluation based on a fuzzy stochastic algorithm
CN115759198A (en) Multi-view graph learning-based citation network framework construction method
CN114519092A (en) Large-scale complex relation data set construction framework oriented to Chinese field
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
Li et al. University Students' behavior characteristics analysis and prediction method based on combined data mining model
CN114706951A (en) Temporal knowledge graph question-answering method based on subgraph
CN113835739A (en) Intelligent prediction method for software defect repair time
Lang Research on College English Teaching Quality Assessment Method Based on K-Means Clustering Algorithm
Keskin et al. Cohort fertility heterogeneity during the fertility decline period in Turkey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant