CN114169731A - Scientific research institution rating system, method, equipment and storage medium - Google Patents

Scientific research institution rating system, method, equipment and storage medium Download PDF

Info

Publication number
CN114169731A
CN114169731A CN202111451990.7A CN202111451990A CN114169731A CN 114169731 A CN114169731 A CN 114169731A CN 202111451990 A CN202111451990 A CN 202111451990A CN 114169731 A CN114169731 A CN 114169731A
Authority
CN
China
Prior art keywords
scientific research
output
research institution
characteristic field
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111451990.7A
Other languages
Chinese (zh)
Inventor
樊宇航
徐鹏景
袁华
朱悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Science And Technology Development Co ltd
Original Assignee
Shanghai Science And Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Science And Technology Development Co ltd filed Critical Shanghai Science And Technology Development Co ltd
Priority to CN202111451990.7A priority Critical patent/CN114169731A/en
Publication of CN114169731A publication Critical patent/CN114169731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Abstract

The application provides a scientific research institution rating system, method, equipment and storage medium, which are used for obtaining scientific research data related to a scientific research institution and cleaning the data; performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions; calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements; and constructing a random forest classification tree model, and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score. The method and the device can effectively improve the processing efficiency and effect of expert review in the face of a large amount of data, and have a good decision-making assisting function; the method is simple in algorithm, easy to implement, labor cost is saved, evaluation period is shortened, and efficiency is greatly improved.

Description

Scientific research institution rating system, method, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a scientific research institution rating system, method, device, and storage medium.
Background
In recent years, the investment of the state on scientific research institutions is continuously increased, the international scientific frontier is aimed at, and the national development strategy target is realized, wherein in the Shanghai, the global scientific and technological innovation center is established as the target, and the investment and output benefit evaluation on three main scientific research institution objects (engineering center, key laboratory and professional technical service platform) is urgently needed, so that the optimal configuration of corresponding resources is realized, and the purpose of 'promoting construction by evaluation' is achieved. At the present stage, each evaluation unit fills annual report information in the system, wherein the annual report information comprises unit basic information, personnel information, thesis patent output information, supporting project information and the like, although the system is structured data, the data dimension is large, the related aspects are many, and meanwhile, the data quality is uneven, and the annual report data needs to be reasonably classified and reduced in dimension so as to construct an evaluation index system of a scientific research institution; in the aspect of the final equivalent division, the conventional method is to organize experts in related fields for evaluation, although the method is authoritative, the subjective factor is too strong, and meanwhile, the objective data report is large in quantity, so that the comparison and judgment are performed artificially, the efficiency is low, errors are easy to occur, and the whole evaluation period is too long. By adopting the machine learning method, relevant characteristics can be trained and learned from multi-dimensional data, and the grading of scientific research institutions is performed according to the relevant characteristics, so that the machine learning method finally has the function similar to that of expert review.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present application to provide a rating system, method, device and storage medium for a research institution to solve the problem of rating for the institution rating of the research institution in the prior art.
To achieve the above and other related objects, the present application provides a scientific research institution rating system, the system comprising: the acquisition module is used for acquiring scientific research data related to a scientific research institution and cleaning the data; the characteristic module is used for performing correlation aggregation on the cleaned scientific research data to obtain a statistical index, reserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions; the scoring module is used for calculating the output scores of the research institutions corresponding to the multiple dimensions by utilizing an entropy method according to the output characteristic fields under the output elements; and the rating module is used for constructing a random forest classification tree model and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
In an embodiment of the present application, the data cleansing includes: deleting repeated values, supplementing missing values, unifying measuring units and mapping the digital codes into any one or more combinations of characters.
In an embodiment of the present application, the dimension reduction process includes: and reserving fields with the proportion of non-zero values higher than a preset value in each statistical index for being used as index characteristic fields.
In an embodiment of the present application, the scoring module is configured to: constructing an original matrix according to the statistics of all the output characteristic fields and all scientific research institutions; calculating the contribution degree of each research institution to each output characteristic field; calculating the weight corresponding to the output characteristic field according to each contribution degree; and multiplying the statistics corresponding to each yield characteristic field by a weight, and adding the weighted values of the same yield element to obtain the yield score of each dimension.
In an embodiment of the present application, the scoring module is configured to: the original matrix M is:
Figure BDA0003386478650000021
wherein M is an original matrix, x is a statistic value, M represents the number of scientific research institutions, and n represents the outputThe number of the characteristic fields; a represents a research institution; contribution degree of each research institution to each output characteristic field:
Figure BDA0003386478650000022
wherein, PijField x representing the jth yield characteristicijThe ith scientific research institution AiThe degree of contribution of (c); with EjRepresenting all scientific research institutes to attribute XjThe total amount of contribution of (c);
Figure BDA0003386478650000023
wherein, In (p)ij) Denotes p with e as baseijThe logarithm of (d); k is 1/in (m), and m is the total number of scientific research institutions; weight W corresponding to each yield characteristic fieldjComprises the following steps: dj=1-Ej
Figure BDA0003386478650000024
In an embodiment of the present application, the rating module is configured to: the input characteristic is that each input characteristic field of the scientific research structure corresponding to the input elements; the target output is according to the output grade evaluation grade; constructing a random forest classification tree model, setting initial parameters, and adjusting the initial parameters by using a grid method; the initial parameters include: any one or more of the number of estimators, classification tree depth, and minimum number of samples of leaf nodes.
To achieve the above and other related objects, the present application provides a scientific research institution rating method, the method comprising: acquiring scientific research data related to a scientific research institution and cleaning the data; performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions; calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements; and constructing a random forest classification tree model, and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
To achieve the above and other related objects, the present application provides a computer apparatus, comprising: a memory, and a processor; the memory is to store computer instructions; the processor executes computer instructions to implement the functionality of the system as described above.
To achieve the above and other related objects, the present application provides a computer readable storage medium storing computer instructions which, when executed, perform the functions of the system as described above.
In summary, the scientific research data related to the scientific research institutions are acquired and data cleaning is performed on the scientific research institutions by the aid of the scientific research institution rating system, the scientific research institution rating method, the scientific research institution rating equipment and the storage medium; performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions; calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements; and constructing a random forest classification tree model, and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
Has the following beneficial effects:
the method and the device can effectively improve the processing efficiency and effect of expert review in the face of a large amount of data, and have a good decision-making assisting function; the method is simple in algorithm, easy to implement, labor cost is saved, evaluation period is shortened, and efficiency is greatly improved.
Drawings
FIG. 1 is a block diagram of a scientific research institution rating system according to an embodiment of the present application.
FIG. 2 is a flowchart illustrating a scientific research institution rating method according to an embodiment of the present disclosure.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only schematic and illustrate the basic idea of the present application, and although the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, the type, quantity and proportion of the components in actual implementation may be changed at will, and the layout of the components may be more complex.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
In order to solve the problems, the application provides a scientific research institution rating system, a scientific research institution rating method, a scientific research institution rating device and a storage medium, wherein an evaluation index system of the scientific research institution is constructed mainly based on an entropy method and a random forest, and is graded according to the evaluation index system, so that the efficiency of a review process of the scientific research institution is improved, and automation and process are realized.
FIG. 1 is a block diagram of a scientific research institution rating system according to one embodiment of the present application. As shown, the system 100 includes:
the acquisition module 101 is configured to acquire scientific research data related to a scientific research institution and perform data cleaning.
In some examples, yearbook data sheets for each dimension of the scientific research institution are downloaded from the Hive database by writing a Sql search. The scientific data includes but is not limited to: any one or more of basic information, depending unit information, personnel information, thesis information and project information. Wherein, the information can be independently generated into an Excel file.
In the application, in order to ensure the data quality of the subsequent input model, the acquired scientific research data is subjected to data cleaning. Wherein the data cleansing comprises: deleting repeated values, supplementing missing values, unifying measuring units and mapping the digital codes into any one or more combinations of characters. For example, duplicate records of papers, items, etc. are deleted, ensuring the accuracy of the statistical quantity thereafter; supplementing the partial missing values by means of an external data source; the consistency of the measurement units is ensured aiming at fields such as the use duration of the instrument, the related amount and the like; the digital codes are mapped into characters, so that the understanding and the subsequent aggregation calculation are facilitated.
The feature module 102 is configured to perform correlation aggregation on the cleaned scientific research data to obtain a statistical index, retain an index feature field through dimension reduction processing, and divide the index feature field into input elements with multiple dimensions and output elements with multiple dimensions.
In short, the method and the device can perform aggregation calculation aiming at related tables of people, papers, projects and the like, and perform data table connection. For example, through aggregation, statistical indexes of total scientific research amount such as total number of persons and total number of papers, and statistical indexes of scientific research of fine categories such as number of persons with high-level job title and total amount of provincial-level projects can be counted.
In an embodiment of the present application, the dimension reduction process includes: and reserving fields with the proportion of non-zero values higher than a preset value in each statistical index for being used as index characteristic fields.
In short, the dimension reduction process is to remove the field with the zero value too high proportion to ensure that the data matrix of the input model is not too sparse. For example, a field with a non-zero value proportion higher than 40% is reserved as the index feature field.
It should be noted that, because the scientific research institution unit fills in uploaded annual report data and performs aggregation calculation, zero-value proportion of some fields is too high, which has negative influence on entropy method and random forest model training, and it is indicated that the feature has little effect on the grade differentiation of the scientific research institution, so that only the fields with non-zero-value proportion higher than the threshold value are reserved as the subsequent training features.
Further, the index characteristic field is divided into input elements containing multiple dimensions and output elements containing multiple dimensions, and calculation of output scores and training of random forest models are facilitated.
In some examples, the investment factors are mainly indexes that can obviously reflect the investment of the scientific research institution after the above processing, such as the year research and development investment, the titles of each year, the number of people in the age group, the number of projects of different grades born and the amount of money thereof; for example, yield elements may be divided into dimensions or portions, such as academic yields, technical yields, talent yields, service yields. Specifically, academic yields mainly refer to academic literature-related indexes such as the quantity of the thesis yields; the technical output is mainly related indexes of technical capabilities such as patents, standards and the like; the service output mainly refers to the relevant indexes of external services such as the service duration of an open instrument; people mainly produce the number of people who cultivate the doctor, and the like to cultivate the relevant indexes.
And the scoring module 103 is used for calculating the output scores of the plurality of dimensions corresponding to each research institution by using an entropy method according to the output feature fields under the output elements.
In particular, the scoring module is configured to:
A. and constructing an original matrix according to the statistics of all the output characteristic fields and all scientific research institutions.
For example, the original matrix M is:
Figure BDA0003386478650000051
wherein M is an original matrix, x is a statistic value, M represents the number of scientific research institutions, and n represents the number of output characteristic fields; a represents a research institution.
B. And calculating the contribution degree of each research institution to each output characteristic field.
Briefly, the contribution degree is calculated by the ratio of the statistics of each yield characteristic field to the sum of the statistics of the similar yield characteristic fields. For example, the contribution may be calculated for the original matrix and then the values are summed over the column to form a new contribution matrix. Therefore, the contribution of each research institution to each of the yield characteristics fields can be expressed as:
Figure BDA0003386478650000052
wherein, PijField x representing the jth yield characteristicijThe ith scientific research institution AiThe degree of contribution of (c).
C. And calculating the weight corresponding to the output characteristic field according to each contribution degree.
Suppose, with EjRepresenting all scientific research institutes to attribute XjThe total amount of contribution of (c);
Figure BDA0003386478650000061
wherein, In (p)ij) Denotes p with e as baseijThe logarithm of (d); k is 1/in (m), and m is the total number of scientific research institutions;
weight W corresponding to each yield characteristic fieldjComprises the following steps:
Dj=1-Ej
Figure BDA0003386478650000062
D. and multiplying the statistics corresponding to each yield characteristic field by a weight, and adding the weighted values of the same yield element to obtain the yield score of each dimension.
Briefly, the weight W obtained abovejEach yield characteristic field is specific to each yield characteristic field, so that in actual calculation, firstly, the statistics corresponding to each yield characteristic field is multiplied by a weight to obtain a weighted value, and then the weighted values belonging to the same yield element are added to obtain a yield score corresponding to one dimension. By analogy, the yield scores of all dimensions can be obtained.
And the rating module 104 is used for constructing a random forest classification tree model and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
In an embodiment of the present application, the rating module is configured to:
A. the input characteristic is that each input characteristic field of the scientific research structure corresponding to the input elements; the target output is according to the output grade evaluation grade;
B. constructing a random forest classification tree model, setting initial parameters, and adjusting the initial parameters by using a grid method; the initial parameters include: any one or more of the number of estimators, classification tree depth, and minimum number of samples of leaf nodes.
For example, the scientific research institution's ranking results may be preferably divided into a training set and a test set. For example, scientific data is divided into 70% training set and 30% testing set.
Then, all input characteristic fields under input elements of the scientific research institution are used as input characteristics, and grades are divided according to output scores to be used as target output. For example, input feature fields such as development input, high-tech ratio, amount of charge, and the like may be used as input features.
Finally, a random forest classification tree model is constructed, and initial parameters are set, such as but not limited to: the number of estimators, the classification tree depth, the minimum number of samples of leaf nodes, etc. And then, parameters are adjusted by utilizing a grid method, and a model is optimized, so that the prediction precision and the recall rate reach an ideal state.
In some examples, the classification model may be trained using existing base-ranking information, with the scientific institutions ranked into A, B, C and D four ranks based on the input feature fields.
Briefly, the present application provides a scientific research institution scoring system 100 based on quantitative statistics and machine learning, with the advantages and effects of: based on objective annual report data of scientific research institutions, the method and the system perform reasonable dimensionality reduction on data of redundant dimensionality to form a simplified and effective index system, and facilitate subsequent analysis processes; the method and the device can effectively improve the processing efficiency and effect of expert review in the face of a large amount of data, and have a good decision-making assisting function; the method is simple, easy to implement, labor-saving, low in cost, short in evaluation period and greatly improved in efficiency.
It should be noted that the division of the modules of the system 100 is only a logical division, and the actual implementation may be wholly or partially integrated into a physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, each module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the system 100 in the form of program code, and a processing element of the system 100 calls and executes the functions of each module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-Chip (SoC) 100.
FIG. 2 is a flow chart illustrating a scientific research institution rating method according to an embodiment of the present application. As shown, the method comprises:
step S201: acquiring scientific research data related to a scientific research institution and cleaning the data;
step S202: performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions;
step S203: calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements;
step S204: and constructing a random forest classification tree model, and determining the evaluation of each scientific research institution and the like according to the input characteristic fields under the input elements and the output scores.
It should be noted that, the embodiments of the content such as information interaction and execution process between the above method and each module of the system described in this application are based on the same concept, and the technical effect brought by the embodiment is the same as that of the system of this application, and specific content may refer to the description in the foregoing system embodiment of this application, and is not described herein again.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown, the computer device 300 includes: a memory 301, and a processor 302; the memory 301 is used for storing computer instructions; the processor 302 executes computer instructions to implement the functionality of the system described in fig. 1.
In some embodiments, the number of the memories 301 in the computer device 300 may be one or more, the number of the processors 302 may be one or more, and fig. 3 illustrates one example.
In an embodiment of the present application, the processor 302 in the computer device 300 loads one or more instructions corresponding to the processes of the application program into the memory 301 according to the steps described in fig. 1, and the processor 302 executes the application program stored in the memory 301, thereby implementing the functions of the system described in fig. 1.
The memory 301 may include a Random Access Memory (RAM), or may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 301 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
The Processor 302 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In some specific applications, the various components of the computer device 300 are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of illustration the various buses have been referred to in figure 3 as a bus system.
In an embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the functionality of the system as described in fig. 1.
The present application may be embodied as systems, methods, and/or computer program products, in any combination of technical details. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable programs described herein may be downloaded from a computer-readable storage medium to a variety of computing/processing devices, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as python, Smalltalk, C + + or the like and procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
In summary, the scientific research data related to the scientific research institutions are acquired and data cleaning is performed on the scientific research institutions by the aid of the scientific research institution rating system, the scientific research institution rating method, the scientific research institution rating equipment and the storage medium; performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions; calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements; and constructing a random forest classification tree model, and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
The application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the invention. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present application.

Claims (9)

1. A scientific research institution rating system, the system comprising:
the acquisition module is used for acquiring scientific research data related to a scientific research institution and cleaning the data;
the characteristic module is used for performing correlation aggregation on the cleaned scientific research data to obtain a statistical index, reserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions;
the scoring module is used for calculating the output scores of the research institutions corresponding to the multiple dimensions by utilizing an entropy method according to the output characteristic fields under the output elements;
and the rating module is used for constructing a random forest classification tree model and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
2. The scientific institution rating system of claim 1, wherein the data cleansing comprises: deleting repeated values, supplementing missing values, unifying measuring units and mapping the digital codes into any one or more combinations of characters.
3. The scientific institution rating system of claim 1, wherein the dimension reduction process comprises: and reserving fields with the proportion of non-zero values higher than a preset value in each statistical index for being used as index characteristic fields.
4. The scientific research institution rating system of claim 1, wherein the scoring module is configured to:
constructing an original matrix according to the statistics of all the output characteristic fields and all scientific research institutions;
calculating the contribution degree of each research institution to each output characteristic field;
calculating the weight corresponding to the output characteristic field according to each contribution degree;
and multiplying the statistics corresponding to each yield characteristic field by a weight, and adding the weighted values of the same yield element to obtain the yield score of each dimension.
5. The scientific research institution rating system of claim 4, wherein the scoring module is configured to:
the original matrix M is:
Figure FDA0003386478640000011
wherein M is an original matrix, x is a statistic value, M represents the number of scientific research institutions, and n represents the number of output characteristic fields; a represents a research institution;
contribution degree of each research institution to each output characteristic field:
Figure FDA0003386478640000012
wherein, PijField x representing the jth yield characteristicijThe ith scientific research institution AiThe degree of contribution of (c);
with EjRepresenting attributes of all scientific research institutionsXjThe total amount of contribution of (c);
Figure FDA0003386478640000021
wherein, In (p)ij) Denotes p with e as baseijThe logarithm of (d); k is 1/in (m), and m is the total number of scientific research institutions;
weight W corresponding to each yield characteristic fieldjComprises the following steps:
Dj=1-Ej
Figure FDA0003386478640000022
6. the method of claim 1, wherein the rating module is configured to:
the input characteristic is that each input characteristic field of the scientific research structure corresponding to the input elements; the target output is according to the output grade evaluation grade;
constructing a random forest classification tree model, setting initial parameters, and adjusting the initial parameters by using a grid method; the initial parameters include: any one or more of the number of estimators, classification tree depth, and minimum number of samples of leaf nodes.
7. A method for rating a research institution, the method comprising:
acquiring scientific research data related to a scientific research institution and cleaning the data;
performing correlation polymerization on the cleaned scientific research data to obtain a statistical index, preserving an index characteristic field through dimension reduction processing, and dividing the index characteristic field into input elements and output elements with multiple dimensions;
calculating output scores of multiple dimensionalities corresponding to each research institution by using an entropy method according to the output characteristic fields under the output elements;
and constructing a random forest classification tree model, and determining the evaluation level of each scientific research institution according to the input characteristic field under the input elements and the output score.
8. A computer device, the device comprising: a memory, and a processor; the memory is to store computer instructions; the processor executes computer instructions to implement the functionality of the system according to any one of claims 1 to 6.
9. A computer-readable storage medium having stored thereon computer instructions which, when executed, perform the functions of the system of any one of claims 1 to 7.
CN202111451990.7A 2021-12-01 2021-12-01 Scientific research institution rating system, method, equipment and storage medium Pending CN114169731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111451990.7A CN114169731A (en) 2021-12-01 2021-12-01 Scientific research institution rating system, method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111451990.7A CN114169731A (en) 2021-12-01 2021-12-01 Scientific research institution rating system, method, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114169731A true CN114169731A (en) 2022-03-11

Family

ID=80482031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111451990.7A Pending CN114169731A (en) 2021-12-01 2021-12-01 Scientific research institution rating system, method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114169731A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438101A (en) * 2022-10-13 2022-12-06 中国兵器工业计算机应用技术研究所 Data feature construction system and method based on feature morphology and data relationship

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438101A (en) * 2022-10-13 2022-12-06 中国兵器工业计算机应用技术研究所 Data feature construction system and method based on feature morphology and data relationship
CN115438101B (en) * 2022-10-13 2023-06-06 中国兵器工业计算机应用技术研究所 Data feature construction system and method based on feature morphology and data relationship

Similar Documents

Publication Publication Date Title
Li et al. Heterogeneous ensemble for default prediction of peer-to-peer lending in China
CN110569322A (en) Address information analysis method, device and system and data acquisition method
CN113554175B (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
WO2022246843A1 (en) Software project risk assessment method and apparatus, computer device, and storage medium
CN116611546B (en) Knowledge-graph-based landslide prediction method and system for target research area
WO2017071369A1 (en) Method and device for predicting user unsubscription
Moghimi et al. Applying multi-criteria decision-making (MCDM) methods for economic ranking of Tehran-22 districts to establish financial and commercial centers: Case: City of Tehran
CA2788509A1 (en) Statistical record linkage calibration for geographic proximity matching
CN116306888A (en) Neural network pruning method, device, equipment and storage medium
CN114169731A (en) Scientific research institution rating system, method, equipment and storage medium
CN108154380A (en) The method for carrying out the online real-time recommendation of commodity to user based on extensive score data
Wang et al. Towards efficient convolutional neural networks through low-error filter saliency estimation
CN111581197B (en) Method and device for sampling and checking data table in data set
CN112801315A (en) State diagnosis method and device for power secondary equipment and terminal
CN112085388A (en) Land value evaluation method, apparatus, terminal and readable storage medium
CN110232119B (en) Meta-analysis-based general intelligent measurement model construction method and system
CN109241146B (en) Student intelligent assistance method and system in cluster environment
CN114565196B (en) Multi-event trend prejudging method, device, equipment and medium based on government affair hotline
CN114121296B (en) Data-driven clinical information rule extraction method, storage medium and equipment
CN115293827A (en) Novel model interpretability analysis method for assisting fine operation of enterprise
CN110750572A (en) Adaptive method and device for heuristic evaluation of scientific and technological achievements
Sasmita et al. Development of machine learning implementation in engineering education: A literature review
CN115204501A (en) Enterprise evaluation method and device, computer equipment and storage medium
CN114861800A (en) Model training method, probability determination method, device, equipment, medium and product
CN111382246B (en) Text matching method, matching device, terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination