CN114139979A

CN114139979A - Service platform for specific research and development mechanism

Info

Publication number: CN114139979A
Application number: CN202111467957.3A
Authority: CN
Inventors: 龙云凤; 任志宽; 陈雪; 张百尚; 何悦; 王鸿飞; 蔡利超; 刘威
Original assignee: GUANGDONG SCIENCE AND TECHNOLO
Current assignee: GUANGDONG SCIENCE AND TECHNOLO
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-03-04

Abstract

The invention discloses a service platform for a specific research and development organization, which comprises: the enterprise self-evaluation table is an evaluation table for the enterprise to evaluate the specific research and development organization; the data acquisition module is used for acquiring corresponding index data of a specific research and development mechanism; the data source management module is used for displaying an acquisition source list of data and early warning an abnormal acquisition source; the statistical center module is used for carrying out statistics on policies, labels and tasks corresponding to a specific research and development organization; the information management module is used for managing the information of a specific research and development mechanism; and the mechanism management module is used for setting departments, accounts and authorities of specific research and development mechanisms. The invention dynamically collects the service data and the operation condition information, and performs correlation analysis and causal analysis by using a big data technology, thereby realizing intelligent statistics, monitoring and early warning, reducing the management cost and improving the data management efficiency.

Description

Service platform for specific research and development mechanism

Technical Field

The invention relates to the technical field of big data, in particular to a service platform for a specific research and development organization.

Background

At present, a novel research and development organization grows rapidly on innovative 'soil', and becomes a binder for deep integration of production, study and research, an incubator of a scientific and technological enterprise, a gathering place of high-end people, and a generator for subversive innovation.

To novel research and development mechanism management department, establish novel research and development mechanism database monitoring platform on the one hand and be favorable to regularly understanding the mechanism development with on-line form, effective monitoring institution operation, to analysis, statistics mechanism's relevant data simultaneously, masters the development trend, the characteristics of the novel research and development mechanism of global province, and then has the positive effect to the formulation of relevant special item, policy next.

In the prior art, a management department of a research and development organization can only manage data of the research and development organization manually, so that the management cost is high, and the data management efficiency is low.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the defects of the prior art, the invention provides a service platform for a specific research and development organization, and aims to solve the problems that in the prior art, a research and development organization management department can only manually manage data of the research and development organization, and the management cost is high.

The technical scheme of the invention is as follows:

a first embodiment of the present invention provides a service platform for a specific research and development organization, including:

the enterprise self-evaluation table is an evaluation table for the enterprise to evaluate the specific research and development organization;

the data acquisition module is used for acquiring corresponding index data of a specific research and development mechanism;

the data source management module is used for displaying an acquisition source list of data and early warning an abnormal acquisition source;

the statistical center module is used for carrying out statistics on policies, labels and tasks corresponding to a specific research and development organization;

the information management module is used for managing the information of a specific research and development mechanism;

and the mechanism management module is used for setting departments, accounts and authorities of specific research and development mechanisms.

Further, the platform comprises:

and the policy intelligent acquisition module is used for automatically capturing policy data related to a specific research and development organization.

Further, the platform further comprises:

and the policy data integration module is used for classifying the policy data and defining labels according to the classification.

Further, the policy intelligence collection module comprises:

the data acquisition source setting unit is used for acquiring a policy acquisition source specified by a user;

and the timing acquisition unit is used for acquiring the policy data of the policy acquisition source at a timing according to the frequency set by the user.

Further, the policy intelligence collection module further comprises:

and the policy acquisition source management unit is used for managing the website name, URL, classification column and acquisition state corresponding to the policy acquisition source, and sending early warning information to the administrator terminal if abnormality is detected.

Further, the policy intelligence collection module further comprises:

and the policy acquisition and analysis unit is used for analyzing the policy acquisition condition, the acquisition source abnormal condition, the acquisition area, the policy issuing condition, the policy category, the label with higher utilization rate, the policy click rate ranking and the editing policy statistics.

Further, the platform further comprises:

and the research and development organization operation table management module is used for correcting and updating the annual operation table field of the platform according to the data field of the third-party system.

Further, the research and development institution operation table management module comprises:

the operation table filling time setting unit is used for setting the filling time of the operation table;

the operation form filling reminding unit is used for informing a specific research and development organization to fill in the report in a mail or short message form after detecting that the set operation form filling time is met;

the operation table data examination unit is used for performing formal examination on data filled by a specific research and development organization and reminding errors of formal examination detection;

and the data monitoring unit is used for monitoring and analyzing the annual execution report related index data of a specific research and development organization, and if the analysis result is bad data, the data monitoring unit carries out annotation reminding.

Further, the organization management module includes:

and the account list information viewing unit is used for viewing account list information of the user in the system, wherein the account list information comprises a name, a belonged unit, a role, a mobile phone number, a login account, a login mobile phone number, the last login time and an account state.

Further, the platform further comprises:

and the log acquisition module is used for setting parameters of the platform and recording the system log.

Has the advantages that: the embodiment of the invention dynamically collects the service data and the operation condition information, and performs correlation analysis and causal analysis by using a big data technology, thereby realizing intelligent statistics, monitoring and early warning, reducing the management cost and improving the data management efficiency.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a functional block diagram of a service platform for a specific research and development organization according to the present invention;

FIG. 2 is a schematic diagram of a data collection process of an embodiment of a service platform for a specific research and development organization according to the present invention;

FIG. 3 is a schematic diagram of a data collection application of a service platform for a specific research and development organization according to a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a data processing and analysis flow of a service platform for a specific research and development organization according to a preferred embodiment of the present invention;

FIG. 5 is a diagram illustrating an organization of data standards for a service platform for a specific research and development organization according to a preferred embodiment of the present invention;

FIG. 6 is a flowchart illustrating a data normalization process of a service platform for a specific research and development organization according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Embodiments of the present invention will be described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic diagram of functional modules of a service platform for a specific research and development organization according to a preferred embodiment of the service platform for the specific research and development organization. As shown in fig. 1, it includes:

the enterprise wind collection module 11 is used for displaying a specific research and development organization list and generating an enterprise self-evaluation table, wherein the enterprise self-evaluation table is an evaluation table for evaluating specific research and development organizations of an enterprise;

the data acquisition module 12 is used for acquiring corresponding index data of a specific research and development organization;

the data source management module 13 is configured to display a data acquisition source list and perform early warning on an abnormal acquisition source;

the statistical center module 14 is used for counting policies, labels and tasks corresponding to a specific research and development organization;

an information management module 15 for managing information of a specific research and development organization;

and the institution management module 16 is used for setting departments, accounts and authorities of specific research and development institutions.

In specific implementation, the specific research and development organization in the embodiment of the invention is a novel research and development organization, which focuses on scientific and technological innovation requirements, mainly engages in scientific research, technical innovation and research and development services, invests in independent legal people organizations with diversified main bodies, modernized management systems, marketized operation mechanisms and flexible personnel selection mechanisms, and can be legally registered as scientific and technological civil non-enterprise organizations (social service organizations), institutions and enterprises.

The service platform series framework oriented to the specific research and development mechanism in the embodiment of the invention adopts a thinphp framework to construct each layer of the system. the thinkph uses a plurality of excellent frames and modes abroad for reference, uses an object-oriented development structure and an MVC mode, adopts a single entry mode and the like, integrates the Action idea of Struts, a tag library of JSP, ORM mapping of RoR and an ActiveRecord mode, encapsulates the CURD and some common operations, and has unique performances in the aspects of project configuration, class library import, a template engine, query language, automatic verification, a view model, project compilation, a cache mechanism, SEO support, a distributed database, connection and switching of a plurality of databases, an authentication mechanism and expansibility. Each layer has definite responsibility in the application program and should not confuse functions with other layers, so that the coupling degree of codes of each layer of the system is reduced, and the stability and the expandability of the system are improved.

The mechanism wind collecting module provided by the embodiment of the invention is mainly used for displaying a novel research and development mechanism and generating an enterprise self-evaluation table. The organization and wind collection module comprises three subunits of member management, an organization list and enterprise self-evaluation, wherein the member management subunit is used for managing members added into the service platform. The mechanism list subunit is used for displaying a list of the novel research and development mechanism, the enterprise self-evaluation table subunit is used for generating an evaluation table, the evaluation table comprises the identification conditions of the novel research and development mechanism, and the enterprise can log in the platform and then evaluate the specific research and development mechanism according to the self condition.

The data acquisition module is used for acquiring corresponding index data of a specific research and development mechanism and displaying the corresponding index data. The data acquisition module comprises three subunits, namely an acquisition list subunit, a to-be-checked type table and a notification list subunit, wherein the acquisition list subunit is used for displaying acquired corresponding index data of the novel research and development mechanism and the checked index data, the to-be-checked type table subunit is used for displaying the acquired corresponding index data of the novel research and development mechanism and the index data type table to be checked by a manager, and the notification list subunit is used for displaying the notification data.

And the data source management module is used for displaying a data acquisition source list and early warning abnormal acquisition sources. The data source management module comprises a total acquisition source list, an abnormal acquisition source and an acquisition source early warning subunit, wherein the total acquisition source list is used for acquiring and displaying all data sources, the acquisition source of the data refers to a source for acquiring corresponding index data of a specific research and development organization, the abnormal acquisition source unit is used for judging the abnormal acquisition source, and the acquisition source early warning is used for sending abnormal information to the administrator terminal when the abnormal acquisition source is judged.

The statistical center module is used for counting data related to a specific research and development organization, and further counting collection, policies, labels and tasks. The statistical center module is specifically used for data statistics functions such as organization statistics, executive list statistics and system logs, additionally, policy information statistics is added, and the statistical contents include but are not limited to policy acquisition conditions, acquisition source exception analysis, acquisition area analysis, policy release conditions, policy area/category structures, hot tags, policy click rate ranking, editor editing policy statistics and the like.

And optimizing the data chart statistical function and providing an individualized chart display mode. The platform adopts an Echarts visualization tool to display statistical data, adopts a JSON data format, and can smoothly run on different terminal devices. Echarts follows the principles of overall system development, is simple and easy to use with high scalability. Echarts can present the acquired data in various visual charts such as a scatter diagram, a bar chart, a broken line diagram, a pie chart and the like, and the charts have diversified forms and strong readability. The visualization technology of dynamic data can clearly present some complex data in a graphical way.

The information management module is used for managing the collected information related to the novel research and development mechanism. Wherein the information includes policy data collected for the new development organization. The information management module comprises the editing, the checking, the return editing and the automatic distribution of the policy.

The mechanism management module is used for setting accounts, departments and authorities of the novel research and development mechanism platform. The mechanism management module also comprises an enterprise management subunit, a mechanism management subunit and an abnormal mechanism list display subunit. The enterprise management subunit is used for uniformly managing all registered enterprises, managing and maintaining enterprise basic information, including but not limited to: enterprise LOGO, enterprise name, registration address, registration time, registration fund, unified social credit code, enterprise type, category of affiliation, industry of affiliation, domain of affiliation, principal of affiliation and contact, enterprise profile, etc.; common registered enterprises can be converted into novel research and development organizations. The organization management subunit is used for uniformly managing a user list which becomes a novel research and development organization; multiple retrieval can be performed according to conditions such as organization classification, warehousing year, technical field, establishment year, sales scale, personnel scale, intellectual property condition, operation product, enterprise keyword and the like. The self-evaluation form and the annual running form of the organization can be checked. Displaying a mechanism with abnormal annual operating list data on the site of the abnormal mechanism list display subunit; and (4) dynamically monitoring relevant indexes of the annual operating list of the mechanism, analyzing the relevant indexes, and labeling and reminding abnormal data. For example: if the index data are zero, carrying out exception marking; and carrying out abnormal annotation when the index data are unchanged for two consecutive years.

The service platform facing the specific research and development organization can realize that one account login platform handles all affair services, and effectively improves the management efficiency. The information disclosure, the declaration and the like, the full demonstration, the examination and approval balance and the responsibility investigation of the business management are realized by an informatization means, and the efficiency and the public service capability of the business management are effectively improved.

The treatment precision is realized. The service data and the operation condition information are dynamically collected, the big data technology is utilized to carry out correlation analysis and cause and effect analysis, intelligent statistics, monitoring and early warning are realized, and the accuracy of treatment is achieved.

And realizing decision scientization. Based on global science and technology data information, statistical analysis, monitoring and early warning are carried out on the science and technology development conditions and trends of novel research and development mechanisms from the macroscopic, mesoscopic and microscopic levels, and scientification of science and technology decision-making is comprehensively improved.

Furthermore, the information management module also comprises an information management unit and a scientific and technological achievement management unit, wherein the information management unit is used for managing policies and issuing, sorting, deleting and modifying information, and the scientific and technological cost management unit is used for managing the issuing of scientific and technological achievements and technologies.

Further, the database corresponding to the data source in the embodiment of the present invention mainly includes 6 sub-databases, such as an index database, a project database, a policy database, a result database, a case database, and a dynamic news database, and the system operation data mainly includes three types, i.e., structured data, unstructured data, and semi-structured data, including two forms, i.e., an electronic document and a paper document. Unstructured data includes forms of video, audio, pictures, images, documents, text, etc., and semi-structured data includes mail, HTML, reports, repositories, etc.

Furthermore, the data acquisition module builds a database of a novel development mechanism according to different data sources and different analysis purposes, and the data acquisition method mainly adopts 4 types of centralized acquisition, distributed acquisition, real-time acquisition, offline acquisition and the like. The first two are divided according to the acquisition mode, and the second two are divided according to the analysis mode. Centralized collection is to collect data from one or more data sources and to collect the data into a file or a database. The distributed collection is to collect data from multiple data sources simultaneously, but the collected data is not stored in a single storage object, but is stored in a computing cluster composed of multiple computers in a distributed manner, and the consistency and integrity of the content are maintained by a control server in the cluster. The acquisition process is shown in figure 2.

The real-time acquisition and the off-line acquisition correspond to a real-time analysis mode and an off-line analysis mode, and the two modes are the biggest difference that the real-time acquisition can directly participate in the real-time analysis in a mode of converting data into streams, belongs to the mode of simultaneously performing acquisition and analysis, and is generally used for real-time monitoring occasions with high requirements on timeliness. While the offline acquisition only stores data in a computer or a cluster, the storage can be centralized or distributed, but the acquisition itself is not directly interfaced with the offline analysis, and the offline acquisition is mostly used for case analysis, trend analysis and the like. In terms of technical implementation, the 4 ways need to be implemented by using a web crawler technology and a sensor technology in the first stage, and the biggest difference from the traditional acquisition is that the problem of subsequent integration and storage caused by the huge data volume. Therefore, data acquisition must be accomplished using large data acquisition techniques. In the existing big data technology, Flume is a relatively ideal data acquisition framework, and after data connection is established between the Flume and a web crawler and a sensor acquisition program, acquired data can be directly stored in a data cluster or a single file and a database in an off-line mode, and can also be directly output to a big data real-time calculation framework Storm in a streaming mode through a Kafka framework to participate in real-time calculation. The application process is shown in fig. 3.

One is access data management. The access data mainly comprises the operation related index data of the novel research and development mechanism. The access data management realizes that the original scattered and inconsistent data are collected, and the data are imported into a database of the system through a data form with a unified standard and a unified format of the platform, so that the data are organically integrated, and the data sharing and the query efficiency are improved. The functions of the system mainly comprise that the data accessed by the system exchange and the data directly reported by the system are subjected to unified maintenance and storage management.

Second, exchange access data management. The access data is extracted, checked, sorted, converted, loaded and put in storage for management; and (4) performing classified ledger management of exchange access data. The main function of the exchange access data management comprises defining and designing a standard template of the exchange access data; extracting, checking, sorting, converting, loading and managing the exchange access data; and (4) performing classified ledger management of exchange access data.

Thirdly, the system directly reports data management. The system direct report data management function mainly aims at departments and key mechanisms which cannot be accessed through exchange, and provides an online data direct report function and a batch data import function. The novel development mechanism runs related index data, the data are input into a system database through a form with a uniform format of the system, the system can classify the data according to field classification preset in the system, and the system is convenient for an upper application system to use. The system provides functions of importing and exporting data. Inputting according to a specified format by using an Excel table to realize batch import of data; the data export is to export the data to be extracted in the system into Excel, or export the data and legend after statistics and analysis according to the format set by the system, which is convenient for editing or other needs.

And fourthly, collecting data of the Internet. For the collection of internet information, the internet information crawler technology is utilized to collect the relevant operating data of a novel research and development mechanism, the collection rules (webpage downloading rules, webpage analysis rules and the like) are configured according to the requirements of data collection clients, and the data which are interested by the users are collected from the mass data of the internet. The main functions include:

webpage downloading configuration: and drawing up a downloading rule according to the requirement, logging in and setting a downloading strategy. The method is mainly used for a webpage downloading process;

and (3) webpage downloading process: data is downloaded from the web page according to established rules. Waiting for the webpage analysis process to analyze;

webpage analysis configuration: formulating an analysis rule, selecting a correction model, and carrying out data acquisition and test on webpage analysis configuration;

and (3) webpage analysis process: analyzing and correcting the downloaded webpage, and inputting the webpage into a specified storage mode through a plug-in;

acquiring task configuration: the downloading configuration and the parsing configuration of the webpage are mutually combined, and then different output modes are set;

import, export, backup, etc. of data: and backing up the collection task and the downloaded data, and importing or exporting the data so as to release the data in a system or other platforms.

The '4V' characteristics of the data resources and big data of the novel research and development organization are as follows: the large amount of data (Volume), the various types of data (Variety), the low Value density (Value) and the high processing speed (vector) are completely consistent, and the large amount of complex data is difficult to apply, and a series of complex technologies such as data structuring processing, data quality evaluation and data cleaning, data normalization, data fusion and data extraction must be performed. Firstly, data structuring processing is carried out, original data are analyzed, required information is extracted, and the required information is further converted into structured data. And performing quality evaluation on the processed data, and taking further data cleaning measures if the data is found to have problems. A user can define some data cleaning rules, so that quality problems in data can be treated in a batch mode, and the data cleaning efficiency is improved. The data collected from a plurality of data sources are different in type and structure, so that the data are integrated into one data and can be subjected to subsequent data analysis. Data integration involves mainly the elimination of data heterogeneity and data storage. The isomerization elimination is generally accomplished by a data labeling method, and of course, preprocessing work of data cleaning such as deduplication and redundancy elimination is required before labeling. Currently, ontology annotation is a common automated data annotation technique.

Data cleansing is also an important aspect of data normalization, which is also a common problem in data preparation. The normalization has simple underlying data layers, such as data type conversion, unit conversion, format table conversion and the like, and also has more complex data item normalization processing, such as telephone numbers, postcodes, addresses and the like. The data normalization processing needs to determine the data granularity and the expression mode according to the application requirement characteristics. And a uniform monitoring information acquisition and distribution center is constructed, and the centralized acquisition and management of monitoring resources required by each monitoring field are realized. The monitoring service in each field submits target sites to be monitored to a monitoring information acquisition and distribution center, wherein the control server manages the target sites uniformly and combines repeated acquisition sites and site columns according to site characteristics. And the acquisition task manager starts corresponding acquisition tasks according to the acquisition frequency requirement of each monitoring field on the target site and a certain frequency within one day. These collection tasks are distributed to various distributed collectors. And each collector acquires the data of the target site from the Internet according to the requirements. The collected data of the target station are uniformly stored in a data warehouse of a monitoring information collection and distribution center, and new information resources are identified through processing such as data duplicate checking and comparison. And then, respectively distributing and allocating the target site information submitted by each monitoring field to the related monitoring fields in a scattered manner so as to further realize the intelligence value calculation of the specific field.

Fusing multiple data sets (likely from multiple data sources) together can make the data content richer. Data fusion is a process of data set integration, and some analysis tasks do not necessarily need all integrated data, and may only need a part of data to support the analysis tasks. In this case, it is necessary to extract partial data (such as some samples or data segments) from the data set, and reduce the amount of data for the data analysis model to perform the analysis operation. This process is called data extraction, and it needs to extract relevant data according to the characteristics of the task. Data fusion is used in the military field for the first time, multi-source data after data integration is a processing object of data fusion, and coordination optimization and comprehensive processing are the core of data fusion. The more common data fusion techniques mainly include: voting method, fuzzy regression, Bayesian convergence technology, BP neural network, Kalman filtering method, D-S theory and other methods.

Further, the platform includes:

In specific implementation, in order to further improve the practicability of the service platform, the embodiment of the invention can realize an intelligent policy acquisition module for automatically capturing policies. The manager only needs to simply edit, label and selectively release the captured policy, so that the policy acquisition time is greatly shortened, the policy release efficiency is improved, and the user can conveniently master the latest policy dynamic timely and comprehensively.

Further, the platform further comprises:

In the specific implementation process, in order to improve the processing efficiency after policy collection, the policy data integration module simply preprocesses the collected policies, an LDA (latent Dirichlet Allocation) model is adopted to mine the text theme of the policies, label processing and simple primary classification are carried out on the policies, and an administrator user can issue the policies to be issued without excessive editing work after auditing. Finally, the user can inquire the required policy information through the functions of policy classification, applicable regions, keyword search and the like.

The policy data integration module simply preprocesses the collected policies and can intelligently analyze the extracted basic information. However, the data mining structure itself is affected by economic information, and it is difficult to construct a data collection mechanism, so in terms of practical application, the preprocessing of data information is a non-trivial management mechanism and control measure. Through economic statistics preprocessing, data gaps, unreality and incorrectness can be planned and analyzed comprehensively. The process of data cleaning is a process of processing basic data problems, and a mean value method cleaning method, a smoothing method or a prediction method can be selected.

In the process of selecting the data processing by the averaging method, the noise data and the data point null value in the basic data can be averaged to ensure that the database can be subjected to attribute averagingFill and make up the blank in terms of data. Statistical analysis data can be made more effective and complete only by ensuring that the data mining system is sufficiently in place in the underlying data analysis structure. In analyzing the data point value, the method is generally selected for use

The method (1).

i-k, when the data is processed by using a smoothing method, the null value and the noise data of the basic data can be calculated in a unified way, and the null value and the noise data are combined with a weighted average processing mechanism, so that the influence weight value of the extracted data is further analyzed, and the result of actual calculation is more real. Can be selected and used

And (4) carrying out expansion analysis on the values of the data points. In the formula, Wj represents the actual weight value of Cj point. After the data mining technology is applied, even if the data information in the same area is different from the standard, a corresponding data integration system is necessarily adopted if the statistical subject is different from the standard, so that the data integration effect is ensured to be optimized, and the accuracy degree of data statistics is enhanced, which is also a target always pursued by the data mining technology.

On the one hand, the structure should be deeply integrated. Due to the large amount and relatively complex variety of data in economic activities, it is necessary to systematically and deeply handle the economic data information integration results and the presentation mode of the data thereof. When the data mining technology is used, for contrastive analysis of std-id and std-no, a swordsman ground contrast module is needed, entity recognition efficiency is guaranteed, and quality standard requirements are met. On the other hand, the problem of human redundancy should be deeply analyzed, which is essentially the process of deeply processing data due to data mining technology. In this case, in order to ensure the complete economic statistical value, it is necessary to ensure that the technical model is in the simplest state and the positive correlation data is realThe method is operated in a centralized and simplified manner, the redundancy attribute is comprehensively considered, and problem data is processed in a detailed manner. Taking the production total value of the people's republic of China as an example, the data information is obtained by calculating and obtaining the production total value of the people's republic of China and population attributes, so that the data needs to be simplified and operated by means of a formula

I.e. to determine the redundancy property. Among the formulas, in the light of the equation,

the attribute that is mainly represented is the average of the attributes a and B. For σ_AAnd σ_BIn other words, two different attribute standard deviations are represented. If r is present_AB>0, then the two attributes are in a positive correlation. If r is present_ABIn the case of 0, there is no direct relationship between the two attributes, i.e. they are independent of each other. If r is present_AB<0, then the two attributes are inversely related, and r_ABThe larger the absolute value is, the more the relationship between the two is proved.

In the process of applying the data mining technology, systematic analysis and summarization need to be performed on the data mining technology to ensure that the output effect of the data is more complete, and the output form is adapted to the decision requirements made by economic management personnel. It is worth noting that the decision tree belongs to a measure which is fast and can carry out visual classification on the data information, and forms a data model to deeply process the data information.

The decision tree is constructed by means of a training set, and an analysis strategy with feasibility is reasonably constructed by combining specific problems and parameter requirements, and a model for data analysis is output in a short time. Meanwhile, data information is classified through the existing decision tree, effective exertion of advantages of a recursion process is guaranteed, the positions of the roots of the decision tree are developed to the trunk, the crotch and the like, and data adaptive to classification conditions are finally output. It should be noted that the classification condition is relatively strict, and in the case that all the data of the node belong to the same class, the most common stopping condition is set, and in the case that the classification attribute is provided, the classification can be stopped when the input data is divided twice. In the whole process, in the process of implementing economic data statistics by means of a decision tree, a pruning operation program is adopted regularly, fluctuation influence is avoided as much as possible, and the effectiveness and stability of data are effectively enhanced.

The genetic algorithm is an algorithm which combines the biological nature and the genetic mechanism and extracts randomly. In the aspect of practical application, social problems are comprehensively considered, information of specified people is effectively collected, and a final result is obtained on the basis of implicit information integration and analysis. Because the genetic algorithm has certain implication, the genetic algorithm can be effectively combined with other models to collect implication data. Existing mined data information is then analyzed in depth and applied in practice. It should be noted that the economic problem is a development and change problem, so the internal relation is very complex, the genetic algorithm is taken as an important reference, the source can be extended downwards, data can be effectively acquired, the data information is integrally analyzed, the economic problem can be ensured to be more direct and specific, the relevant staff can be ensured to be more intuitive in the aspect of processing the problem, and the recessive problem is practically represented. In this way, it can be ensured that the statistical work is more direct and simple. As shown in fig. 4:

the data standardization method is divided into linear standardization and nonlinear standardization, and comprises a Z-score method, a polarization method, a maximization method, a minimization method, an averaging method, a pycnometry method, a vector normalization method, an efficacy coefficient method and the like. The different methods are specifically characterized as follows:

(1) z-score method

Wherein,

is the average value of the index j, s_jThe variance of index j is as follows.

The method is characterized in that: the mean value of the normalized index is 0 and the variance is 1, and this method is not suitable for the case of small sample size, and generally, the number of samples is more than 30.

(2) Polarization method

Wherein,

is the minimum value of the index j,

the maximum value of index j is as follows.

The method is characterized in that: the normalized index has a minimum value of 0 and a maximum value of 1, and is not applicable to the case where the index value is constant.

(3) Maximization method

The method is characterized in that: the normalized index has a maximum value of 1 and no fixed minimum value.

(4) Minimization method

The method is characterized in that: the normalized index has a minimum value of 1 and no fixed maximum value.

(5) Method of averaging

The method is characterized in that: after standardization, the mean value of each index is 1, the variance is the square of the coefficient of variation, and the information of the degree of variation of each index is retained by averaging.

(6) Specific gravity method

The method is characterized in that: the standardized method requires

When the sample value is 0 or more, the normalized sample value is between 0 and 1 and the sum is 1, that is

(7) Vector normalization method

When the sample value is greater than or equal to 0, the normalized sample value is between 0 and 1, and

(8) method of efficacy coefficient

Wherein M is_jAnd m_jAnd respectively representing a satisfactory value and an unacceptable value of the index j, wherein c and d are known normal numbers, c is a translation index, and d is a scaling index, and are set by the evaluation value according to actual requirements.

The method is characterized in that: the maximum value of the normalized index is c + d, and the minimum value is c. But satisfactory and unacceptable values are more difficult to determine in this method and are usually replaced by maxima and minima.

According to eight data standardization processing principles such as the invariance of relative differences of internal data of the same index, the uncertainty of relative differences among different indexes, interval stability, total quantity constancy, monotonicity, difference ratio invariance, translation irrelevance, scaling irrelevance and the like, the novel research and development organization database processes data by adopting normal standardization, range standardization, mean standardization, median standardization, centralization and total intensity standardization methods, calculates the data after filling up a missing value by adopting a multiple interpolation method through various standardization methods, and then performs fitting, prediction and classification effect inspection to obtain an optimal standardization method.

In order to flexibly and stably support semantic expression of a novel research and development mechanism, the operation and maintenance-oriented data standard logically divides an information organization process into three layers, as shown in fig. 5. The bottom layer determines how each piece of data is expressed, including basic structure setting of the data, various protocols related to information interaction, and physical specifications of data expression. The middle layer determines a universal abstract model, namely a standard formula for defining the organization of the bottom layer data, and the universal abstract model is divided into three types: objects, relationships, and attributes. All data and concepts of the new development organization are determined by these three types of paradigms. The top layer is a standard semantic model of a novel research and development mechanism, and is defined one by one according to the hierarchical division mode of fields, specialties, systems and equipment.

When data is classified, data needs to be standardized, and data extracted for the first time has a series of different problems and cannot meet the requirements of database monitoring application, so that data needs to be standardized, as shown in fig. 6, the specific flow is as follows:

(1) a part of parameters in the data derived for the first time are not required in the monitoring activity, and the part of parameters need to be deleted;

(2) the dimension of the variable of the data part which is exported for the first time is inconsistent with the simulation data, for example, the units of the same index of different mechanisms are inconsistent, and the units need to be unified. In addition, part of parameters need to be calculated, for example, the average people data need to be calculated according to the total amount and the number of people;

(3) in order to facilitate analysis, the parameter sequence of the original data needs to be adjusted;

(4) data errors occasionally occur in original data, for example, infinite phenomenon occurs in part of data, and the part of error data needs to be processed;

(5) some data in the original data have abnormal fluctuation, which is not beneficial to analysis and needs to be filtered.

Further, the policy intelligence collection module includes:

During specific implementation, the data capturing source setting unit acquires a policy acquisition source specified by a user, the timing acquisition unit realizes timing acquisition of the specified policy acquisition source by using a Python technology, and the multithreading crawler is adopted to reduce the pressure of a server. The method mainly monitors policy information about novel research and development institutions published all over the country, and programs are used for intelligently mining, collecting and warehousing, and the number of policy sources required to be collected is less than 150.

Further, the policy intelligence collection module further comprises:

When the system is implemented specifically, the policy acquisition source management unit manages the names, URLs, classification columns, acquisition states and the like of tracked and monitored target websites, a monitoring mechanism, an early warning mechanism, an automatic repairing mechanism and other mechanisms are established, and if data acquisition is found to be abnormal, the system can send early warning information to related personnel, so that the system is convenient to maintain in time.

Further, the policy intelligence collection module further comprises:

During specific implementation, the policy acquisition and analysis unit can realize statistical analysis of policies, analyze recent policy acquisition conditions, analyze exception of acquisition sources, analyze acquisition regions, issue conditions of policies, policy regions/category structures, hit rate of policies, edit policy statistics by editors, and the like.

Further, the platform further comprises:

In specific implementation, the service platform of the embodiment of the invention further comprises a research and development organization operation table management module. And the research and development organization operation table management module corrects and updates the annual operation table field according to the data field of the third-party system. Taking the third-party system as the Guangdong province solar government platform as an example, the research and development organization operation form management module can refer to the relevant data fields of the Guangdong province solar government platform to correct and update the annual operation form fields of the platform.

In addition, the data monitoring unit is used for carrying out dynamic data monitoring and data analysis and marking and reminding on bad data aiming at the relevant index data of the annual execution report of the novel research and development mechanism.

Further, the organization management module includes:

In specific implementation, the account list information viewing unit can add personnel accounts of management departments at all levels, can perform operations such as deleting and resetting passwords, and can view personnel of all units in the system. The account list information that can be viewed is: name, affiliated unit, role, mobile phone number, login account, login mobile phone number, last login time, account status and other information.

Further, the platform further comprises:

In specific implementation, the log obtaining module mainly sets parameters of the platform and records system logs. The log obtaining module comprises a parameter setting subunit and a system log obtaining subunit, wherein the parameter setting subunit is used for setting the operation parameters of the platform, and the system log obtaining subunit is used for obtaining the system log in the operation process of the platform to record and displaying when detecting the system log query instruction of the user.

Further, the service platform of the embodiment of the invention further comprises a storage database, and the storage database is used for storing the data in the platform.

Specifically, for mass data, a NoSQL database is adopted for data storage, and meanwhile, data analysis and sharing are carried out. NoSQL databases can be broadly divided into four categories: (1) key-value type databases: the storage mode of the database is in a Key-value mode, the HASH table is used for storage, and the mapping mode is a one-to-many mode. The data structure is simple and does not need to strictly adhere to ACID, so the reading and writing speed of the type of database is the fastest in all NoSQL databases, but the disadvantage is that the query can be carried out only through the complete matching of Key, and the compound search cannot be carried out through Value or other combination modes. (2) column store database: the type of database is different from the traditional database in storage mode, and has a great difference in use, and the type of database is mainly listed as a main object for data office operation. The column storage database and the key-value type database overlap in part of concepts, and the main difference is that the column storage database can be locally updated on the basis of columns, which has extremely high value for realizing many business forms in a big data environment. (3) document type database: the type database relies on files to construct corresponding data structures, data assembly modes which are not strictly defined such as JSON and XML are usually used for constructing the type database, and the file structure is high in degree of freedom, so that the document type database can be almost suitable for any data structure and has very good adaptability. (4) The type database is constructed by utilizing three basic elements (nodes, relations and attributes) of graph theory, and the association information between the data constructed based on the three basic elements is a type which is closest to a relational database in a NoSQL database, but the design is more complex, so that the method is generally suitable for constructing a large social network system.

The embodiment of the invention adopts the NoSQL database system with the distributed architecture for storage, and as the NoSQL database system with the distributed architecture is adopted in many database systems, the database framework has a loose structure and a unfixed data model, but the problems of insufficient resources and the like of non-structural optimization of other databases can be effectively solved, and the NoSQL database system with the distributed architecture has ultrahigh expansibility and flexibility. The NoSQL system adopting the distributed architecture is very suitable for processing a large amount of scientific data, and simultaneously solves the storage problem of a large amount of data.

In some other embodiments, the development-specific service platform of the embodiment of the present invention employs a B/S (browser/server) framework model and an Eucalyptus cloud storage technology architecture, and should support efficient transmission of various types of data. The cloud storage of the novel research and development organization data is that the data are stored in a cloud end, a large number of database servers are mutually associated, and different cloud storage terminal servers adopt communication modes with the same protocol in order to realize sharing tasks and batch processing tasks of the data in the cloud storage servers. The confidentiality and the safety of the data of the novel research and development mechanism stored at the cloud end are considered, the data read from and written into the database server by a user needs to be encrypted by a client key, the server also needs to be encrypted in an auxiliary mode, and a firewall operation function is also made by the gateway to avoid data leakage and attack on the data by external personnel. The cloud storage of data needs to make the unification and the compatibility of a storage structure, and meanwhile, the systematic classification and storage are carried out aiming at different data categories of a novel research and development mechanism, and the system division is carried out by utilizing cloud computing. The collapse or the attack of the local computer is avoided, data obtained by a user should be backed up to the cloud storage database in time, and data storage and data analysis processing are carried out at the cloud end.

From the above embodiments, the embodiment of the present invention discloses a service platform for a specific research and development organization, including: the enterprise self-evaluation table is an evaluation table for the enterprise to evaluate the specific research and development organization; the data acquisition module is used for acquiring corresponding index data of a specific research and development mechanism; the data source management module is used for displaying an acquisition source list of data and early warning an abnormal acquisition source; the statistical center module is used for carrying out statistics on policies, labels and tasks corresponding to a specific research and development organization; the information management module is used for managing the information of a specific research and development mechanism; and the mechanism management module is used for setting departments, accounts and authorities of specific research and development mechanisms. The invention dynamically collects the service data and the operation condition information, and performs correlation analysis and causal analysis by using a big data technology, thereby realizing intelligent statistics, monitoring and early warning, reducing the management cost and improving the data management efficiency.

Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements, and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether such features, elements, and/or operations are included or are to be performed in any particular embodiment.

What has been described herein in the specification and drawings includes examples that can provide information distribution methods and apparatuses. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications can be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A specific research and development institution-oriented service platform, the platform comprising:

2. The platform of claim 1, wherein the platform comprises:

3. The platform of claim 2, further comprising:

4. The platform of claim 3, wherein the policy intelligence collection module comprises:

5. The platform of claim 4, wherein the policy intelligence collection module further comprises:

6. The platform of claim 5, wherein the policy intelligence collection module further comprises:

and the policy acquisition and analysis unit is used for analyzing the policy acquisition condition, the acquisition source abnormal condition, the acquisition region, the policy issuing condition, the policy category, the label with higher utilization rate, the policy click rate ranking and the editing policy statistics.

7. The platform of claim 6, further comprising:

8. The platform of claim 7, wherein the R & D mechanism hotlist management module comprises:

the operation form filling reminding unit is used for informing a specific research and development organization to fill in the report in a mail or short message mode after detecting that the set report filling time of the operation form is met;

9. The platform of claim 8, wherein the organization management module comprises:

10. The platform of claim 9, further comprising: