CN112966162A - Scientific and technological resource integration method and device based on data warehouse and middleware - Google Patents

Scientific and technological resource integration method and device based on data warehouse and middleware Download PDF

Info

Publication number
CN112966162A
CN112966162A CN202110251114.3A CN202110251114A CN112966162A CN 112966162 A CN112966162 A CN 112966162A CN 202110251114 A CN202110251114 A CN 202110251114A CN 112966162 A CN112966162 A CN 112966162A
Authority
CN
China
Prior art keywords
scientific
data
technological resource
technological
middleware
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110251114.3A
Other languages
Chinese (zh)
Inventor
张辉
涂昱
金盛豪
葛胤池
王德庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110251114.3A priority Critical patent/CN112966162A/en
Publication of CN112966162A publication Critical patent/CN112966162A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results

Abstract

The invention discloses a scientific and technological resource integration method and device based on a data warehouse and a middleware. The method comprises the following steps: acquiring scientific and technological resource data with different sources and different characteristics, and preprocessing the scientific and technological resource data; classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model; integrating scientific and technological resource data in a source database of the data warehouse by adopting the data warehouse method and then loading the data warehouse; meanwhile, the technology resource data in the source database is integrated by adopting a middleware integration method and returned to the user interface in a user requirement format for presentation. The method adopts a data warehouse method and a middleware integration method to carry out classification integration according to the structural characteristics, the change rate, the updating speed and the user requirements of the scientific and technological resource data.

Description

Scientific and technological resource integration method and device based on data warehouse and middleware
Technical Field
The invention relates to a scientific and technological resource integration method based on a data warehouse and a middleware, and also relates to a corresponding scientific and technological resource integration device, belonging to the technical field of data integration.
Background
The scientific and technological resources are the general names of software and hardware elements such as manpower, material resources, financial resources and the like engaged in scientific and technological activities, and provide physical guarantee for scientific research such as scientific and technological activities, scientific and technological management, scientific and technological decisions and the like. The scientific and technological resources in different fields can be effectively integrated by constructing a perfect scientific and technological resource platform to form a unified standard, so that massive scientific and technological resources can be shared, scientific researchers and scientific research projects can more efficiently utilize the scientific and technological resources, and the scientific and technological service level is effectively improved.
At present, the problems in the link of integrating scientific and technological resources are as follows: first, from the perspective of a management subject, governments, organizations and enterprises establish information service systems and associated systems to perform distributed storage and management on scientific and technological resources, and the systems are not connected to form an 'information island', thereby constituting a complex heterogeneous database environment. From the management object, the scientific and technological resource data has the characteristics of high dimensionality, heterogeneous heterogeneity, high generation speed and the like, the data cannot be automatically transmitted, and the association and sharing degree of the data is low. With the continuous accumulation of scientific and technological resources, the data volume owned by each industry tends to increase exponentially, and the problems of outdated resources, repetition, waste and the like are easily caused.
Secondly, in the process of integrating and sharing scientific and technological resources, planning and guidance of a standard system are lacked, a unified information standard is not available, and a resource utilization mode is limited. For example: metadata specification, resource classification and other specifications are not coordinated, metadata description rules are not uniform, and the like. Finally, because of the huge number of users, if the users cannot be effectively screened, the safety and stability of the database information are affected, and data leakage and abuse are caused.
Disclosure of Invention
The invention aims to provide a scientific and technological resource integration method based on a data warehouse and a middleware.
Another technical problem to be solved by the present invention is to provide a scientific and technological resource integration apparatus based on a data warehouse and a middleware.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to a first aspect of the embodiments of the present invention, there is provided a scientific and technological resource integration method based on a data warehouse and a middleware, including the following steps:
acquiring scientific and technological resource data with different sources and different characteristics, and preprocessing the scientific and technological resource data;
classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model;
integrating scientific and technological resource data in a source database of the data warehouse by adopting the data warehouse method and then loading the data warehouse; and meanwhile, integrating the scientific and technological resource data in the source database by adopting the middleware integration method, and returning the scientific and technological resource data to the user interface for presentation in a user requirement format.
Preferably, the step of preprocessing the acquired scientific and technological resource data comprises the following steps:
marking the scientific and technological resource data according to a preset standard;
and if the scientific and technological resource data do not have corresponding standards for marking, classifying the scientific and technological resource data according to the classification basis of the scientific and technological resource elements.
Preferably, the scientific and technological resource classification model trained in advance is obtained by training through the following steps:
inputting a scientific and technological resource training data set into a scientific and technological resource classification model established by a support vector machine for training to obtain optimal parameters of the scientific and technological resource classification model;
and testing the classification precision of the trained scientific and technological resource classification model.
Preferably, each scientific and technological resource data input into the scientific and technological resource classification model is represented in the form of a triple [ f1, f2, n ], wherein f1 represents data change rate, f2 represents data update rate, and n represents user requirement.
Preferably, the data change rate f1, the data update rate f2 and the user requirement n of each scientific and technological resource data are normalized respectively, input into the scientific and technological resource classification model to obtain corresponding labels, and the scientific and technological resource data are classified into the source database corresponding to the data warehouse method or the middleware integration method according to the labels.
Preferably, the data warehouse method is adopted to integrate each scientific and technological resource data in the source database, and specifically comprises the following steps:
extracting interested attribute information related to each scientific and technological resource data in a source database of the data warehouse method, and storing the attribute information into a temporary database;
cleaning scientific and technological resource data which are extracted from the temporary database;
converting the attribute information and the storage format of the scientific and technological resource data subjected to data cleaning into the attribute information and the storage format consistent with those of a target database;
and mapping the scientific and technological resource data which are subjected to data conversion in the temporary database, and loading the data into a data warehouse.
Preferably, when the middleware integration method is used for integrating the scientific and technological resource data in the source database, the method specifically comprises the following steps:
extracting data source description information from scientific and technological resource data in a source database of a middleware integration method by adopting a wrapper, and establishing a mapping relation between the data source description information and an intermediary mode;
according to the metadata standard, carrying out syntactic analysis on the query statement of the user, checking the correctness and the legality of the statement, and decomposing a reasonable user query statement into sub query statements aiming at different scientific and technological resource data;
and integrating and optimizing the inquiry results of the scientific and technological resources, and returning the inquiry results to the user interface in a user demand format for presentation.
According to a second aspect of the embodiments of the present invention, there is provided a scientific and technological resource integration apparatus based on a data warehouse and middleware, which is characterized by comprising a data acquisition unit, a data operation and storage unit, a human-computer interaction unit, a metadata management unit and a system management unit;
the data acquisition unit is used for acquiring scientific and technological resource data with different sources and different characteristics and preprocessing the scientific and technological resource data;
the data operation and storage unit is used for classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model, and integrating the corresponding scientific and technological resource data by adopting the data warehouse method and the middleware integration method;
the human-computer interaction unit is used for registering, authenticating, logging in, authenticating identity and evaluating credit for the user, entering a query interface after meeting the safety requirement, and returning the query result to the user interface for presentation according to the user requirement format;
the metadata management unit is used for uniformly managing and effectively standardizing the metadata according to metadata standards;
and the system management unit is used for controlling management, job management, allocation and management of storage space and inspection of scientific and technological resource classification results.
Preferably, the data operation and storage unit comprises a classifier, a data warehouse module and a middleware module;
the classifier is used for classifying the preprocessed scientific and technological resource data into the data warehouse module or the source database corresponding to the data warehouse module by using a pre-trained scientific and technological resource classification model;
the data warehouse module and the middleware module are used for integrating corresponding scientific and technological resource data.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon instructions which, when run on a computer, cause the computer to perform the above-mentioned method.
The scientific and technological resource integration method and device based on the data warehouse and the middleware provided by the invention are used for carrying out classification integration by adopting a data warehouse method and a middleware integration method according to the structural characteristics, the change rate, the updating speed and the user requirements of scientific and technological resource data. In the process of integrating scientific and technological resource data, the method simultaneously improves the system integration efficiency and enhances the data real-time performance, and realizes the aims of improving the management level of scientific and technological resources and promoting the integration and sharing of scientific and technological resources.
Drawings
Fig. 1 is a flowchart of a scientific and technological resource integration method based on a data warehouse and a middleware according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating the labeling of scientific and technological resource data in GB/T32843-2016 scientific and technological resource identifier;
FIG. 3 is a diagram of scientific and technological resource classification descriptions;
fig. 4 is a flowchart illustrating an operation of the ETL system in the method for integrating scientific and technological resources based on a data warehouse and a middleware according to an embodiment of the present invention;
FIG. 5 is a flow chart of an implementation of a middleware integration method;
fig. 6 is a schematic structural diagram of a scientific and technological resource integration device based on a data warehouse and a middleware according to an embodiment of the present invention.
Detailed Description
The technical contents of the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
The existing mainstream multisource heterogeneous data integration method mainly comprises the following steps: a federated database system, a data warehouse schema, and a middleware schema. The federated database system consists of a plurality of member database systems with autonomous capability, each member database system is in a distributed and heterogeneous environment, and realizes data integration by establishing access to a data source, rather than moving data into a central repository. The data warehouse method can eliminate data inconsistency, retain historical data and add new data regularly. The user queries and processes data in the data warehouse, the access efficiency is high, and the network dependence is weak. The middleware integration method has a good query effect, and each data source is high in autonomy. All data are obtained from the data source, and the timeliness of the data is guaranteed.
In the process of integrating scientific and technological resources, as the scientific and technological resources are various in types, a proper integration method is required to be adopted according to different characteristics and structures of the scientific and technological resources. The method has the advantages of high relative updating speed and high timeliness of resources such as experimental materials in scientific and technological information and material resources and scientific and technological cases in scientific and technological resources, and is suitable for collecting data in real time by adopting a middleware integration method. The updating frequency of science and technology human resources is low, but the data volume is continuously increased along with the time, the technology has the characteristic of mass production, and the ETL (Extract-Transform-Load) technology is relatively suitable for the integration of large data volume, so the integration effect of human resources by adopting a data warehouse method is better.
However, the existing federal database system interface has the problems of high complexity, poor expansibility, slow query response, unsuitability for frequent access of data, easy occurrence of resource conflict and the like. The existing data warehouse can not update data in time along with the change of a data source, does not contain latest information, has poor real-time performance, and has long development period and high cost. The existing middleware integration method has poor supporting effect on data deletion and write-in operation.
Therefore, as shown in fig. 1, an embodiment of the present invention provides a scientific and technological resource integration method based on a data warehouse and a middleware, including the following steps:
step S1: and acquiring scientific and technological resource data with different sources and different characteristics, and preprocessing the data.
The scientific and technological resource data includes structured data, semi-structured data and unstructured data. For example, structured data collected from third party databases through API data interfaces, semi-structured data such as HTML pages crawled through crawler technology, and unstructured data such as natural language text, pictures, audio, video, etc.
The method for preprocessing the acquired scientific and technological resource data comprises the following steps:
step S11, marking the acquired scientific and technological resource data according to a preset standard; if no corresponding criteria are noted, step S12 is performed.
As shown in fig. 2, the acquired scientific and technological resource data with different sources and different characteristics are from a scientific and technological resource management main body, and the scientific and technological resource management main body labels the acquired scientific and technological resource data according to a preset standard, for example, the national standard "GB/T32843-.
And step S12, classifying the scientific and technological resource data according to the scientific and technological resource element classification basis.
If the acquired scientific and technological resource data are not labeled according to the corresponding standards, the scientific and technological resource management main body classifies and labels the scientific and technological resource data according to the scientific and technological resource elements, and the classification is divided into: scientific and technological human resources, scientific and technological financial resources, scientific and technological material resources, scientific and technological information resources, scientific and technological organization resources and scientific and technological system resources are shown in fig. 3 and 6.
And step S2, classifying the preprocessed scientific and technological resource data into a source database corresponding to the data warehouse method or the middleware integration method by using a pre-trained scientific and technological resource classification model.
The pre-trained scientific and technological resource classification model is obtained by training through the following steps:
and step S21, inputting the scientific and technological resource training data set into a scientific and technological resource classification model established by a support vector machine for training to obtain the optimal parameters of the scientific and technological resource classification model.
After a plurality of scientific and technological resource data are acquired in advance by adopting a plurality of ways, each scientific and technological resource data is labeled according to a preset standard, such as national standard GB/T32843-2016 scientific and technological resource identifier, or classified according to scientific and technological resource element classification basis. And each labeled or classified scientific and technological resource data is represented in the form of a triple [ f1, f2, n ], wherein f1 represents a data change rate (unit: times/day), f2 represents a data update rate (unit: times/day), and n represents a user requirement (unit: times/day). The data change rate f1, the data update rate f2 and the user demand n of each labeled or classified scientific and technological resource data can be obtained through a statistical method. Selecting the scientific and technological resource data with preset percentage from the scientific and technological resource data labeled or classified in advance as a scientific and technological resource training data set, and using the rest scientific and technological resource data as a scientific and technological resource testing data set.
A technology resource classification model is established by adopting a supervised learning two-classification method in machine learning, namely a Support Vector Machine (SVM). Respectively normalizing the data change rate f1, the data update rate f2 and the user requirement n of each scientific and technological resource data in the scientific and technological resource training data set, inputting the data into the scientific and technological resource classification model for training, updating parameters (a hyperplane normal vector w and an intercept b) once when all the scientific and technological resource data in the scientific and technological resource training data set are input into the scientific and technological resource classification model until the loss function value of the scientific and technological resource classification model is minimum, obtaining the optimal values of the parameters (the hyperplane normal vector w and the intercept b) of the scientific and technological resource classification model, and enabling the result output by the scientific and technological resource classification model to be closest to a label preset value (-1, + 1).
Specifically, the scientific and technological resource training data set is represented as X ═ X1, X2,.. and Xr ], r is a positive integer, each scientific and technological resource data in the scientific and technological resource training data set is represented as Xi ═ f1, f2, n, and the data change rate f1, the data update rate f2 and the user demand n of each scientific and technological resource data in the scientific and technological resource training data set are normalized respectively according to the following formula (1);
Figure BDA0002966110790000071
in the above formula, X represents the current scientific and technological resource data and the data change rate f1, the data update rate f2 or the user demand n thereof, Xmin represents the maximum value of the data change rate f1, the data update rate f2 or the user demand n of the current scientific and technological resource data, and Xmin represents the minimum value of the data change rate f1, the data update rate f2 or the user demand n of the current scientific and technological resource data.
Each scientific and technological resource is input into the scientific and technological resource classification model, and a corresponding label (corresponding to a classification value) is obtained according to the following formulas (2) and (3). After the scientific and technological resource data are input into the scientific and technological resource classification model, when the label output by the model is +1, the scientific and technological resource data are divided into the source database of the data warehouse method. After the scientific and technological resource data are input into the scientific and technological resource classification model, when the label output by the model is-1, the scientific and technological resource data are classified into a source database of the middleware integration method.
wTXi+b≥+1,yi=+1 (2)
wTXi+b≤-1,yi=-1 (3)
In the above formula, the parameters w and b are respectively a normal vector and an intercept of the hyperplane, T represents a transposition of the matrix, Xi represents the ith scientific and technological resource data, yi represents the ith scientific and technological resource data input into the scientific and technological resource classification model, and w represents the number of the ith scientific and technological resource data input into the scientific and technological resource classification modelTWhen Xi + b is more than or equal to +1, the label output by the model is +1, after the ith scientific and technological resource data is input into the scientific and technological resource classification model, wTWhen Xi + b is less than or equal to-1, the label output by the model is-1.
The loss function of the scientific resource classification model is calculated once every time the tag is output by the scientific resource classification model using the Hinge (Hinge) loss function as follows. And, the separation hyperplane is obtained by minimizing the loss function, which is specifically expressed as follows:
Figure BDA0002966110790000072
in the above formula, r represents the number of the labels output by the scientific and technological resource classification model, yi represents the ith scientific and technological resource data input into the scientific and technological resource classification model, and wTXi+b≥+The label of 1 hour output of the model is +1, after the ith scientific and technological resource data is input into the scientific and technological resource classification model, wTWhen Xi + b is less than or equal to-1, the label output by the model is-1, lambda represents the regularization term of the scientific and technological resource classification model, and parameters w and b are the normal vector and the intercept of the hyperplane respectively.
And step S22, testing the classification precision of the trained scientific and technological resource classification model.
And testing the classification precision of the trained scientific and technological resource classification model by adopting a scientific and technological resource test data set. And if the tested precision meets the requirement, respectively carrying out normalization processing on the data change rate f1, the data update rate f2 and the user requirement n of each preprocessed scientific and technological resource data, inputting the normalized data into the scientific and technological resource classification model to obtain corresponding labels, and classifying the scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method according to the labels. And if the tested precision does not meet the requirement, the scientific and technological resource classification model is retrained until the scientific and technological resource classification model with the tested precision meeting the requirement is obtained. It is emphasized that the data change rate f1, the data update rate f2 and the user requirement n of the preprocessed scientific and technological resource data need to be obtained through a statistical method.
S3, integrating scientific and technological resource data in a source database by a data warehouse method and loading the data into a data warehouse; meanwhile, the technology resource data in the source database is integrated by adopting a middleware integration method and returned to the user interface in a user requirement format for presentation.
The method for integrating the scientific and technological resource data in the source database by adopting the data warehouse method specifically comprises the following steps:
and step S31, extracting interested attribute information related to each scientific and technological resource data in a source database of the data warehouse method, and storing the extracted interested attribute information in a temporary database.
An ETL system is adopted to access each scientific and technological resource data (such as a single data source shown in fig. 4) in a source Database of a data warehouse method through a Database interface such as 0DBC (0pen Database Connectivity) or jdbc (java Database Connectivity), and then interested attribute information related to each scientific and technological resource data is extracted. For example, taking a certain plant as an example, wherein the attribute information of interest to the plant is color and growth environment, the ETL system can be used to extract the attribute information related to the color and growth environment of the plant. In order to improve the operating efficiency of the ETL system and the consistency of data, the ETL system may perform incremental operations, only extract newly added scientific and technological resource data, and no longer perform extraction operations on the scientific and technological resource data that has already been subjected to extraction operations.
And the source database of the data warehouse method stores the data into the temporary database after the data extraction step is completed.
And step S32, cleaning the scientific and technological resource data which are extracted from the temporary database.
And cleaning the scientific and technological resource data which are subjected to data extraction in the temporary database by adopting an ETL system so as to remove the fields which are not needed in the scientific and technological resource.
And step S33, converting the attribute information and the storage format of the scientific and technological resource data after data cleaning into the attribute information and the storage format consistent with the target database.
As shown in fig. 4, since the storage format and the attribute information of the scientific and technological resource data completing data cleaning in the temporary database may not be consistent with the requirements of the target database, a conversion function needs to be called to perform mapping conversion on the scientific and technological resource data completing data cleaning, and a mapping expression is used to define the corresponding relationship between matching elements. Uniformly describing attribute information of scientific and technological resource data to be subjected to data cleaning to make the attribute information consistent with a target database; for example, taking a CPU as an example, the model and name of the CPU are collectively described as a CPU. And simultaneously, converting the storage format of the scientific and technological resource data subjected to data cleaning into the format consistent with the target database.
And step S34, mapping the scientific and technological resource data which are subjected to data conversion in the temporary database, and loading the data into a data warehouse.
When the middleware integration method is adopted to integrate the scientific and technological resource data in the source database, the method specifically comprises the following steps:
step S35, extracting data source description information from the scientific and technological resource data in the source database by using the wrapper, and establishing a mapping relationship between the data source description information and the intermediary schema.
Each scientific and technical resource data in the source database extracts the data source description information through the corresponding wrapper (such as the data source S shown in fig. 5)1-S3Corresponding to the wrapper W1-W3
And calculating the similarity of fields corresponding to the extracted data source description information by adopting a distributed similarity method, determining the matching degree of the data source description information, and establishing a mapping relation between the data source description information and an intermediary mode (virtual global mode) according to the matching degree of the data source description information. For example, data source description information extracted from scientific and technological resource data of a source database by using a wrapper is respectively a CPU model and CPU information, a graphics card model and graphics card information, a similarity between the CPU model and the CPU information is calculated by using a distributed similarity method to be 95%, a similarity between the graphics card model and the graphics card information is calculated to be 90%, a similarity between the CPU model and the graphics card model is calculated to be 10%, a similarity between the CPU model and the graphics card information is calculated to be 8%, a similarity between the CPU information and the graphics card model is calculated to be 12%, and a similarity between the CPU information and the graphics card information is calculated to be 6%, so that it can be obtained that the CPU model and the CPU information can establish a mapping relationship with a CPU in the intermediary mode, and the graphics card model and the graphics card information can establish a mapping relationship with a graphics card in the intermediary mode
And step S36, carrying out syntactic analysis on the query statement of the user according to the metadata standard, checking the correctness and the legality of the statement, and decomposing the reasonable query statement of the user into sub-query statements aiming at different scientific and technological resource data.
Through various front-end tools, a unified data interface is provided to facilitate operations such as data query and data mining for users. And the user performs the steps of registration, authentication, login, identity verification, credit evaluation and the like, and enters a query interface after meeting the safety requirement. An authorization mechanism and a view mechanism are adopted to improve the system security: failure to successfully verify identity or disqualification of the credit assessment results will be prohibited from accessing the system.
And step S37, integrating and optimizing the inquiry results of the scientific and technical resources, and returning the inquiry results to the user interface in a user requirement format for presentation.
And S2, judging the scientific and technological resource type required by the user, inquiring from a data warehouse method or a middleware integration method, responding to the user request, sorting the result into a uniform format (such as XML, Excel and the like), and returning to the user interface for presentation. In addition, different views are defined for different users through a view mechanism, and data which are not acquired by the users without permission are hidden. And moreover, the user satisfaction is evaluated by methods of extracting mouse movement behaviors, analyzing search results, investigating user evaluation and the like, the data stored in the database can be supervised, and the mapping relation stored in the metadata management is modified or added according to the feedback results.
In addition, as shown in fig. 6, an embodiment of the present invention further provides a scientific and technological resource integration apparatus based on a data warehouse and a middleware, including a data acquisition unit 1, a data operation and storage unit 2, a human-computer interaction unit 3, a metadata management unit 4 and a system management unit 5, where the data acquisition unit 1 is connected to the data operation and storage unit 2, the data operation and storage unit 2 is connected to the human-computer interaction unit 3, and the metadata management unit 4 and the system management unit 5 are respectively connected to the data acquisition unit 1, the data operation and storage unit 2 and the human-computer interaction unit 3.
The data acquisition unit 1 is used for acquiring scientific and technological resource data with different sources and different characteristics and preprocessing the data.
And the data operation and storage unit 2 is used for classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model, and integrating the corresponding scientific and technological resource data by using the data warehouse method and the middleware integration method.
And the human-computer interaction unit 3 is used for registering, authenticating, logging in, verifying identity and evaluating credit of the user, entering a query interface after meeting the safety requirement, and returning the query result to the user interface for presentation according to the user requirement format.
And a metadata management unit 4 for uniformly managing and efficiently specifying the metadata in accordance with the metadata standard.
And the system management unit 5 is used for controlling management, job management, allocation and management of storage space and inspection of scientific and technological resource classification results.
Specifically, as shown in fig. 6, the data manipulation and storage unit 2 includes a classifier, a data warehouse module, and a middleware module. And the classifier is used for classifying the preprocessed scientific and technological resource data into a source database corresponding to the data warehouse module or the middleware module by using a pre-trained scientific and technological resource classification model. The data warehouse module and the middleware module are used for integrating corresponding scientific and technological resource data.
As shown in FIG. 6, the middleware module consists of a wrapper and an intermediary mode (virtual global mode), which is divided into two parts, query processing and result summarization (as shown in FIG. 5). The wrapper is used for extracting data source description information from scientific and technological resource data in a source database and establishing a mapping relation between the data source description information and the intermediary mode.
As shown in fig. 6, the human-computer interaction unit 3 includes a user management module, a judger, and a user feedback module. And the user management module is used for registering, authenticating, logging in, verifying identity and evaluating credit of the user. And the user feedback module is used for evaluating the user satisfaction by adopting methods such as mouse movement behavior extraction, search result analysis, user evaluation investigation and the like, monitoring the data stored in the database, and modifying or adding the mapping relation stored in the metadata management unit according to the feedback result. The judger is used for distinguishing the type of the scientific and technological resources required by the user and deciding to start the data warehouse module or the middleware module for resource integration.
For the metadata management unit 4, the essence of metadata is data describing data, and first, the metadata needs to record processing details of scientific and technological resources, describe relationships and rules in the system, for example, define extracted data sources, describe mapping rules between the scientific and technological resource data and target scientific and technological resource data, provide corresponding rules of data fields, and cover data interface standards. Second, the metadata indicates the current status and availability of information in the data warehouse, helping users to access the knowledge of the data. In addition, the metadata comprises information of manager posts, responsibilities, operations and the like, and operations of updating, modifying, backing up and the like are recorded. In the data warehouse module, metadata is mainly applied to the ETL process. The process handles data migration during the metadata record ETL, such as date, data load record, data reject record, etc. The technical metadata records information such as type, length, etc. of data. The service metadata is provided for the user to perform data analysis and other operations. In the middleware module, the metadata implements a retrieval service of the resource. The metadata includes mapping rules, such as mapping between data source data description and global schema data description, mapping between user query statement and query statement on scientific resources.
The metadata management range comprises links of data generation, data storage, data processing and data presentation, a metadata management system of each link is respectively established, and unified management and effective specification of metadata are realized according to metadata standards. For metadata management, a data submission supervision mechanism, a data checking mechanism, a dynamic supervision management mechanism and a data updating long-acting mechanism are adopted.
As for the system management unit 5, the system management includes job management such as user query interface management, voice input control, and the like. Secondly, the storage space needs to be allocated and managed, such as user information storage, database data storage, metadata standard storage and the like. And meanwhile, the classification result of the scientific and technological resources needs to be checked. In addition, the system also comprises control management, the scientific and technological resource integration system needs to have stability and usability, and the task of an administrator is to carry out planning, running and detection on the operation of the ETL system, compare the difference between execution and plan, process error information, respond to faults and adjust the output of the system.
In addition, an embodiment of the present invention further provides an electronic device, which includes a processor, coupled to a memory, and configured to execute a program or instructions in the memory, so as to enable the electronic device to implement the method described in fig. 1.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method as described in fig. 1 above.
The scientific and technological resource integration method and device based on the data warehouse and the middleware provided by the invention are used for carrying out classification integration by adopting a data warehouse method and a middleware integration method according to the structural characteristics, the change rate, the updating speed and the user requirements of scientific and technological resource data. In the process of integrating scientific and technological resource data, the method simultaneously improves the system integration efficiency and enhances the data real-time performance, and realizes the aims of improving the management level of scientific and technological resources and promoting the integration and sharing of scientific and technological resources.
The scientific and technological resource integration method and device based on the data warehouse and the middleware provided by the invention are explained in detail above. It will be apparent to those skilled in the art that various modifications can be made without departing from the spirit of the invention.

Claims (9)

1. A scientific and technological resource integration method based on a data warehouse and a middleware is characterized by comprising the following steps:
acquiring scientific and technological resource data with different sources and different characteristics, and preprocessing the scientific and technological resource data;
classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model;
integrating scientific and technological resource data in a source database of the data warehouse by adopting the data warehouse method and then loading the data warehouse; and meanwhile, integrating the scientific and technological resource data in the source database by adopting the middleware integration method, and returning the scientific and technological resource data to the user interface for presentation in a user requirement format.
2. The method for integrating scientific and technological resources based on data warehouse and middleware as claimed in claim 1, wherein the step of acquiring scientific and technological resource data and preprocessing comprises the following steps:
marking the scientific and technological resource data according to a preset standard;
and if the scientific and technological resource data do not have corresponding standards for marking, classifying the scientific and technological resource data according to the classification basis of the scientific and technological resource elements.
3. A method for integrating scientific and technological resources based on data warehouses and middleware according to claim 1, wherein:
the pre-trained scientific and technological resource classification model is obtained by training through the following steps:
inputting a scientific and technological resource training data set into a scientific and technological resource classification model established by a support vector machine for training to obtain optimal parameters of the scientific and technological resource classification model;
and testing the classification precision of the trained scientific and technological resource classification model.
4. A method for integrating scientific and technological resources based on data warehouses and middleware according to claim 3, wherein:
each scientific and technological resource data input into the scientific and technological resource classification model is represented in the form of a triple [ f1, f2, n ], wherein f1 represents a data change rate, f2 represents a data update rate, and n represents a user demand.
5. A method for integrating scientific and technological resources based on data warehouses and middleware according to claim 4, wherein:
and respectively carrying out normalization processing on the data change rate f1, the data update rate f2 and the user demand n of each scientific and technological resource data, inputting the normalized data into the scientific and technological resource classification model to obtain a corresponding label, and classifying the scientific and technological resource data into a source database corresponding to the data warehouse method or the middleware integration method according to the label.
6. A method for integrating scientific and technological resources based on data warehouses and middleware according to claim 1, wherein:
the data warehouse method is adopted to integrate each scientific and technological resource data in the source database, and specifically comprises the following steps:
extracting interested attribute information related to each scientific and technological resource data in a source database of the data warehouse method, and storing the attribute information into a temporary database;
cleaning scientific and technological resource data which are extracted from the temporary database;
converting the attribute information and the storage format of the scientific and technological resource data subjected to data cleaning into the attribute information and the storage format consistent with those of a target database;
and mapping the scientific and technological resource data which are subjected to data conversion in the temporary database, and loading the data into a data warehouse.
7. A method for integrating scientific and technological resources based on data warehouses and middleware according to claim 1, wherein:
when the middleware integration method is adopted to integrate scientific and technological resource data in a source database, the method specifically comprises the following steps:
extracting data source description information from scientific and technological resource data in a source database of a middleware integration method by adopting a wrapper, and establishing a mapping relation between the data source description information and an intermediary mode;
according to the metadata standard, carrying out syntactic analysis on the query statement of the user, checking the correctness and the legality of the statement, and decomposing a reasonable user query statement into sub query statements aiming at different scientific and technological resource data;
and integrating and optimizing the inquiry results of the scientific and technological resources, and returning the inquiry results to the user interface in a user demand format for presentation.
8. A scientific and technological resource integration device based on a data warehouse and a middleware is characterized by comprising a data acquisition unit, a data operation and storage unit, a man-machine interaction unit, a metadata management unit and a system management unit;
the data acquisition unit is used for acquiring scientific and technological resource data with different sources and different characteristics and preprocessing the scientific and technological resource data;
the data operation and storage unit is used for classifying the preprocessed scientific and technological resource data into a source database corresponding to a data warehouse method or a middleware integration method by using a pre-trained scientific and technological resource classification model, and integrating the corresponding scientific and technological resource data by adopting the data warehouse method and the middleware integration method;
the human-computer interaction unit is used for registering, authenticating, logging in, authenticating identity and evaluating credit for the user, entering a query interface after meeting the safety requirement, and returning the query result to the user interface for presentation according to the user requirement format;
the metadata management unit is used for uniformly managing and effectively standardizing the metadata according to metadata standards;
and the system management unit is used for controlling management, job management, allocation and management of storage space and inspection of scientific and technological resource classification results.
9. A technology resource integration apparatus based on data warehouse and middleware as claimed in claim 8, wherein:
the data operation and storage unit comprises a classifier, a data warehouse module and a middleware module;
the classifier is used for classifying the preprocessed scientific and technological resource data into the data warehouse module or the source database corresponding to the data warehouse module by using a pre-trained scientific and technological resource classification model;
the data warehouse module and the middleware module are used for integrating corresponding scientific and technological resource data.
CN202110251114.3A 2021-03-08 2021-03-08 Scientific and technological resource integration method and device based on data warehouse and middleware Pending CN112966162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110251114.3A CN112966162A (en) 2021-03-08 2021-03-08 Scientific and technological resource integration method and device based on data warehouse and middleware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110251114.3A CN112966162A (en) 2021-03-08 2021-03-08 Scientific and technological resource integration method and device based on data warehouse and middleware

Publications (1)

Publication Number Publication Date
CN112966162A true CN112966162A (en) 2021-06-15

Family

ID=76276996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110251114.3A Pending CN112966162A (en) 2021-03-08 2021-03-08 Scientific and technological resource integration method and device based on data warehouse and middleware

Country Status (1)

Country Link
CN (1) CN112966162A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537927A (en) * 2021-06-28 2021-10-22 北京航空航天大学 Scientific and technological resource service platform transaction coordination system and method
CN113590722A (en) * 2021-07-01 2021-11-02 南京玄策智能科技有限公司 Digital rural operation and maintenance knowledge base platform based on edge intelligence and updating method
CN116092682A (en) * 2023-04-11 2023-05-09 中大体育产业集团股份有限公司 File management method and system for body measurement data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368588A (en) * 2017-07-24 2017-11-21 人教数字出版有限公司 A kind of heterogeneous resource Homogeneous method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368588A (en) * 2017-07-24 2017-11-21 人教数字出版有限公司 A kind of heterogeneous resource Homogeneous method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李鹏: "面向地质勘查的多源异构数据集成关键技术研究", 《中国博士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537927A (en) * 2021-06-28 2021-10-22 北京航空航天大学 Scientific and technological resource service platform transaction coordination system and method
CN113590722A (en) * 2021-07-01 2021-11-02 南京玄策智能科技有限公司 Digital rural operation and maintenance knowledge base platform based on edge intelligence and updating method
CN116092682A (en) * 2023-04-11 2023-05-09 中大体育产业集团股份有限公司 File management method and system for body measurement data

Similar Documents

Publication Publication Date Title
US11847574B2 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
CN110796470B (en) Data analysis system for market subject supervision and service
Zhang et al. Multi-database mining
Hui et al. Data mining for customer service support
CN112966162A (en) Scientific and technological resource integration method and device based on data warehouse and middleware
Kusumasari Data profiling for data quality improvement with OpenRefine
LU503512B1 (en) Operating method for construction of knowledge graph based on naming rule and caching mechanism
CN111125068A (en) Metadata management method and system
Aher et al. Best combination of machine learning algorithms for course recommendation system in e-learning
KR20180069088A (en) A multidimensional recursive learning process and system used to discover complex dyadic or multiple counterparty relationships
Zhang et al. Developing scalable management information system with big financial data using data mart and mining architecture
Zhang Evaluation and analysis of human resource management mode and its talent screening factors based on decision tree algorithm
Toivonen Big data quality challenges in the context of business analytics
CN115168474B (en) Internet of things central station system building method based on big data model
Li et al. Design of Teaching Quality Analysis and Management System for PE Courses Based on Data-Mining Algorithm
Dagnaw et al. Data management practice in 21st century: systematic review
KR20030014011A (en) Method and system for automatic combining a different kind of database
Ayyavaraiah Data Mining For Business Intelligence
CN112749990B (en) Data analysis method and system based on tourist identity
Hooda et al. Improve Quality of Data Management and Maintenance in Data Warehouse Systems
Akbar et al. Leveraging Semantic Web Technologies for Veracity Assessment of Big Biodiversity Data
Sun et al. Research on Microservice Identification Technology Based on API Correlation
Kaur et al. Comparison of varoius tools for data mining
CN114139979A (en) Service platform for specific research and development mechanism
Gao et al. Data analysis framework of tourism enterprise human resource management system based on MySQL and fuzzy clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210615