CN111259006A - Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system - Google Patents

Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system Download PDF

Info

Publication number
CN111259006A
CN111259006A CN202010020974.1A CN202010020974A CN111259006A CN 111259006 A CN111259006 A CN 111259006A CN 202010020974 A CN202010020974 A CN 202010020974A CN 111259006 A CN111259006 A CN 111259006A
Authority
CN
China
Prior art keywords
data
relational
metadata
service
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010020974.1A
Other languages
Chinese (zh)
Other versions
CN111259006B (en
Inventor
刘峰
周园春
韩芳
沈志宏
夏景隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Publication of CN111259006A publication Critical patent/CN111259006A/en
Application granted granted Critical
Publication of CN111259006B publication Critical patent/CN111259006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a general distributed heterogeneous data integrated physical aggregation, organization, release and service method and system. The method comprises the following steps: 1) registering public basic data at a central terminal; 2) the distribution end carries out convergence transmission and synchronization of distributed heterogeneous data to the central end; 3) establishing a library, organizing and editing the converged data resources at the central end; 4) uniformly issuing and auditing data resources at a central terminal; 5) and performing integrated sharing service of data resources at the central end. The invention realizes the high-efficiency convergence transmission and synchronization of distributed heterogeneous entity data, realizes the centralized database building, the organization management and the uniform release of data resources, realizes the integration and the sharing of data release services in various forms at a data resource portal, has the characteristics of integration and general customization, ensures the integral communication, the high customization and the high reusability of the data convergence, management, release and service processes, and greatly improves the universality and the flexibility of data service encapsulation.

Description

Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
Technical Field
The invention relates to the field of data management and sharing service, in particular to a general distributed heterogeneous data integrated physical aggregation, organization, release and service method and system. The user can uniformly realize the physical convergence transmission, the organization and the release and the integration of the sharing service of the heterogeneous data.
Background
Under the background of rapid development of cloud computing, big data and artificial intelligence technologies in the current society, a large amount of data resources with different types are generated in various fields, the importance of the data resources is widely accepted by social circles, and the level of important strategic resources in the country is promoted. Meanwhile, with the development of the requirements of open acquisition and data sharing, more and more data resources are required to be used in an open sharing manner. Under the promotion of various information engineering constructions at home and abroad, information (data) resource sharing service platforms in various fields are continuously emerging.
The data resource sharing of the traditional data sharing service platform is mostly organized into a data set form to provide sharing, and only comprises metadata and data files. The most common storage mode for structured data, namely a relational data table, is mostly served in the form of table files (such as excel, csv), or is shared in the form of data tables, and lacks data integration organization and metadata description. The important defects are represented as follows:
(1) unified sharing service of (relational, file type) heterogeneous data resources cannot be realized, and entity data only provides a single file form. The advantages of the relational structured data online service, the relational data and file data association fusion service and the relational database table correlation service are weakened.
(2) The traditional distributed data aggregation and collection mainly takes a file form, and does not support the remote transmission aggregation and synchronous management of relational data.
(3) The prior platform system only supports a limited service subset or service subsets in certain processes, mostly is designed and developed specifically according to construction requirements, lacks of customized and generalized decoupling design, reduces the efficiency of development and implementation, generates a large amount of repeated work, and increases the research and development cost.
(4) The introduction of internationally recognized unique identifiers and the introduction of normalized data references is lacking in the shared data organizational model.
(5) In terms of service forms, the method lacks full-text retrieval of entity-oriented data file contents, lacks full-field customized retrieval of relational data, lacks integration of relational data fusion services (such as association with files, images and videos, association with data sub-tables, association of various URL display, association of enumeration lists and the like), lacks multiple associated recommendation modes of data sets, lacks packaging services of data resource APIs, lacks user-oriented personalized service support, lacks international support of platforms and the like.
Disclosure of Invention
Aiming at the defects in the aspects of distributed data management and shared service, the invention provides a general distributed heterogeneous data integrated physical aggregation (centralized aggregation storage organization of entity data), organization, release and service method and system design.
The technical scheme adopted by the invention is as follows:
a general distributed heterogeneous data integrated physical aggregation, organization, release and service method comprises the following steps:
1) registering public basic data at a central terminal, wherein the public basic data comprises data node registration, metadata extension element registration, classification system registration and license agreement registration of a distribution terminal;
2) the distribution end carries out convergence transmission and synchronization of distributed heterogeneous data to the central end;
3) establishing a library, organizing and editing the converged data resources at the central end;
4) uniformly issuing and auditing data resources at a central terminal;
5) and performing integrated sharing service of data resources at the central end.
Further, the data node registration realizes registration management of data node information of a distribution end and node administrator authentication information;
the metadata extension element registration supports the customized configuration management of extension metadata items, and the configuration items of the metadata comprise: metadata Chinese name, metadata English name, field type, necessity of filling, repetition, sequence number and remark;
the classification system is registered, the registration, editing and deletion operations of the tree data classification system are supported, the classification system information comprises classification names, classification codes and classification descriptions, and a user can perform addition, editing, insertion and deletion operations on any tree classification system node information;
the license agreement registration supports standard license agreements and supports registration, editing and deletion operations of self-defined license contents, and the registration information comprises a protocol identification code, a protocol name, a protocol identification picture and a protocol description text.
Further, the converged transmission and synchronization of the distributed heterogeneous data includes:
2.1) carrying out heterogeneous data source registration, including unified registration connection management of a relational data source and a file type data source;
2.2) constructing data transmission tasks, including constructing a relational data task and constructing a file type data task;
2.3) carrying out transmission task operation management, and remotely and efficiently and stably transmitting the data tasks of the distribution end to the central end;
and 2.4) carrying out relational data synchronous management, and regularly synchronizing each record in a relational table or a logic table in a transmission task of the distribution end to a relational table of the central end.
Further, the library building, organizing and editing the aggregated data resources includes:
3.1) establishing a relational database, wherein a new relational database is established by introducing an Excel template, or a new table is established by associating an existing and described relational data table;
3.2) carrying out description and field fusion configuration on the structure information of the relational table; the description of the relational database table structure information comprises a description relational data table name and a description relational data table field name; the field fusion configuration is realized by setting a certain field display type of the relational data table, and comprises a text type, a URL type, an enumeration type, a sub-table type and a file type;
3.3) managing the data of all the relation table of the center end, and supporting the data viewing, adding, editing and deleting operations;
and 3.4) carrying out file type data management, including network disk type management of all data files and directories at the center end.
Further, the uniformly issuing and auditing the data resources includes:
4.1) dynamically realizing the on-line filling and batch filling of the metadata of the data set one by one based on the built-in metadata and the extended metadata;
4.2) based on a relation table and a file system of a center end, the selection of an online relational entity data table and the selection of an entity data file based on a file directory system are realized, and the online immediate uploading selection of the file is supported;
4.3) editing, submitting and issuing the data set;
4.4) performing content auditing on the data set to be issued, wherein the key points comprise checking and auditing whether metadata information is filled in normally or not and checking whether entity data is accurate or not; and select a range of users to which the data set can authorize access.
Further, the integrated sharing service of the data resource includes:
5.1) data set retrieval, including two data retrieval modes of key words and classified navigation, and supporting API interface encapsulation of various data retrieval modes;
5.2) filtering and sorting the data set, wherein the filtering and sorting comprises data resource label cloud display and multi-condition step-by-step filtering service thereof, and the re-sorting display of the data resource retrieval results under multiple conditions is supported;
5.3) data set access and evaluation, including online browsing, playing and displaying of typical entity data files in data resources; supporting online customized query and result downloading and fusion integrated display of the entity data of the relational table; supporting full-text retrieval of text entity files; metadata online downloading and API access service packaging are supported; supporting a data social service;
5.4) recommending data sets, including recommending services based on metadata content correlation calculation of the data sets, and supporting data recommending services based on user access behavior statistics;
5.5) recording and counting the service of the data set, wherein the recording and counting comprise full log record management of user data access behaviors, and statistics and display of data set access and downloading conditions are supported;
and 5.6) user personalized services, which comprise the display of user access and download history, and support the collection, evaluation and labeling management of users.
A general distributed heterogeneous data integrated physical aggregation, organization, release and service system comprises a central terminal and a distribution terminal, wherein the distribution terminal is provided with a data aggregation transmission software module, the central terminal is provided with a data management and release software module and a data sharing and service portal module, and a common basic data registration and service sub-module is integrated in the data management and release software module;
the data convergence transmission software module is responsible for convergence transmission and synchronization of distributed heterogeneous data from the distribution end to the central end;
the data management and release software module is responsible for registering public basic data, establishing a library for the converged data resources, organizing and editing the converged data resources, and uniformly releasing and auditing the data resources;
and the data sharing and service portal module is responsible for carrying out integrated sharing service of data resources.
The key innovation of the invention comprises:
1) a general distributed heterogeneous data (relational type and file type) integrated physical convergence, organization release and integration fusion service method and system design are provided. The framework is easy to expand, and a user can expand other needed relational data sources. In the aspect of files, the invention realizes a local file system and an FTP file data source, and a user can extend other data sources such as Samba files. In addition, the user can also extend the data source of NoSQL by himself, such as: MongoDB, and the like.
2) The method realizes the physical convergence, organization, release and decoupling of the whole flow of the integrated service of heterogeneous data resources (particularly supporting relational data), fully considers the requirements of high customizability and high reusability in the design of the method, effectively improves the universality and flexibility of the method, and has universal scene applicability. The user can finish the effective physical aggregation, release and service of the distributed data only through customized configuration, thereby improving the efficiency of the design and development of the distributed data sharing service system and shortening the development period of software.
3) And customized remote transmission aggregation and synchronous management of relational data are realized.
4) The full-text retrieval service oriented to the content of the text entity data file and the full-field customized retrieval service oriented to the relational data are realized.
5) The method realizes the fusion configuration and service functions (such as file, image and video association, data sub-table association, various URL display association, enumeration list association and the like) among heterogeneous data resources.
6) The method realizes effective integration of various advanced data service functions, is convenient for users to quickly discover, obtain, share and use data resources, and is in orbit with international services. The method realizes multiple data retrieval modes and multiple associated recommendation modes of the data set, realizes the step-by-step filtering and sequencing of the tag cloud, realizes the automatic packaging service of the data resource API, realizes the personalized service support facing users, realizes the bilingual support of a platform, realizes the unique identification and normalized data citation service, and realizes the data license agreement customized service.
The invention has the following beneficial effects:
the invention realizes the high-efficiency convergence transmission and synchronization of distributed heterogeneous entity data (file type and relational data), realizes the centralized library building, the organization management and the unified publishing of data resources (note that a data set is taken as a publishing organization model, and comprises three parts of PID, metadata and entity data, wherein the PID is a continuous data object identifier which refers to an internationally recognized globally unique identifier code such as a handle code or a DOI identifier, and the like), and finally realizes the integration and sharing of data publishing services in various forms at a data resource portal.
Drawings
FIG. 1 is a diagram of the overall functional logic framework of the present invention.
FIG. 2 is a diagram of the overall method steps and relationships of the present invention.
FIG. 3 is a structural relationship diagram of a common basic data registration refinement process.
FIG. 4 is a structural diagram of a distributed heterogeneous data convergence transmission and synchronization refinement process.
FIG. 5 is a diagram of a prototype interface of a newly added relationship data source.
FIG. 6. the relational data task builds a prototype interface graph.
FIG. 7. document-type data task construction prototype interface diagram.
FIG. 8 is a structural relationship diagram of the process of centralized database creation and organization of data resources and editing and refining.
FIG. 9 is a diagram of an importation data template for importation table building.
FIG. 10. import create new table prototype interface diagram.
FIG. 11. correlation creation of a new table prototype interface diagram.
FIG. 12. A prototype interface diagram for configuration of a relational table description fused with fields.
FIG. 13 is a table of a relational database data management prototype interface diagram.
FIG. 14 is a diagram of a prototype document data management interface.
FIG. 15 is a structural diagram of a process of uniform data resource release and audit refinement.
FIG. 16 is an illustration of sample instances of data set metadata on-line filling.
FIG. 17 is an illustration of a data set PID identification and reference requirement sample.
FIG. 18 is an illustration of data set entity data selection samples.
FIG. 19 is a flow chart showing the structure of the data resource integration sharing service.
Fig. 20 is a diagram of the overall system software architecture of the present invention.
FIG. 21 is a block diagram of the system software deployment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
The overall functional logic framework diagram of the present invention is shown in fig. 1. The overall method steps and relationships are shown in fig. 2. The overall process is broadly divided into five major steps (or subsystems): 1. the method comprises the steps of public basic data registration management, 2. distributed heterogeneous data convergence transmission and synchronization, 3. data resource library establishment organization and editing, 4. data uniform release and audit, and 5. data resource integration sharing service.
Step 1 can be understood as the initialization process of the whole invention, and mainly completes the registration work of public basic data; step 2, realizing the physical convergence transmission of the distributed heterogeneous data resources and the synchronous management of the relational data; step 3, building management, organization description and editing management of the converged heterogeneous data are realized; step 4, realizing unified issuing organization and auditing authorization management of data, and step 5, realizing integrated sharing service and management of (issued) data resources. Wherein step 2 is completed at the distribution end, and the other steps are completed at the central end, and the detailed flow and functions of each step of the method are mainly explained below.
1. Public underlying data registration
The parallel registration function of the common basic data is realized, and the registration management of the basic operation data such as data resource nodes, metadata extension elements, classification systems, license agreements and the like is included. The step is used by a system administrator, and the identity authentication of the user needs to be managed through the system before the step is started.
The main flow structure of this step is shown in fig. 3. The implementation details of each step in fig. 3 are described below with emphasis on the description.
1.1 data resource node registration
And realizing registration management of the data node information of the distribution end and the authentication information of the node administrator. The method specifically comprises the following steps: and registering, filling, editing and managing attribute information such as data node names, node codes, node introduction, node contact persons, contact telephones, Email, node administrator accounts, node administrator passwords, data node creation time, serial numbers and the like.
And the account number and the password of the node administrator are used for the distributed end to deploy and execute the step 2 distributed heterogeneous data convergence transmission and synchronization, and the authentication of the node management user is carried out when the distributed end is started. Meanwhile, the Vsftp service of installation and deployment is started at the central end, the FTP account password is initially constructed by using the same account password, the remote transmission of data is realized by adopting an FTP protocol at the bottom layer of the system (compared with the traditional http protocol, the method is more efficient and stable, and breakpoint continuous transmission is conveniently realized), meanwhile, the distributed data nodes are ingeniously supported, a third-party FTP tool is adopted, files are automatically transmitted by using the account password, and the wide compatibility of the transmission tool is effectively realized. Meanwhile, the data resource node registration fully embodies that the invention has universal customizability.
The distributed data nodes adopt a third-party FTP tool to automatically transmit files by using the account password, which means that file type entity data can be realized by using 2.3 transmission task operation management hereinafter, and relational entity data only needs to use 2.3 transmission task operation management hereinafter, but as the center end adopts general Vsftp service, users can not adopt a distributed heterogeneous data aggregation transmission tool hereinafter in part 2 for the file type entity files, and adopt any third-party FTP tool software, and after logging in directly by using the FTP account password provided in the node information, the transmission files are completely compatible.
1.2 metadata extension element registration
The customized configuration management supporting the expansion metadata item comprises addition, editing, query and the like, and the specific metadata configuration item comprises the following steps: metadata Chinese name, metadata English name, field type, necessity of filling items, repetition, sequence number and remark.
1) The metadata extension element registration is a function for realizing the user-defined extension of a metadata structure, and is a manifestation of the invention having general customizability.
2) The core metadata of the data set of the present invention is embedded with metadata elements (metadata extension elements are relative to the embedded core elements) including:
table 1 core metadata element description of the invention built into the data set
Figure BDA0002360741420000071
3) The mandatory item in the above table represents that the metadata element is mandatory to be filled, and the uniqueness represents whether the metadata element allows multiple filling, and the field types include: the field type determines the display control style of a metadata interface to be input in the future, such as controls of a single-line text, a multi-line text, a date control, a drop-down list, an upload control and the like, and the field type has strong customizable types; the verification rule of the metadata element provides basic format verification format definition, a user can define the rule by himself and then analyze the rule, and the rule can be defined by a regular expression and verified by the regular expression.
4) The Chinese and English name elements in the table are closely related to the English version data release and the English version portal supported by the invention, and the following metadata filling part in the data resource release supplements the description.
1.3 data taxonomy registration
The method supports the operations of registering, editing and deleting the tree data classification system, and the classification system information includes but is not limited to: the classification name, the classification code, the classification description and the like, and a user can add, edit, insert and delete any tree classification system node information.
The data classification system supports automatic customized extension of multi-level classification, is closely associated with classified navigation retrieval of data sets in the data resource integration sharing service, is associated through the built-in metadata element classification coding, and is selectively filled by a publishing user when data is published. Data taxonomy registration is one embodiment of the universal customizability of the present invention.
1.4 license agreement registration
The method supports standard license agreements such as CC, ODC and PDDL, and supports operations such as registration, editing and deletion of self-defined license content, and the key points of registration information comprise protocol identification codes, protocol names, protocol identification pictures, protocol description texts and the like.
The license agreement is a protection mode of data acquisition, multiplexing and propagation. The registered license agreement is associated with the data set detail display in the data resource integration sharing service, and is associated through the built-in metadata element 'license agreement' described above, and the publishing user performs association selection filling in when the data is published. License agreement registration is also an embodiment of the universal customizability of the present invention.
2. Distributed heterogeneous data convergence transmission and synchronization
Unified registration and connection management of relational and file data sources are realized; the construction of a customized data transmission task is supported, and the physical convergence of heterogeneous data is realized; the method supports breakpoint continuous transmission of transmission tasks, customized timing, automatic and manual synchronization of relational data, and log management of the whole process of data transmission and synchronization.
In the step, application needs to be carried out on the data nodes of the distribution end for a node administrator to use, and the identity authentication of the node administrator needs to be passed before the step is started.
The main flow structure is shown in fig. 4. The implementation details of each step in fig. 4 are described below with emphasis on the description.
2.1 heterogeneous data Source registration
And realizing the unified registration connection management of the relational data source and the file type data source.
Relational data Source: and the registration and connection test of database connection information are supported. The data source information at least comprises a data source name, a database type, a host address, a port number, a user name, a password and the like, wherein the database type at least supports mainstream relational databases such as MySQL, Oracle, SQLServer and the like, and can extend other relational databases. The prototype interface of the newly added relationship data source is shown in fig. 5.
File type data source: the definition and management of address information of file type data storage is supported. The data source information at least comprises a data source name and a file access protocol (when the access protocol is a local file system, the subsequent information needs to comprise data file path information, and when the access protocol is FTP, the subsequent information needs to comprise an FTP account, an FTP password, FTP path information of a data file and the like); supports the extension of Samba and other protocols.
In the implementation of the method, both the relational data source and the file type data source need to realize connectivity test, and the validity of the registration information of the data source is ensured. The connectivity test can be carried out when the data source information is stored, and the registered user needs to be fed back in time when the problem that the communication cannot be carried out occurs.
The data source registration is the basis for shielding heterogeneous data resources, and in the implementation of subsequent data task transmission data, the structure of a relational database table and data reading are realized by converting the adaptation of different database types into standard SQL (structured query language); the file type data is a direct read file implementation.
2.2 data transfer task construction
The management of the construction, editing, checking, deleting and the like of the relational data tasks and the file data tasks is realized.
And (3) constructing a relational data task: and acquiring a related data table by connecting the relational data source described above, and selecting a related entity data table or a logic data table formed by SQL to form a data transmission task. The specific prototype interface is shown in fig. 6.
And (3) file type data task construction: and determining a related file directory system by connecting the previously described file data source, selecting related entity files or directories, and selecting the position of a target transmission directory at the center end to form a file type data transmission task. The specific prototype interface is shown in fig. 7.
2.3 transport task operation management
And the remote efficient stable transmission management of the data tasks at the distribution end to the central end is realized.
Support breakpoint resume of data transfer tasks
Support of data encryption compressed transmission
Presentation of support for transmission progress
Log record management supporting the entire process of transmission
As previously described, the present invention entity data file transfer hub-based Vsftp service employs the FTP protocol, supporting full compatibility with third party FTP tools.
In the aspect of relational entity data transmission, a certain type of relational database cluster constructed based on a center end comprises the following steps: MySQL extracts and maps different types of relation table structures and data of a distribution end MySQL, Oracle, SQLServer and the like into a table building SQL statement and a data inserting SQL statement which are consistent with the structure of a central end library, then packages the table building SQL statement and the data inserting SQL statement and transmits the table building SQL statement and the data inserting SQL statement to the central end in a compressed file mode, and the central end performs the table building SQL and the data inserting SQL uniformly in a cloud relation database after decompression so as to realize remote transmission of the relation data table and the data.
2.4 relational data synchronization management
The method realizes the synchronous management of the relational data of the distribution end to the relational database of the central end at regular time with high efficiency and stability.
Here, the synchronization is only for the relational data, which means that each record in the relational table or the logical table in a certain transmission task at the distribution end is synchronized into the relational table at the center end in timing. The method mainly considers the situation that the data of a certain relation table at a distribution end is increased or changed by certain records regularly or irregularly, a user can directly customize synchronous frequency without adding a transmission task again, and a system can synchronously update the table data related in the transmission task and the table data at a central end regularly to ensure the consistency of the records in the relation table at the distribution end and the central end.
Compared with relational data, the file type entity data can be transmitted again through a newly-built transmission task in consideration of low change frequency, so that the method does not support synchronization of the file entity data temporarily.
Support for timing synchronization of relational data tables, manual synchronization to achieve data synchronization (where timing synchronization supports user-customized synchronization frequency, e.g., 1 hour, 12 hours, 1 day, 1 week, etc.).
The manual synchronization means that a user clicks an immediate synchronization button in a transmission task to realize immediate synchronization transmission of the database table data records in the current task to a database table at a central end, so that the consistency of data is ensured.
The timing synchronization means that a user sets a synchronization period setting of a transmission task, such as 1 hour, 12 hours, 1 day and 1 week, a background process of the system matches the user period setting, when the period time arrives, the system automatically realizes synchronous transmission of the database table data records in the current task to a database table at a central end, and consistency of records in the database tables at a distribution end and the central end is maintained.
And detail operation log information record supporting the data synchronization process is ensured, and the data synchronization process can be traced.
3. Centralized database building organization and editing of data resources
The method realizes the construction and management of the relational database, supports the construction of an online new table structure and the import and editing of table data, and provides online database construction and data management service for users. The method realizes the network disk type management of the file data, and supports the management operations of uploading, downloading, copying, moving, deleting and the like of the file type data resources.
The step is used by a node administrator, and the identity authentication of the user needs to be managed through the node before the step is started.
The main flow structure is shown in fig. 8. Each step of fig. 8 is described in detail below with emphasis on the implementation.
3.1 relational database construction
And realizing the creation of a new relational database through the introduction of an Excel template or creating a new table through the association of an existing and described relational data table.
Leading-in type table building: and creating a new table through an Excel template, and storing data in the template into a database. Excel template rule: each sheet page in Excel represents a data table, the name of the sheet page is the name of the data table to be established, the first line must be field description information, the second line is the name of the field, the third line is the data type (including Varchar, Text, Integer, Float, Double, datatime and the like), and the actual data is started from the fourth line. The pattern is shown in fig. 9.
The association creates a new table: there are two ways.
Import create new table: the connection fields of the table A and the table B and the composition fields of the new table are respectively selected through the interface, the new table name is filled in, the new table is constructed, and the data of the new table can be previewed, as shown in FIG. 10.
Associations create new tables: the method defines a new table name to form a new table through the SQL sentences connected by the multiple tables, supports the verification of the SQL sentences, can preview the results of the SQL sentences, namely the new table data, and supports the synchronous updating of the customization frequency of the table data. The association creation new table prototype interface is shown in fig. 11.
3.2 relational database table description and field fusion configuration
And realizing the description and fusion configuration of the structural information of the relation table selected by the central end.
1. Including describing the name of the relational data table, describing the name of the field of the relational data table, and referring to fig. 12.
2. The fusion configuration is realized by setting a field display type of the relational data table, and the method specifically comprises the following steps:
text type (default display type)
URL type (further selection settings include FTP, HTTP, Email, picture links, etc.)
Enumeration type (optionally setting an enumeration string such as male or female, or setting an SQL statement including a storage column and a display column, such as select user id and user name from user)
Sub-table type (further selecting and setting table name and associated field of associated sub-table; increasing or decreasing and setting multiple)
File type (further selection setting files, pictures, video, and can set the main path of the file location, multiple file association record separator)
3.3 relational database table data management
The data management of all the relation table of the central end is realized, and the data viewing, adding, editing and deleting operations are supported.
The user can check all data tables under the managed database, update, add, check data, delete and the like on the data tables, and the retrieval of all fields in the relation table is supported. The prototype view is shown in fig. 13.
3.4 File-type data management
And the network disk type management of all data files and directories at the central end is realized. The prototype view is shown in fig. 14.
File and directory base operations, right key operations file rename, move, copy, delete.
Searching files and directories, and deeply searching files and directories containing the specified name with the current path as a root path.
Uploading files, supporting the uploading of files under the current path and uploading files by selecting the specified path.
And downloading files, wherein the downloading of files selected by double clicking and the downloading of files selected by right clicking are supported.
Create a new directory, create a folder under the current path.
4. Unified publishing and auditing of data resources
The data resource publishing is realized, and unified metadata description, data range selection and publishing management of heterogeneous data are supported. The unified audit of the data resources is realized, the batch audit mode is supported, and the user permission setting and fusion configuration are supported. The main flow structure is shown in fig. 15.
In the step, the report is released, edited and submitted for a node administrator to use, and the identity authentication of a node management user is required before the corresponding function in the step is started; in this step, the audit and authorization are issued for the system administrator to use, and before the corresponding function in this step is started, the identity authentication of the system management user is required.
Each step of fig. 15 is described in detail below with emphasis on the implementation.
4.1 data set metadata filing
Based on the built-in metadata and the extended metadata of the invention, the online filling and batch filling of the metadata of the data set are dynamically realized.
1) In the aspect of filling, based on the necessary items, uniqueness, element types and check rules defined by built-in and expanded metadata elements, ① automatically generates a metadata online filling page to realize metadata online filling one by one (see a sample figure 16), wherein a classification system and a data permission protocol can provide user selection of an enumeration list based on the definition of a basic public data registration part, and a system correspondingly stores related enumeration item numbers;
the two filling modes automatically check the necessary items and the check rules. In addition, table 1 shows that the elements automatically filled in the system in the built-in metadata of the present invention have been annotated with identifiers, when filling is implemented, part of the elements are automatically filled in after being selected by the user on line (such as classification system selection, license agreement selection, etc.), and other elements are automatically filled in when being stored in the system background (such as PID is acquired through a background PID automatic registration interface and then filled in, data set release time is automatically filled in by the system according to the current time, reference format is automatically spliced and filled in by referencing format character string definition rules, total file number, total memory amount, etc. are automatically counted by the background and then filled in), without the user on line and batch filling.
2) In the aspect of filling, the effective butt joint with a globally unique data persistent identification distribution interface is supported, and the PID of the current data set is automatically generated; and automatically realizing the data reference text of the current data set according to the data reference format definition, and realizing the automatic filling of the built-in data reference metadata elements. A sample of PID data identification and data referencing is shown in fig. 17.
3) As mentioned above, the invention supports the implementation of Chinese and English bilingual languages. The English name of the built-in and expanded metadata can be used on the metadata element display, in the aspect of metadata content, after the metadata is filled in online and filled in batch, the filled Chinese metadata is automatically translated into English (which can be realized by using an open translation interface of Baidu or Google), the manual verification of a translation result by a user is supported, and the final Chinese and English metadata is synchronously stored in a system background.
4.2 dataset entity data selection
Based on a relation table and a file system of a central end, the selection of an online relational entity data table and the selection of an entity data file based on a file directory system (supporting the independent selection and the simultaneous selection of a heterogeneous entity data table and an entity file) are realized, and the selection of the file is supported on-line and immediate uploading. Data set entity data selection samples are shown, for example, in fig. 18.
4.3 data set editing and submission publishing
The metadata filing of 4.1 and the entity data selection of 4.2 are two important steps for the data set organization to publish. Two-step reediting and selection is supported during dataset editing. And when the data is confirmed to be correct, the data set can be submitted to an auditor for issuing and auditing.
When the data set is submitted, published and audited, the background can automatically extract the text content of all the text entity files (such as txt, doc, pdf and the like) in the data set, and construct a full-text database of related entity files, so as to realize file content indexing and support the realization of full-text retrieval based on the text entity files in the integrated sharing service.
4.4 dataset auditing and authorization publishing
Performing content verification on a data set to be issued, wherein the key points comprise checking and verifying whether metadata information is filled in normally or not and checking whether entity data are accurate or not; and selecting a range of users to which the data set is authorized to access, including: either fully open to all users or open to a certain user/users (group of users).
In the aspect of data set auditing, besides the support of an online auditing function, the data set batch export is supported for offline auditing. The method comprises the steps that in the implementation, data set metadata are exported in batches to form excels, access interface packaging of entity data files and relational data based on HTTP or FTP is supported, automatic association is carried out on the interfaces and the entity data metadata of a data set, further offline data set metadata viewing of metadata export metadata Excel files based on the batch data set is supported, entity data are accessed, an audit result is selected, and opinions are entered; and the batch import system of the verification results of the Excel metadata is supported.
The data set auditing and authorized issuing operation are closely related to the data resource integration sharing service in the step 5, the authorized data sets are audited and issued, and a user can inquire and check the related data sets in the step of sharing service; and after logging in the system, users (user groups) in the authorized range of the data set can acquire the complete access right of the entity data of the data set.
5. Data resource integration sharing service
The data resource discovery and access service is integrated, and Chinese and English bilingual service and automatic switching are supported. The method supports uniform classified retrieval and keyword retrieval of data resources, supports tag cloud filtration and various sequencing organizations, supports full-field customized query of an entity relation table, supports full-text content retrieval of text entity files, and supports online preview and play of multi-format data files such as documents, pictures, videos, audios and the like; the recommendation and acquisition service of the data resources is realized, a plurality of data association recommendation modes based on the content and the user behavior are supported, a plurality of data acquisition modes such as online downloading of the data resources and API (application program interface) access are supported, and management-oriented data access classification statistics is supported; the personalized management service of the data resources is realized, and the services of collection, recommendation, downloading, evaluation, labeling and the like of personalized requirements are supported.
In the step, the data set is retrieved, filtered and sorted, accessed and recommended for anonymous users to use; in this step, the data set downloads the evaluation and personalized service for the authorized user to use, and the user needs to pass the identity authentication before starting the corresponding function in this step.
The main flow structure is shown in fig. 19. The following description focuses on the implementation details of each step in fig. 19.
5.1 dataset retrieval
The method supports two data retrieval modes of keywords and classified navigation (when a user defines a data set comprising longitude and latitude metadata, online map retrieval should be supported), and supports API (application programming interface) interface packaging of various data retrieval modes.
Keyword retrieval, which supports full-text search for a keyword based on data metadata, and sorts the searched data set information by relevance.
And (4) classified navigation retrieval, namely displaying related data resources according to related classifications or searching data set information in a specified classification according to a globally set classification system.
5.2 data set Filter ordering
And the cloud display of the data resource labels and the multi-condition step-by-step filtering service of the data resource labels are supported, and the multi-condition re-sequencing display of the data resource retrieval results is supported.
Step-by-step filtering of combinations such as tag clouds and the like, supporting a user to dynamically generate tag clouds based on data resource retrieval results, and further supporting step-by-step tag cloud filtering of data resources; combined filtering based on categorical navigation keywords is supported.
And comprehensive sequencing, which supports dynamic sequencing of data resources according to information such as time, file type, user access heat and the like.
5.3 data set Access and evaluation
The method is oriented to user requirements, and realizes online browsing, playing and displaying of typical entity data files in data resources; supporting online customized query and result downloading and fusion integrated display of the entity data of the relational table; supporting full-text retrieval of text entity files; metadata (entity data) online downloading and API access service packaging are supported; and the data social services of user-defined labeling, evaluation, sharing and the like are supported.
And entity data file online browsing, wherein the supported file formats include but are not limited to mainstream data file types such as doc, xls, pdf, mp3, csv, avi, txt and the like, and the file formats can be dynamically expanded, and preview display and play of other expansion formats can be supported.
Online query and display of table data, support of full-field customized retrieval (e.g., combination of customized field retrieval conditions), display (e.g., customized display columns and row sequences) and result download of relational table data, and support of relational table row-level data association sub-tables, file, video and picture display based on fusion configuration of relational table; and (3) connection services of an associated enumeration dictionary and a URL (uniform resource locator) (URL text automatically displays a clickable link format, and the link format is supported and comprises http, ftp, email and the like).
Text-type document full-text retrieval, which is based on the extraction and indexing of the content of the text-type document (including but not limited to txt, doc, docx, pdf, etc.) when the data set is submitted and released, and supports the text-type entity data document full-text retrieval function.
And the data downloading service provides selective downloading of data entities oriented to data sets, different layers and different ranges of the data files based on the query result for the login user, and simultaneously provides downloading oriented to metadata. Besides the online downloading based on the interface, the downloading mode based on the API interface is supported simultaneously.
Data social services: the method supports the data resource scoring evaluation provided by the logged-in and downloaded user, supports the data set tagging function of the access user, and realizes the auditing management and the filtering of the tags tagged by the user by a background manager, thereby supplementing and correcting the existing data set tag setting. And the user is supported to conveniently share the data set URL to social media such as WeChat and microblog.
5.4 data set recommendations
The recommendation service based on the metadata content correlation calculation of the data set is supported, and the data recommendation service based on the user access behavior statistics is supported.
And metadata content association recommendation, which supports recommendation of other data sets with higher similarity with the current data set based on the content of description information of each element of metadata, and facilitates quick discovery of other data sets with higher association similarity by a user.
And user access behavior analysis recommendation supports statistical analysis of access conditions of other data sets based on the current data set access user group, recommends homogeneous data sets which may be interested by the current user, and facilitates the user to quickly discover homogeneous data resources.
5.5 data set service recording and statistics
The method supports the full log record management of user data access behaviors, and supports the statistics and display of data set access and download conditions.
And user access log management, which supports full log record of user login, access, download and other access behaviors.
Data resources and service statistics, statistics and ranking that support data set viewing, collection, download.
Statistical presentation of data sets, supporting statistical result presentation in a variety of presentation forms, such as histograms, graphs, and the like.
5.6 user personalization service
The method supports the display of user access and download history, and supports the collection, evaluation and labeling management of the user.
My access and download, support the user to search quickly, view the data resources accessed and downloaded by the user.
My evaluation supports users to quickly search and view the evaluated data resources.
My tags, which support users to quickly search for, and view tags that users have tagged to data resources.
My collection supports collection operation for data resources, and is convenient for a user to conveniently check and obtain data resources interested by the user.
6. Description of System integration
In system implementation, the steps of the method of the present invention are combined appropriately, and the overall system software structure is shown in fig. 20. The system comprises three software systems of data gathering and transmitting software, data management and publishing software and a data sharing and service portal from bottom to top.
The overall deployment structure of the system is shown in fig. 21. The Web development technology widely used at present can be adopted on the system realization mode, and the design mode of MVC is adopted based on the B/S framework. Wherein: the Model (Model) is the part of the application program for processing the application program data logic, the Controller (Controller) is the part of the application program for processing the user interaction, and the View (View) is the part of the application program for processing the data display.
7. Summary of the invention
The invention has the beneficial effect of providing a universal distributed heterogeneous data (relational type and file type) integrated physical convergence, organization release and integration fusion service method and system design.
The method realizes the decoupling of the whole process of the physical convergence, organization, release and integration of heterogeneous (relational and file type) data resources, fully considers the requirements of high customizability and high reusability in the design of the method, effectively improves the universality and flexibility of the invention, and has universal scene applicability. The user can finish the effective aggregation, release and service of the distributed data only through customized configuration, thereby greatly improving the efficiency of the design and development of the distributed data sharing service system and shortening the development period of software.
Meanwhile, the method considers the advancement of the service, realizes the centralized physical efficient aggregation transmission and synchronization of heterogeneous (relational and file) data, realizes the modes of batch data filling, organization and auditing, gets through the access of the data persistent identifier and the data reference standard, realizes the support of bilingual publishing, realizes the full-text retrieval and the full-table customized retrieval of text entity data, realizes the integrated service of the structured pre-unstructured data, and realizes the integration and encapsulation of various services such as retrieval, filtering, access, downloading, recommendation, social contact and the like.
The present invention provides a general method, schema and framework that is easily scalable. Wherein, in the aspect of the heterogeneous data source, the user can expand the data source according to the requirement. If the system of the invention realizes the main stream relational databases such as MySQL, Oracle, SQLServer and the like, the user can expand other required relational data sources by himself. In the aspect of files, the invention realizes a local file system and an FTP file data source, and a user can extend other data sources such as Samba files. In addition, the user can also extend the data source of NoSQL by himself, such as: MongoDB, and the like.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the principle and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A general distributed heterogeneous data integrated physical aggregation, organization, release and service method is characterized by comprising the following steps:
1) registering public basic data at a central terminal, wherein the public basic data comprises data node registration, metadata extension element registration, classification system registration and license agreement registration of a distribution terminal;
2) the distribution end carries out convergence transmission and synchronization of distributed heterogeneous data to the central end;
3) establishing a library, organizing and editing the converged data resources at the central end;
4) uniformly issuing and auditing data resources at a central terminal;
5) and performing integrated sharing service of data resources at the central end.
2. The method according to claim 1, wherein the data node registration implements registration management of data node information of a distribution end and node administrator authentication information;
the metadata extension element registration supports the customized configuration management of extension metadata items, and the configuration items of the metadata comprise: metadata Chinese name, metadata English name, field type, necessity of filling, repetition, sequence number and remark;
the classification system is registered, the registration, editing and deletion operations of the tree data classification system are supported, the classification system information comprises classification names, classification codes and classification descriptions, and a user can perform addition, editing, insertion and deletion operations on any tree classification system node information;
the license agreement registration supports standard license agreements and supports registration, editing and deletion operations of self-defined license contents, and the registration information comprises a protocol identification code, a protocol name, a protocol identification picture and a protocol description text.
3. The method of claim 2, wherein the data node is registered, and wherein the attribute information of the data node comprises: the method comprises the following steps of (1) data node name, node code, node introduction, node contact person, contact phone, Email, node administrator account, node administrator password, data node creation time and serial number; wherein the account number and the password of the node administrator are used for the identity authentication of the node administrator when the distribution end executes the step 2); the metadata extension element registry comprising the following metadata elements: data set unique persistent identification, data set cover, data set name, data set profile, keywords, category code, start time, end time, creation authority, creator, latest creation/update date, release authority, contact mail, contact phone, latest release date, license agreement, reference format, total storage, total number of files, total number of records.
4. The method of claim 1, wherein the aggregated transmission and synchronization of the distributed heterogeneous data comprises:
2.1) carrying out heterogeneous data source registration, including unified registration connection management of a relational data source and a file type data source;
2.2) constructing data transmission tasks, including constructing a relational data task and constructing a file type data task;
2.3) carrying out transmission task operation management, and remotely and efficiently and stably transmitting the data tasks of the distribution end to the central end;
and 2.4) carrying out relational data synchronous management, and regularly synchronizing each record in a relational table or a logic table in a transmission task of the distribution end to a relational table of the central end.
5. The method according to claim 3, wherein the relational data task in step 2.2) is constructed by connecting the above description relational data source to obtain the related data table, and selecting the related entity data table or the logical data table formed by SQL to form the data transmission task; the file type data task is constructed by connecting the data source of the previously described file to determine a related file directory system, selecting related entity files or directories and selecting the position of a target transmission directory at a center end to form a file type data transmission task.
6. The method according to claim 3, wherein step 2.3) the transmission task execution management comprises: the entity data file transmission adopts an FTP protocol based on the Vftp service of the central terminal, and supports complete compatibility with a third-party FTP tool; in the aspect of relational entity data transmission, based on a certain type of relational database cluster constructed by a central end, relational table structures and data of different types of a distribution end are extracted and mapped into table construction SQL statements and data insertion SQL statements which are consistent with the structure of a central end database, then the table construction SQL statements and the data insertion SQL statements are packaged and transmitted to the central end in a compressed file mode, and the table construction SQL and the data insertion SQL are uniformly executed in a cloud relational database after the central end decompresses, so that the relational data table and the data are remotely transmitted.
7. The method of claim 1, wherein the pooling organization and editing of the aggregated data resources comprises:
3.1) establishing a relational database, wherein a new relational database is established by introducing an Excel template, or a new table is established by associating an existing and described relational data table;
3.2) carrying out description and field fusion configuration on the structure information of the relational table; the description of the relational database table structure information comprises a description relational data table name and a description relational data table field name; the field fusion configuration is realized by setting a certain field display type of the relational data table, and comprises a text type, a URL type, an enumeration type, a sub-table type and a file type;
3.3) managing the data of all the relation table of the center end, and supporting the data viewing, adding, editing and deleting operations;
and 3.4) carrying out file type data management, including network disk type management of all data files and directories at the center end.
8. The method according to claim 1, wherein the uniformly publishing and auditing the data resources comprises:
4.1) dynamically realizing the on-line filling and batch filling of the metadata of the data set one by one based on the built-in metadata and the extended metadata;
4.2) based on a relation table and a file system of a center end, the selection of an online relational entity data table and the selection of an entity data file based on a file directory system are realized, and the online immediate uploading selection of the file is supported;
4.3) editing, submitting and issuing the data set;
4.4) performing content auditing on the data set to be issued, wherein the key points comprise checking and auditing whether metadata information is filled in normally or not and checking whether entity data is accurate or not; and select a range of users to which the data set can authorize access.
9. The method of claim 1, wherein the integrated sharing of the data resource comprises:
5.1) data set retrieval, including two data retrieval modes of key words and classified navigation, and supporting API interface encapsulation of various data retrieval modes;
5.2) filtering and sorting the data set, wherein the filtering and sorting comprises data resource label cloud display and multi-condition step-by-step filtering service thereof, and the re-sorting display of the data resource retrieval results under multiple conditions is supported;
5.3) data set access and evaluation, including online browsing, playing and displaying of typical entity data files in data resources; supporting online customized query and result downloading and fusion integrated display of the entity data of the relational table; supporting full-text retrieval of text entity files; metadata online downloading and API access service packaging are supported; supporting a data social service;
5.4) recommending data sets, including recommending services based on metadata content correlation calculation of the data sets, and supporting data recommending services based on user access behavior statistics;
5.5) recording and counting the service of the data set, wherein the recording and counting comprise full log record management of user data access behaviors, and statistics and display of data set access and downloading conditions are supported;
and 5.6) user personalized services, which comprise the display of user access and download history, and support the collection, evaluation and labeling management of users.
10. A general distributed heterogeneous data integrated physical aggregation, organization, release and service system is characterized by comprising a central end and a distribution end, wherein the distribution end is provided with a data aggregation transmission software module, the central end is provided with a data management and release software module and a data sharing and service portal module, and a common basic data registration and service sub-module is integrated in the data management and release software module;
the data convergence transmission software module is responsible for convergence transmission and synchronization of distributed heterogeneous data from the distribution end to the central end;
the data management and release software module is responsible for registering public basic data, establishing a library for the converged data resources, organizing and editing the converged data resources, and uniformly releasing and auditing the data resources;
and the data sharing and service portal module is responsible for carrying out integrated sharing service of data resources.
CN202010020974.1A 2019-11-19 2020-01-09 Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system Active CN111259006B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911135740 2019-11-19
CN2019111357405 2019-11-19

Publications (2)

Publication Number Publication Date
CN111259006A true CN111259006A (en) 2020-06-09
CN111259006B CN111259006B (en) 2023-06-27

Family

ID=70951160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010020974.1A Active CN111259006B (en) 2019-11-19 2020-01-09 Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system

Country Status (1)

Country Link
CN (1) CN111259006B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752249A (en) * 2020-07-06 2020-10-09 成都信息工程大学 Data acquisition and cataloguing method, system, terminal equipment and storage medium applied to production process of discrete manufacturing industry
CN112035438A (en) * 2020-09-01 2020-12-04 江苏风云科技服务有限公司 Government affair big data platform system
CN112307129A (en) * 2020-12-31 2021-02-02 成都四方伟业软件股份有限公司 Control system constructed based on data sharing and control method thereof
CN112463885A (en) * 2020-11-26 2021-03-09 北京宏景世纪软件股份有限公司 Data synchronization implementation method and centralized data system
CN113094393A (en) * 2021-03-16 2021-07-09 杭州数梦工场科技有限公司 Data aggregation method and device and electronic equipment
CN113110351A (en) * 2021-04-28 2021-07-13 广东省科学院智能制造研究所 Industrial production field heterogeneous state data acquisition system and method
CN113127413A (en) * 2021-05-12 2021-07-16 北京红山信息科技研究院有限公司 Operator data processing method, device, server and storage medium
CN113407810A (en) * 2021-06-04 2021-09-17 北京航空航天大学 City information and service integration system and method based on big data
CN113961625A (en) * 2021-10-27 2022-01-21 北京科杰科技有限公司 Task migration method for heterogeneous big data management platform
CN114240466A (en) * 2021-12-23 2022-03-25 中科星通(廊坊)信息技术有限公司 Remote sensing product authenticity checking method based on micro-service architecture
CN114253929A (en) * 2021-11-15 2022-03-29 北京计算机技术及应用研究所 Network disk system architecture based on distributed file storage
CN114844887A (en) * 2022-03-30 2022-08-02 广州市华懋科技发展有限公司 Novel Internet independent platform system and data interaction method thereof
CN115587087A (en) * 2022-12-13 2023-01-10 四川华西集采电子商务有限公司 Efficient data sharing platform based on data extraction and system modeling
CN115665040A (en) * 2022-06-22 2023-01-31 中兴智慧(北京)技术有限公司 Data processing method for heterogeneous data interface router
CN116303623A (en) * 2023-05-12 2023-06-23 国网信息通信产业集团有限公司 System and method for converging cross-network heterogeneous service data to mobile portal
CN116340691A (en) * 2023-05-25 2023-06-27 北京大学 Multi-source data-based data asset networking management and sharing method and system
CN116450578A (en) * 2023-06-15 2023-07-18 中国航发四川燃气涡轮研究院 Aircraft engine material data maintenance management method
CN116627955A (en) * 2023-05-30 2023-08-22 四川川大智胜系统集成有限公司 Heterogeneous data processing method, system, equipment and medium based on metadata
CN117520620A (en) * 2024-01-05 2024-02-06 中国电子科技集团公司第二十八研究所 Metadata-based automatic data resource association method and system
WO2024026931A1 (en) * 2022-08-05 2024-02-08 广东外语外贸大学南国商学院 Big data processing and forming method and model for adding value to data asset
CN115665040B (en) * 2022-06-22 2024-06-28 中兴智慧(北京)技术有限公司 Data processing method for heterogeneous data interface router

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657918A (en) * 2015-01-21 2015-05-27 胡宝清 Regional resource environmental data sharing and comprehensive service platform
WO2016015439A1 (en) * 2014-07-30 2016-02-04 国云科技股份有限公司 Database virtual microkernel data source registration and encapsulation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016015439A1 (en) * 2014-07-30 2016-02-04 国云科技股份有限公司 Database virtual microkernel data source registration and encapsulation method
CN104657918A (en) * 2015-01-21 2015-05-27 胡宝清 Regional resource environmental data sharing and comprehensive service platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李燕等: "黄河信息资源共享服务系统设计与实现", 《人民黄河》 *
高昂等: "空间数据访问集成与分布式空间数据源对象查询", 《地球信息科学学报》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752249A (en) * 2020-07-06 2020-10-09 成都信息工程大学 Data acquisition and cataloguing method, system, terminal equipment and storage medium applied to production process of discrete manufacturing industry
CN112035438A (en) * 2020-09-01 2020-12-04 江苏风云科技服务有限公司 Government affair big data platform system
CN112463885A (en) * 2020-11-26 2021-03-09 北京宏景世纪软件股份有限公司 Data synchronization implementation method and centralized data system
CN112307129A (en) * 2020-12-31 2021-02-02 成都四方伟业软件股份有限公司 Control system constructed based on data sharing and control method thereof
CN113094393A (en) * 2021-03-16 2021-07-09 杭州数梦工场科技有限公司 Data aggregation method and device and electronic equipment
CN113110351A (en) * 2021-04-28 2021-07-13 广东省科学院智能制造研究所 Industrial production field heterogeneous state data acquisition system and method
CN113127413A (en) * 2021-05-12 2021-07-16 北京红山信息科技研究院有限公司 Operator data processing method, device, server and storage medium
CN113127413B (en) * 2021-05-12 2024-03-01 北京红山信息科技研究院有限公司 Operator data processing method, device, server and storage medium
CN113407810A (en) * 2021-06-04 2021-09-17 北京航空航天大学 City information and service integration system and method based on big data
CN113961625A (en) * 2021-10-27 2022-01-21 北京科杰科技有限公司 Task migration method for heterogeneous big data management platform
CN114253929B (en) * 2021-11-15 2024-04-05 北京计算机技术及应用研究所 Network disk system architecture based on distributed file storage
CN114253929A (en) * 2021-11-15 2022-03-29 北京计算机技术及应用研究所 Network disk system architecture based on distributed file storage
CN114240466B (en) * 2021-12-23 2023-04-18 中科星通(廊坊)信息技术有限公司 Remote sensing product authenticity checking method based on micro-service architecture
CN114240466A (en) * 2021-12-23 2022-03-25 中科星通(廊坊)信息技术有限公司 Remote sensing product authenticity checking method based on micro-service architecture
CN114844887B (en) * 2022-03-30 2024-04-19 广州市华懋科技发展有限公司 Novel internet independent platform system and data interaction method thereof
CN114844887A (en) * 2022-03-30 2022-08-02 广州市华懋科技发展有限公司 Novel Internet independent platform system and data interaction method thereof
CN115665040B (en) * 2022-06-22 2024-06-28 中兴智慧(北京)技术有限公司 Data processing method for heterogeneous data interface router
CN115665040A (en) * 2022-06-22 2023-01-31 中兴智慧(北京)技术有限公司 Data processing method for heterogeneous data interface router
WO2024026931A1 (en) * 2022-08-05 2024-02-08 广东外语外贸大学南国商学院 Big data processing and forming method and model for adding value to data asset
CN115587087A (en) * 2022-12-13 2023-01-10 四川华西集采电子商务有限公司 Efficient data sharing platform based on data extraction and system modeling
CN116303623A (en) * 2023-05-12 2023-06-23 国网信息通信产业集团有限公司 System and method for converging cross-network heterogeneous service data to mobile portal
CN116303623B (en) * 2023-05-12 2023-10-13 国网信息通信产业集团有限公司 System and method for converging cross-network heterogeneous service data to mobile portal
CN116340691A (en) * 2023-05-25 2023-06-27 北京大学 Multi-source data-based data asset networking management and sharing method and system
CN116340691B (en) * 2023-05-25 2024-02-20 北京大学 Multi-source data-based data asset networking management and sharing method and system
CN116627955A (en) * 2023-05-30 2023-08-22 四川川大智胜系统集成有限公司 Heterogeneous data processing method, system, equipment and medium based on metadata
CN116450578B (en) * 2023-06-15 2023-09-15 中国航发四川燃气涡轮研究院 Aircraft engine material data maintenance management method
CN116450578A (en) * 2023-06-15 2023-07-18 中国航发四川燃气涡轮研究院 Aircraft engine material data maintenance management method
CN117520620A (en) * 2024-01-05 2024-02-06 中国电子科技集团公司第二十八研究所 Metadata-based automatic data resource association method and system
CN117520620B (en) * 2024-01-05 2024-03-19 中国电子科技集团公司第二十八研究所 Metadata-based automatic data resource association method and system

Also Published As

Publication number Publication date
CN111259006B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN111259006B (en) Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
US11113463B2 (en) Note browser
CN111274294B (en) Universal distributed heterogeneous data integrated logic convergence organization, release and service method and system
US9942121B2 (en) Systems and methods for ephemeral eventing
US10042862B2 (en) Methods and systems for connecting a social network to a geospatial data repository
US20200004727A1 (en) Suggesting content items to be accessed by a user
US8676001B2 (en) Automatic discovery of popular landmarks
US11798208B2 (en) Computerized systems and methods for graph data modeling
US20140195516A1 (en) Systems and methods for presenting content items in a collections view
US7991767B2 (en) Method for providing a shared search index in a peer to peer network
WO2018036324A1 (en) Smart city information sharing method and device
CN106294695A (en) A kind of implementation method towards the biggest data search engine
US11216516B2 (en) Method and system for scalable search using microservice and cloud based search with records indexes
US20150169207A1 (en) Systems and methods for generating personalized account reconfiguration interfaces
US10152538B2 (en) Suggested search based on a content item
EP2680174A1 (en) A method, a server, a system and a computer program product for copying data from a source server to a target server
US9870422B2 (en) Natural language search
RU2019100812A (en) Information retrieval method and corporate information retrieval system
CN111444694B (en) Universal information resource customized collection and release method
AU2017202664A1 (en) A method and system for integrating a social network and data repository to enable map creation
CN115617865A (en) Enterprise document knowledge platform based on B/S architecture
CN117453630A (en) File path checking method and device, electronic equipment and readable storage medium
Al Azad Big Data Analytics: Performance Analysis of NoSQL Databases and Hadoop Ecosystem
Guo et al. Research on Information Description Mechanisms for Equipment Maintenance Support Resource Based on Metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant