CN111737364A - Safe multi-party data fusion and federal sharing method, device, equipment and medium - Google Patents

Safe multi-party data fusion and federal sharing method, device, equipment and medium Download PDF

Info

Publication number
CN111737364A
CN111737364A CN202010708220.5A CN202010708220A CN111737364A CN 111737364 A CN111737364 A CN 111737364A CN 202010708220 A CN202010708220 A CN 202010708220A CN 111737364 A CN111737364 A CN 111737364A
Authority
CN
China
Prior art keywords
data
target
field
database
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010708220.5A
Other languages
Chinese (zh)
Other versions
CN111737364B (en
Inventor
李宏宇
李晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bodun Xiyan Technology Co.,Ltd.
Original Assignee
Tongdun Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongdun Holdings Co Ltd filed Critical Tongdun Holdings Co Ltd
Priority to CN202010708220.5A priority Critical patent/CN111737364B/en
Publication of CN111737364A publication Critical patent/CN111737364A/en
Application granted granted Critical
Publication of CN111737364B publication Critical patent/CN111737364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The invention discloses a safe multi-party data fusion and federal sharing method, which relates to the technical field of computers and comprises the following steps: obtaining a plurality of corresponding data sources according to the acquired data integration requirements; determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field according to the first data mapping table, combining a corresponding relation between each data table field and metadata in a plurality of data sources, loading original data described by the metadata corresponding to the target field from a database of the plurality of data sources, and storing the original data as first target data to a memory where the plurality of data sources are located to generate a first data warehouse. The method can realize the fusion of multi-party data in logic, is favorable for improving the data security, has high efficiency, can adapt to the diverse data integration requirements, and saves the storage resources. The invention also discloses a safe multi-party data fusion and federal sharing device, electronic equipment and a computer storage medium.

Description

Safe multi-party data fusion and federal sharing method, device, equipment and medium
Technical Field
The invention relates to the technical field of computers, in particular to a safe multi-party data fusion and federal sharing method and device, electronic equipment and a storage medium.
Background
With the development of big data technology, data from different data sources are often required to be modeled and analyzed according to data application purposes, the different data sources are often respectively used for data statistics, data acquisition standards are not uniform, so that the data are different at the beginning of construction, data resultant force is difficult to form, and the different data application purposes are different in the required data sources. For example, the outbreak of pneumonia infected by the novel coronavirus brings great challenges to public health management and epidemic prevention and control, particularly in the period of traditional spring festival in China, the mobility of people is high, so that the risk of epidemic propagation is very high, the information of people in an epidemic area can be accurately mastered by using a big data technology, the risk identification and the household isolation measures of various groups can be directionally guided, and even the outbreak is used for epidemic research and judgment and trend analysis.
The traditional data integration method is mainly characterized in that data of a plurality of data sources are uniformly stored on one server to realize physical data collection, the efficiency is low, the variable data application purpose is difficult to meet, and after personal information (such as medical information, public transportation information and family relation information) of all users in different fields is collected and stored on one server, once the information is stolen and leaked, the personal privacy and the personal enterprise interests are damaged, and the national interests and the national security are seriously damaged even greatly. Therefore, it is necessary to realize the logically secure fusion of data of different data sources according to the data application purpose, rather than the physical data integration.
Disclosure of Invention
In order to overcome the defects of the prior art, an object of the present invention is to provide a secure multi-party data fusion and federation sharing method, which loads first target data from a database of multiple data sources corresponding to a data integration requirement according to a first data integration view and a first data mapping table generated in advance, and correspondingly stores the first target data in memories of the multiple data sources to obtain a first data warehouse, so as to implement logical fusion of multi-party data, and the method is secure and reliable and can adapt to diverse data integration requirements.
One of the purposes of the invention is realized by adopting the following technical scheme:
acquiring data integration requirements;
obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, and the database comprises metadata describing original data and data table fields corresponding to the metadata;
loading first target data from a database of the plurality of data sources based on a first pre-stored data integration view and a first data mapping table, comprising: determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of the plurality of data sources according to the first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation between each data table field and the metadata in the database of the plurality of data sources and the mapping relation, and taking original data described by the metadata corresponding to the target field as first target data;
and storing the first target data to the memories of the plurality of data sources to generate a first data warehouse.
Further, the first data integration view and the first data mapping table are generated by:
carrying out standardization processing on each data table field which represents the same meaning to obtain a candidate field which has the same meaning with each data table field;
composing the first data integration view from a plurality of candidate fields representing different meanings;
and establishing a mapping relation between each data table field which represents the same meaning in the database of the plurality of data sources and each candidate field which represents the same meaning in the data table field, and generating the first data mapping table.
Further, determining a target field from a plurality of candidate fields in the first data integration view includes: and inquiring whether metadata represented by data table fields with mapping relation with the candidate fields exists in a database of the plurality of data sources, and if so, taking the candidate fields as target fields.
Further, recording original data loaded from a database of each data source as sub-target data, wherein the first target data comprises sub-target data correspondingly loaded from the database of each data source; storing the first target data to the memories where the plurality of data sources are located to generate a first data warehouse, including:
storing the sub-target data to a memory where a corresponding data source is located;
and the first data warehouse is formed by a plurality of sub-target data correspondingly stored in the memories of the data sources.
Further, still include:
acquiring a new data integration requirement;
obtaining a corresponding new data source according to the new data integration requirement;
obtaining a second data warehouse based on the new data source;
and associating the second data warehouse with the first data warehouse to obtain a new data warehouse.
Further, obtaining a second data warehouse based on the new data source comprises:
generating a second data integration view and a second data mapping table based on the new data source;
loading second target data from a database of the new data source based on the second data integration view and the second data mapping table;
storing the second target data to a memory where the new data source is located to generate a second data warehouse;
the second data mapping table and the first data mapping table contain identical candidate fields; associating the second data warehouse with the first data warehouse, including: taking the same candidate field as a same target field; and associating the second target data corresponding to the same target field with the first target data.
Further, still include:
acquiring a new data integration requirement;
obtaining new candidate fields according to the new data integration requirement and the plurality of data sources, including: querying the data sources according to the new data set requirement to obtain corresponding metadata; performing field analysis on the metadata to obtain the new candidate field, including: generating the new candidate fields from the data table fields representing the same meaning of the metadata;
adding the new candidate field to the first data integration view, updating the first data integration view and the first data mapping table;
loading third target data from the database of the plurality of data sources based on the updated first data integration view and the first data mapping table;
and storing the third target data to the memories where the plurality of data sources are located, and forming a new data warehouse by the third target data and the first target data.
The second objective of the present invention is to provide a secure multi-party data fusion and federation sharing apparatus, which loads first target data from a database of multiple data sources corresponding to data integration requirements according to a first data integration view and a first data mapping table generated in advance, and correspondingly stores the first target data in memories of the multiple data sources to obtain a first data warehouse, so as to implement logical fusion of multi-party data, and is secure and reliable, and can adapt to diverse data integration requirements.
The second purpose of the invention is realized by adopting the following technical scheme:
a secure multi-party data fusion and federation sharing apparatus, comprising:
the integrated demand acquisition module is used for acquiring data integrated demands; obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, and the database comprises metadata describing original data and data table fields corresponding to the metadata;
the intelligent data loading module is used for loading first target data from the databases of the multiple data sources based on a first data integration view and a first data mapping table which are stored in advance, and comprises the following steps: determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of the plurality of data sources according to the first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation between each data table field and the metadata in the database of the plurality of data sources and the mapping relation, and taking original data described by the metadata corresponding to the target field as first target data;
and the data dynamic storage module is used for storing the first target data to the memories of the plurality of data sources to generate a first data warehouse.
It is a further object of the present invention to provide an electronic device for performing one of the objects of the present invention, comprising a processor, a storage medium and a computer program, the computer program being stored in the storage medium, the computer program being executed by the processor for performing the secure multi-party data fusion and federal sharing method of one of the objects of the present invention.
It is a further object of the present invention to provide a computer readable storage medium storing one of the objects of the invention, having a computer program stored thereon, which when executed by a processor, implements a secure multi-party data fusion and federal sharing method for one of the objects of the invention.
Compared with the prior art, the invention has the beneficial effects that:
according to the data integration method, the first target data are loaded from the databases of the corresponding data sources and are correspondingly stored in the memories of the data sources to obtain the first data warehouse, so that fusion of multi-party data in logic instead of simple physical data collection can be realized under the condition that the data sources are not affected, the efficiency is high, the data integration method can meet the diverse data integration requirements, and the distributed data storage mode is favorable for improving data safety and saving storage resources.
Drawings
FIG. 1 is a flow chart of a secure multi-party data fusion and federation sharing method according to a first embodiment of the present invention;
fig. 2 is a block diagram of a secure multi-party data fusion and federation sharing apparatus according to a sixth embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to a seventh embodiment of the present invention.
Detailed Description
The present invention will now be described in more detail with reference to the accompanying drawings, in which the description of the invention is given by way of illustration and not of limitation. The various embodiments may be combined with each other to form other embodiments not shown in the following description.
Example one
The embodiment provides a safe multi-party data fusion and federal sharing method, which aims to dynamically load first target data from a database of a plurality of data sources corresponding to data integration requirements according to a first data integration view and a first data mapping table which are generated in advance, and correspondingly store the first target data in memories of the plurality of data sources to obtain a first data warehouse, so that the safe multi-party data fusion is realized logically without influencing the data sources, and the data integration is not simple in physical data collection, so that the efficiency is high, the method can adapt to diverse data integration requirements, and is favorable for improving the data safety.
Referring to fig. 1, a secure multi-party data fusion and federation sharing method includes the following steps:
and S10, acquiring data integration requirements.
The data integration requirement refers to determining data required by modeling according to different data application purposes. Different data integration requirements correspond to different data required by modeling so as to meet the requirements of different data analysis models for training and learning.
And S20, obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, the database refers to a relational database generated after strict definition, and the relationship refers to the relationship among entities and attributes in the database and is generally suitable for storing stable and durable data. The database contains metadata describing the raw data and corresponding data table fields representing the meaning of the metadata. Metadata is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like. The data table field refers to a field contained in a data table in the database, and can represent the meaning of metadata. The data source may be any database owner that can provide the data needed for modeling.
S30, loading the first target data from the database of the plurality of data sources based on the first data integration view and the first data mapping table which are stored in advance. Determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of a plurality of data sources according to a first data mapping table, obtaining metadata corresponding to the target field according to the corresponding relation and the mapping relation between each data table field and the metadata in the database of the plurality of data sources, and taking original data described by the metadata as first target data. The data integration view is a transformation of metadata in a plurality of data sources, is another mode for viewing the metadata in each data source, and can help a user to perform data screening and loading. The data mapping table represents the mapping relation between each data table field in the database of each data source and each candidate field in the data integration view.
And S40, storing the first target data to the memories of the plurality of data sources to generate a first data warehouse. The first target data contained in the first data warehouse is distributed and stored in the memories where the plurality of data sources are located, the safe multi-party data are logically fused, data migration and collection are not simply carried out physically, data transmission and collection processing on the physics are reduced, the integration efficiency is effectively improved, and meanwhile the data safety of each data source can be improved through a distributed storage mode.
According to the safe multi-party data fusion and federal sharing method, a data source can be changed correspondingly according to different data integration requirements, and first target data dynamically loaded from the data source are changed accordingly, so that the first target data meet the requirements of the different data integration requirements on data required by modeling to adapt to diverse data integration requirements, redundant data which cannot be applied to the data integration requirements cannot be generated in a first data warehouse, resource waste caused by storage of the redundant data can be reduced, and storage resources are saved.
Example two
The second embodiment is an improvement on the first embodiment, and the first data integration view and the first data mapping table are generated in advance according to the multiple data sources, so that the consistency of target data loaded from different data sources is ensured, and the accuracy of safe multi-party data fusion is improved. The first data integration view and the first data mapping table are generated by the following method, including:
carrying out standardization processing on each data table field which represents the same meaning to obtain a candidate field which has the same meaning with each data table field;
constructing a first data integration view from a plurality of candidate fields representing different meanings;
and establishing a mapping relation between each data table field which represents the same meaning in the database of the plurality of data sources and each candidate field which represents the same meaning in each data table field, and generating the first data mapping table.
Specifically, field analysis is performed on metadata in each data source, and a corresponding data view is generated according to metadata description, that is, each data source is composed of data table fields representing metadata to form one data view. The data view is a transformation of metadata in a data source, and is another way for viewing the metadata, and an application corresponding to a data integration requirement can be built on the data view by using the data view, so that the application and a database table are separated by the data view. Each data source correspondingly generates a data view, a plurality of data view fields which represent data tables with the same meaning are combined to obtain a first data integration view, a plurality of candidate fields in the first data integration view correspond to data required by modeling, the integration view is generated through an interactive mode program, and the names of the candidate fields can be defined according to actual requirements along with the change of application purposes. For example, the data table fields representing Gender such as "Sex" and "Xingbie" are standardized to obtain a candidate field "Gender" representing Gender. The first data integration view is convenient to visually check and select, and can help a user to perform data screening and loading, so that the addition and modification of a data source are simple, and the data integration efficiency is improved.
The first data mapping table is a mapping relationship between data table fields of the data sources and candidate fields in the first data integration view. When the data table fields in the data source and the candidate fields in the first data integration view represent the same meaning, the data table fields need to be mapped into corresponding candidate fields, for example, each data table field representing Gender, such as "six" and "Xingbie" are mapped into corresponding candidate fields representing Gender, such as "Gender". When creating a data view or loading target data, the first data mapping table may ensure that the target data loaded from different data sources are consistent, so as to improve the accuracy of secure multiparty data fusion.
EXAMPLE III
The third embodiment is an improvement on the second embodiment, and the target field is determined according to the actual existence condition of the metadata in the database of each data source, so that the dynamic loading of the first target data is realized, the actual existence condition of the metadata in the database of each data source is met, and the integration accuracy is effectively ensured.
Determining a target field from a plurality of candidate fields in a first data integration view, comprising: and inquiring whether metadata represented by the data table fields with mapping relation with the candidate fields exists in a database of a plurality of data sources, and if so, taking the candidate fields as target fields.
Specifically, each data table field having a corresponding mapping relation with each candidate field can be obtained according to the first data mapping table, if metadata represented by each data table field having a mapping relation with each candidate field exists in the database of the data source, each candidate field in the first data integration view is used as target data, and at this time, the plurality of data sources can provide all data required for modeling; and if the metadata represented by the data table fields with the mapping relation with the candidate fields cannot be queried in the database, the candidate fields with the query structure of negative are not taken as target fields, and at the moment, the plurality of data sources can only provide partial data in the data required by modeling. The metadata which can be inquired is dynamically loaded from the plurality of data sources to serve as the first target data, the actual existence condition of the metadata in the database of each data source is met, namely the actual existence condition of the original data in each data source is met, the integration accuracy is effectively guaranteed, the redundant data which is not needed by modeling can not be loaded, the integration efficiency can be improved, the resource waste caused by the storage of the redundant data is reduced, and the storage resources are saved.
Preferably, the first target data includes a plurality of data with different structures and formats, and after the first target data is extracted from a plurality of data sources, the plurality of data in the first target data are standardized and subjected to data alignment processing to obtain the first target data with uniform structures and formats, so that the data fusion efficiency is integrally improved.
Specifically, the standardization includes standardization of functional formulas such as Min-max standardization and z-score standardization, and standardization of data formats, metrics and modes such as recording fields representing distances in a database of different data sources in units of km or m, and unifying different date formats such as 2020/3/12 or 3/12/2020. The normalization process is not limited to the above-mentioned manner. And then, a plurality of data in the first target data are aligned by adopting standard user id coding, and finally, the first target data are unified into a standard format for use.
In some other embodiments, the original data loaded separately from the database of each data source is recorded as sub-target data, the first target data includes sub-target data loaded correspondingly from the database of each data source, and the sub-target data is stored in the memory of the corresponding data source, that is, the sub-target data loaded from one data source is stored in the memory of the data source. The first data warehouse is formed by a plurality of sub-target data correspondingly stored on the memory where the plurality of data sources are located, so that the first target data contained in the first data warehouse is still stored on the memory where the original data source is located, data safety can be protected, data storage and distribution work is not needed, and efficiency is further improved.
Example four
The fourth embodiment is an improvement on the second and/or third embodiments, and the second data warehouse is newly built according to the new data integration requirement and is associated with the first data warehouse to obtain the new data warehouse, so that the method can adapt to diverse data integration requirements, and is high in efficiency. The safe multi-party data fusion and federal sharing method further comprises the following steps:
acquiring a new data integration requirement;
obtaining a corresponding new data source according to the new data integration requirement;
obtaining a second data warehouse based on the new data source;
and associating the second data warehouse with the first data warehouse to obtain a new data warehouse.
The original data sources and the new data sources provide data required by modeling together, a second data warehouse is generated according to the new data sources, and the second data warehouse is associated with the original first data warehouse, so that the new data warehouse can be obtained, and the method can quickly adapt to diversified data integration requirements. When data application corresponding to the new data integration requirement is carried out, the required new data can be obtained from the second data warehouse, the data can also be directly obtained from the original first data warehouse, physical data loading and collection do not need to be carried out from the inquiry of a new data source and the original multiple data sources, and the efficiency is higher.
Preferably, the second data integration view and the second data mapping table are generated based on the new data source, the second target data is loaded from the database of the new data source based on the second data integration view and the second data mapping table, and the second target data is stored in the memory where the new data source is located to generate the second data warehouse, so that the security of the second target data can be effectively protected. The second data mapping table and the first data mapping table contain the same candidate field (such as the same user id field), the same candidate field is used as the same target field, and the second target data corresponding to the same target field is associated with the first target data.
According to the mapping result between the same target field and each data table field in the original multiple data sources and the corresponding relationship between each data table field and the metadata in the databases of the multiple data sources, the metadata corresponding to the same target field in the multiple data sources and the original data described by the metadata can be obtained, namely the first target data corresponding to the same target field can be obtained, and similarly, according to the mapping result between the same target field and each data table field in the new data source and the corresponding relationship between each data table field and the metadata in the database of the new data source, the metadata corresponding to the same target field in the new data source and the original data described by the metadata can be obtained, namely the second target data corresponding to the same target field can be obtained, and the first target data and the second target data can be associated through the same target field, the integration of the second target data and the first target data on logic can be realized, the efficiency is high, the second data warehouse and the first data warehouse can be used in a correlation mode, and the new data integration requirement can be quickly met.
In other embodiments, different data integration requirements can share the existing data warehouse in a federal mode, target data in the existing data warehouse does not need to be repeatedly integrated, and the on-demand multi-party data fusion method can quickly adapt to diverse data integration requirements and is high in efficiency.
The specific application of the fourth method of the present embodiment can be referred to the following description. For example, the data fusion and federal sharing method can adapt to the diverse data integration requirements during epidemic situations, and provides support for applications such as epidemic situation data analysis and query.
In the new crown epidemic situation prediction process, three kinds of crowd data of susceptible people, infected people and removed people which dynamically change every day are needed to train a simple dynamic model, such as an SIR model. At the moment, the data integration demand corresponds to data required for predicting epidemic situation development trend, and a plurality of hospitals or related institutions in different regions are used as a plurality of epidemic situation data sources. And generating an epidemic situation data integrated view and an epidemic situation data mapping table in advance according to the plurality of epidemic situation data sources. Based on the epidemic situation data integrated view and the epidemic situation data mapping table, target epidemic situation data are dynamically loaded from a plurality of epidemic situation data sources, the target epidemic situation data are stored in the plurality of epidemic situation data sources, target epidemic situation data which are distributed and stored in the plurality of epidemic situation data sources form an epidemic situation data warehouse, and fusion of safe multi-party data is logically achieved.
Along with the change of the epidemic situation, for example, viruses are spread along with the flow of a large number of people in the spring festival, the influence of dynamic migration change of people on the trend of the epidemic situation needs to be considered, a simple dynamic model cannot support accurate prediction of the development trend of the epidemic situation, at the moment, new data needs to be used for training a correspondingly improved dynamic model to acquire the new data integration requirement, traffic systems such as aviation and railways are used as traffic data sources and generate traffic data integration views and traffic data mapping tables according to the new data integration requirement to obtain traffic data warehouses comprising crowd moving data among different cities, target fields of the traffic data in the traffic data warehouses are user ids, train numbers and seats, target fields of the epidemic situation data in the epidemic situation data warehouses are user ids and new crown diagnosis time, the traffic data mapping tables and the epidemic situation data mapping tables both contain the same candidate fields, namely the user ids, the traffic data and the epidemic situation data are linked through the user id, so that the traffic data warehouse is associated with the epidemic situation data warehouse, and the traffic data warehouse and the epidemic situation data warehouse can be used for training and learning of a new model, and further can meet the epidemic situation trend prediction application under the condition that a large number of people flow. Once the trend prediction application is finished, the corresponding data warehouse is automatically cleared.
And if epidemic propagation path tracking is required, individual social relationship data is required, an operator or a social platform is used as a social data source to generate a social data warehouse comprising individual movement track data, the social data warehouse and a traffic data warehouse are intelligently integrated, the integrated data warehouse can be specially used for propagation path tracking service and can also support other corresponding applications, and each data warehouse can be automatically cleaned after the application service is finished.
EXAMPLE five
The fifth embodiment is an improvement on the second embodiment and/or the third embodiment, and is high in efficiency, wherein new candidate fields are obtained according to the new data integration requirements and the plurality of data sources, and the data integration view and the data mapping table are updated according to the new candidate fields, so that a new data warehouse is obtained to adapt to diverse data integration requirements. The safe multi-party data fusion and federal sharing method further comprises the following steps:
acquiring a new data integration requirement;
obtaining new candidate fields according to the new data integration requirement and the plurality of data sources, comprising: inquiring a plurality of data sources according to the requirements of the new data set to obtain corresponding metadata; carrying out field analysis on the metadata to obtain a new candidate field;
adding the new candidate field into the first data integration view, and updating the first data integration view and the first data mapping table;
loading third target data from a database of a plurality of data sources based on the updated first data integration view and the first data mapping table;
and storing the third target data to the memories of the plurality of data sources, and forming a new data warehouse by the third target data and the first target data.
The method comprises the steps that original multiple data sources provide first target data of original data integration requirements, and also comprise other data which are not needed by the original data integration requirements, new data needed by modeling of the new data integration requirements can be obtained from the original multiple data sources, the multiple data sources are inquired according to the new data needed by modeling to obtain corresponding metadata, field analysis is carried out on the metadata obtained by inquiry, and new candidate fields which have the same meaning as that of the data table fields are generated through the data table fields representing the meanings of the metadata. The new candidate fields are added into the data integration view, the first data integration view and the first data mapping table are updated, third target data are further loaded from the original multiple data sources, the third target data are still stored in memories of the original multiple data sources, and a new data warehouse is formed by the third target data and the first target data, so that the new data integration requirement is quickly adapted, and the integration efficiency is high.
In other embodiments, a data application task stop instruction is received, and the data warehouse is automatically cleared. The life cycle of the data warehouse is dynamic, and once the supported data application task is cancelled, the data warehouse is correspondingly cleared, so that the storage resources are further saved.
EXAMPLE six
The sixth embodiment discloses a secure multi-party data fusion and federal sharing device corresponding to the sixth embodiment, which is a virtual device structure of the sixth embodiment, and the device can realize the logical fusion of secure multi-party data and adapt to diverse data integration requirements. Referring to fig. 2, the method includes:
an integration requirement acquisition module 210, configured to acquire a data integration requirement; obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database which comprises metadata describing original data and corresponding data table fields representing the meaning of the metadata;
the data intelligent loading module 220 is configured to load first target data from a database of multiple data sources based on a first data integration view and a first data mapping table stored in advance, and includes: determining a target field from a plurality of candidate fields in a first data integration view, obtaining a mapping relation between the target field and each data table field in a database of a plurality of data sources according to a first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation and a mapping relation between each data table field and the metadata in the database of the plurality of data sources, and taking original data described by the metadata corresponding to the target field as first target data;
the data dynamic storage module 230 is configured to store the first target data in the memory where the plurality of data sources are located to generate a first data warehouse.
EXAMPLE seven
Fig. 3 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention, as shown in fig. 3, the electronic device includes a processor 310, a memory 320, an input device 330, and an output device 340; the number of the processors 310 in the computer device may be one or more, and one processor 310 is taken as an example in fig. 3; the processor 310, the memory 320, the input device 330 and the output device 340 in the electronic apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 3.
The memory 320 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the secure multi-party data fusion and federal sharing method in the embodiment of the present invention (for example, the integrated requirement acquisition module 210, the intelligent data loading module 220, and the dynamic data storage module 230 in the secure multi-party data fusion and federal sharing device). The processor 310 executes various functional applications and data processing of the electronic device by running software programs, instructions and modules stored in the memory 320, that is, the secure multi-party data fusion and federal sharing method of the first to fifth embodiments is implemented.
The memory 320 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 320 may further include memory located remotely from the processor 310, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 330 may be used to receive data integration requirements, etc. The output device 340 may include a display device such as a display screen.
Example eight
An eighth embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions, when executed by a computer processor, are configured to perform a secure multi-party data fusion and federation sharing method, where the method includes:
acquiring data integration requirements;
obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, and the database comprises metadata describing original data and data table fields corresponding to the metadata;
loading first target data from a database of the plurality of data sources based on a first pre-stored data integration view and a first data mapping table, comprising: determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of the plurality of data sources according to the first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation between each data table field and the metadata in the database of the plurality of data sources and the mapping relation, and taking original data described by the metadata corresponding to the target field as first target data;
and storing the first target data to the memories of the plurality of data sources to generate a first data warehouse.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the secure multi-party data fusion and federal sharing method provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling an electronic device (which may be a mobile phone, a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It is to be noted that, in the embodiment of the secure multiparty data fusion and federal sharing device, each unit and each module included in the embodiment are only divided according to functional logic, but are not limited to the above division, as long as corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims (10)

1. A safe multi-party data fusion and federal sharing method is characterized in that: the method comprises the following steps:
acquiring data integration requirements;
obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, and the database comprises metadata describing original data and data table fields corresponding to the metadata;
loading first target data from a database of the plurality of data sources based on a first pre-stored data integration view and a first data mapping table, comprising: determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of the plurality of data sources according to the first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation between each data table field and the metadata in the database of the plurality of data sources and the mapping relation, and taking original data described by the metadata corresponding to the target field as first target data;
and storing the first target data to the memories of the plurality of data sources to generate a first data warehouse.
2. The secure multi-party data fusion and federation sharing method of claim 1, wherein: the first data integration view and the first data mapping table are generated by:
carrying out standardization processing on each data table field which represents the same meaning to obtain a candidate field which has the same meaning with each data table field;
composing the first data integration view from a plurality of candidate fields representing different meanings;
and establishing a mapping relation between each data table field which represents the same meaning in the database of the plurality of data sources and each candidate field which represents the same meaning in the data table field, and generating the first data mapping table.
3. The secure multi-party data fusion and federation sharing method of claim 2, wherein: determining a target field from a plurality of candidate fields in the first data integration view, including: and inquiring whether metadata represented by data table fields with mapping relation with the candidate fields exists in a database of the plurality of data sources, and if so, taking the candidate fields as target fields.
4. The secure multi-party data fusion and federation sharing method of claim 3, wherein: recording original data loaded from a database of each data source as sub-target data, wherein the first target data comprises the sub-target data correspondingly loaded from the database of each data source; storing the first target data to the memories where the plurality of data sources are located to generate a first data warehouse, including:
storing the sub-target data to a memory where a corresponding data source is located;
and the first data warehouse is formed by a plurality of sub-target data correspondingly stored in the memories of the data sources.
5. The secure multi-party data fusion and federation sharing method of claim 2, wherein: further comprising:
acquiring a new data integration requirement;
obtaining a corresponding new data source according to the new data integration requirement;
obtaining a second data warehouse based on the new data source;
and associating the second data warehouse with the first data warehouse to obtain a new data warehouse.
6. The secure multi-party data fusion and federation sharing method of claim 5, wherein: obtaining a second data warehouse based on the new data source, including:
generating a second data integration view and a second data mapping table based on the new data source;
loading second target data from a database of the new data source based on the second data integration view and the second data mapping table;
storing the second target data to a memory where the new data source is located to generate a second data warehouse;
the second data mapping table and the first data mapping table contain identical candidate fields; associating the second data warehouse with the first data warehouse, including: taking the same candidate field as a same target field; and associating the second target data corresponding to the same target field with the first target data.
7. The secure multi-party data fusion and federation sharing method of claim 2, wherein: further comprising:
acquiring a new data integration requirement;
obtaining new candidate fields according to the new data integration requirement and the plurality of data sources, including: querying the data sources according to the new data set requirement to obtain corresponding metadata; performing field analysis on the metadata to obtain the new candidate field, including: generating the new candidate fields from the data table fields representing the same meaning of the metadata;
adding the new candidate field to the first data integration view, updating the first data integration view and the first data mapping table;
loading third target data from the database of the plurality of data sources based on the updated first data integration view and the first data mapping table;
and storing the third target data to the memories where the plurality of data sources are located, and forming a new data warehouse by the third target data and the first target data.
8. A safe multi-party data fusion and federal sharing device is characterized by comprising;
the integrated demand acquisition module is used for acquiring data integrated demands; obtaining a plurality of corresponding data sources according to the data integration requirement, wherein each data source is provided with a database, and the database comprises metadata describing original data and data table fields corresponding to the metadata;
the intelligent data loading module is used for loading first target data from the databases of the multiple data sources based on a first data integration view and a first data mapping table which are stored in advance, and comprises the following steps: determining a target field from a plurality of candidate fields in the first data integration view, obtaining a mapping relation between the target field and each data table field in a database of the plurality of data sources according to the first data mapping table, obtaining metadata corresponding to the target field according to a corresponding relation between each data table field and the metadata in the database of the plurality of data sources and the mapping relation, and taking original data described by the metadata corresponding to the target field as first target data;
and the data dynamic storage module is used for storing the first target data to the memories where the plurality of data sources are located to generate a first data warehouse.
9. An electronic device comprising a processor, a storage medium, and a computer program stored in the storage medium, wherein the computer program, when executed by the processor, performs the secure multi-party data fusion and federation sharing method of any one of claims 1 to 7.
10. A computer storage medium having a computer program stored thereon, characterized in that: the computer program when executed by a processor implements the secure multi-party data fusion and federal sharing method of any of claims 1 to 7.
CN202010708220.5A 2020-07-22 2020-07-22 Safe multi-party data fusion and federal sharing method, device, equipment and medium Active CN111737364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010708220.5A CN111737364B (en) 2020-07-22 2020-07-22 Safe multi-party data fusion and federal sharing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010708220.5A CN111737364B (en) 2020-07-22 2020-07-22 Safe multi-party data fusion and federal sharing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN111737364A true CN111737364A (en) 2020-10-02
CN111737364B CN111737364B (en) 2020-12-11

Family

ID=72656265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010708220.5A Active CN111737364B (en) 2020-07-22 2020-07-22 Safe multi-party data fusion and federal sharing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111737364B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199434A (en) * 2020-11-17 2021-01-08 平安数字信息科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN113990068A (en) * 2021-10-27 2022-01-28 阿波罗智联(北京)科技有限公司 Traffic data processing method, device, equipment and storage medium
US11899680B2 (en) 2022-03-09 2024-02-13 Oracle International Corporation Techniques for metadata value-based mapping during data load in data integration job

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN106326245A (en) * 2015-06-19 2017-01-11 北京京东尚科信息技术有限公司 Hive data warehouse-based fast association realization method and device
CN106874437A (en) * 2017-02-04 2017-06-20 中国人民大学 The internal storage data warehouse ranks storage conversion implementation method of data base-oriented all-in-one
CN107609154A (en) * 2017-09-23 2018-01-19 浪潮软件集团有限公司 Method and device for processing multi-source heterogeneous data
CN111352982A (en) * 2018-12-24 2020-06-30 核工业计算机应用研究所 Manpower extraction analysis system based on big data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN106326245A (en) * 2015-06-19 2017-01-11 北京京东尚科信息技术有限公司 Hive data warehouse-based fast association realization method and device
CN106874437A (en) * 2017-02-04 2017-06-20 中国人民大学 The internal storage data warehouse ranks storage conversion implementation method of data base-oriented all-in-one
CN107609154A (en) * 2017-09-23 2018-01-19 浪潮软件集团有限公司 Method and device for processing multi-source heterogeneous data
CN111352982A (en) * 2018-12-24 2020-06-30 核工业计算机应用研究所 Manpower extraction analysis system based on big data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘小秋等: "基于HANA内存计算技术开展信息系统性能优化", 《大众用电》 *
潘霄等: "《电力信息安全工程技术实战指南》", 30 September 2016, 东北大学出版社 *
舒天然: "《我国中央银行流动性救助及其决策支持系统研究》", 31 August 2018, 西安交通大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199434A (en) * 2020-11-17 2021-01-08 平安数字信息科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN112199434B (en) * 2020-11-17 2023-09-19 深圳平安智汇企业信息管理有限公司 Data processing method, device, electronic equipment and storage medium
CN113990068A (en) * 2021-10-27 2022-01-28 阿波罗智联(北京)科技有限公司 Traffic data processing method, device, equipment and storage medium
CN113990068B (en) * 2021-10-27 2023-02-24 阿波罗智联(北京)科技有限公司 Traffic data processing method, device, equipment and storage medium
US11899680B2 (en) 2022-03-09 2024-02-13 Oracle International Corporation Techniques for metadata value-based mapping during data load in data integration job

Also Published As

Publication number Publication date
CN111737364B (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN111737364B (en) Safe multi-party data fusion and federal sharing method, device, equipment and medium
CN107315776B (en) Data management system based on cloud computing
CN109272155A (en) A kind of corporate behavior analysis system based on big data
CN106611246A (en) Integrated management system of land and resources
CN109684330A (en) User's portrait base construction method, device, computer equipment and storage medium
CN102882986A (en) One-stop cloud service system for intellectual property of internet of things
CN106202207A (en) A kind of index based on HBase ORM and searching system
CN104794151A (en) Spatial knowledge service system building method based on collaborative plotting technology
CN105556517A (en) Smart search refinement
US20190050435A1 (en) Object data association index system and methods for the construction and applications thereof
CN107766470B (en) Intelligent statistical method, intelligent statistical display method and device for data sharing
CN103390037A (en) Ten-thousand-person cooperation plotting method based on mobile terminal
CN111078980A (en) Management method, device, equipment and storage medium based on credit investigation big data
Sun et al. Wearable mobile internet devices involved in big data solution for education
CN114218291A (en) Portrait generation method, apparatus, device and storage medium based on target object
CN105677745A (en) General efficient self-service data search system and implementation method
CN109753541A (en) A kind of relational network construction method and device, computer readable storage medium
CN115168474B (en) Internet of things central station system building method based on big data model
CN116028467A (en) Intelligent service big data modeling method, system, storage medium and computer equipment
CN115221337A (en) Data weaving processing method and device, electronic equipment and readable storage medium
CN112347314B (en) Data resource management system based on graph database
Büscher et al. VPI-FP: an integrative information system for factory planning
CN110389944B (en) Metadata management system and method based on model
CN113157795A (en) Power grid regulation and control operation multi-source data modeling and management system suitable for mobile application
CN111143328A (en) Agile business intelligent data construction method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210924

Address after: 311121 room 210, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Bodun Xiyan Technology Co.,Ltd.

Address before: Room 704, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: TONGDUN HOLDINGS Co.,Ltd.

TR01 Transfer of patent right