CN114416705A - Multi-source heterogeneous data fusion modeling method - Google Patents

Multi-source heterogeneous data fusion modeling method Download PDF

Info

Publication number
CN114416705A
CN114416705A CN202111318577.3A CN202111318577A CN114416705A CN 114416705 A CN114416705 A CN 114416705A CN 202111318577 A CN202111318577 A CN 202111318577A CN 114416705 A CN114416705 A CN 114416705A
Authority
CN
China
Prior art keywords
data
modeling method
source
source heterogeneous
hbase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111318577.3A
Other languages
Chinese (zh)
Inventor
李忱
陈忠国
周鑫
江何
门殿春
孟繁荣
姚志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Testor Technology Co ltd
Beijing Tongtech Co Ltd
Original Assignee
Beijing Testor Technology Co ltd
Beijing Tongtech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Testor Technology Co ltd, Beijing Tongtech Co Ltd filed Critical Beijing Testor Technology Co ltd
Priority to CN202111318577.3A priority Critical patent/CN114416705A/en
Publication of CN114416705A publication Critical patent/CN114416705A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a multi-source heterogeneous data fusion modeling method, in particular to the technical field of manufacturing heterogeneous data processing, the invention provides complete JPA support by using a Hibernate ORM core, realizes faster and unified reading and writing of a plurality of databases of different types, simultaneously adopts data description corresponding to different types of original data, adopts protocol analysis rules, realizes decision-level fusion modeling of key characteristic data in the data fusion modeling process, extracts data based on the two-dimensional relationship of a plurality of protocol analysis engines and the data, realizes decision-level fusion of key characteristics on the data of different types, improves fault tolerance and interference resistance, and compensates the influence of low accuracy caused by low data precision of a decision-level modeling mode by the two-dimensional relationship of a plurality of protocol analysis engines and metadata aiming at the traditional mode synchronously, the method realizes a quick and accurate decision modeling mode.

Description

Multi-source heterogeneous data fusion modeling method
Technical Field
The invention relates to the technical field of manufacturing heterogeneous data processing, in particular to a multi-source heterogeneous data fusion modeling method.
Background
The multi-source heterogeneous data come from a plurality of data sources, including data sets collected by different database systems and different devices in work and the like. Different data sources are different in operating system and management system, different in storage mode and logic structure of data, different in generation time, use place, code protocol and the like of data, this results in a "multi-source" characterization of the data, which, as is currently the case in the manufacturing industry, particularly data generated during the manufacturing process of the product, the method not only has huge data volume, rich sources, various types and complex structure, but also has isomerism, distributivity and autonomy among data sources due to different sources, storage forms and the like of data among different departments and systems in the manufacturing industry, the data types not only comprise structured data such as digital and relational data, but also comprise unstructured data such as images and audios, the production data is subjected to modeling treatment after the whole process, so that the production data can be displayed more intuitively, and the decision-making deployment is facilitated.
Due to the multi-source characteristic of the data, the quality of the acquired data is difficult to guarantee in the data integration process, missing, wrong, inconsistent and other invalid data which do not meet the specification generally exist, and the formats of the data from different systems are not uniform, which bring difficulty to the effective analysis of the data, so that an efficient processing and integration means is adopted to improve the integration efficiency of various heterogeneous data, and for the decision-making modeling mode, the traditional multi-source heterogeneous data has data missing to a certain extent in the data fusion process, so that during feature extraction, the accuracy of the model is influenced, and the modeling content can not be controlled more accurately while the rapid decision-making is realized.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a multi-source heterogeneous data fusion modeling method, which realizes the more rapid and unified reading and writing of a plurality of databases of different types by using the core and complete JPA support of Hibernate ORM, ensures the stability and efficiency of the reading and writing process, improves the overall quality of data by adopting a data cleaning mode, and ensures the effective workload of the data conversion process, thereby achieving the effect of improving the real-time data processing speed and improving the data integration efficiency.
In order to achieve the purpose, the invention provides the following technical scheme: a multi-source heterogeneous data fusion modeling method comprises a data acquisition process, a data integration process and a data analysis process, and specifically comprises the following steps:
the method comprises the following steps: in the data acquisition process, the original data are accurately acquired in real time, an original data source is provided for the data integration stage, the data description is carried out on the original data source, and a corresponding multi-protocol analysis engine is established.
Step two: and the HBase and the NoSQL databases are used for carrying out distributed storage on the data from each subsystem according to various different data sources.
Step three: by loading Hibernate OGM and establishing a unified HBase and NoSQL database access model based on the Hibernate OGM, the two databases are read and written under the same frame according to a unified rule to complete integral data access.
Step four: for error data, a homogeneous mean interpolation mode is utilized, firstly, a standard deviation method of statistical analysis is utilized to identify the estimated error value, and the identified error data is eliminated, so that the data is screened.
Step five: after the data are cleaned, the data are subjected to screening processing conversion through Extract-Transform-Load, and then the data are loaded into a data warehouse model to be stored.
Step six: extracting and analyzing data in the data warehouse model by adopting an FP-Growth parallel algorithm, marking associated information, and importing the associated information into a corresponding modeling algorithm.
As a further scheme of the invention: the HBase and NoSQL databases in the second step can be replaced by any one of MySQL, Oracle, DB2, SQL Server and Redis, HBase, MongoDB and Neo4 j.
As a further scheme of the invention: the Extract-Transform-Load data warehouse technology in the step five comprises Datastage, Informatica and Kettle.
As a further scheme of the invention: and the distributed storage memory in the second step adopts an index structure based on a hash table, namely the hash table stores the position index of the data on the disk, and the disk stores the actual contents of the main key and the value.
As a further scheme of the invention: and when the data is screened in the fourth step, potential errors of the data are detected and repaired based on the consistency between the associated data for inconsistent data, so that the cleaning of the data of multiple data sources is completed.
As a further scheme of the invention: the original data source comprises various heterogeneous data information, and the data description of the original data source comprises the combined description of the extraction of key characteristic data and a protocol analysis rule.
As a further scheme of the invention: and the multiple protocol analysis engines are used for establishing a two-dimensional relationship after data analysis on the protocols configured in the data description by using monitoring, pulling and crawling modes of related protocols, storing the two-dimensional relationship into a message queue and sequentially storing the two-dimensional relationship into corresponding HBase and NoSQL databases in the message queue.
As a further scheme of the invention: and C, implementing a temporary storage strategy of the constant-capacity recycle bin on the error data cleared in the step four.
The invention has the beneficial effects that:
1. the invention provides complete JPA support by using the core of Hibernate ORM, realizes the more rapid unified reading and writing of a plurality of different types of databases, ensures the stability and the efficiency, simultaneously adopts a data cleaning mode to improve the overall quality of data and ensures the effective workload of a data conversion process, thereby achieving the effect of improving the real-time data processing speed and improving the efficiency of data integration, simultaneously adopts data description corresponding to different types of original data to realize the direct description aiming at the data characteristics, adopts a protocol analysis rule to realize the decision-level fusion modeling of key characteristic data in the process of data fusion modeling, and simultaneously extracts detailed data in a mode of monitoring, pulling and crawling through two-dimensional relations of a plurality of protocol analysis engines and data to realize the decision-level fusion of key characteristics of different types of data, the calculated amount is reduced to a certain extent, the fault tolerance and the anti-interference performance are improved, and the influence of low data precision of a decision-level modeling mode caused by low modeling accuracy is made up for the traditional mode through the two-dimensional relation between a plurality of protocol analysis engines and metadata, so that the quick and accurate decision-level modeling mode is realized.
2. The invention stores the data from each subsystem in a distributed way by using the HBase and the NoSQL database, adopts an index structure based on a hash table, namely the hash table stores the position index of the data on a disk, so that companies in a plurality of aggregation intervals can keep unchanged at the physical level, realizes the retrieval of multi-source heterogeneous data at the software level, is matched with and establishes a uniform HBase and NoSQL database access model, realizes the integrated retrieval authority and integration of the whole data, integrates the standard deviation method of statistical analysis to carry out error value estimation identification and the similar mean interpolation mode to process the error data, has more obvious effect on the quality improvement of the whole data, carries out further processing on the required data by the Extract-Transform-Load tool processing after the primary quality improvement, and carries out the overall identification processing of key data according to the key characteristics of the data in the process, and the unified data warehouse model is stored and called, so that the processing of modeling data is realized, the data access speed during modeling is increased, and the processing of dirty data in the data is guaranteed.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of the present invention;
FIG. 2 is a schematic block diagram of the system of the present invention;
FIG. 3 is a block diagram of the process of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
a multi-source heterogeneous data fusion modeling method comprises a data acquisition process, a data integration process and a data analysis process, and specifically comprises the following steps:
the method comprises the following steps: in the data acquisition process, the original data are accurately acquired in real time, an original data source is provided for the data integration stage, the data description is carried out on the original data source, and a corresponding multi-protocol analysis engine is established.
Step two: and the HBase and the NoSQL databases are used for carrying out distributed storage on the data from each subsystem according to various different data sources.
Step three: by loading Hibernate OGM and establishing a unified HBase and NoSQL database access model based on the Hibernate OGM, the two databases are read and written under the same frame according to a unified rule to complete integral data access.
Step four: and processing error data by using a homogeneous mean interpolation mode, firstly identifying the estimated error value by using a standard deviation method of statistical analysis, and clearing the identified error data to complete the screening of the data.
Step five: after the data are cleaned, the data are screened, processed and converted through an Extract-Transform-Load tool, and then the data are loaded into a data warehouse model to be stored.
Step six: extracting and analyzing data in the data warehouse model by adopting an FP-Growth parallel algorithm, marking associated information, and importing the associated information into a corresponding modeling algorithm.
By adopting the standard deviation method, the method can calculate the average number and standard deviation of a given sample, then determine the critical point for distinguishing the abnormal value, namely a plurality of standard deviation ranges from the average number, and then determine the value exceeding the defined lower limit and upper limit as the abnormal value, thereby realizing the identification of the error value, facilitating the cleaning of data, improving the data quality
In other embodiments, the HBase and NoSQL databases in step two may be replaced by any of MySQL, Oracle, DB2, SQL Server and Redis, HBase, MongoDB, Neo4 j. The HBase and NoSQL databases in the step two adopt a selectable and replaceable mode of various database types, so that the HBase and NoSQL databases can be suitable for data required to be stored in different manufacturing industries, and the most appropriate storage mode is selected, so that the wide compatibility of the HBase and NoSQL databases is improved.
In other embodiments, the Extract-Transform-Load data warehouse technique in step five includes Datastage, Informatica, and Kettle. The data is further screened and converted by adopting an Extract-Transform-Load mode, so that the data can be further processed on the basis of the fourth step, the quality of the data is further improved, and the processed data is loaded to the same data warehouse model, so that a good data integration effect can be guaranteed, the data in the modeling process can be directly read conveniently, and the modeling speed and quality are guaranteed.
In other embodiments, an index structure based on a hash table is adopted in the distributed storage memory in the second step, that is, the hash table stores the position index of the data on the disk, and the actual contents of the primary key and the value are stored on the disk. By adopting a distributed storage mode, the method can select the latest distribution based on different types of data, simultaneously unify data indexes, adopt a Hash storage engine, regularly merge old data or deletion operation, retain the latest data, simultaneously retain an index record on a disk, generate the index record when regularly merging, and directly reconstruct the index record in a memory when the disk is powered off so as to ensure the data security.
In other embodiments, the data screening in the fourth step is performed simultaneously, for inconsistent data, potential errors of the data are detected based on consistency between associated data, and repair is performed, so as to complete cleaning of data of multiple data sources. And judging and repairing possible errors based on the consistency between the associated data, so that the data can be matched and sorted in the fourth step, and the integration speed of the data is improved.
In other embodiments, the original data source includes a plurality of heterogeneous data information, and the data description of the original data source includes a combined description of extraction of key feature data and a protocol parsing rule. By adopting the matching of the key characteristic data and the protocol analysis rule, the original data can be simply represented through the key characteristic data, and meanwhile, the index of the two-dimensional relation is matched, so that the processing of the key characteristic data can be realized, in the modeling process, the introduction and completion of the original data are realized through the index, the quick achievement of decision-making modeling is realized, and the accurate modeling processing of data index perfection is realized subsequently.
In other embodiments, the multiple protocol parsing engines utilize monitoring, pulling and crawling modes of relevant protocols for protocols configured in the data description to establish a two-dimensional relationship after data parsing and store the two-dimensional relationship into a message queue, and sequentially store corresponding HBase and NoSQL databases in the message queue. After the data are analyzed, the two-dimensional relationship is established, so that the data indexing between the characteristic data and the original data can be completed in a monitoring, pulling and crawling mode.
In other embodiments, the error data cleared in the fourth step is implemented with a constant-capacity recycle bin temporary storage strategy, and by adopting the recycle bin temporary storage strategy, the error data can be temporarily stored when the recycle bin is used, and the recycle bin is cleared according to a time sequence after the capacity is full, so that the situation that the error deletion cannot be recovered is prevented, and the fault tolerance of the overall operation is improved.
Example 2:
a multi-source heterogeneous data fusion modeling method comprises a data acquisition process, a data integration process and a data analysis process, and specifically comprises the following steps:
the method comprises the following steps: in the data acquisition process, the original data are accurately acquired in real time, and an original data source is provided for the data integration stage.
Step two: and performing distributed storage on data from each subsystem by using HBase and NoSQL databases according to various different data sources, performing data description on an original data source, and establishing corresponding various protocol analysis engines.
Step three: and processing error data by using a homogeneous mean interpolation mode, firstly identifying the estimated error value by using a standard deviation method of statistical analysis, and clearing the identified error data to complete the data screening.
Step four: after the data are cleaned, the data are subjected to screening processing conversion through Extract-Transform-Load, and then the data are loaded into a data warehouse model to be stored.
Step five: extracting and analyzing data in the data warehouse model by adopting an FP-Growth parallel algorithm, marking associated information, and importing the associated information into a corresponding modeling algorithm.
The Extract-Transform-Load data warehouse technology in the step five comprises Datastage, Informatica and Kettle.
And adopting an index structure based on a hash table in the distributed storage memory in the second step, namely storing the position index of the data on the disk by the hash table, and storing the actual contents of the main key and the value on the disk.
And while screening the data in the fourth step, detecting potential errors of the data based on the consistency among the associated data for inconsistent data, and repairing to finish cleaning the data of multiple data sources.
Example 3:
a multi-source heterogeneous data fusion modeling method comprises a data acquisition process, a data integration process and a data analysis process, and specifically comprises the following steps:
the method comprises the following steps: in the data acquisition process, the original data are accurately acquired in real time, an original data source is provided for the data integration stage, the data description is carried out on the original data source, and a corresponding multi-protocol analysis engine is established.
Step two: and (3) performing distributed storage on data from each subsystem by using HBase and No SQL databases according to various different data sources.
Step three: by loading Hibernate OGM and establishing a uniform HBase and No SQL database access model based on the Hibernate OGM, the two databases are read and written under the same frame according to a uniform rule to complete integral data access.
Step four: and (4) screening, processing and converting the data through Extract-Transform-Load, and then loading the data into a data warehouse model for storage.
Step five: extracting and analyzing data in the data warehouse model by adopting an FP-Growth parallel algorithm, marking associated information, and importing the associated information into a corresponding modeling algorithm.
The HBase and NoSQL databases in the step two can be replaced by any one of MySQL, Oracle, DB2, SQL Server and Redis, HBase, MongoDB and Neo4 j.
The Extract-Transform-Load data warehouse technology in the step five comprises Datastage, Informatica and Kettle.
And adopting an index structure based on a hash table in the distributed storage memory in the second step, namely storing the position index of the data on the disk by the hash table, and storing the actual contents of the main key and the value on the disk.
In conclusion, the present invention: through the comparison of the embodiments, the Hibernate OGM and the storage mode of the distributed database can be matched with each other, so that the distributed database can play a role in more convenient data reading and storage, meanwhile, the uniformity of the data can be kept, the similar mean interpolation mode and the matched data cleaning and repairing based on the consistency between the associated data are matched, the quality of the data is improved, meanwhile, the high-efficiency fusion of the data can be guaranteed, and the direct reading of a modeling algorithm is facilitated.
The points to be finally explained are: although the present invention has been described in detail with reference to the general description and the specific embodiments, on the basis of the present invention, the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A multi-source heterogeneous data fusion modeling method is characterized by comprising the processes of data acquisition, data integration and data analysis, and specifically comprises the following steps:
the method comprises the following steps: in the data acquisition process, the original data are accurately acquired in real time, an original data source is provided for the data integration stage, the data description is carried out on the original data source, and a corresponding multi-protocol analysis engine is established;
step two: according to various different types of data sources, the HBase and the NoSQL database are used for carrying out distributed storage on data from each subsystem;
step three: the method comprises the steps that a Hibernate OGM is loaded, and a unified HBase and NoSQL database access model is established on the basis of the Hibernate OGM, so that the two databases read and write under the same frame according to a unified rule to complete integral data access;
step four: processing error data by using a homogeneous mean interpolation mode, firstly identifying the estimated error value by using a standard deviation method of statistical analysis, and clearing the identified error data to complete the screening of the data;
step five: after the data are cleaned, the data are screened, processed and converted through an Extract-Transform-Load tool and then loaded into a data warehouse model for storage;
step six: extracting and analyzing data in the data warehouse model by adopting an FP-Growth parallel algorithm, marking associated information, and importing the associated information into a corresponding modeling algorithm.
2. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: the HBase and NoSQL databases in the second step can be replaced by any one of MySQL, Oracle, DB2, SQL Server and Redis, HBase, MongoDB and Neo4 j.
3. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: in the step five, the Extract-Transform-Load tool is any one of Datastage, Informatica and button.
4. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: and the distributed storage memory in the second step adopts an index structure based on a hash table, namely the hash table stores the position index of the data on the disk, and the disk stores the actual contents of the main key and the value.
5. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: and when the data is screened in the fourth step, potential errors of the data are detected and repaired based on the consistency between the associated data for inconsistent data, so that the cleaning of the data of multiple data sources is completed.
6. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: the original data source comprises various heterogeneous data information, and the data description of the original data source comprises the combined description of the extraction of key characteristic data and a protocol analysis rule.
7. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: and the multiple protocol analysis engines are used for establishing a two-dimensional relationship after data analysis on the protocols configured in the data description by using monitoring, pulling and crawling modes of related protocols, storing the two-dimensional relationship into a message queue and sequentially storing the two-dimensional relationship into corresponding HBase and NoSQL databases in the message queue.
8. The multi-source heterogeneous data fusion modeling method according to claim 1, characterized in that: and C, implementing a temporary storage strategy of the constant-capacity recycle bin on the error data cleared in the step four.
CN202111318577.3A 2021-11-09 2021-11-09 Multi-source heterogeneous data fusion modeling method Pending CN114416705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111318577.3A CN114416705A (en) 2021-11-09 2021-11-09 Multi-source heterogeneous data fusion modeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111318577.3A CN114416705A (en) 2021-11-09 2021-11-09 Multi-source heterogeneous data fusion modeling method

Publications (1)

Publication Number Publication Date
CN114416705A true CN114416705A (en) 2022-04-29

Family

ID=81265884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111318577.3A Pending CN114416705A (en) 2021-11-09 2021-11-09 Multi-source heterogeneous data fusion modeling method

Country Status (1)

Country Link
CN (1) CN114416705A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235748A1 (en) * 2008-03-14 2010-09-16 Johnson William J System and method for automated content presentation objects
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN108985531A (en) * 2017-06-01 2018-12-11 中国科学院深圳先进技术研究院 A kind of multimode isomery electric power big data convergence analysis management system and method
CN109344186A (en) * 2018-08-23 2019-02-15 成都四方伟业软件股份有限公司 A kind of BI system Various database is across the inter-library emerging system in source and fusion method
US20190231097A1 (en) * 2008-03-14 2019-08-01 William J. Johnson System and method for location based exchanges of data facilitiating distributed locational applications
CN112001539A (en) * 2020-08-21 2020-11-27 北京交通大学 High-precision passenger traffic prediction method and passenger traffic prediction system
CN112256782A (en) * 2020-10-30 2021-01-22 内蒙古电力(集团)有限责任公司乌海超高压供电局 Electric power big data processing system based on Hadoop
US20210056347A1 (en) * 2019-08-20 2021-02-25 International Business Machines Corporation Intelligent generation of image-like representations of ordered and heterogenous data to enable explainability of artificial intelligence results
CN112769605A (en) * 2020-12-30 2021-05-07 杭州东方通信软件技术有限公司 Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN113569054A (en) * 2021-05-12 2021-10-29 浙江工业大学 Knowledge graph construction method and system for multi-source Chinese financial bulletin document

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235748A1 (en) * 2008-03-14 2010-09-16 Johnson William J System and method for automated content presentation objects
US20190231097A1 (en) * 2008-03-14 2019-08-01 William J. Johnson System and method for location based exchanges of data facilitiating distributed locational applications
CN108985531A (en) * 2017-06-01 2018-12-11 中国科学院深圳先进技术研究院 A kind of multimode isomery electric power big data convergence analysis management system and method
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN109344186A (en) * 2018-08-23 2019-02-15 成都四方伟业软件股份有限公司 A kind of BI system Various database is across the inter-library emerging system in source and fusion method
US20210056347A1 (en) * 2019-08-20 2021-02-25 International Business Machines Corporation Intelligent generation of image-like representations of ordered and heterogenous data to enable explainability of artificial intelligence results
CN112001539A (en) * 2020-08-21 2020-11-27 北京交通大学 High-precision passenger traffic prediction method and passenger traffic prediction system
CN112256782A (en) * 2020-10-30 2021-01-22 内蒙古电力(集团)有限责任公司乌海超高压供电局 Electric power big data processing system based on Hadoop
CN112769605A (en) * 2020-12-30 2021-05-07 杭州东方通信软件技术有限公司 Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN113569054A (en) * 2021-05-12 2021-10-29 浙江工业大学 Knowledge graph construction method and system for multi-source Chinese financial bulletin document

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李东奎 等: "基于HibernateOGM的SQL与NoSQL数据库的统一访问模型的设计与实现", 《软件》 *
陈世超 等: "制造业生产过程中多源异构数据处理方法综述", 《大数据 》 *

Similar Documents

Publication Publication Date Title
CN113010506B (en) Multi-source heterogeneous water environment big data management system
CN112256782B (en) Hadoop-based power big data processing system
CN110389950B (en) Rapid running big data cleaning method
CN109325062B (en) Data dependency mining method and system based on distributed computation
CN111400354B (en) Machine tool manufacturing BOM (Bill of Material) storage query and tree structure construction method based on MES (manufacturing execution System)
CN113360722B (en) Fault root cause positioning method and system based on multidimensional data map
US20030033291A1 (en) SQL execution analysis
CN113010505A (en) Water environment big data cleaning method
CN112181955A (en) Data standard management method for information sharing of heavy haul railway comprehensive big data platform
CN117056867B (en) Multi-source heterogeneous data fusion method and system for digital twin
CN110674211A (en) Automatic analysis method and device for AWR report of Oracle database
CN111459646A (en) Big data quality management task scheduling method based on pipeline model and task combination
CN109634949B (en) Mixed data cleaning method based on multiple data versions
CN114528284A (en) Bottom layer data cleaning method and device, mobile terminal and storage medium
CN107133335A (en) A kind of repetition record detection method based on participle and index technology
CN110704407B (en) Data deduplication method and system
CN114416705A (en) Multi-source heterogeneous data fusion modeling method
CN115587333A (en) Failure analysis fault point prediction method and system based on multi-classification model
CN110413602B (en) Layered cleaning type big data cleaning method
CN111221809A (en) Data cleaning method and system based on real-time database storage and storage medium
CN107402920A (en) The method and apparatus for determining relation database table connection complexity factor
CN112800219A (en) Method and system for feeding back customer service log to return database
CN117708186A (en) Intelligent adjustment and optimization method for application system
CN110781177A (en) Electric energy meter electricity utilization information sorting method and device and readable storage medium
CN115631866B (en) Rapid and accurate de-duplication method for medical big data acquisition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220429

RJ01 Rejection of invention patent application after publication