LU505379B1 - Method for managing information data based on big data - Google Patents
Method for managing information data based on big data Download PDFInfo
- Publication number
- LU505379B1 LU505379B1 LU505379A LU505379A LU505379B1 LU 505379 B1 LU505379 B1 LU 505379B1 LU 505379 A LU505379 A LU 505379A LU 505379 A LU505379 A LU 505379A LU 505379 B1 LU505379 B1 LU 505379B1
- Authority
- LU
- Luxembourg
- Prior art keywords
- data
- module
- big
- managing information
- integration
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000010354 integration Effects 0.000 claims abstract description 38
- 238000007726 management method Methods 0.000 claims abstract description 24
- 238000004140 cleaning Methods 0.000 claims abstract description 19
- 238000013480 data collection Methods 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims abstract description 16
- 238000005457 optimization Methods 0.000 claims abstract description 16
- 238000013523 data management Methods 0.000 claims abstract description 14
- 238000013500 data storage Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 13
- 230000009466 transformation Effects 0.000 claims abstract description 12
- 238000013079 data visualisation Methods 0.000 claims abstract description 9
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 238000005516 engineering process Methods 0.000 claims description 19
- 230000005540 biological transmission Effects 0.000 claims description 11
- 238000000586 desensitisation Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 8
- 238000013501 data transformation Methods 0.000 claims description 6
- 238000007405 data analysis Methods 0.000 claims description 5
- 238000007418 data mining Methods 0.000 claims description 5
- 230000006872 improvement Effects 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims description 5
- 238000007619 statistical method Methods 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 5
- 230000009193 crawling Effects 0.000 claims description 4
- 238000013506 data mapping Methods 0.000 claims description 4
- 238000010921 in-depth analysis Methods 0.000 claims description 4
- 238000003058 natural language processing Methods 0.000 claims description 4
- 238000013439 planning Methods 0.000 claims description 4
- 238000011084 recovery Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013075 data extraction Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6263—Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
Provided is a method for managing information data based on big data, which relates to the field of data management. The method includes a data collection and acquisition module, a data storage and management module, a data pre-processing and cleaning module, a data integration and combination module, a data modeling and analysis module, a data visualization and presentation module, a data security and privacy module, a data governance and quality management module, and a scalability and performance optimization module. The data integration and combination module may be added with an ETL (extraction, transformation and loading) tool, which is common for data integration and combination. The ETL tool can automatically extract data from different data sources, perform data conversion and processing, and load the data into the target system by adding an ETL. The ETL tool can improve efficiency and accuracy of data integration. The appropriate API and Web service are defined and implemented, to achieve data sharing and interoperability between different systems.
Description
METHOD FOR MANAGING INFORMATION DATA BASED ON BIG DATA
[0001] The present disclosure relates to the field of data management, and in particular to a method for managing information data based on big data.
[0002] Management of information data is widely used around the world, which helps an organization to effectively organize, store, retrieve and utilize information data resources to support decision-making, optimize business processes, improve efficiency, provide personalized user experience, and promote knowledge sharing and innovation. A method for managing information data based on big data is to collect, store, process and analyze massive data by utilizing the big data technology and tools, to provide insights and values. These methods include data collection, data cleaning, data integration, data storage, data analysis, and other steps, in order to cope with challenges of data speed, diversity, scale and complexity.
With the big data technology, an organization can discover a pattern, trend and association rule in the data, and make more accurate and targeted decisions and gain a competitive advantage. Therefore, a method for managing information data based on big data needs to be proposed.
[0003] From search, Chinese patent publication No. CN114840770A discloses a management method and system based on big data, in which authenticity and adaptability of data required by a demand client can be improved, and reliability of bill information obtained by a supply client is ensured.
[0004] However, in terms of management of information data, an effectively integration and combination is difficult as data from different data sources may have different formats and structures. In addition, a growth rate of data may further increase the difficulty in data integration and integration. In addition, big data involves personal identification information, business secrets, and other sensitive information, which requires effective measures for data security and privacy protection.
[0005] An objective of the present disclosure is to provide a method for managing information data based on big data, in order to solve those problems that: an effectively integration and combination is difficult as data from different data sources may have different formats and structures; a growth rate of data may further increase the difficulty in data integration and integration; and big data involves personal identification information, business secrets, and other sensitive information, which requires effective measures for data security and privacy protection.
[0006] Technical solutions provided in the present disclosure for achieving the above objective are described below. A method for managing information data based on big data is provided. The method includes a data collection and acquisition module, a data storage and management module, a data pre-processing and cleaning module, a data integration and combination module, a data modeling and analysis module, a data visualization and presentation module, a data security and privacy module, a data governance and quality management module, and a scalability and performance optimization module.
[0007] In a preferred embodiment, the data collection and acquisition module is for collecting and acquiring big data from various sources, where the sources include sensors, log files, social media and web crawling, and the data collection and acquisition module involves automation of data source selection, data acquisition, and data collection.
[0008] In a preferred embodiment, the data storage and management module is for storing and managing large-scale information data, including a distributed storage system, a data warehouse, a data lake and a cloud storage technology, and the data storage and management module further includes data indexing, data backup, data recovery and data life cycle management.
[0009] In a preferred embodiment, the data pre-processing and cleaning module 1s for pre- processing and cleaning the collected information data, including noise removal, outlier processing, data cleaning, data standardization, to ensure quality and consistency of the data.
[0010] In a preferred embodiment, the data integration and combination module is for integrating and combining data from different sources and in different formats, which involves format transformation of data, data mapping, data merging and data standardization, to build a complete and consistent data set. An ETL tool is added to automatically extract data from different data sources, perform data transformation and processing, load the data into a target system. The data 1s integrated into a web application or cloud service for data exchange and integration by utilizing an API and web service.
[0011] In a preferred embodiment, the data modeling and analysis module 1s for applying data analysis and modeling technologies to perform in-depth analysis of the information data, including statistical analysis, machine learning, data mining, and natural language processing, to discover a pattern, trend and association in the data.
[0012] In a preferred embodiment, the data visualization and presentation module is for presenting an analysis result to a user in a form of a visual chart, dashboard and report, to help the user to better understand and utilize the information data to support decision-making and strategic planning.
[0013] In a preferred embodiment, the data security and privacy module is for ensuring security and privacy protection of the information data, including data encryption, identity authentication, access control, data desensitization and compliance measures, to prevent unauthorized access and data leakage. Secure transfer protocols HTTPS and SSH are utilized to ensure confidentiality and integrity of data during transmission. Data anonymization is performed to protect privacy in a case that data desensitization is not possible.
[0014] In a preferred embodiment, the data governance and quality management module is for establishing a data governance framework and a data quality management mechanism to ensure accuracy, integrity and consistency of the data, which includes assessment of data quality, monitoring on data quality and improvement of data quality.
[0015] In a preferred embodiment, the scalability and performance optimization module is for ensuring that the method is able to process information data in a large scale and rapid growth, which involves distributed computing, parallel processing, cache optimization, query optimization and resource management technology, to improve system scalability and performance.
[0016] Beneficial effects of the present disclosure compared with the conventional technology are described here. The data integration and combination module may be added with the ETL (extraction, transformation and loading) tool, to automatically extract data from different data sources, perform data conversion and processing, and load the data into the target system by adding an ETL. The ETL tool can improve efficiency and accuracy of data integration. The appropriate API and Web service are defined and implemented, to achieve data sharing and interoperability between different systems. Data security and privacy issues can be solved by using secure transfer protocols HTTPS and SSH, so as to ensure confidentiality and integrity of data during transmission, which can prevent data from being intercepted or tampered with. Data anonymization can be performed to protect privacy in a case that data desensitization is impossible.
[0017] FIG. 1 is a flow chart of an overall method according to an embodiment of the present disclosure; and
[0018] FIG. 2 is a flow chart of an overall method according to another embodiment of the present disclosure.
[0019] Hereinafter technical solutions of embodiments of the present disclosure are described clearly and completely in conjunction with the drawings of the embodiments of the present disclosure. Apparently, the embodiments described below are only some embodiments, rather than all the embodiments of the present disclosure. Any other embodiment obtained by those skilled in the art based on the embodiments in the present disclosure without any creative effort fall within the protection scope of the present disclosure.
[0020] Reference is made to Figure 1 to Figure 2. A technical solution of the present disclosure is provided as follows. A method for managing information data based on big data is provided. The method includes: a data collection and acquisition module, a data storage and management module, a data pre-processing and cleaning module, a data integration and combination module, a data modeling and analysis module, a data visualization and presentation module, a data security and privacy module, a data governance and quality management module, and a scalability and performance optimization module.
[0021] Further, the data collection and acquisition module is for collecting and acquiring big data from various sources. The sources include sensors, log files, social media and web crawling. The data collection and acquisition module involves automation of data source selection, data acquisition and data collection. An appropriate data collection tool and technology is adopted to connect to data sources and extract data from the data sources, so as to ensure integrity and accuracy of the data.
[0022] Further, the data storage and management module is for storing and managing large- scale information data, including a distributed storage system, a data warehouse, a data lake and a cloud storage technology. The data storage and management module further includes data indexing, data backup, data recovery and data life cycle management. An appropriate 5 storage system is adopted to design a data model and table structure, and establish a data indexing and partitioning strategy, so as to provide efficient data storage and retrieval.
[0023] Further, the data pre-processing and cleaning module is for pre-processing and cleaning the collected information data, including noise removal, outlier processing, data cleaning, data standardization, to ensure quality and consistency of the data. Data cleaning and pre-processing techniques are applied, so as to normalize and transform data formats.
[0024] Further, the data integration and combination module is for integrating and combining data from different sources and in different formats, which involves format transformation of data, data mapping, data merging and data standardization, to build a complete and consistent data set. An ETL tool may be added to automatically extract data from different data sources, perform data transformation and processing, and load the data into a target system. The data is integrated into a web application or cloud service for data exchange and integration by utilizing an API and web service. A data integration tool or customized data transformation and integration strategy is utilized to integrate and map data from different data sources, so that a difference in data formats and structures is solved.
[0025] Further, the data modeling and analysis module is for applying data analysis and modeling technologies to perform in-depth analysis of the information data, including statistical analysis, machine learning, data mining, and natural language processing, to discover a pattern, trend and association in the data. A machine learning algorithm, a statistical analysis and a data mining technology are adopted to build a model and an algorithm, so as to perform data modeling, analysis and prediction.
[0026] Further, the data visualization and presentation module is for presenting an analysis result to a user in a form of a visual chart, dashboard and report, to help the user to better understand and utilize the information data to support decision-making and strategic planning.
A data visualization tool and technique is utilized to select an appropriate chart type, layout, and color scheme, and provide an interactive and customized function, so as to support the user in data exploration and decision-making.
[0027] Further, the data security and privacy module is for ensuring security and privacy protection of the information data, including data encryption, identity authentication, access control, data desensitization and compliance measures, to prevent unauthorized access and data leakage. Secure transfer protocols HTTPS and SSH are utilized to ensure confidentiality and integrity of data during transmission. Data anonymization is performed to protect privacy in a case that data desensitization is not possible. An access control mechanism, a data encryption technology and an identity verification method are utilized to ensure data confidentiality, integrity and availability, so as to comply with privacy protection regulations and standards.
[0028] Further, the data governance and quality management module is for establishing a data governance framework and a data quality management mechanism to ensure accuracy, integrity and consistency of the data, which includes assessment of data quality, monitoring on data quality and improvement of data quality. Data governance policies and rules are developed and a data quality assessment indicator is established to implement data quality monitoring and a maintenance measure, so as to perform data control and metadata management.
[0029] Further, the scalability and performance optimization module is for ensuring that the method is able to process information data in a large scale and rapid growth, which involves distributed computing, parallel processing, cache optimization, query optimization and resource management technology, to improve system scalability and performance. A data processing and query algorithm and strategy is optimized, a horizontal expansion and distributed computing architecture is designed, and a performance testing and optimizing is performed, so as to satisfy requirements for high-concurrency, large-scale data processing and analysis.
[0030] An application of the method for managing information data based on big data and improvement modules thereof are described below.
[0031] S1, examples of steps of the method for managing information data based on big data are as follows.
[0032] In the data collection and acquisition module, big data is collected and acquired from various sources, where the sources includes sensors, log files, social media and web crawling, and the like. The data collection and acquisition module involves automation of data source selection, data acquisition and data collection.
[0033] In the data storage and management module, large-scale information data is stored and managed, which includes a distributed storage system, a data warehouse, a data lake, a cloud storage technology, and the like. The data storage and management module further includes data indexing, data backup, data recovery, data life cycle management, and the like.
[0034] In the data pre-processing and cleaning module, pre-processing and cleaning 1s performed on the collected information data, which includes noise removal, outlier processing, data cleaning, data standardization, and the like, to ensure quality and consistency of the data.
[0035] In the data integration and combination module, data from different sources and in different formats are integrated and combined, which involves format transformation of data, data mapping, data merging and data standardization and other processing, to build a complete and consistent data set.
[0036] In the data modeling and analysis module, data analysis and modeling technologies are applied to perform in-depth analysis of the information data, including statistical analysis, machine learning, data mining, and natural language processing and other methods, to discover a pattern, trend and association in the data.
[0037] In the data visualization and presentation module, an analysis result is presented to a user in a form of a visual chart, dashboard, report, and the like, to help the user to better understand and utilize the information data to support decision-making and strategic planning.
[0038] In the data security and privacy module, security and privacy protection of the information data is ensured, including data encryption, identity authentication, access control, data desensitization and compliance and other measures, to prevent unauthorized access and data leakage.
[0039] In a data governance and quality management module, a data governance framework and a data quality management mechanism are established, to ensure accuracy, integrity and consistency of the data, which includes assessment of data quality, monitoring on data quality, improvement of data quality, and the like.
[0040] In the scalability and performance optimization module, it is ensured that the method is able to process information data in a large scale and rapid growth, which involves distributed computing, parallel processing, cache optimization, query optimization and resource management and other technologies, to improve system scalability and performance.
[0041] S2, in a case that data from different data sources have different formats and structures, making it difficult to integrate and combine the data effectively, a data integration tool and platform may be utilized to help achieve data integration and integration.
[0042] An ETL (extraction, transformation and loading) tool is common for data integration and combination. Such tool can automatically extract data from different data sources, transform and process the data, and then load the data into a target system. The ETL tool can improve efficiency and accuracy of data integration.
[0043] To integrate data into a web application or cloud service, an API and web service may be utilized for data exchange and integration. By defining and implementing an appropriate
API and web service, data sharing and interoperability between different systems can be achieved.
[0044] A main responsibility of this module is to ensure data consistency, integrity and availability.
[0045] Data extraction: the data integration and consolidation module extracts data from various data sources by using the ETL tool. The data sources may include databases, files,
API interfaces, log files, and sensors. The ETL tool provides abilities to connect to and access different data sources. A standard connector or driving program may be utilized to communicate with the data sources and extract required data.
[0046] Data transformation: after the data extraction, the data integration and combination module transforms and processes the extracted data. This includes data cleaning, data format transformation, field renaming, data merging, data calculation and data normalization operations. The ETL tool provides a visual interface and various transformation functions, allowing a user to define and perform expected data transformation operations.
[0047] Data loading: after the transformation is completed, the data integration and combination module loads the data after transformation into a target system. The target system may be a data warehouse, a data lake, an analysis platform, a web application, or a cloud service. The ETL tool provides a function and adapter for loading data into a specific target system to ensure correct loading and storage of the data.
[0048] Data exchange and integration: the data integration and combination module further provides functions of data exchange and integration using the API and Web service. This allows other applications or services to access and obtain the integrated data through an API call. By defining and implementing an appropriate API and data exchange protocol, data sharing and integration between applications can be achieved.
[0049] S3, the big data involves personal identification information, business secrets, and other sensitive information, which requires effective measures for data security and privacy protection.
[0050] Secure transmission protocols HTTPS and SSH are utilized to ensure confidentiality and integrity of data during transmission. This can prevent data from being intercepted or tampered with.
[0051] Data anonymization may be performed to protect privacy in a case that data desensitization is impossible. The Data anonymization is to perform structural adjustment on data, noise addition, data perturbation and other operations to prevent data from being directly linked to a specific individual.
[0052] Secure Transfer Protocol HTTPS and SSH: the HTTPS is a secure transfer protocol that encrypts and authenticates HTTP communications by using SSL or TLS protocols. The
HTTPS ensures confidentiality and integrity of data during transmission.
[0053] The SSH is an encrypted protocol for secure remote login and file transfer over a network. The SSH provides authentication and encrypted transmission, and protects the confidentiality and integrity of data during transmission.
[0054] By using these secure transfer protocols, the data is encrypted during transmission, and therefore the data cannot be intercepted and read by unauthorized users. This ensures confidentiality and tamper-proofness of data transmission, so that data security is protected.
[0055] Data anonymization: data anonymization is a method for protecting privacy, by which personal identification information in raw data is removed or replaced with a virtual identifier, so as to prevent the personal identification from being identified and associated. Some common techniques for data anonymization are described below.
[0056] Desensitization: removing or scrambling personal identification information, such as name, ID number, phone numbers, from sensitive data. This can be achieved by using data encryption, data replacement, data deletion, and other technologies.
[0057] Generalization: converting detailed data into a broader category or range, for example, converting age data accurate to age into an age group.
[0058] Data sampling: select a portion of data randomly from an original data set for analysis and processing, to reduce a potential impact on personal privacy.
[0059] Although the embodiments of the present disclosure are shown and described, those skilled in the art can understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principle and purpose of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.
Claims (10)
1. A method for managing information data based on big data, comprising: a data collection and acquisition module, a data storage and management module, a data pre-processing and cleaning module, a data integration and combination module, a data modeling and analysis module, a data visualization and presentation module, a data security and privacy module, a data governance and quality management module, and a scalability and performance optimization module.
2. The method for managing information data based on big data according to claim 1, wherein the data collection and acquisition module is for collecting and acquiring big data from various sources, wherein the sources comprise sensors, log files, social media and web crawling, and the data collection and acquisition module involves automation of data source selection, data acquisition, and data collection.
3. The method for managing information data based on big data according to claim 1, wherein the data storage and management module is for storing and managing large-scale information data, comprising a distributed storage system, a data warehouse, a data lake and a cloud storage technology, and the data storage and management module further comprises data indexing, data backup, data recovery and data life cycle management.
4. The method for managing information data based on big data according to claim 1, wherein the data pre-processing and cleaning module is for pre-processing and cleaning collected information data, comprising noise removal, outlier processing, data cleaning, data standardization, to ensure quality and consistency of the data.
5. The method for managing information data based on big data according to claim 1, wherein the data integration and combination module is for integrating and combining data from different sources and in different formats, which involves format transformation of data, data mapping, data merging and data standardization, to build a complete and consistent data set, wherein an ETL (extraction, transformation and loading) tool is added to automatically extract data from different data sources, perform data transformation and processing, load the data into a target system, and the data is integrated into a web application or cloud service for data exchange and integration by utilizing an API and web service.
6. The method for managing information data based on big data according to claim 1, wherein the data modeling and analysis module is for applying data analysis and modeling technologies to perform in-depth analysis of the information data, comprising statistical analysis, machine learning, data mining, and natural language processing, to discover a pattern, trend and association in the data.
7. The method for managing information data based on big data according to claim 1, wherein the data visualization and presentation module is for presenting an analysis result to a user in a form of a visual chart, dashboard and report, to help the user to better understand and utilize the information data to support decision-making and strategic planning.
8. The method for managing information data based on big data according to claim 1, wherein the data security and privacy module is for ensuring security and privacy protection of the information data, comprising data encryption, identity authentication, access control, data desensitization and compliance measures, to prevent unauthorized access and data leakage, wherein secure transfer protocols HTTPS and SSH are utilized to ensure confidentiality and integrity of data during transmission, and data anonymization is performed to protect privacy in a case that data desensitization is not possible.
9. The method for managing information data based on big data according to claim 1, wherein the data governance and quality management module is for establishing a data governance framework and a data quality management mechanism to ensure accuracy, integrity and consistency of the data, which comprises assessment of data quality, monitoring on data quality and improvement of data quality.
10. The method for managing information data based on big data according to claim 1, wherein the scalability and performance optimization module is for ensuring that the method is able to process information data in a large scale and rapid growth, which involves distributed computing, parallel processing, cache optimization, query optimization and resource management technology, to improve system scalability and performance.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311139282.9A CN116910815A (en) | 2023-09-06 | 2023-09-06 | Information data management method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
LU505379B1 true LU505379B1 (en) | 2024-04-26 |
Family
ID=88351353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
LU505379A LU505379B1 (en) | 2023-09-06 | 2023-10-26 | Method for managing information data based on big data |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116910815A (en) |
LU (1) | LU505379B1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117131036B (en) * | 2023-10-26 | 2023-12-22 | 环球数科集团有限公司 | Data maintenance system based on big data and artificial intelligence |
CN117524434B (en) * | 2023-11-17 | 2024-04-30 | 中国人民解放军海军第九七一医院 | Expert information management optimization method and system based on vein treatment data platform |
CN117829794A (en) * | 2024-01-02 | 2024-04-05 | 浙江精创教育科技有限公司 | Human resource data processing method and system based on cloud computing |
CN117874117A (en) * | 2024-01-18 | 2024-04-12 | 杭州泛嘉科技有限公司 | Member value-added service platform for data information management |
CN118446428A (en) * | 2024-05-29 | 2024-08-06 | 北京星航机电装备有限公司 | Data management method and system for space discrete manufacturing |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8311863B1 (en) * | 2009-02-24 | 2012-11-13 | Accenture Global Services Limited | Utility high performance capability assessment |
CN109597848A (en) * | 2018-11-21 | 2019-04-09 | 北京域天科技有限公司 | A kind of shared exchange system of emergency resources |
CN112102111B (en) * | 2020-09-27 | 2021-06-08 | 华电福新广州能源有限公司 | Intelligent processing system for power plant data |
CN114756563A (en) * | 2022-05-06 | 2022-07-15 | 焦点科技股份有限公司 | Data management system with multiple coexisting complex service lines of internet |
CN116681250A (en) * | 2023-06-07 | 2023-09-01 | 山东天瀚企业管理咨询服务有限公司 | Building engineering progress supervisory systems based on artificial intelligence |
-
2023
- 2023-09-06 CN CN202311139282.9A patent/CN116910815A/en active Pending
- 2023-10-26 LU LU505379A patent/LU505379B1/en active IP Right Grant
Also Published As
Publication number | Publication date |
---|---|
CN116910815A (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
LU505379B1 (en) | Method for managing information data based on big data | |
Khare et al. | Big data in IoT | |
CN111090779A (en) | Cloud storage and retrieval analysis method for case-handling exploration evidence-taking data | |
CN112632135A (en) | Big data platform | |
CN101908165A (en) | Geographic information system (GIS)-based industrial cluster information integration service system and method | |
CN107480263A (en) | The management method and management system of a kind of data resource | |
US20190050435A1 (en) | Object data association index system and methods for the construction and applications thereof | |
CN111125042A (en) | Method and device for determining risk operation event | |
CN114140082B (en) | Enterprise content management system | |
CN111046000B (en) | Government data exchange sharing oriented security supervision metadata organization method | |
CN116974490A (en) | Big data storage method and system based on multi-terminal cloud computing cluster | |
Goswami et al. | A survey on big data & privacy preserving publishing techniques | |
CN117493412A (en) | Digital base operation management system | |
CN117194399A (en) | Data management system for intelligent operation and maintenance service | |
CN117574436B (en) | Tensor-based big data privacy security protection method | |
CN118505447A (en) | Big data-based obstetric and research information resource sharing service system | |
CN107392042A (en) | Electric network data monitoring method and device | |
CN117033501A (en) | Big data acquisition and analysis system | |
CN110222102A (en) | The data management system and method for rocket engine ground test | |
CN113220807A (en) | Automatic remote sensing data publishing and online process monitoring method | |
Zhu | [Retracted] Interoperability of Multimedia Network Public Opinion Knowledge Base Group Based on Multisource Text Mining | |
CN109885543A (en) | Log processing method and device based on big data cluster | |
CN117150567B (en) | Cross-regional real estate data sharing system | |
CN113268517B (en) | Data analysis method and device, electronic equipment and readable medium | |
CN116049877B (en) | Method, system, equipment and storage medium for identifying and desensitizing private data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FG | Patent granted |
Effective date: 20240426 |