CN110543464A - Big data platform applied to smart park and operation method - Google Patents

Big data platform applied to smart park and operation method Download PDF

Info

Publication number
CN110543464A
CN110543464A CN201910774850.XA CN201910774850A CN110543464A CN 110543464 A CN110543464 A CN 110543464A CN 201910774850 A CN201910774850 A CN 201910774850A CN 110543464 A CN110543464 A CN 110543464A
Authority
CN
China
Prior art keywords
data
module
unit
platform
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910774850.XA
Other languages
Chinese (zh)
Other versions
CN110543464B (en
Inventor
任菁倩
杨嘉欣
杜莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Dingyi Interconnection Technology Co Ltd
Original Assignee
Guangdong Dingyi Interconnection Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Dingyi Interconnection Technology Co Ltd filed Critical Guangdong Dingyi Interconnection Technology Co Ltd
Publication of CN110543464A publication Critical patent/CN110543464A/en
Application granted granted Critical
Publication of CN110543464B publication Critical patent/CN110543464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

the invention discloses a big data platform applied to an intelligent park, which is mainly characterized in that the technical architecture of the whole project adopts a mixed mode of Hadoop + MPP + memory database, and simultaneously adopts Storm technology to support the acquisition and calculation of real-time data, thereby realizing a big data system with high concurrency, scalability and high performance. And the data sharing and processing capabilities of databases, messages, files and the like in various modes are supported. Meanwhile, MapReduce operation, SQL operation, flow calculation and memory calculation are supported. The rule engine is used for reducing the complexity of components for realizing complex business logic, increasing the flexibility of marketing scene configuration, reducing the maintenance cost of an application program and enhancing the expandability of the program. The scheme has good expansibility, can enhance the processing capacity of the cluster in a horizontal expansion mode in the future and meets the requirement of service development.

Description

Big data platform applied to smart park and operation method
Technical Field
The invention relates to the technical field of big data platforms, in particular to a big data platform applied to an intelligent park and an operation method.
Background
Although the existing big data technology is very explosive, the big data industry in China is still in a starting stage, and the development of an industry chain is not mature. After the big data industrial park is established, enough enterprises cannot necessarily exist, and a complete big data ecological circle cannot be formed.
The existing big data application technology has the problems of safety and hidden danger, and the main contents are as follows: first, the threat posed by big data, a security issue in general, is bound to be the target of attack when big data technologies, systems and applications gather a lot of value; secondly, problems and side effects caused by excessive abuse of big data are typically personal privacy disclosure, and also include commercial secret disclosure and national secret disclosure caused by big data analysis capability; third, mental and conscious security issues. Threats to big data, side effects of big data, and extreme mental efforts to big data can hinder and disrupt the development of big data.
therefore, how to provide a safe big data platform and an operation method for realizing efficient utilization of an intelligent park is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a big data platform and an operation method applied to a smart park, which provide an effective solution and implement on the basis of an advanced big data technology and aim at the defects of the prior art, and enlarge the coverage of the big data application technology, so that more industrial parks can apply the big data technology, and the big data platform applied to the smart park can realize the efficient utilization of the smart park through the big data application technology, and simultaneously can ensure the efficient and safe data of the smart park, so that the smart park forms a more perfect big data ecological circle.
In order to achieve the purpose, the invention adopts the following technical scheme:
A big data platform for a smart park, comprising: the system comprises a data acquisition module, a data storage module, a data calculation module, a data application module and a platform management and control module;
The data acquisition module is connected with the data storage module and stores acquired data into the data storage module;
The data calculation module is connected with the data storage module and is used for processing data in the data storage module;
the data application module is connected with the data calculation module to establish a business logic encapsulation business object and a business service;
The platform management and control module is used for connecting and monitoring the data acquisition module, the data storage module, the data calculation module and the data application module.
Preferably, in the above big data platform applied to the smart park, the data acquisition module includes: the data extraction unit, the data input end and the data output end; the data input end is connected with a data source; the data extraction unit is connected with the data input end and classifies the acquired data and transmits the data to the data storage module.
Preferably, in the above big data platform applied to the smart park, the data storage module includes a distributed file unit, a distributed database, and a distributed cache unit; the distributed file unit is provided with an uploading channel and a downloading channel and performs data interaction with the distributed database; and the distributed cache unit is connected with the distributed database for cache processing.
Preferably, in the big data platform applied to the smart park, the data calculation module includes: the system comprises a MapReuce unit, a data warehouse unit, a machine learning and data mining base and a rule knowledge base; the data warehouse unit converts a data file and runs the data file on the MapReuce unit; the machine learning and data mining library stores a machine learning field classic algorithm; the rule knowledge base matches rules through a rule engine.
Preferably, in the above big data platform applied to the smart park, the platform management and control module includes: the system comprises a cluster management unit, a host management unit, a user management unit and a cluster log management unit; the cluster management unit is connected with the data calculation module; the host management unit is connected with a host node; the user management unit manages the platform users; the cluster log management unit is respectively connected with the data acquisition module, the data storage module, the data calculation module and the data application module.
Preferably, in the big data platform applied to the intelligent park, the big data platform further comprises a data security module; the data security module comprises an identity verification and authorization unit; the identity authentication and authorization unit is connected with the user management unit.
an operation method of a big data platform for an intelligent park comprises the following specific steps:
the method comprises the following steps: the data acquisition module extracts data from a data source, processes the data and stores the data, and the accessed data is processed uniformly through file decompression, file merging and splitting, file level verification, data level verification, cleaning, conversion, association and summarization and is loaded to the data storage module;
Step two: the data storage module adopts a distributed scheme and utilizes Hadoop to realize semi-structured and unstructured data processing; processing high-quality structured data by using MPP (maximum power point tracking), and storing the data;
Step three: transmitting the stored data to a data warehouse unit of a data calculation module, converting a data file and operating on the MapReuce unit; the MapReduce unit transmits the information to a data application module for carrying out flow statistics, service recommendation, trend analysis, user behavior analysis, data mining, offline analysis, online analysis and ad hoc query;
Step four: and the platform control module is used for monitoring the data acquisition module, the data storage module, the data calculation module and the data application module.
According to the technical scheme, compared with the prior art, the invention provides the big data platform applied to the smart park, and an effective solution is provided and implemented on the basis of the advanced big data technology aiming at the defects of the prior art, the coverage rate of the big data application technology is increased, more industrial parks can apply the big data technology, and the big data platform applied to the smart park can realize the efficient utilization of the smart park through the big data application technology, and simultaneously can ensure the efficient and safe data of the smart park, so that the smart park forms a more perfect big data ecological circle.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Figure 1 is a structural framework diagram of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a big data platform applied to a smart park, which is based on an advanced big data technology and aims at the defects of the prior art, provides an effective solution and is implemented, the coverage rate of the big data application technology is enlarged, more industrial parks can apply the big data technology, and the big data platform applied to the smart park can realize the high-efficiency utilization of the smart park through the big data application technology, can ensure the high-efficiency and safety of the data of the smart park, and can form a more perfect big data ecological circle for the smart park.
As shown in fig. 1, a big data platform applied to a smart park includes: the system comprises a data acquisition module, a data storage module, a data calculation module, a data application module and a platform management and control module;
The data acquisition module is connected with the data storage module and stores acquired data into the data storage module;
The data calculation module is connected with the data storage module and is used for processing data in the data storage module;
The data application module is connected with the data calculation module to establish a business logic encapsulation business object and a business service;
the platform management and control module is used for connecting and monitoring the data acquisition module, the data storage module, the data calculation module and the data application module.
In order to further optimize and optimize the above technical solution, the data acquisition module includes: the data extraction unit, the data input end and the data output end; the data input end is connected with a data source; the data extraction unit is connected with the data input end and classifies the acquired data and transmits the data to the data storage module.
furthermore, the data in the current intelligent park information system has inconsistent problems, such as data structure isomerism, inconsistent data length, different data formats, even error data and the like, so that the original data is difficult to directly use. The system is required to further process the data in a particular data source into a useful and desirable data format.
By constructing the big data extraction and conversion unit, government affair information collection and adaptation such as a government affair shared information base and on-line affair handling information of the project are realized. The method is used for helping to integrate data in various systems, and the integrated data can meet the requirements of further mining data and discovering knowledge.
the functions of the method mainly comprise the steps of extracting data from a data source, processing the data and storing the data, so that data reconstruction is completed, and the requirements of big data search application and other data mining applications on data formats are met. The method supports the collection of various file collection sources, and can collect local and local area network shared folders, FTP server folders and folders on an HTTP server.
Unified data acquisition and scheduling are used as a data transfer hub of the project, a data input end is accessed into a data source as required, the accessed data are uniformly processed through the steps of file decompression, file merging and splitting, file level verification, data level verification, cleaning, conversion, association, summarization and the like, and the data are loaded to a system and are in charge of platform scheduling centralized management.
The data output end simultaneously bears the capability of providing data service for upper-layer application, and the service interface calls various data output ends to realize data access according to requirements.
The data extraction unit finishes extraction from the file type interface file, and the ETL subsystem provides an SFTP plug-in unit and performs data extraction in an SFTP protocol mode. And a breakpoint resume function is supported, and file wildcard is supported. The main functions involved are as follows: downloading a file; checking the integrity of the file; file breakpoint resuming; source file deletion, etc.
Furthermore, the data acquisition module also comprises a database table data synchronization plug-in for supporting the acquisition of source data from various heterogeneous database table data sources, and the source data is loaded to a target database table after being converted and formatted.
The data conversion function sets cleaning and conversion rules for the extracted data files according to the data specification requirements of the target interface table, and then performs cleaning and conversion of data according to the rules to form formatted files and related cleaning and conversion quality data.
The data verification function verifies the data unit file uploaded by the data source by using the data verification plug-in, and verified quality data is stored in the management metadata base. The hierarchy of the check is as follows:
And (3) file level checking: and checking the number of the data files and the number of the records according to the check file.
Recording and checking: and checking the value range of the record field based on an agreed checking rule.
Index level verification: and verifying the key service indexes of the interface units based on the verification files.
The method further comprises the steps that the data loading plug-in obtains the converted data files, SQL is assembled according to a target format, then the SQL is loaded into a target table of a data warehouse in batches, and meanwhile quality data of a loading process are generated.
The data aggregation plug-in calls a procedure or function to complete a specific data aggregation processing procedure. The aggregation refers to preprocessing such as recording line compression, table connection, attribute combination and the like on the bottom layer data according to the difference of dimensional granularity, indexes and computational elements and the actual analysis requirement, and is a data processing form for carrying out corresponding statistics on the detailed data of the bottom layer, and the preprocessing includes summation, averaging and the like.
The result of the aggregate calculation is a pre-calculated summary data based on the possible queries of the user. The form of aggregation is varied and can be performed along any one or more dimensions of the multidimensional data in the data warehouse. Aggregation may also be performed at any one level if the dimensions are hierarchical. The aggregated Data for a certain combination of dimensions is called a Cube (Cube), and the Cube lattice formed by all cubes of a given dimension set is called the Data Cube (Data Cube) of that dimension set. The data cube is built by aggregation.
Data aggregation is used for improving the performance of a data warehouse unit in online analysis processing, and shortens the query response time by preparing answers before questions are raised, so that the data warehouse unit is the basis for the OLAP technology to respond quickly, and is mainly embodied in the following aspects:
Aggregation reduces the impact of direct access to underlying data on front-end applications
On-line analysis processing usually requires summarized data derived from detailed data, and directly performing query statistics on massive basic data greatly affects system efficiency. The required summary data is pre-computed by aggregation, thereby avoiding direct access to the underlying data.
aggregation reduces duplicate computations on underlying data
Different on-line analytical processing operations may all require the same processing of the same portion of underlying data. The summary data is pre-computed by aggregation, thereby avoiding duplicate computations on relevant underlying data.
data consistency can be guaranteed to a certain extent by using aggregation
In one aspect, the underlying data in the data warehouse units is not updatable in real time, and the aggregate derived from these relatively stable underlying data reflects aggregated information over a period of time. On the other hand, the data in the data warehouse unit is time-varying again, and new data will be added periodically. The consistency of the data accessed in the analysis process can be ensured to a certain extent through aggregation, and the inconsistency of successively summarized data caused by directly using basic data is avoided.
Acquiring event data flow data: the value of the data decreases over time so that events must be processed as soon as they occur, preferably immediately when they occur, with one event occurring for processing rather than being buffered as a batch. In the data flow model, the input data (in whole or in part) that needs to be processed is not stored on a randomly accessible disk or memory, and they arrive in one or more "continuous data streams".
The data flow system relates to two kinds of operations, namely stateful operators and stateless operators, wherein the stateless operators comprise units, filters and the like, and the stateful operators comprise sort, join, aggregat and the like. The state maintained by a stateful operator is lost if the execution fails, the state and output generated by the replay dataflow are not necessarily consistent with those before failure, and the replay dataflow can construct an output consistent with those before failure after the stateless operator fails.
The dataflow computation can be seen as a dataflow graph consisting of one operator (node) and one dataflow (edge).
Apache Kafka is also an open source system, and aims to provide a unified, high-throughput, low-latency distributed message processing platform for processing real-time data. It was originally developed by LinkedIn, was open in 2011 and was contributed to Apache. Kafka differs from traditional RabbitMQ, Apache ActiveMQ and other message systems mainly in that: the distributed system is characterized by easy expansion; providing high throughput for publishing and subscribing; multiple subscriptions are supported, and consumers can be automatically balanced; messages may be persisted to disk and may be used for bulk consumption, such as ETL and the like.
storm is a real-time dataflow computing system with Twitter open source, and is developed by using Clojere functional language. Storm provides a set of common primitives for distributed real-time computing that can be used in "stream processing," processing messages and updating databases in real-time, which is another way to manage queues and worker clusters. Storm consults the Hadoop computational model, where Hadoop runs a Job and Storm runs a Topoloy. Job is a lifecycle, while Topology is a Service, a Job that does not stop.
In order to further optimize and optimize the technical scheme, the data storage module comprises a distributed file unit, a distributed database and a distributed cache unit; the distributed file unit is provided with an uploading channel and a downloading channel and performs data interaction with the distributed database; and the distributed cache unit is connected with the distributed database for cache processing.
Further, the data storage module adopts a mixed architecture of Hadoop + MPP + memory database, adopts a distributed scheme, and realizes semi-structured and unstructured data processing by Hadoop. MPP is used for processing high-quality structured data, and meanwhile, rich SQL and transaction support capability is provided for applications. The method breaks through key technologies of storage, management and efficient access of big data, can construct a big data platform with PB-level storage capacity, and provides a transparent data management platform for users.
The distributed file unit has the characteristic of high fault tolerance and is designed to be deployed on cheap hardware; and it provides high throughput access to application data, suitable for applications with very large data sets.
The distributed database is used as a non-shared architecture, each node runs an own operating system, database and the like, and information interaction between the nodes can be realized only through network connection.
The distributed cache unit is a high-performance key-value memory database. It supports relatively more value types to store, including string, list, set, and zset. These data types all support push/pop, add/remove, and intersect union and difference, and richer operations, and these operations are all atomic. On the basis, the distributed cache unit supports various different modes of sorting. To ensure efficiency, data is cached in memory. The difference is that the distributed cache unit periodically writes updated data into a disk or writes modification operation into an additional recording file, and master-slave synchronization is realized on the basis. The distributed cache unit can play a good role in supplementing the relational database in part of occasions.
By adopting a hierarchical and open sharing-oriented technical framework, the application and the data of the performance management system are decoupled to form a stable and open data sharing platform, the application integration of multiple upper-layer manufacturers is supported, a data platform is realized, diversified internal applications and external applications are supported, the system has the data processing and storage capacity required by related work development, and the data are classified and classified for storage according to the data importance and timeliness.
The data storage module has the characteristics that:
1) data openness
In order to ensure the effectiveness of data and stable performance, the system has the functions of shared interface management, access control, load control and the like, and can realize one-to-many application expansion:
The sharing interface management function uniformly manages the interfaces of the data sharing platform, including interfaces of inquiry, subscription, message exchange, database and the like.
The access control management function should implement: judgment of access authority, session management, access frequency management, request queue management, security control and the like.
2) Expansibility
And the smooth evolution of the platform is supported, including hardware expansion, data configuration, system management, software upgrading and the like, so that the method can adapt to the continuous development of services and the expansion of user scale.
The system is based on X86PC server hardware and is easy to expand horizontally;
The method has no dependence on source and target data and is compatible with various data sources;
And Hadoop storage and application, scheduling, management and monitoring of computing resources are provided for third-party application.
in order to further optimize and optimize the above technical solution, the data calculation module includes: the system comprises a MapReuce unit, a data warehouse unit, a machine learning and data mining base and a rule knowledge base; the data warehouse unit converts a data file and runs the data file on the MapReuce unit; the machine learning and data mining library stores a machine learning field classic algorithm; the rule knowledge base matches rules through a rule engine.
the data calculation module supports various workflows, algorithms and tools for parallel processing, adopts Hadoop-most adept batch calculation, iterative calculation represented by various machine learning algorithms, stream calculation, SQL (structured query language) relational query, interactive ad hoc query and the like, and realizes the technologies of data fusion, statistics, offline analysis, online analysis, data mining and the like.
Hadoop is currently used for processing and offline analysis of mass data, and has irreplaceable advantages in scalability, robustness, computational performance and cost. Hadoop is used for processing large-scale data through a distributed processing framework of a MapReuce unit, and the flexibility is very good.
The data warehouse unit adopts Hive, which is a data warehouse unit of Hadoop, and promotes data summarization, ad hoc query and large-scale data set analysis.
Data mining adopts Mahout, which is an extensible machine learning and data mining library, and supports 4 main use cases: recommending mining, aggregating, classifying and frequently mining item sets.
The rule engine adopts Drools, which is an inference engine, and matches rules from a rule knowledge base according to the existing facts, processes the rules with conflicts, and executes the rules which are finally screened. The rule engine can liberate complex and changeable rules from hard codes and store the rules in a file in the form of rule scripts, so that the change of the rules can be immediately effective in an online environment without modifying codes and restarting a machine.
The data calculation module supports a wide range of applications, including traffic statistics, service recommendation, trend analysis, user behavior analysis, data mining, offline analysis, online analysis, ad hoc query, and the like.
in order to further optimize the technical scheme, the data application module adopts J2ee and ajax technologies to realize the application function based on the WEB interface, and establishes business logic to package business objects and business services, and the application services realize the business logic in a centralized manner. In this way, business logic is implemented outside of the business objects, which can reduce coupling between the business objects. The use of application services enables the encapsulation of higher-level-of-abstraction business logic in a separate component that calls underlying business objects and business services. The application layer has the main functions of: the four-network collaborative accurate marketing support, user behavior track extraction and scene production, and outdoor advertisement accurate marketing.
In order to further optimize and optimize the above technical solution, the platform management and control module includes: the system comprises a cluster management unit, a host management unit, a user management unit and a cluster log management unit; the cluster management unit is connected with the data calculation module; the host management unit is connected with a host node; the user management unit manages the platform users; the cluster log management unit is respectively connected with the data acquisition module, the data storage module, the data calculation module and the data application module.
The platform management and control module realizes the following functions:
1) visual management of big data platform
administration and configuration is implemented using a cloudera manager. The Cloudera Manager is a component for facilitating installation and monitoring management of services related to Hadoop and other big data processing in a cluster, and greatly simplifies installation configuration management of services such as a host, Hadoop, Hive, Spark and the like in the cluster.
The Cloudera manager provides a visual management interface;
the Cloudera manager provides the cluster management function;
The Cloudera manager provides the functions of host management, application authorization and the like;
The Cloudera manager provides a cluster management user management function;
The Cloudera manager provides a cluster log management function;
2) Big data platform configuration management
the big data platform provides the installation, parameter configuration and management functions of the Hadoop cluster.
can provide the functions of components such as HDFS, Hbase, MapRdubce, Hive, Zookeeper and the like.
The installation and deployment operations are supported to be carried out in a guide mode, and a system administrator can complete the installation and deployment tasks only by carrying out a small amount of input according to the prompt of the guide.
And HA automatic deployment of the main node is supported.
supporting the automatic installation and deployment tasks of more than 300 nodes.
The system configuration information is added, deleted, modified, searched and the like, and each operation of an administrator needs to be recorded in a log.
The dynamic adding and deleting functions of the system nodes are supported;
Supporting cluster configuration of the heterogeneous servers and supporting configuration tuning of operation resources under the heterogeneous servers.
3) Big data platform cluster monitoring
The big data platform supports visual monitoring and alarming of all cluster resources in the universal Hadoop system and supports unified monitoring of a plurality of clusters. And the multi-level and multi-dimensional visual monitoring of the Hadoop cluster is realized through a WEB interface tool. The multi-level refers to five levels of a cluster level, a service level, a node level, a process level and a job level. The multi-dimension refers to multiple dimensions of CPU occupancy rate, memory capacity, occupancy rate and occupancy rate, disk capacity or HDFS capacity and occupancy rate, disk I/O flow rate and occupancy rate, network bandwidth and occupancy rate.
The method supports visual display of storage and computing resources of each node of the cluster, such as a rack, a network topological graph, a network segment, server configuration and the like;
the method supports the visual display of the resource use condition of each node of the cluster, such as the number of data blocks, the running number of Job and the health state of the node, and supports the periodic health condition inspection.
and visual monitoring on system services of each node is supported, such as distributed file units, MapReduce, Hbase, Zookeeper and the like.
The method supports visual monitoring of the operation state (success, failure, cancellation and the like) of each node, and captures corresponding log information.
The monitoring content comprises the following steps:
The host node: host name, idle CPU percentage, CPU percentage occupied by user space, user process space, CPU percentage occupied by prioritized processes, CPU percentage occupied by kernel space, cache memory size, free memory size, shared memory size, total memory for kernel cache, total amount of switch partitions, total size of disks, remaining disk space, total number of processes running, total number of processes, system average load per minute, system average load per 5 minutes, system average load per 15 minutes, incoming packets per second, outgoing packets per second, network ingress bandwidth speed, network egress bandwidth speed.
The distributed file units comprise total number of file system blocks, total size, total number of files, residual amount, damaged blocks, blocks needing to be copied, JVM thread state and the like.
MapReduce refers to the task running condition, the task occupying resource condition and the like.
HBASE: the request times of the cluster and the RegionServer, the number of RegionServer registers and the like.
And monitoring and recovering the software and hardware faults of the cluster are supported, such as a node downtime restart mechanism and a restart mechanism of abnormally terminated service processes.
When a fault or an abnormality occurs, alarm information is displayed at the important position.
When the fault or anomaly is resolved, the alarm is automatically dismissed from the user interface and the alarm record may be retrieved from the historical information.
4) big data platform safety management (permission isolation)
The big data platform supports the authority management of system users and the safety certification of nodes. The role is established according to the combination of different organization structures, operation authorities, data authorities and the like, and flexible configuration management is realized. Each user can only see the execution of the authorized application. Before the user performs various operations on the job, the user should judge whether the operation authority is provided through a unified authentication service. And supporting a file and directory security control model similar to Linux for files stored in the distributed file unit. The system supports access authentication and security control on a client side accessed to the Hadoop system, and supports a network connection Kerberos security authentication mechanism. The method provides the security access control for the Hadoop system, and can perform access interruption on illegal access by formulating a security policy.
SSL encryption: with different certificate policies, allowing SSL clients to securely connect to servers can be used at the cluster, using trusted certificates or the issuance of certificates by trusted authorities. And the setting of the certificate requirements depends on the configuration policy for the certificate. The general strategies are: certificate per host (one-for-one), Certificate for multiple hosts (multiple-for-one), Wildcard Certificate (generic Certificate). While SSL must be enabled for all core Hadoop services (HDFS, MapReduce, yann, etc.).
Kerberos authentication: kerberos uses the needleha-scheduler protocol as its basis. It uses a single logic consisting of two separate logic parts: the authentication server and the ticket authority server constitute a "trusted third party," termed a Key Distribution Center (KDC). Kerberos works on the basis of "tickets" that are used to prove the identity of a user. The KDC holds a key database; each network entity, whether a client or a server, shares a set of keys known only to itself and the KDC. The content of the key is used to prove the identity of the entity. For communication between two entities, the KDC generates a session key that is used to encrypt the information of the interaction between them.
The Kerberos authentication mechanism causes the nodes in the cluster to become nodes that they acknowledge and trust. It puts the authenticated key on the trusted node in advance at cluster deployment. When the cluster runs, the nodes in the cluster are authenticated by using the secret key. Only authenticated nodes can be used normally. Nodes attempting to spoof cannot communicate with nodes within the cluster because they do not have previously obtained key information. The problem of maliciously using or tampering the Hadoop cluster is prevented, and the reliability and the safety of the Hadoop cluster are ensured.
sentry service: sentry is a Hadoop open source component issued by Cloudera, is a Hadoop authorization module, and in order to provide accurate access level for correct users and application programs, Sentry provides fine-grained level, role-based authorization and multi-tenant management mode, and by introducing Sentry, Hadoop can meet the RBAC (role-based access control) requirements of enterprises and government users in the following aspects:
And (4) security authorization: sentry can control data access and provide data access privileges to authenticated users.
Fine-grained access control: sentry supports fine-grained Hadoop data and metadata access control.
Role-based management: sentry simplifies management by role-based authorization, and you can easily grant different privilege levels to access the same dataset to multiple groups. For example, for a particular data set, you can assign anti-fraud groups the privilege to view all columns, give analysts the right to view non-sensitive or non-PII (persistent identification information) columns, and give data receiving streams the right to insert new data into the HDFS.
Multi-tenant management: sentry allows setting permissions for different data sets delegated to different administrators. In the Hive/Impala case, Sentry may perform rights management at the database/schema level.
Unifying the platform: sentry provides a unified platform for ensuring data security, and uses the existing Hadoop Kerberos to realize security authentication. Meanwhile, the same Sentry protocol may be used when accessing data via Hive or Impala. In the future, the Sentry protocol will be extended to other components.
sentry architecture: the authorization core layer of Sentry is essentially divided into two parts, the tie layers (high bindings and Impala bindings) and the core authorization providers (Policy engine and Policy associations). The binding layer provides a pluggable interface that enables dialog with the protocol engine. Policy engine cooperates with bindings to evaluate access requests and, if access is allowed, to access the underlying data through Policy associations.
the cluster log management module is connected with the data acquisition module, the data storage module, the data calculation module and the data application module; the log information comprises a timestamp, a level, user and module information and a log text. And the system operation log and the audit log are supported to be recorded and viewed. And the recording, query and presentation of the system running log and the user access operation log are supported. And the recording and viewing of the running logs of HDFS, MapReduce, HBase, Hive and Zookeeper are supported. And supporting system operation log grading, including INFO, DEBUG, WARN, ERROR, FATAL and the like. And recording and viewing of system audit logs of HDFS, MapReduce and Hive are supported.
In order to further optimize the technical scheme, the system also comprises a data security module; the data security module comprises an identity verification and authorization unit; the identity authentication and authorization unit is connected with the user management unit.
Further, authentication and authorization are two core processes that are typically involved in attempting to interact with an IT system. These core flows can ensure the security of the system in the face of attacks:
Authentication is the process of confirming that system project affiliates have their claimed identity. In the human world, project affiliates are typically authenticated by providing a username and password pair. There are a number of advanced, sophisticated mechanisms available to perform authentication; these mechanisms may include biometric authentication, multi-factor authentication, and the like. The object (person or particular subsystem) being authenticated is often referred to as the principal.
the authorization mechanism is used to determine which operations a principal is allowed to perform on the system or which resources the principal has access to. The authorization flow is typically triggered after the authentication flow. Typically, when a principal passes authentication, information is provided about the principal to help determine which operations the principal can and cannot perform.
In monolithic applications, authentication and authorization are simple and common because they are actually handled by the application; there is no need to have advanced mechanisms to provide a more secure user experience. However, in microservice architectures with typical distributed features, more advanced modes must be employed to avoid repeated interception between credential-providing service calls. You want to be able to verify the identity of a principal one at a time. This identity simplifies the authentication and authorization process, utilizes automation functions, and improves scalability.
Further, still include: when the security policy is established for the micro-service architecture, the inter-service identity authentication and authorization are adopted:
Trust boundary: containerization techniques (such as Docker) are used to reduce risk. The many functions provided by Docker allow developers to flexibly maximize the security of microservices and entire applications at different levels. In building service code, the developer is free to use the penetration testing tool to perform stress testing on any part of the build cycle. Because the source code that builds a Docker image has been explicitly described in declarative form in a Docker distribution component (Docker and Docker composite files), developers can easily handle the image supply chain and enforce security policies when needed. In addition, services can be easily consolidated by placing them into a Docker container, making them immutable, adding a strong safeguard to the service.
Further, by employing a software defined infrastructure, private networks can be quickly created and configured using scripting languages, and strong security policies can be enforced at the network level.
The SSO is used for internal interaction between services in the micro-service architecture, the method can use the existing infrastructure, can also simplify the access control of the services, and integrates all the access control operations in one enterprise access directory server.
Hash operation message verification code (HMAC) based on HTTP
in HMAC, the request content is hashed with a private key, and the resulting hash value is sent with the request. The other end of the communication then recreates the hash value using its copy of the private key and the received request content. If the hash values match, the request is allowed to pass. If the request has been tampered with, the hash values do not match and the other end knows and reacts appropriately.
Managing keys using special purpose services
To eliminate the credential management overhead in a distributed model such as a microservice architecture and benefit from the high security of the constructed system, one option is to use a comprehensive key management tool. This tool allows for storage, dynamic leasing, updating, and revocation of keys (e.g., passwords, API keys, and certificates). These operations are very important in microservice due to the automation principles specified in microservice.
It is to be understood that: although theoretically there is no data encryption method that cannot be compromised, there are still some mature, proven, and commonly used mechanisms (e.g., AES-128 or AES-256, etc.). These mechanisms are used when security considerations are made, rather than creating their own methods internally. In addition, libraries used to implement these mechanisms are updated and patched in time.
the key management tool: it is a first practice not to store keys and data in the same location. The key management complexity is not violated by the flexibility principle of the microservice architecture. Attempts have been made to use comprehensive tools with microservice design concepts that do not disrupt your continuous integration and continuous delivery pipeline.
the security policy is adjusted for the business needs: security policies are developed based on business needs and continually adjusted as strategic goals may change constantly, as may the techniques involved in the solution.
Establishment of big data security guarantee system
1. Security architecture
the safety guarantee system comprises a safety protection system and a safety management system. Wherein the safety protection system comprises: network security, system security, application security, and data security; the safety management system comprises a safety policy management specification, a safety organization model and a safety regulation and regulation system.
2. safety protection system
The network security protection system mainly provides a network security protection means necessary for a data application access mode, and part of applications can adopt a technical means of an deficiency-type Virtual Private Network (VPN) to ensure the safe and reliable transmission of shared exchange data. Key application and encrypted data of a network layer security protection platform; the data transmission efficiency is enhanced, and the rapid creation of a new safe application environment is supported to meet the requirements of a new application process. The method mainly comprises four major functions of boundary protection, area protection, node protection and high network availability.
The system operation safety system mainly comprises system operation safety, system information safety design, a trust service system and authority management design, and the safety of the system is ensured from each level.
The data security system mainly realizes the security of data exchange through four functions of data security encryption transmission (VPN), security guarantee of a data exchange process, data exchange interface security design and data auditing and protection.
3. security management system
In the construction of the safety guarantee system, all potential safety hazards are difficult to prevent only by technical means, and a corresponding safety management system needs to be established. The safety management is the core link of the whole safety construction. An effective security organization can guarantee the simplicity and high efficiency of daily security guarantee under the guidance of security strategies and the guarantee of security technologies and security products.
The safety management system mainly comprises: security policies, security organizations, and security regimes. In order to strengthen the security management of the client network and ensure the security of key facilities, the construction of a security management system should be strengthened.
The invention relates to a big data platform applied to an intelligent park, which is mainly characterized in that the technical architecture of the whole project adopts a mixed mode of Hadoop + MPP + memory database, and simultaneously adopts Storm technology to support the acquisition and calculation of real-time data, thereby realizing a high-concurrency, telescopic and high-performance big data system. And the data sharing and processing capabilities of databases, messages, files and the like in various modes are supported. Meanwhile, MapReduce operation, SQL operation, flow calculation and memory calculation are supported. The rule engine is used for reducing the complexity of components for realizing complex business logic, increasing the flexibility of marketing scene configuration, reducing the maintenance cost of an application program and enhancing the expandability of the program. The scheme has good expansibility, can enhance the processing capacity of the cluster in a horizontal expansion mode in the future and meets the requirement of service development.
By adopting a big data technology Hadoop and a distributed architecture, the system has no single-point fault, high flexibility and high availability. Indexing and searching of large amounts of information can be done in near real-time, enabling billions of files and PB-level data to be searched quickly in real-time, while providing an all-around option that can be customized for almost every aspect of the engine.
Data acquisition tasks are executed in parallel through a MapReduce technology, captured data are subjected to preliminary arrangement and then submitted to a data storage layer, and then structured information extraction is carried out through a data processing layer for data mining analysis.
The distributed database is adopted to store the original content of the webpage, and the distributed database is constructed on the basis of Hadoop + Hbase, so that an online real-time random read-write framework is realized. The device has extremely strong horizontal flexibility, supports billions of rows and millions of columns, and supports real-time data acquisition.
The platform runs on a cluster formed by common commercial hardware, adopts a distributed architecture, can be expanded to thousands of machines, has a fault-tolerant mechanism, and cannot cause data loss or failure of a computing task when part of machine nodes break down. The method has the advantages of high availability, capability of rapidly performing fault transfer when a node fails, high flexibility, and capability of horizontally expanding, improving data, storage capacity and calculating speed by simply increasing machines.
Simultaneously through the protection of technical safety system and the personnel safety protection under the line in big data platform's safety guarantee system, both combine, break through originally only technical safety protection and the potential safety hazard problem that exists, provide higher safety guarantee for being applied to the big data platform in wisdom garden. The safety guarantee system comprises a safety protection system and a safety management system. The safety protection system mainly realizes safety guarantee through the technique, includes: network security, system security, application security, and data security; the safety management system is mainly characterized in that a safety organization meeting is established under the guidance of a leader, a safety protection system is formulated, and the data safety of a big data platform is realized, wherein the safety management system comprises a safety strategy management standard, a safety organization model and a safety regulation system.
the embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. the utility model provides a be applied to big data platform in wisdom garden which characterized in that includes: the system comprises a data acquisition module, a data storage module, a data calculation module, a data application module and a platform management and control module;
The data acquisition module is connected with the data storage module and stores acquired data into the data storage module;
The data calculation module is connected with the data storage module and is used for processing data in the data storage module;
The data application module is connected with the data calculation module to establish a business logic encapsulation business object and a business service;
The platform management and control module is used for connecting and monitoring the data acquisition module, the data storage module, the data calculation module and the data application module.
2. The big data platform applied to the intelligent park according to claim 1, wherein the data acquisition module comprises: the data extraction unit, the data input end and the data output end; the data input end is connected with a data source; the data extraction unit is connected with the data input end and classifies the acquired data and transmits the data to the data storage module.
3. The big data platform applied to the intelligent park according to claim 1, wherein the data storage module comprises a distributed file unit, a distributed database and a distributed cache unit; the distributed file unit is provided with an uploading channel and a downloading channel and performs data interaction with the distributed database; and the distributed cache unit is connected with the distributed database for cache processing.
4. the big data platform applied to the intelligent park according to claim 1, wherein the data calculation module comprises: the system comprises a MapReuce unit, a data warehouse unit, a machine learning and data mining base and a rule knowledge base; the data warehouse unit converts a data file and runs the data file on the MapReuce unit; the machine learning and data mining library stores a machine learning field classic algorithm; the rule knowledge base matches rules through a rule engine.
5. the big data platform applied to the intelligent park according to claim 1, wherein the platform management and control module comprises: the system comprises a cluster management unit, a host management unit, a user management unit and a cluster log management unit; the cluster management unit is connected with the data calculation module; the host management unit is connected with a host node; the user management unit manages the platform users; the cluster log management unit is respectively connected with the data acquisition module, the data storage module, the data calculation module and the data application module.
6. The big data platform applied to the intelligent park according to claim 5, further comprising a data security module; the data security module comprises an identity verification and authorization unit; the identity authentication and authorization unit is connected with the user management unit.
7. an operation method for a big data platform of an intelligent park is characterized by comprising the following specific steps:
The method comprises the following steps: the data acquisition module extracts data from a data source, processes the data and stores the data, and the accessed data is processed uniformly through file decompression, file merging and splitting, file level verification, data level verification, cleaning, conversion, association and summarization and is loaded to the data storage module;
Step two: the data storage module adopts a distributed scheme and utilizes Hadoop to realize semi-structured and unstructured data processing; processing high-quality structured data by using MPP (maximum power point tracking), and storing the data;
Step three: transmitting the stored data to a data warehouse unit of a data calculation module, converting a data file and operating on the MapReuce unit; the MapReduce unit transmits the information to a data application module for carrying out flow statistics, service recommendation, trend analysis, user behavior analysis, data mining, offline analysis, online analysis and ad hoc query;
Step four: and the platform control module is used for monitoring the data acquisition module, the data storage module, the data calculation module and the data application module.
CN201910774850.XA 2018-12-12 2019-08-21 Big data platform applied to intelligent park and operation method Active CN110543464B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201822083697X 2018-12-12
CN201822083697 2018-12-12

Publications (2)

Publication Number Publication Date
CN110543464A true CN110543464A (en) 2019-12-06
CN110543464B CN110543464B (en) 2023-06-23

Family

ID=68731627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910774850.XA Active CN110543464B (en) 2018-12-12 2019-08-21 Big data platform applied to intelligent park and operation method

Country Status (1)

Country Link
CN (1) CN110543464B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995869A (en) * 2019-12-23 2020-04-10 杭州雷数科技有限公司 Machine data collection method, device, equipment and medium
CN111400326A (en) * 2020-02-28 2020-07-10 深圳市赛为智能股份有限公司 Smart city data management system and method thereof
CN111541542A (en) * 2019-12-31 2020-08-14 远景智能国际私人投资有限公司 Request sending and verifying method, device and equipment
CN111666559A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 Data bus management method and device supporting authority management, electronic equipment and storage medium
CN111708645A (en) * 2020-06-12 2020-09-25 北京思特奇信息技术股份有限公司 Event processing method and system based on stream processing
CN111861016A (en) * 2020-07-24 2020-10-30 北京合众伟奇科技有限公司 Method and system for summarizing, analyzing and managing predicted electricity selling amount of power grid
CN111950809A (en) * 2020-08-26 2020-11-17 华北电力大学(保定) Master-slave game-based hierarchical and partitioned optimized operation method for comprehensive energy system
CN112181940A (en) * 2020-08-25 2021-01-05 天津农学院 Method for constructing national industrial and commercial big data processing system
CN112508677A (en) * 2020-11-06 2021-03-16 无锡艺界科技有限公司 Financial system based on big data wind accuse
CN112579701A (en) * 2020-12-15 2021-03-30 中国建设银行股份有限公司 Data processing method and device
CN112711399A (en) * 2020-12-29 2021-04-27 华润水泥投资有限公司 Audit application platform based on containerization design
CN112906907A (en) * 2021-03-24 2021-06-04 成都工业学院 Method and system for hierarchical management and distribution of machine learning pipeline model
CN113254548A (en) * 2021-06-04 2021-08-13 深圳市智慧空间平台技术开发有限公司 Method for integrating multidimensional data of park
CN113378219A (en) * 2021-06-07 2021-09-10 北京许继电气有限公司 Processing method and system of unstructured data
CN113810272A (en) * 2021-09-29 2021-12-17 周明升 Wisdom garden data access gateway
CN114253519A (en) * 2022-03-01 2022-03-29 中国电子信息产业集团有限公司第六研究所 Wisdom garden security protection management system and electronic equipment
CN115225730A (en) * 2022-07-05 2022-10-21 北京赛思信安技术股份有限公司 High-concurrency offline data packet analysis method supporting multiple tasks
CN117057126A (en) * 2023-08-11 2023-11-14 上海电气集团智慧能源科技有限公司 Energy-saving management and control system and energy-saving management and control method for park
CN118092939A (en) * 2024-02-29 2024-05-28 南京祝华信息科技有限公司 Intelligent park system based on cloud platform

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
US20160103877A1 (en) * 2014-10-10 2016-04-14 International Business Machines Corporation Joining data across a parallel database and a distributed processing system
CN105631764A (en) * 2015-12-31 2016-06-01 国网电力科学研究院武汉南瑞有限责任公司 Smart power grid big data application system orienting smart city
CN106126601A (en) * 2016-06-20 2016-11-16 华南理工大学 A kind of social security distributed preprocess method of big data and system
CN106547882A (en) * 2016-11-03 2017-03-29 国网重庆市电力公司电力科学研究院 A kind of real-time processing method and system of big data of marketing in intelligent grid
US20170220692A1 (en) * 2014-05-09 2017-08-03 Paul Greenwood User-Trained Searching Application System and Method
US20170270165A1 (en) * 2016-03-16 2017-09-21 Futurewei Technologies, Inc. Data streaming broadcasts in massively parallel processing databases
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform
CN107679192A (en) * 2017-10-09 2018-02-09 中国工商银行股份有限公司 More cluster synergistic data processing method, system, storage medium and equipment
CN107945086A (en) * 2017-11-17 2018-04-20 广州葵翼信息科技有限公司 A kind of big data resource management system applied to smart city
CN108197261A (en) * 2017-12-30 2018-06-22 北京通途永久科技有限公司 A kind of wisdom traffic operating system
US20180181621A1 (en) * 2016-12-22 2018-06-28 Teradata Us, Inc. Multi-level reservoir sampling over distributed databases and distributed streams

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220692A1 (en) * 2014-05-09 2017-08-03 Paul Greenwood User-Trained Searching Application System and Method
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
US20160103877A1 (en) * 2014-10-10 2016-04-14 International Business Machines Corporation Joining data across a parallel database and a distributed processing system
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN105631764A (en) * 2015-12-31 2016-06-01 国网电力科学研究院武汉南瑞有限责任公司 Smart power grid big data application system orienting smart city
US20170270165A1 (en) * 2016-03-16 2017-09-21 Futurewei Technologies, Inc. Data streaming broadcasts in massively parallel processing databases
CN106126601A (en) * 2016-06-20 2016-11-16 华南理工大学 A kind of social security distributed preprocess method of big data and system
CN106547882A (en) * 2016-11-03 2017-03-29 国网重庆市电力公司电力科学研究院 A kind of real-time processing method and system of big data of marketing in intelligent grid
US20180181621A1 (en) * 2016-12-22 2018-06-28 Teradata Us, Inc. Multi-level reservoir sampling over distributed databases and distributed streams
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform
CN107679192A (en) * 2017-10-09 2018-02-09 中国工商银行股份有限公司 More cluster synergistic data processing method, system, storage medium and equipment
CN107945086A (en) * 2017-11-17 2018-04-20 广州葵翼信息科技有限公司 A kind of big data resource management system applied to smart city
CN108197261A (en) * 2017-12-30 2018-06-22 北京通途永久科技有限公司 A kind of wisdom traffic operating system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任菁: "信息化提高政府管理效能——‘互联网+政务、民生、党建’的茂名经验", 《紫光阁》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995869B (en) * 2019-12-23 2022-11-11 杭州雷数科技有限公司 Machine data collection method, device, equipment and medium
CN110995869A (en) * 2019-12-23 2020-04-10 杭州雷数科技有限公司 Machine data collection method, device, equipment and medium
CN111541542A (en) * 2019-12-31 2020-08-14 远景智能国际私人投资有限公司 Request sending and verifying method, device and equipment
CN111541542B (en) * 2019-12-31 2023-09-15 远景智能国际私人投资有限公司 Request sending and verifying method, device and equipment
CN111400326A (en) * 2020-02-28 2020-07-10 深圳市赛为智能股份有限公司 Smart city data management system and method thereof
CN111400326B (en) * 2020-02-28 2023-09-12 深圳市赛为智能股份有限公司 Smart city data management system and method thereof
CN111708645A (en) * 2020-06-12 2020-09-25 北京思特奇信息技术股份有限公司 Event processing method and system based on stream processing
CN111666559A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 Data bus management method and device supporting authority management, electronic equipment and storage medium
CN111861016A (en) * 2020-07-24 2020-10-30 北京合众伟奇科技有限公司 Method and system for summarizing, analyzing and managing predicted electricity selling amount of power grid
CN111861016B (en) * 2020-07-24 2024-03-29 北京合众伟奇科技股份有限公司 Summarized analysis management method and system for predicted sales amount of power grid
CN112181940A (en) * 2020-08-25 2021-01-05 天津农学院 Method for constructing national industrial and commercial big data processing system
CN111950809B (en) * 2020-08-26 2022-03-25 华北电力大学(保定) Master-slave game-based hierarchical and partitioned optimized operation method for comprehensive energy system
CN111950809A (en) * 2020-08-26 2020-11-17 华北电力大学(保定) Master-slave game-based hierarchical and partitioned optimized operation method for comprehensive energy system
CN112508677A (en) * 2020-11-06 2021-03-16 无锡艺界科技有限公司 Financial system based on big data wind accuse
CN112579701A (en) * 2020-12-15 2021-03-30 中国建设银行股份有限公司 Data processing method and device
CN112711399A (en) * 2020-12-29 2021-04-27 华润水泥投资有限公司 Audit application platform based on containerization design
CN112906907A (en) * 2021-03-24 2021-06-04 成都工业学院 Method and system for hierarchical management and distribution of machine learning pipeline model
CN112906907B (en) * 2021-03-24 2024-02-23 成都工业学院 Method and system for layering management and distribution of machine learning pipeline model
CN113254548A (en) * 2021-06-04 2021-08-13 深圳市智慧空间平台技术开发有限公司 Method for integrating multidimensional data of park
CN113378219A (en) * 2021-06-07 2021-09-10 北京许继电气有限公司 Processing method and system of unstructured data
CN113378219B (en) * 2021-06-07 2024-05-28 北京许继电气有限公司 Unstructured data processing method and system
CN113810272A (en) * 2021-09-29 2021-12-17 周明升 Wisdom garden data access gateway
CN114253519A (en) * 2022-03-01 2022-03-29 中国电子信息产业集团有限公司第六研究所 Wisdom garden security protection management system and electronic equipment
CN114253519B (en) * 2022-03-01 2022-06-24 中国电子信息产业集团有限公司第六研究所 Wisdom garden security protection management system and electronic equipment
CN115225730A (en) * 2022-07-05 2022-10-21 北京赛思信安技术股份有限公司 High-concurrency offline data packet analysis method supporting multiple tasks
CN115225730B (en) * 2022-07-05 2024-05-31 北京赛思信安技术股份有限公司 High concurrency offline data packet analysis method supporting multitasking
CN117057126A (en) * 2023-08-11 2023-11-14 上海电气集团智慧能源科技有限公司 Energy-saving management and control system and energy-saving management and control method for park
CN118092939A (en) * 2024-02-29 2024-05-28 南京祝华信息科技有限公司 Intelligent park system based on cloud platform

Also Published As

Publication number Publication date
CN110543464B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110543464B (en) Big data platform applied to intelligent park and operation method
US11882141B1 (en) Graph-based query composition for monitoring an environment
US11770398B1 (en) Guided anomaly detection framework
US20220247769A1 (en) Learning from similar cloud deployments
US11792284B1 (en) Using data transformations for monitoring a cloud compute environment
US11895135B2 (en) Detecting anomalous behavior of a device
CN112765245A (en) Electronic government affair big data processing platform
US20240080329A1 (en) Cloud Resource Risk Scenario Assessment and Remediation
US20220200869A1 (en) Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments
Sicari et al. Security&privacy issues and challenges in NoSQL databases
US20170279720A1 (en) Real-Time Logs
US20220303295A1 (en) Annotating changes in software across computing environments
US20220360600A1 (en) Agentless Workload Assessment by a Data Platform
US20220294816A1 (en) Ingesting event data into a data warehouse
US11818156B1 (en) Data lake-enabled security platform
CN112527873B (en) Big data management application system based on chain number cube
US11973784B1 (en) Natural language interface for an anomaly detection framework
US20240106846A1 (en) Approval Workflows For Anomalous User Behavior
US20230319092A1 (en) Offline Workflows In An Edge-Based Data Platform
CN118312626B (en) Data management method and system based on machine learning
CN112511515B (en) Chain number cube for data chaining
Shahin et al. Big data platform privacy and security, a review
WO2023034419A1 (en) Detecting anomalous behavior of a device
WO2023038957A1 (en) Monitoring a software development pipeline
Gharajeh Security issues and privacy challenges of NoSQL databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant