CN112214453B - Large-scale industrial data compression storage method, system and medium - Google Patents

Large-scale industrial data compression storage method, system and medium Download PDF

Info

Publication number
CN112214453B
CN112214453B CN202010961819.XA CN202010961819A CN112214453B CN 112214453 B CN112214453 B CN 112214453B CN 202010961819 A CN202010961819 A CN 202010961819A CN 112214453 B CN112214453 B CN 112214453B
Authority
CN
China
Prior art keywords
data
format
avro
storage
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010961819.XA
Other languages
Chinese (zh)
Other versions
CN112214453A (en
Inventor
高响
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Original Assignee
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weiyi Intelligent Manufacturing Technology Co ltd, Changzhou Weiyizhi Technology Co Ltd filed Critical Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Priority to CN202010961819.XA priority Critical patent/CN112214453B/en
Publication of CN112214453A publication Critical patent/CN112214453A/en
Application granted granted Critical
Publication of CN112214453B publication Critical patent/CN112214453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a large-scale industrial data compression storage method, a system and a medium, comprising the following steps: step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation; step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in; and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data. The invention can define the conversion chain and the compression and storage format for any type of data, and greatly improves the data processing speed and the data compression ratio of the computing platform.

Description

Large-scale industrial data compression storage method, system and medium
Technical Field
The invention relates to the technical field of data compression and storage, in particular to a large-scale industrial data compression and storage method, system and medium.
Background
With the rapid development of new infrastructure, more and more traditional industrial enterprises are beginning to increase productivity by means of internet technology, with data being the most critical. In the traditional internet, large data processing has more and more data, and many enterprises can back up 2 pieces of data. This results in wasted disks.
Patent document CN108304472A (application No. 201711455790.2) discloses a data compression storage method and a data compression storage apparatus, the data compression method including the steps of: a segmentation step, in which original data is segmented into a plurality of fields; and a compression step, based on different data contents, adopting different compression strategies to compress different fields and storing compressed data. According to the data compression storage method and the data compression storage device, different compression methods can be adopted in consideration of different data contents, the data compression efficiency can be effectively improved, and the data compression rate is obviously improved compared with the data compression tools such as the general GZIP and SNAPPY.
Disclosure of Invention
In view of the defects in the prior art, the invention aims to provide a large-scale industrial data compression storage method, a large-scale industrial data compression storage system and a large-scale industrial data compression storage medium.
The large-scale industrial data compression and storage method provided by the invention comprises the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
Preferably, the step 1 comprises:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: and selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
Preferably, said step 2 of converting the data into an Avro format comprises: the industrial data maps the Avro formatted set of database objects and generates temporary Avro formatted data.
Preferably, the industrial data mapping Avro-formatted database object set comprises the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: and configuring an interceptor component of the data acquisition system, intercepting data, preloading a database object set in an Avro format during data conversion, and injecting the database object set into a header file.
Preferably, the industrial data generating the temporary Avro format data includes the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
Preferably, the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
Preferably, the step of generating the data set partition policy JSON file includes:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
Preferably, the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the method adopts the Flume as a data pipeline to connect each data source of the industrial data platform, and adopts Morphlines to reduce the time and energy required for constructing and changing the ETL flow processing application program of the data, only needs to pay attention to business logic, carries out configuration operation through configuration files, and can extract, convert and load the data into a distributed storage system such as an HDFS (Hadoop distributed file system) without writing complex code programs;
2. the problem that JSON data can not be directly converted into a request format when stored in hdfs is solved by adopting a DataSet data set, the DataSet specifies the data formats to be a column-type storage format and a snapshot compression format when the data set is created, the compression ratio of the size of snapshot compressed data reaches 30% -40%, the compression and decompression rates reach 180M/1s and 430M/1s respectively, and the landing efficiency of the data and the utilization rate of a disk are greatly improved;
3. according to the method, data of messages such as kafka and the like of an industrial data platform are docked through the flash, the data are processed and landed through the flash, are stored into a queue format and are compressed by snap, only one copy of the data is stored, the consistency of the data is guaranteed through flash, when the data are landed, the flash can perform rollback operation through a self transaction mechanism, and a code writing mode is not adopted, so that the working time of developers is greatly reduced, the working efficiency is improved, and the resource utilization rate is increased.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
referring to fig. 1, the large-scale industrial data compression and storage method provided by the invention comprises:
and (3) industrial data extraction: configuring different FlumeSource according to different data sources, and realizing configurable universal setting by performing interface operation on the FlumeSource configuration;
data temporary preload is Avro step: defining a conversion chain, configuring Schame of an Avro data format, and configuring Morpthline to temporarily convert different types of data formats into data of the Avro format;
create Dataset step: creating a data set with the sequence as a storage format in Hdfs through the Dataset, compressing data by a GPL protocol, and declaring that the final landing data is in the sequence format and a snappy compression format;
the combined operation steps are as follows: the steps are connected and operated through flash configuration, and finally, data are stored in a distributed storage system in a queue format through compression and preprocessing of a large amount of data in different formats.
The step of universal interfacing configuration FlumeSource comprises the following steps:
step A1: data stored by the industrial data processing platform is classified according to data format and storage media, including structured data and unstructured data, and the storage media include Kafka and Rabbitmq data storage media.
Step A2: through a Flume configuration management tool, a corresponding Flume resource is selected, Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
The step of temporarily preloading data into Avro comprises the following steps: the industrial data maps Schame of Avro data and generates temporary Avro format data.
The Schame step of mapping the industrial data to Avro data comprises the following steps:
step B1: by configuring the fields that need to be output and the input fields to define a transformation chain, the transformation chain can use any type of data from any type of data source.
Step B2: the flumeInterceptor is configured to intercept data before streaming to the next step, preload the AvroSchame when the data is transformed by extraction, and inject a pattern into the header file so that the AvroEventSerrializer can pick it up.
The step of generating temporary Avro-formatted data from the industrial data comprises:
step C1: the fluorine receives the industrial device log events and sends them to the fluorine morphinesink, which converts each fluorine event into a record and passes it to the readLine command through the pipe. The readLine command extracts log lines and data pipes, uses regular expression pattern matching, sends one record per line in the input stream, and the line is put into a message output field as a character string.
Step C2: and configuring a Flume interceptor, intercepting the data after the step B2, generating structured or unstructured data into temporary Avro-format data by matching with Schame of Avro, and flowing the temporary Avro-format data into a FileChannel for further processing.
The creating Dataset step includes:
step D1: a dataset partition JSON file is generated, a dataset being a collection of records, similar to a relational database table. The records are similar to the table rows, but the columns may contain not only strings or numbers, but also nested data structures, such as lists, maps and other records, create a create command to partition primarily using datasets, may define partitioning policies such as date _ time: year, date _ time: month, date _ time: day by year, month and day, and partition data _ time by month and day. The partitions define logical partitions for data storage. Time-based queries are most often used to process data. When using data after 7/14/2020, Hadoop only needs to access the data/year-2020/month-7/day-14 stored in the partition. By using partitions corresponding to the most common queries, the application may run faster, increasing data computation efficiency and commit resource utilization.
Step D2: to create a data set, at least the URI and schema are required to define the data set. The data management platform creates or specifies a data set through a create command, mainly comprising url of the data set, a specified schedule and a partition field JSON, wherein the data storage is in a partial format, the schedule is defined in the step B2, and the partition JSON is generated in the step D1. The data set is identified by the URI. The created URI tells how and where to store the data. Dataset created using URI HDFS:/user/2020/7/14/then data is finally stored/user/2020/7/14/in the directory of HDFS. The created data set finally generates a metadata folder in the Hdfs, wherein a schema and a descriptor are arranged below the folder, and the descriptor file contain a compressed format of snap, a data format of request, a data storage path and a partition field.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
The invention realizes the following functions:
1) the problem that a large amount of codes need to be compiled and operation and maintenance deployment codes need to be solved by compiling the configuration file to define the data conversion process;
2) the data set for storing the data is created in advance, and the data is temporarily converted into the data in the avro format, so that the flow of processing the data by borrowing spark is solved, and the utilization rate of computing resources and the data processing flow are saved;
3) by presetting dataset partition fields and automatically partitioning according to field contents in data, the problems that data needs to be stored in an isolated mode among different enterprises and the subsequent data analysis and calculation efficiency are solved.
The invention carries out data access, data circulation and data storage through the configuration interface. The method has extremely high compression and storage efficiency, greatly improves the utilization rate of storage resources and computing resources, mostly adopts spark for data processing and storage in the mainstream technology of storing data in the request format, needs additional computing resources and data processing components, needs different code development and deployment aiming at different industrial data, and is complicated in maintenance and development. The calculation time of the same data sample is improved by about 5 times through the subsequent analysis and calculation of the data with the format.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (5)

1. A large-scale industrial data compression storage method is characterized by comprising the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: compressing data in an Avro format by a GPL protocol, wherein the compression format is snappy, creating a data set in a distributed file system, and the data set takes queue as a storage format, and storing the compressed data;
the method comprises the steps that through adopting the flash as a data pipeline to connect each data source of an industrial data platform, and through adopting Morphlins, the time required for constructing and changing an ETL (extract transform Loading) stream processing application program of data is reduced, only business logic needs to be concerned, configuration operation is carried out through a configuration file, and then the data is extracted, converted and loaded into a distributed storage system such as an HDFS (Hadoop distributed file system);
by adopting the DataSet data set, the DataSet specifies the data format as a column storage format and a snappy compression format when the data set is created;
docking data of a kafka message middleware of an industrial data platform through flash, processing the data to land only through the flash, storing the data into a queue format and compressing the data by using snap, only storing one copy of the data, ensuring the consistency of the data through flash filechannel, and when the data is in a land, performing rollback operation on the flash through a self transaction mechanism without a code writing mode;
the step 1 comprises the following steps:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector;
said step 2 converts the data into an Avro format, comprising: mapping the industrial data to an Avro-formatted database object set and generating temporary Avro-formatted data;
the industrial data mapping Avro format database object set comprises the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: configuring an interceptor component of the data acquisition system, intercepting data, pre-loading a database object set in an Avro format during data conversion, and injecting the database object set into a header file;
the industrial data generation method generates data in a temporary Avro format, and comprises the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
2. The large-scale industrial data compression storage method according to claim 1, wherein the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
3. The large-scale industrial data compression storage method according to claim 2, wherein the step of generating a data set partition strategy JSON file comprises:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
4. The large-scale industrial data compression storage method according to claim 2, wherein the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202010961819.XA 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium Active CN112214453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010961819.XA CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010961819.XA CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Publications (2)

Publication Number Publication Date
CN112214453A CN112214453A (en) 2021-01-12
CN112214453B true CN112214453B (en) 2021-10-01

Family

ID=74050285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010961819.XA Active CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Country Status (1)

Country Link
CN (1) CN112214453B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507013B (en) * 2021-02-07 2021-07-02 北京工业大数据创新中心有限公司 Industrial equipment data storage method and device
CN115017218B (en) * 2022-06-17 2024-01-30 中国电信股份有限公司 Processing method and device of distributed call chain, storage medium and electronic equipment
CN116719866B (en) * 2023-05-09 2024-02-13 上海银满仓数字科技有限公司 Multi-format data self-adaptive distribution method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246671A (en) * 2012-02-09 2013-08-14 中兴通讯股份有限公司 Processing method and device for abstract syntax notation files
CN106294374A (en) * 2015-05-15 2017-01-04 北京国双科技有限公司 The method of small documents merging and data query system
US10289739B1 (en) * 2014-05-07 2019-05-14 ThinkAnalytics System to recommend content based on trending social media topics
US10592282B2 (en) * 2015-09-16 2020-03-17 Salesforce.Com, Inc. Providing strong ordering in multi-stage streaming processing
CN111046022A (en) * 2019-12-04 2020-04-21 山西云时代技术有限公司 Database auditing method based on big data technology
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784039A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 A kind of data load method, apparatus and system
US11194551B2 (en) * 2017-06-07 2021-12-07 Ab Initio Technology Llc Dataflow graph configuration
US11042530B2 (en) * 2018-01-17 2021-06-22 International Business Machines Corporation Data processing with nullable schema information
CN110813783A (en) * 2019-10-29 2020-02-21 常州微亿智造科技有限公司 Appearance intelligent detection system based on manipulator
CN111125513A (en) * 2019-11-22 2020-05-08 博智安全科技股份有限公司 Recommendation system based on Spark
CN111324688A (en) * 2020-02-24 2020-06-23 南京莱斯网信技术研究院有限公司 Semi-structured data and unstructured data acquisition system based on events

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246671A (en) * 2012-02-09 2013-08-14 中兴通讯股份有限公司 Processing method and device for abstract syntax notation files
US10289739B1 (en) * 2014-05-07 2019-05-14 ThinkAnalytics System to recommend content based on trending social media topics
CN106294374A (en) * 2015-05-15 2017-01-04 北京国双科技有限公司 The method of small documents merging and data query system
US10592282B2 (en) * 2015-09-16 2020-03-17 Salesforce.Com, Inc. Providing strong ordering in multi-stage streaming processing
CN111046022A (en) * 2019-12-04 2020-04-21 山西云时代技术有限公司 Database auditing method based on big data technology
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack;Geoffrey C. Fox等;《2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》;20150709;第1057-1066页 *

Also Published As

Publication number Publication date
CN112214453A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
US11354316B2 (en) Systems and methods for selective scanning of external partitions
Chaiken et al. Scope: easy and efficient parallel processing of massive data sets
US8099725B2 (en) Method and apparatus for generating code for an extract, transform, and load (ETL) data flow
CN103425762A (en) Telecom operator mass data processing method based on Hadoop platform
CN111324610A (en) Data synchronization method and device
CN103593422A (en) Virtual access management method of heterogeneous database
CN104899199A (en) Data processing method and system for data warehouse
Yang et al. F1 Lightning: HTAP as a Service
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN104572895A (en) MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method
US20130290300A1 (en) In-database parallel analytics
Samwel et al. F1 query: Declarative querying at scale
CN106528898A (en) Method and device for converting data of non-relational database into relational database
WO2014163624A1 (en) Query integration across databases and file systems
Mehmood et al. Performance analysis of not only SQL semi-stream join using MongoDB for real-time data warehousing
Bidoit et al. Processing XML queries and updates on map/reduce clusters
Sathya et al. Application of hadoop mapreduce technique to virtual database system design
CN106708972B (en) Method for optimizing ABAP program by utilizing SLT component based on HANA database
Sethy et al. Big data analysis using Hadoop: a survey
CN109165262A (en) Fragmentation clustering system and fragmentation method of relational large table
CN102360382B (en) High-speed object-based parallel storage system directory replication method
US8229946B1 (en) Business rules application parallel processing system
Sinthong et al. AFrame: Extending DataFrames for large-scale modern data analysis (Extended Version)
McClean et al. A comparison of mapreduce and parallel database management systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant