CN104615736A - Quick analysis and storage method of big data based on database - Google Patents

Quick analysis and storage method of big data based on database Download PDF

Info

Publication number
CN104615736A
CN104615736A CN201510070607.1A CN201510070607A CN104615736A CN 104615736 A CN104615736 A CN 104615736A CN 201510070607 A CN201510070607 A CN 201510070607A CN 104615736 A CN104615736 A CN 104615736A
Authority
CN
China
Prior art keywords
data
list
thread
definition
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510070607.1A
Other languages
Chinese (zh)
Other versions
CN104615736B (en
Inventor
彭成志
刘钧钧
咸峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai chuangkin Mdt InfoTech Ltd
Original Assignee
Upper Seabird Scape Computer System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Upper Seabird Scape Computer System Co Ltd filed Critical Upper Seabird Scape Computer System Co Ltd
Priority to CN201510070607.1A priority Critical patent/CN104615736B/en
Publication of CN104615736A publication Critical patent/CN104615736A/en
Application granted granted Critical
Publication of CN104615736B publication Critical patent/CN104615736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof

Abstract

A quick analysis and storage method of big data based on a database includes the steps 1, according to data content actually to be analyzed, defining a data format to a file; 2, reading the data format from the file to a data structure constructed in a memory, as data analysis basis; 3, making preparation before data analysis; 4, analyzing data; 5, storing the data; 6, after a list in the step 5 finishes storage, clearing a results list, setting a processing completion mark to be false, setting a status of the processing completion mark to be idle, recovering the processing completion mark to a thread pool, and waiting for allocation of new data blocks. The method has the advantages that analytical speed is high, the analyzed data structure is configurable and highly universal, data storage redundancy is low, and stored results data are convenient to analyze afterwards.

Description

Based on the large data fast resolving storage means of database
Technical field
The present invention relates to a kind of microcomputer data processing, in particular, relating to the fast resolving for solving large data and ex-post analysis problem.
Background technology
The mass data needing process to produce at a high speed all can be run in a lot of engineering, if these data can not process in time, fatal impact can be produced to whole software systems, the existing parsing scheme to large data, substantially be all process set form, the compatible non-constant of extendability, and processing speed is fast not in single computer, do not make full use of the processing power that existing technology plays computing machine to the full extent, need Distributed Calculation to network speed and stability requirement high, and do not provide efficient and the comprehensive date storage method of information, so extract dumb when data process afterwards.
Through retrieval, application number is disclose a kind of Volume data disposal route and system in 200810097594.7, and solution Volume data cannot process at the appointed time and cause process time delay, finally causes the problem of system crash.Comprise: according to source document naming rule distribution server, source document is split as small documents; For each small documents after fractionation, according to small documents naming rule distribution server again, the small documents after splitting is processed.This invention can be disposed multiple servers and splits large-data documents simultaneously and process, and greatly improves the processing power of system, ensures that system is complete to file processing at the appointed time.And described system has extraordinary extendability, when files tend large or increasing time, just can be satisfied the demands by newly-increased server, namely can linear expansion, and do not need to buy more senior server, do not need the server run before redeploying yet.But there is the technological deficiency of following aspect:
1, versatility: setting data form wanted by the file in above-mentioned patent, extendability poor universality;
2, treatment effeciency: the design logic in above-mentioned patent is how to split file, composition file greatly, corresponding disk read-write and Internet Transmission can expend a lot of time resources;
3, data integrity: above-mentioned patent adopts client-server pattern, need Distributed Calculation, but the process of single client to data does not make full use of computer resource, if network connects and occurs abnormal, unpredictable time delay or loss of data may be caused thus cause serious consequence.
Summary of the invention
The present invention is directed to the technical matters existed in above-mentioned prior art, a kind of large data fast resolving storage means based on database is provided, data format definition method is provided, realizes general-purpose data parsing method; The method of fast processing data is provided, in the unit interval, efficiently processes mass data; Efficient date storage method is provided, is convenient to off-line compute analysis.
For achieving the above object, the technical solution adopted in the present invention is as follows:
Based on a large data fast resolving storage means for database, comprise step as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file;
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation;
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data, create the thread of specified quantity, and these thread identification are saved in list, this list is considered to a thread pool, for each thread, for it creates one for preserving the list of analysis result;
Step 4: Data Analysis, the data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created;
Step 5: data store, creates tables of data T1, the binary data content of the data structure sequence that the data format definition obtained for storing step 2 is corresponding, and be that the unique data of this record distribution one define ID; Create the tables of data T2 of associated data table T1 data definition ID, for being set up the analysis result that be disposed mark and its sequence number are thread information minimum in thread pool in storing step 4.
Step 6: in step 5 after list storage, empties its results list, and set handling completes and is labeled as vacation, and then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
The concrete grammar of described step 1 is: data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether have positive and negative point, relative starting position, if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem, for the uncertain data item of length, definition calculates fraction.
Described computing formula comprises conventional mathematic(al) representation, or comprises quoting other data items length, to go out the physical length of data item according to given concrete data content dynamic calculation.
The concrete grammar of described step 2 is: described data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
In described step 3, if raw data is stored on disk, then reads data by File Mapping mechanism and carry out cutting to internal memory, to improve data reading speed.
In described step 4, because the length of each data item in each analysis result is determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when each data item will be used, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream, after the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
The concrete grammar of described step 5 is: first create a tables of data T1, data structure serializing corresponding to data format definition step 2 obtained is in temporary file, then the binary data content in file is stored in this table, and be that this records the unique data definition ID of distribution one, then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, define the binary data stream that a field deposits each analysis result again, thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and by the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
To the method that the data stored in step 7 are extracted be: only need from tables of data T1, read out corresponding binary data and be put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in tables of data T2 mates by Data Identification in order with the data structure obtained, and can obtain the data content of each field.
Technique scheme of the present invention, relative to Volume data disposal route a kind of disclosed in prior art 200810097594.7 and system, has the advantage of following aspect:
1, versatility: the present invention is according to extracting in specific file by data characteristic information, this file can also allow user increase, deletes, change data feature description information, and software according to the rule parsing data of this paper formulation, thus possesses good extendability;
2, treatment effeciency: the present invention saves the disk read-write time greatly by File Mapping mechanism, by thread pool Multi-thread synchronization process data, speed is fast, efficiency is high;
3, data integrity, the present invention utilizes thread pool and File Mapping mechanism to make full use of single computer processing power, is guaranteed in data integrity.
The beneficial effect that the present invention brings is as follows:
1) resolution speed is fast;
2) data structure of resolving is configurable, highly versatile;
3) analysis result is stored in database, and data storage redundancy is little;
4) the result data ex-post analysis stored is convenient.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is method flow diagram provided by the present invention.
Embodiment
Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.
As shown in Figure 1, the large data fast resolving storage means based on database provided by the present invention, the step specifically comprised is as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file.
Data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether have positive and negative point, relative starting position etc., if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem.For the uncertain data item of length, definition calculates fraction, computing formula can comprise conventional mathematic(al) representation, also can comprise quoting other data items length, so just can go out the physical length of data item according to given concrete data content dynamic calculation.
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation.
This data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data like this, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data.
If raw data is stored on disk, then reads data by File Mapping mechanism and carry out cutting to internal memory, can data reading speed be improved like this.Create the thread of specified quantity, and be saved in list by these thread identification, this list can be considered to a thread pool.For each thread, for it creates one for preserving the list of analysis result.
Step 4: Data Analysis.
The data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created.
Because the length of each data item in each analysis result is now determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when using each data item, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream.After the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
Step 5: data store
First create a tables of data T1, the binary data content in file, in temporary file, then stores in this table by data structure serializing corresponding to data format definition step 2 obtained, and be that the unique data of this record distribution one define ID; Then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, then define the binary data stream that a field deposits each analysis result.Thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key (data definition ID) of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
Adopt the database that this step storage obtains, the complete documentation data structure of data, data record redundant information is few, without the need to relying on other configuration file during ex-post analysis, only needing from tables of data T1, read out corresponding binary data is put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in T2 table carries out mating the data content that can obtain each field by Data Identification with the data structure obtained in order, and data are extracted very simple.
Step 6: after thread analysis result list storage, its results list is emptied, and set handling completes and is labeled as vacation, then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims (8)

1., based on a large data fast resolving storage means for database, it is characterized in that, comprise step as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file;
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation;
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data, create the thread of specified quantity, and these thread identification are saved in list, this list is considered to a thread pool, for each thread, for it creates one for preserving the list of analysis result;
Step 4: Data Analysis, the data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created;
Step 5: data store, creates tables of data T1, the binary data content of the data structure sequence that the data format definition obtained for storing step 2 is corresponding, and be that the unique data of this record distribution one define ID; Create the tables of data T2 of associated data table T1 data definition ID, for being set up the analysis result that be disposed mark and its sequence number are thread information minimum in thread pool in storing step 4.
Step 6: in step 5 after list storage, empties its results list, and set handling completes and is labeled as vacation, and then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
2. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 1 is: data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether there is positive and negative dividing, relative starting position, if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem, for the uncertain data item of length, definition calculates fraction.
3. the large data fast resolving storage means based on database according to claim 2, it is characterized in that, described computing formula comprises conventional mathematic(al) representation, or comprise quoting other data items length, to go out the physical length of data item according to given concrete data content dynamic calculation.
4. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 2 is: described data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
5. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, in described step 3, if raw data is stored on disk, then read data by File Mapping mechanism and carry out cutting to internal memory, to improve data reading speed.
6. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, in described step 4, because the length of each data item in each analysis result is determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when each data item will be used, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream, after the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
7. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 5 is: first create a tables of data T1, data structure serializing corresponding to data format definition step 2 obtained is in temporary file, then the binary data content in file is stored in this table, and be that this records the unique data definition ID of distribution one, then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, define the binary data stream that a field deposits each analysis result again, thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and by the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
8. the large data fast resolving storage means based on database according to claim 7, it is characterized in that, to the method that the data stored in step 7 are extracted be: only need from tables of data T1, read out corresponding binary data and be put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in tables of data T2 mates by Data Identification in order with the data structure obtained, and can obtain the data content of each field.
CN201510070607.1A 2015-02-10 2015-02-10 Big data fast resolving storage method based on database Active CN104615736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510070607.1A CN104615736B (en) 2015-02-10 2015-02-10 Big data fast resolving storage method based on database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510070607.1A CN104615736B (en) 2015-02-10 2015-02-10 Big data fast resolving storage method based on database

Publications (2)

Publication Number Publication Date
CN104615736A true CN104615736A (en) 2015-05-13
CN104615736B CN104615736B (en) 2017-10-27

Family

ID=53150178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510070607.1A Active CN104615736B (en) 2015-02-10 2015-02-10 Big data fast resolving storage method based on database

Country Status (1)

Country Link
CN (1) CN104615736B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528425A (en) * 2015-12-08 2016-04-27 普元信息技术股份有限公司 Method of implementing asynchronous data storage based on files in cloud computing environment
CN106055450A (en) * 2016-05-20 2016-10-26 北京神州绿盟信息安全科技股份有限公司 Binary log analysis method and apparatus
WO2017028690A1 (en) * 2015-08-14 2017-02-23 阿里巴巴集团控股有限公司 File processing method and system based on etl
CN106528893A (en) * 2016-12-26 2017-03-22 北京奇虎科技有限公司 Data synchronization method and device
CN106709059A (en) * 2017-01-10 2017-05-24 南方电网科学研究院有限责任公司 Monitoring method and device for terminal online rate indexes based on metering automation system
CN107092607A (en) * 2016-02-18 2017-08-25 中国移动通信集团安徽有限公司 A kind of bill storage method and device
CN107798122A (en) * 2017-11-10 2018-03-13 中国航空工业集团公司西安飞机设计研究所 A kind of unstructured data analytic method
CN107977341A (en) * 2016-10-21 2018-05-01 北京航天爱威电子技术有限公司 Big data text immediate processing method
CN107992561A (en) * 2017-11-29 2018-05-04 四川巧夺天工信息安全智能设备有限公司 A kind of method of long field in parsing EDB database source files
CN108090137A (en) * 2017-11-29 2018-05-29 四川巧夺天工信息安全智能设备有限公司 A kind of method for parsing long field in EDB database source files
CN108763235A (en) * 2018-02-13 2018-11-06 阿里巴巴集团控股有限公司 A kind of document handling method, device and equipment
CN110866010A (en) * 2019-10-30 2020-03-06 苏州伽顿全盛信息科技有限公司 Formatted order information extraction method and device
CN113282609A (en) * 2021-06-11 2021-08-20 东莞市盟大塑化科技有限公司 Intelligent data analysis method based on big data technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582064A (en) * 2008-05-15 2009-11-18 阿里巴巴集团控股有限公司 Method and system for processing enormous data
US20140101178A1 (en) * 2012-10-08 2014-04-10 Bmc Software, Inc. Progressive analysis for big data
CN103778148A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Life cycle management method and equipment for data file of Hadoop distributed file system
CN104317800A (en) * 2014-09-19 2015-01-28 山东大学 Hybrid storage system and method for mass intelligent power utilization data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582064A (en) * 2008-05-15 2009-11-18 阿里巴巴集团控股有限公司 Method and system for processing enormous data
US20140101178A1 (en) * 2012-10-08 2014-04-10 Bmc Software, Inc. Progressive analysis for big data
CN103778148A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Life cycle management method and equipment for data file of Hadoop distributed file system
CN104317800A (en) * 2014-09-19 2015-01-28 山东大学 Hybrid storage system and method for mass intelligent power utilization data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张静: ""解析大数据"", 《电脑开发与应用》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028690A1 (en) * 2015-08-14 2017-02-23 阿里巴巴集团控股有限公司 File processing method and system based on etl
CN105528425A (en) * 2015-12-08 2016-04-27 普元信息技术股份有限公司 Method of implementing asynchronous data storage based on files in cloud computing environment
CN107092607B (en) * 2016-02-18 2021-04-23 中国移动通信集团安徽有限公司 Ticket storage method and device
CN107092607A (en) * 2016-02-18 2017-08-25 中国移动通信集团安徽有限公司 A kind of bill storage method and device
CN106055450A (en) * 2016-05-20 2016-10-26 北京神州绿盟信息安全科技股份有限公司 Binary log analysis method and apparatus
CN106055450B (en) * 2016-05-20 2019-07-02 北京神州绿盟信息安全科技股份有限公司 A kind of binary log analysis method and device
CN107977341A (en) * 2016-10-21 2018-05-01 北京航天爱威电子技术有限公司 Big data text immediate processing method
CN106528893A (en) * 2016-12-26 2017-03-22 北京奇虎科技有限公司 Data synchronization method and device
CN106528893B (en) * 2016-12-26 2020-01-10 北京奇虎科技有限公司 Data synchronization method and device
CN106709059B (en) * 2017-01-10 2020-06-19 南方电网科学研究院有限责任公司 Terminal online rate index monitoring method and device based on metering automation system
CN106709059A (en) * 2017-01-10 2017-05-24 南方电网科学研究院有限责任公司 Monitoring method and device for terminal online rate indexes based on metering automation system
CN107798122A (en) * 2017-11-10 2018-03-13 中国航空工业集团公司西安飞机设计研究所 A kind of unstructured data analytic method
CN107798122B (en) * 2017-11-10 2021-08-17 中国航空工业集团公司西安飞机设计研究所 Unstructured data analysis method
CN107992561A (en) * 2017-11-29 2018-05-04 四川巧夺天工信息安全智能设备有限公司 A kind of method of long field in parsing EDB database source files
CN108090137A (en) * 2017-11-29 2018-05-29 四川巧夺天工信息安全智能设备有限公司 A kind of method for parsing long field in EDB database source files
CN108090137B (en) * 2017-11-29 2021-11-26 四川巧夺天工信息安全智能设备有限公司 Method for analyzing overlong fields in EDB database source file
CN108763235A (en) * 2018-02-13 2018-11-06 阿里巴巴集团控股有限公司 A kind of document handling method, device and equipment
CN110866010A (en) * 2019-10-30 2020-03-06 苏州伽顿全盛信息科技有限公司 Formatted order information extraction method and device
CN110866010B (en) * 2019-10-30 2023-05-23 苏州伽顿全盛信息科技有限公司 Formatted order information extraction method and device
CN113282609A (en) * 2021-06-11 2021-08-20 东莞市盟大塑化科技有限公司 Intelligent data analysis method based on big data technology

Also Published As

Publication number Publication date
CN104615736B (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN104615736A (en) Quick analysis and storage method of big data based on database
US9953102B2 (en) Creating NoSQL database index for semi-structured data
CN111400408B (en) Data synchronization method, device, equipment and storage medium
US20180137134A1 (en) Data snapshot acquisition method and system
CN111881210B (en) Data synchronization method, device, intranet server and medium
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN109753502B (en) Data acquisition method based on NiFi
CN102129425B (en) The access method of big object set table and device in data warehouse
CN104317928A (en) Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN106970929B (en) Data import method and device
CN108268586B (en) Data processing method, device, medium and computing equipment across multiple data tables
CN104331435A (en) Low-influence high-efficiency mass data extraction method based on Hadoop big data platform
CN102110102A (en) Data processing method and device, and file identifying method and tool
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
US8543600B2 (en) Redistribute native XML index key shipping
CN101645073A (en) Method for guiding prior database file into embedded type database
CN105159820A (en) Transmission method and device of system log data
KR101450239B1 (en) A system for simultaneous and parallel processing of many twig pattern queries for massive XML data and method thereof
CN104881475A (en) Method and system for randomly sampling big data
CN110704407B (en) Data deduplication method and system
US9092338B1 (en) Multi-level caching event lookup
CN110851437A (en) Storage method, device and equipment
JP2016076100A (en) File division system and method
Aniceto et al. Genomic data persistency on a NoSQL database system
Li et al. Evaluating spatial keyword queries under the mapreduce framework

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 201203 Shanghai City, Pudong New Area Chinese (Shanghai) free trade zone 498 GuoShouJing Road No. 14 building block 22301-985

Patentee after: Shanghai chuangkin Mdt InfoTech Ltd

Address before: 201203 Shanghai Guo Shou Jing Road, Zhangjiang High Tech Park of Pudong New Area No. 498 Pudong Software Park building 14, block 22301-985

Patentee before: Upper SeaBird scape computer system company limited