CN104615736A - Quick analysis and storage method of big data based on database - Google Patents
Quick analysis and storage method of big data based on database Download PDFInfo
- Publication number
- CN104615736A CN104615736A CN201510070607.1A CN201510070607A CN104615736A CN 104615736 A CN104615736 A CN 104615736A CN 201510070607 A CN201510070607 A CN 201510070607A CN 104615736 A CN104615736 A CN 104615736A
- Authority
- CN
- China
- Prior art keywords
- data
- list
- thread
- definition
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
Abstract
A quick analysis and storage method of big data based on a database includes the steps 1, according to data content actually to be analyzed, defining a data format to a file; 2, reading the data format from the file to a data structure constructed in a memory, as data analysis basis; 3, making preparation before data analysis; 4, analyzing data; 5, storing the data; 6, after a list in the step 5 finishes storage, clearing a results list, setting a processing completion mark to be false, setting a status of the processing completion mark to be idle, recovering the processing completion mark to a thread pool, and waiting for allocation of new data blocks. The method has the advantages that analytical speed is high, the analyzed data structure is configurable and highly universal, data storage redundancy is low, and stored results data are convenient to analyze afterwards.
Description
Technical field
The present invention relates to a kind of microcomputer data processing, in particular, relating to the fast resolving for solving large data and ex-post analysis problem.
Background technology
The mass data needing process to produce at a high speed all can be run in a lot of engineering, if these data can not process in time, fatal impact can be produced to whole software systems, the existing parsing scheme to large data, substantially be all process set form, the compatible non-constant of extendability, and processing speed is fast not in single computer, do not make full use of the processing power that existing technology plays computing machine to the full extent, need Distributed Calculation to network speed and stability requirement high, and do not provide efficient and the comprehensive date storage method of information, so extract dumb when data process afterwards.
Through retrieval, application number is disclose a kind of Volume data disposal route and system in 200810097594.7, and solution Volume data cannot process at the appointed time and cause process time delay, finally causes the problem of system crash.Comprise: according to source document naming rule distribution server, source document is split as small documents; For each small documents after fractionation, according to small documents naming rule distribution server again, the small documents after splitting is processed.This invention can be disposed multiple servers and splits large-data documents simultaneously and process, and greatly improves the processing power of system, ensures that system is complete to file processing at the appointed time.And described system has extraordinary extendability, when files tend large or increasing time, just can be satisfied the demands by newly-increased server, namely can linear expansion, and do not need to buy more senior server, do not need the server run before redeploying yet.But there is the technological deficiency of following aspect:
1, versatility: setting data form wanted by the file in above-mentioned patent, extendability poor universality;
2, treatment effeciency: the design logic in above-mentioned patent is how to split file, composition file greatly, corresponding disk read-write and Internet Transmission can expend a lot of time resources;
3, data integrity: above-mentioned patent adopts client-server pattern, need Distributed Calculation, but the process of single client to data does not make full use of computer resource, if network connects and occurs abnormal, unpredictable time delay or loss of data may be caused thus cause serious consequence.
Summary of the invention
The present invention is directed to the technical matters existed in above-mentioned prior art, a kind of large data fast resolving storage means based on database is provided, data format definition method is provided, realizes general-purpose data parsing method; The method of fast processing data is provided, in the unit interval, efficiently processes mass data; Efficient date storage method is provided, is convenient to off-line compute analysis.
For achieving the above object, the technical solution adopted in the present invention is as follows:
Based on a large data fast resolving storage means for database, comprise step as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file;
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation;
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data, create the thread of specified quantity, and these thread identification are saved in list, this list is considered to a thread pool, for each thread, for it creates one for preserving the list of analysis result;
Step 4: Data Analysis, the data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created;
Step 5: data store, creates tables of data T1, the binary data content of the data structure sequence that the data format definition obtained for storing step 2 is corresponding, and be that the unique data of this record distribution one define ID; Create the tables of data T2 of associated data table T1 data definition ID, for being set up the analysis result that be disposed mark and its sequence number are thread information minimum in thread pool in storing step 4.
Step 6: in step 5 after list storage, empties its results list, and set handling completes and is labeled as vacation, and then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
The concrete grammar of described step 1 is: data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether have positive and negative point, relative starting position, if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem, for the uncertain data item of length, definition calculates fraction.
Described computing formula comprises conventional mathematic(al) representation, or comprises quoting other data items length, to go out the physical length of data item according to given concrete data content dynamic calculation.
The concrete grammar of described step 2 is: described data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
In described step 3, if raw data is stored on disk, then reads data by File Mapping mechanism and carry out cutting to internal memory, to improve data reading speed.
In described step 4, because the length of each data item in each analysis result is determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when each data item will be used, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream, after the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
The concrete grammar of described step 5 is: first create a tables of data T1, data structure serializing corresponding to data format definition step 2 obtained is in temporary file, then the binary data content in file is stored in this table, and be that this records the unique data definition ID of distribution one, then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, define the binary data stream that a field deposits each analysis result again, thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and by the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
To the method that the data stored in step 7 are extracted be: only need from tables of data T1, read out corresponding binary data and be put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in tables of data T2 mates by Data Identification in order with the data structure obtained, and can obtain the data content of each field.
Technique scheme of the present invention, relative to Volume data disposal route a kind of disclosed in prior art 200810097594.7 and system, has the advantage of following aspect:
1, versatility: the present invention is according to extracting in specific file by data characteristic information, this file can also allow user increase, deletes, change data feature description information, and software according to the rule parsing data of this paper formulation, thus possesses good extendability;
2, treatment effeciency: the present invention saves the disk read-write time greatly by File Mapping mechanism, by thread pool Multi-thread synchronization process data, speed is fast, efficiency is high;
3, data integrity, the present invention utilizes thread pool and File Mapping mechanism to make full use of single computer processing power, is guaranteed in data integrity.
The beneficial effect that the present invention brings is as follows:
1) resolution speed is fast;
2) data structure of resolving is configurable, highly versatile;
3) analysis result is stored in database, and data storage redundancy is little;
4) the result data ex-post analysis stored is convenient.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is method flow diagram provided by the present invention.
Embodiment
Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.
As shown in Figure 1, the large data fast resolving storage means based on database provided by the present invention, the step specifically comprised is as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file.
Data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether have positive and negative point, relative starting position etc., if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem.For the uncertain data item of length, definition calculates fraction, computing formula can comprise conventional mathematic(al) representation, also can comprise quoting other data items length, so just can go out the physical length of data item according to given concrete data content dynamic calculation.
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation.
This data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data like this, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data.
If raw data is stored on disk, then reads data by File Mapping mechanism and carry out cutting to internal memory, can data reading speed be improved like this.Create the thread of specified quantity, and be saved in list by these thread identification, this list can be considered to a thread pool.For each thread, for it creates one for preserving the list of analysis result.
Step 4: Data Analysis.
The data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created.
Because the length of each data item in each analysis result is now determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when using each data item, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream.After the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
Step 5: data store
First create a tables of data T1, the binary data content in file, in temporary file, then stores in this table by data structure serializing corresponding to data format definition step 2 obtained, and be that the unique data of this record distribution one define ID; Then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, then define the binary data stream that a field deposits each analysis result.Thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key (data definition ID) of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
Adopt the database that this step storage obtains, the complete documentation data structure of data, data record redundant information is few, without the need to relying on other configuration file during ex-post analysis, only needing from tables of data T1, read out corresponding binary data is put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in T2 table carries out mating the data content that can obtain each field by Data Identification with the data structure obtained in order, and data are extracted very simple.
Step 6: after thread analysis result list storage, its results list is emptied, and set handling completes and is labeled as vacation, then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.
Claims (8)
1., based on a large data fast resolving storage means for database, it is characterized in that, comprise step as follows:
Step 1: according to the actual data content that will analyze, definition data layout is in file;
Step 2: data layout is read in internal memory by file and constructs data structure, as Data Analysis foundation;
Step 3: the preparation before Data Analysis, is cut into the data block of specifying size to raw data, create the thread of specified quantity, and these thread identification are saved in list, this list is considered to a thread pool, for each thread, for it creates one for preserving the list of analysis result;
Step 4: Data Analysis, the data block segmented in step 3 is distributed to idle thread in thread pool, give these idle thread Allotment Serial Numbers according to the context of data block simultaneously, carry out Data Matching parsing according to alphabetic data item list in the data structure obtained in step 2 after these idle threads take data, resolve the result obtained and be stored in step 3 as in its results list created;
Step 5: data store, creates tables of data T1, the binary data content of the data structure sequence that the data format definition obtained for storing step 2 is corresponding, and be that the unique data of this record distribution one define ID; Create the tables of data T2 of associated data table T1 data definition ID, for being set up the analysis result that be disposed mark and its sequence number are thread information minimum in thread pool in storing step 4.
Step 6: in step 5 after list storage, empties its results list, and set handling completes and is labeled as vacation, and then arranging its state is idle condition, is recovered to thread pool, waits new data block to be allocated.
2. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 1 is: data layout adopts extend markup language to write, to define the data of any form, the data item related in data layout will define its mark, title, type, length, whether there is positive and negative dividing, relative starting position, if a data item can be subdivided into multiple data item again, then define data subitem, the same data item of definition mode of each data subitem, for the uncertain data item of length, definition calculates fraction.
3. the large data fast resolving storage means based on database according to claim 2, it is characterized in that, described computing formula comprises conventional mathematic(al) representation, or comprise quoting other data items length, to go out the physical length of data item according to given concrete data content dynamic calculation.
4. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 2 is: described data structure will preserve the tree structure between the data that define in data format definition file, also this tree structure to be organized into the list of data items of an order, the data item content comprising data subitem can be extracted easily after having resolved data, the mode fast resolving data of the matching analysis can be adopted item by item again when Data Analysis according to the list of data items of order.
5. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, in described step 3, if raw data is stored on disk, then read data by File Mapping mechanism and carry out cutting to internal memory, to improve data reading speed.
6. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, in described step 4, because the length of each data item in each analysis result is determined, so the every bar record in the results list is all provided with the physical length information of all variable-length fields in this record, and contain a binary data stream according to each result of the list of data items of data definition of order, when each data item will be used, only needs extract the data of corresponding length according to the length of data item and data reference position from this binary data stream, after the data block distributing to thread is resolved, this thread and list set handling are completed and is labeled as very.
7. the large data fast resolving storage means based on database according to claim 1, it is characterized in that, the concrete grammar of described step 5 is: first create a tables of data T1, data structure serializing corresponding to data format definition step 2 obtained is in temporary file, then the binary data content in file is stored in this table, and be that this records the unique data definition ID of distribution one, then create a tables of data T2 to be used for depositing analysis result, in T2, define an external key be associated with data definition ID in T1, define the physical length information that a field deposits all variable-length fields, define the binary data stream that a field deposits each analysis result again, thread information in analytical procedure 4, if certain thread has been set up the mark that is disposed, and its sequence number is minimum in thread pool, then by its analysis result list storage in tables of data T2, when depositing, the external key of each tables of data record is set to distribute in T1 the ID of data structure corresponding to data format definition that step 2 obtains, and by the variable-length field physical length information of record, binary data stream is stored in the respective field of T2.
8. the large data fast resolving storage means based on database according to claim 7, it is characterized in that, to the method that the data stored in step 7 are extracted be: only need from tables of data T1, read out corresponding binary data and be put in temporary file, then data structure definition is obtained from temporary file unserializing, then the binary data stream taken out again in tables of data T2 mates by Data Identification in order with the data structure obtained, and can obtain the data content of each field.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510070607.1A CN104615736B (en) | 2015-02-10 | 2015-02-10 | Big data fast resolving storage method based on database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510070607.1A CN104615736B (en) | 2015-02-10 | 2015-02-10 | Big data fast resolving storage method based on database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104615736A true CN104615736A (en) | 2015-05-13 |
CN104615736B CN104615736B (en) | 2017-10-27 |
Family
ID=53150178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510070607.1A Active CN104615736B (en) | 2015-02-10 | 2015-02-10 | Big data fast resolving storage method based on database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104615736B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528425A (en) * | 2015-12-08 | 2016-04-27 | 普元信息技术股份有限公司 | Method of implementing asynchronous data storage based on files in cloud computing environment |
CN106055450A (en) * | 2016-05-20 | 2016-10-26 | 北京神州绿盟信息安全科技股份有限公司 | Binary log analysis method and apparatus |
WO2017028690A1 (en) * | 2015-08-14 | 2017-02-23 | 阿里巴巴集团控股有限公司 | File processing method and system based on etl |
CN106528893A (en) * | 2016-12-26 | 2017-03-22 | 北京奇虎科技有限公司 | Data synchronization method and device |
CN106709059A (en) * | 2017-01-10 | 2017-05-24 | 南方电网科学研究院有限责任公司 | Monitoring method and device for terminal online rate indexes based on metering automation system |
CN107092607A (en) * | 2016-02-18 | 2017-08-25 | 中国移动通信集团安徽有限公司 | A kind of bill storage method and device |
CN107798122A (en) * | 2017-11-10 | 2018-03-13 | 中国航空工业集团公司西安飞机设计研究所 | A kind of unstructured data analytic method |
CN107977341A (en) * | 2016-10-21 | 2018-05-01 | 北京航天爱威电子技术有限公司 | Big data text immediate processing method |
CN107992561A (en) * | 2017-11-29 | 2018-05-04 | 四川巧夺天工信息安全智能设备有限公司 | A kind of method of long field in parsing EDB database source files |
CN108090137A (en) * | 2017-11-29 | 2018-05-29 | 四川巧夺天工信息安全智能设备有限公司 | A kind of method for parsing long field in EDB database source files |
CN108763235A (en) * | 2018-02-13 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of document handling method, device and equipment |
CN110866010A (en) * | 2019-10-30 | 2020-03-06 | 苏州伽顿全盛信息科技有限公司 | Formatted order information extraction method and device |
CN113282609A (en) * | 2021-06-11 | 2021-08-20 | 东莞市盟大塑化科技有限公司 | Intelligent data analysis method based on big data technology |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582064A (en) * | 2008-05-15 | 2009-11-18 | 阿里巴巴集团控股有限公司 | Method and system for processing enormous data |
US20140101178A1 (en) * | 2012-10-08 | 2014-04-10 | Bmc Software, Inc. | Progressive analysis for big data |
CN103778148A (en) * | 2012-10-23 | 2014-05-07 | 阿里巴巴集团控股有限公司 | Life cycle management method and equipment for data file of Hadoop distributed file system |
CN104317800A (en) * | 2014-09-19 | 2015-01-28 | 山东大学 | Hybrid storage system and method for mass intelligent power utilization data |
-
2015
- 2015-02-10 CN CN201510070607.1A patent/CN104615736B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582064A (en) * | 2008-05-15 | 2009-11-18 | 阿里巴巴集团控股有限公司 | Method and system for processing enormous data |
US20140101178A1 (en) * | 2012-10-08 | 2014-04-10 | Bmc Software, Inc. | Progressive analysis for big data |
CN103778148A (en) * | 2012-10-23 | 2014-05-07 | 阿里巴巴集团控股有限公司 | Life cycle management method and equipment for data file of Hadoop distributed file system |
CN104317800A (en) * | 2014-09-19 | 2015-01-28 | 山东大学 | Hybrid storage system and method for mass intelligent power utilization data |
Non-Patent Citations (1)
Title |
---|
张静: ""解析大数据"", 《电脑开发与应用》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017028690A1 (en) * | 2015-08-14 | 2017-02-23 | 阿里巴巴集团控股有限公司 | File processing method and system based on etl |
CN105528425A (en) * | 2015-12-08 | 2016-04-27 | 普元信息技术股份有限公司 | Method of implementing asynchronous data storage based on files in cloud computing environment |
CN107092607B (en) * | 2016-02-18 | 2021-04-23 | 中国移动通信集团安徽有限公司 | Ticket storage method and device |
CN107092607A (en) * | 2016-02-18 | 2017-08-25 | 中国移动通信集团安徽有限公司 | A kind of bill storage method and device |
CN106055450A (en) * | 2016-05-20 | 2016-10-26 | 北京神州绿盟信息安全科技股份有限公司 | Binary log analysis method and apparatus |
CN106055450B (en) * | 2016-05-20 | 2019-07-02 | 北京神州绿盟信息安全科技股份有限公司 | A kind of binary log analysis method and device |
CN107977341A (en) * | 2016-10-21 | 2018-05-01 | 北京航天爱威电子技术有限公司 | Big data text immediate processing method |
CN106528893A (en) * | 2016-12-26 | 2017-03-22 | 北京奇虎科技有限公司 | Data synchronization method and device |
CN106528893B (en) * | 2016-12-26 | 2020-01-10 | 北京奇虎科技有限公司 | Data synchronization method and device |
CN106709059B (en) * | 2017-01-10 | 2020-06-19 | 南方电网科学研究院有限责任公司 | Terminal online rate index monitoring method and device based on metering automation system |
CN106709059A (en) * | 2017-01-10 | 2017-05-24 | 南方电网科学研究院有限责任公司 | Monitoring method and device for terminal online rate indexes based on metering automation system |
CN107798122A (en) * | 2017-11-10 | 2018-03-13 | 中国航空工业集团公司西安飞机设计研究所 | A kind of unstructured data analytic method |
CN107798122B (en) * | 2017-11-10 | 2021-08-17 | 中国航空工业集团公司西安飞机设计研究所 | Unstructured data analysis method |
CN107992561A (en) * | 2017-11-29 | 2018-05-04 | 四川巧夺天工信息安全智能设备有限公司 | A kind of method of long field in parsing EDB database source files |
CN108090137A (en) * | 2017-11-29 | 2018-05-29 | 四川巧夺天工信息安全智能设备有限公司 | A kind of method for parsing long field in EDB database source files |
CN108090137B (en) * | 2017-11-29 | 2021-11-26 | 四川巧夺天工信息安全智能设备有限公司 | Method for analyzing overlong fields in EDB database source file |
CN108763235A (en) * | 2018-02-13 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of document handling method, device and equipment |
CN110866010A (en) * | 2019-10-30 | 2020-03-06 | 苏州伽顿全盛信息科技有限公司 | Formatted order information extraction method and device |
CN110866010B (en) * | 2019-10-30 | 2023-05-23 | 苏州伽顿全盛信息科技有限公司 | Formatted order information extraction method and device |
CN113282609A (en) * | 2021-06-11 | 2021-08-20 | 东莞市盟大塑化科技有限公司 | Intelligent data analysis method based on big data technology |
Also Published As
Publication number | Publication date |
---|---|
CN104615736B (en) | 2017-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104615736A (en) | Quick analysis and storage method of big data based on database | |
US9953102B2 (en) | Creating NoSQL database index for semi-structured data | |
CN111400408B (en) | Data synchronization method, device, equipment and storage medium | |
US20180137134A1 (en) | Data snapshot acquisition method and system | |
CN111881210B (en) | Data synchronization method, device, intranet server and medium | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN109753502B (en) | Data acquisition method based on NiFi | |
CN102129425B (en) | The access method of big object set table and device in data warehouse | |
CN104317928A (en) | Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database | |
CN106970929B (en) | Data import method and device | |
CN108268586B (en) | Data processing method, device, medium and computing equipment across multiple data tables | |
CN104331435A (en) | Low-influence high-efficiency mass data extraction method based on Hadoop big data platform | |
CN102110102A (en) | Data processing method and device, and file identifying method and tool | |
CN112214453B (en) | Large-scale industrial data compression storage method, system and medium | |
US8543600B2 (en) | Redistribute native XML index key shipping | |
CN101645073A (en) | Method for guiding prior database file into embedded type database | |
CN105159820A (en) | Transmission method and device of system log data | |
KR101450239B1 (en) | A system for simultaneous and parallel processing of many twig pattern queries for massive XML data and method thereof | |
CN104881475A (en) | Method and system for randomly sampling big data | |
CN110704407B (en) | Data deduplication method and system | |
US9092338B1 (en) | Multi-level caching event lookup | |
CN110851437A (en) | Storage method, device and equipment | |
JP2016076100A (en) | File division system and method | |
Aniceto et al. | Genomic data persistency on a NoSQL database system | |
Li et al. | Evaluating spatial keyword queries under the mapreduce framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 201203 Shanghai City, Pudong New Area Chinese (Shanghai) free trade zone 498 GuoShouJing Road No. 14 building block 22301-985 Patentee after: Shanghai chuangkin Mdt InfoTech Ltd Address before: 201203 Shanghai Guo Shou Jing Road, Zhangjiang High Tech Park of Pudong New Area No. 498 Pudong Software Park building 14, block 22301-985 Patentee before: Upper SeaBird scape computer system company limited |