CN105930381A - Global Argo data storage and update method based on mixed database architecture - Google Patents

Global Argo data storage and update method based on mixed database architecture Download PDF

Info

Publication number
CN105930381A
CN105930381A CN201610230748.XA CN201610230748A CN105930381A CN 105930381 A CN105930381 A CN 105930381A CN 201610230748 A CN201610230748 A CN 201610230748A CN 105930381 A CN105930381 A CN 105930381A
Authority
CN
China
Prior art keywords
data
argo
file
buoy
whole world
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610230748.XA
Other languages
Chinese (zh)
Inventor
曹敏杰
许建平
刘增宏
孙朝辉
吴晓芬
卢少磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Institute of Oceanography SOA
Original Assignee
Second Institute of Oceanography SOA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Institute of Oceanography SOA filed Critical Second Institute of Oceanography SOA
Priority to CN201610230748.XA priority Critical patent/CN105930381A/en
Publication of CN105930381A publication Critical patent/CN105930381A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Abstract

The invention discloses a global Argo data storage and update method based on mixed database architecture. The method comprises following steps: 1) a data monitor monitors an assigned directory, and new Argo data files are forwarded to a data server; 2) a data classifier classifies the Argo data files collected in the data server into three classes according to file formats; 3) a data controller checks that whether the database contains current data and checks that whether the data files are complete or not; 4) a data extractor extracts related metadata and data blocks from the data files; 5) a data input module uploads unstructured data blocks to a HDFS distributed storage system; 6) a data filing module performs filing on the input data to form log files. According to the global Argo data storage and update method based on mixed database architecture, data files of different classes in the global Argo data are integrated into one database platform to be stored and updated; efficient and flexibly extensible storage and update solution scheme is provided for the global Argo data.

Description

Global Argo data based on hybrid database framework storage and update method
Technical field
The present invention relates to data storage and update method field, particularly relate to a kind of based on hybrid database framework Global Argo data storage and update method.
Background technology
Whole world Argo plan is in 1998 by the U.S., the air of state, the Marine Sciences man such as French and Japanese The global oceanographic observation plan released, it is intended to collect global ocean upper strata quickly, accurately, on a large scale Sea water temperature and salinity profiles data, to improve the precision of climatic prediction, the gas that effectively the defence whole world is the most serious Wait the threat that disaster (such as hurricane, tornado, typhoon, ice storm, flood and arid etc.) causes to the mankind. Since 15 years, the Argo buoy quantity that various countries lay at global ocean, more than 12,000, accumulative obtains About 1,500,000 temperature and salinity profiles, define huge global Argo data ocean.
Along with being constantly incremented by of whole world Argo data volume, and owing to Argo data exist multi-source heterogeneous, dynamic The characteristic such as multidimensional and magnanimity, the whole world efficient storage of Argo data and renewal always be one challenging A difficult problem.Argo data store as mainly storing form using file at present, to crude initial data Extract and sorter is difficult to, be also unfavorable for further data mining, it is impossible to coupling currently increases day by day Long Argo data ocean.All kinds of Argo data that long term accumulation gets off are stored in the most isolated difference Place, it is impossible to carry out effective collaborative work, the renewal of data also cannot be accomplished the most ageing.Therefore, By different types of whole world Argo data unification to same database platform storing and updating, become The urgent needs of scientific research business.
Summary of the invention
It is an object of the invention to the problem for overcoming prior art to exist, it is provided that a kind of based on hybrid database frame The global Argo data storage of structure and update method.
Global Argo data based on hybrid database framework storage and update method, its step is as follows:
1) data monitor monitors assigned catalogue on teledata main frame, once has new Argo data literary composition Part generates, then by data file forwarding to data server;
2) the Argo data file being summarised on data server is divided into by data sorter according to file format Whole world Argo buoy metadata, whole world Argo buoy observation cross-sectional data and whole world Argo gridded data produce Three classifications of product;
3) whether recording controller exists current data in checking data base, and whether checks content data file Completely;
4) data extractor extracts relevant metadata and data block from data file;
5) non-structured data block is uploaded to HDFS distributed memory system by data loading module, will knot The metadata record of structure is in PostgreSQL relevant database, and sets up between data block and metadata Index;
6) data that warehouse-in completes are filed by data filing module, form journal file.
Described step 1) particularly as follows: data monitor is a module residing on data server, it Thread can be periodically turned on, for connecting each teledata main frame that whole world Argo data are relevant, and lead to Cross daily record document judges whether there is new Argo data genaration on assigned catalogue, once have new file generated, then By on this data file forwarding to data server, and record in daily record document.
Described step 2) particularly as follows: the data sorter Argo data literary composition to being summarised on data server Part is classified, and DAT file format is whole world Argo buoy metadata, and NetCDF file format is complete Ball Argo buoy observation cross-sectional data, PNG file format for the whole world Argo gridded data product, by This three class file is divided into respective data center by this.
Described step 3) particularly as follows: recording controller according to file name from PostgreSQL relationship type number Whether there is this data file according to storehouse is inquired about, and judge that this data file is the most complete according to file size, If meeting, to there is not this data file and data file in data base complete, then can be identified as new data file, Can put in storage.
Described step 4) particularly as follows: data extractor is from whole world Argo buoy meta data file, whole world Argo Buoy observation cross-sectional data file and whole world Argo gridded data product extract the metadata of correspondence respectively, Extract blocks of data from whole world Argo buoy observation cross-sectional data file and be converted to JSON formatted file simultaneously.
Described step 5) specifically include following sub-step:
5.1) data loading module is by step 4) in the structurized metadata record that extracts arrive In PostgreSQL relevant database, it is mainly stored in buoy metadata table, buoy observation cross-sectional data letter Breath table and this three classes table of buoy gridded data product information table, wherein buoy metadata table is used for storing all The metadata information of Argo buoy, the technical parameter of the most each Argo buoy, including WMO numbering, platform Number, transmission system, signal transmission repetitive rate, alignment system, manufacturer, section sample direction, sensor Information and cyclical information etc.;Buoy observation cross-sectional data information table is used for storing the relevant letter of all observation sections Breath, for improve search efficiency, this table by year divides multilist store, mainly include buoy ID, WMO number, Section period, profiling observation direction, date and longitude and latitude etc.;Buoy gridded data product information table For storing all gridded data product related informations, mainly include product category, product date, product Scope etc.;
5.2) non-structured JSON and PNG file data blocks is uploaded to HDFS and divides by data loading module Cloth storage system, and on multiple physical nodes, complete storage and redundancy backup, its data block access path Leave on the host node of cluster;
5.3) data loading module is deposited unified for metadata information corresponding with to it for data block access path simultaneously Storage, in PostgreSQL relevant database, sets up the index between data block and metadata with this, it is achieved Whole world Argo data are deposited in the mixing of HDFS distributed memory system and PostgreSQL relevant database Storage.
Described step 6), particularly as follows: the data that warehouse-in is completed by data filing module are filed, with day be Unit, forms journal file respectively according to three classifications of whole world Argo data.
The present invention compared with prior art has the beneficial effect that
1) instant invention overcomes current file storage mode and single-relation type data base cannot work in coordination with storage and have The deficiency that effect updates, puts down unified for different types of data file in the Argo data of the whole world to same data base Platform stores and updates, for whole world Argo data provide a kind of efficient, can the storage of flexible expansion With more new solution.
2) unstructured data during the present invention utilizes HDFS storage whole world Argo data, by each data Block copies to, on the multiple nodes in cluster, improve the fault-tolerance of data, it is possible to dynamically add or remove Node, it is ensured that the extensibility of data, efficient access and quick renewal for whole world Argo data provide Basic guarantee.
Accompanying drawing explanation
Fig. 1 is the flow chart of global Argo data based on hybrid database framework storage and update method;
Fig. 2 is hybrid database Organization Chart.
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention it is further elaborated and illustrates.The skill of each embodiment in the present invention Art feature, on the premise of not colliding with each other, all can carry out respective combination.
As it is shown in figure 1, the storage of a kind of global Argo data based on hybrid database framework and update method, Its step is as follows:
1) data monitor monitors assigned catalogue on teledata main frame, once has new Argo data literary composition Part generates, then by data file forwarding to data server.Particularly as follows:
Data monitor is a module residing on data server, and it can be periodically turned on thread, For connecting each teledata main frame that whole world Argo data are relevant, and judge to specify mesh by daily record document Whether there is new Argo data genaration in record, once have new file generated, then by this data file forwarding to number According on server, and record in daily record document.
2) the Argo data file being summarised on data server is divided into by data sorter according to file format Whole world Argo buoy metadata, whole world Argo buoy observation cross-sectional data and whole world Argo gridded data produce Three classifications of product.Particularly as follows:
The Argo data file being summarised on data server is classified by data sorter, DAT tray Formula be the whole world Argo buoy metadata, NetCDF file format for the whole world Argo buoy observation cross-sectional data, PNG file format for the whole world Argo gridded data product, thus this three class file is divided into respective Data center.
3) whether recording controller exists current data in checking data base, and whether checks content data file Completely.Particularly as follows:
Recording controller inquires about whether there is this number according to file name from PostgreSQL relevant database According to file, and judging that this data file is the most complete according to file size, there is not this if meeting in data base Data file and data file are complete, then can be identified as new data file, can put in storage;Otherwise can not enter Storehouse.The most predeterminable reduced value, is used for judging that data file is the most complete.Reduced value can be according at present The standard of international Argo file is determined, and normal section file size is all 38KB.
4) data extractor extracts relevant metadata and data block from data file.Particularly as follows:
Data extractor is from whole world Argo buoy meta data file, whole world Argo buoy observation cross-sectional data literary composition Part and whole world Argo gridded data product extract the metadata of correspondence respectively, floats from whole world Argo simultaneously Mark observation cross-sectional data file extracts blocks of data and is converted to JSON formatted file.
5) non-structured data block is uploaded to HDFS distributed memory system by data loading module, will knot The metadata record of structure is in PostgreSQL relevant database, and sets up between data block and metadata Index.Particularly as follows:
5.1) data loading module is by step 4) in the structurized metadata record that extracts arrive In PostgreSQL relevant database, it is mainly stored in buoy metadata table, buoy observation cross-sectional data letter Breath table and this three classes table of buoy gridded data product information table, wherein buoy metadata table is used for storing all The metadata information of Argo buoy, the technical parameter of the most each Argo buoy, including WMO numbering, platform Number, transmission system, signal transmission repetitive rate, alignment system, manufacturer, section sample direction, sensor Information and cyclical information etc.;Buoy observation cross-sectional data information table is used for storing the relevant letter of all observation sections Breath, for improve search efficiency, this table by year divides multilist store, mainly include buoy ID, WMO number, Section period, profiling observation direction, date and longitude and latitude etc.;Buoy gridded data product information table For storing all gridded data product related informations, mainly include product category, product date, product Scope etc.;
5.2) data loading module is by non-structured JSON and PNG file data blocks (the most aforementioned " JSON Formatted file " and " PNG file format for the whole world Argo gridded data product ") upload to HDFS and divide Cloth storage system, and on multiple physical nodes, complete storage and redundancy backup, its data block access path Leave on the host node of cluster;
5.3) data loading module is deposited unified for metadata information corresponding with to it for data block access path simultaneously Storage, in PostgreSQL relevant database, sets up the index between data block and metadata with this, it is achieved Whole world Argo data are deposited in the mixing of HDFS distributed memory system and PostgreSQL relevant database Storage.
6) data that warehouse-in completes are filed by data filing module, in units of day, according to whole world Argo Three classifications of data form journal file respectively.
Being further elaborated the present invention with embodiment below, the operating procedure of embodiment is consistent with said method, The most for purpose of brevity, part steps does not illustrates.
Embodiment
1) data monitor (ftp: //ftp.argo.org.cn/pub/ARGO) on teledata main frame is monitored and is referred to Determine catalogue, once have new Argo Generating Data File, such as listen to new data file 1900726_285.nc, then by this data file forwarding to data server;
2) data sorter by Argo data file 1900726_285.nc that is summarised on data server by It is judged as whole world Argo buoy observation cross-sectional data according to file format, this file is transferred in the data of correspondence The heart;
3) recording controller checks whether PostgreSQL relevant database exists current data, if not existing, Then can be identified as new data file, and judge that this content data file is the most complete according to file size, if should File size is 38KB, it is determined that this document content intact;
4) data extractor extracts relevant metadata and data block from 1900726_285.nc data file, Content metadata includes that WMO is numbered 1900726, platform number is 39506, transmission system is ARGOS, Alignment system is ARGOS, manufacturer is Webb, cycle period is 285, latitude and longitude information is 25.761 ° of south latitude The relevant informations such as 115.159 ° of west longitude;Data block contents then specifically observes profile information for extract, and changes For JSON formatted file 1900726_285.json;
5) non-structured 1900726_285.json file is uploaded to by data loading module as data block HDFS distributed memory system, is saved in each back end, structurized metadata record is arrived Buoy observation cross-sectional data information table in PostgreSQL relevant database, accesses road by data block simultaneously The footpath metadata information corresponding with to it is unified to be stored in PostgreSQL relevant database, sets up with this Index between data block and metadata, the most as shown in Figure 2;
6) data that warehouse-in completes are filed by data filing module, form journal file.
Embodiments described above is the one preferably scheme of the present invention, so itself and be not used to limit this Invention.About the those of ordinary skill of technical field, without departing from the spirit and scope of the present invention, Can also make a variety of changes and modification.The most all modes taking equivalent or equivalent transformation are obtained Technical scheme, all falls within protection scope of the present invention.

Claims (7)

1. global Argo data based on hybrid database framework storage and a update method, It is characterized in that its step is as follows:
1) data monitor monitors assigned catalogue on teledata main frame, once has new Argo Generating Data File, then by data file forwarding to data server;
2) data sorter by the Argo data file that is summarised on data server according to file Form is divided into whole world Argo buoy metadata, whole world Argo buoy observation cross-sectional data and the whole world Three classifications of Argo gridded data product;
3) whether recording controller exists current data in checking data base, and checks data literary composition Part content is the most complete;
4) data extractor extracts relevant metadata and data block from data file;
5) non-structured data block is uploaded to HDFS distributed storage by data loading module System, by structurized metadata record to PostgreSQL relevant database, and builds Vertical index between data block and metadata;
6) data that warehouse-in completes are filed by data filing module, form journal file.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 1) be: data monitor is One resides in the module on data server, and it can be periodically turned on thread, is used for connecting Each teledata main frame that whole world Argo data are relevant, and judge to specify by daily record document Whether there is new Argo data genaration in catalogue, once have new file generated, then by these data File is forwarded on data server, and records in daily record document.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 2) be: data sorter pair The Argo data file being summarised on data server is classified, and DAT file format is complete Ball Argo buoy metadata, NetCDF file format for the whole world Argo buoy observation section Data, PNG file format for the whole world Argo gridded data product, thus by this three class File is divided into respective data center.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 3) be: recording controller root From PostgreSQL relevant database, inquire about whether there is this data file according to file name, And judge that this data file is the most complete according to file size, there is not this if meeting in data base Data file and data file are complete, then can be identified as new data file, can put in storage.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 4) be: data extractor from Whole world Argo buoy meta data file, whole world Argo buoy observation cross-sectional data file and the whole world Argo gridded data product extracts the metadata of correspondence respectively, floats from whole world Argo simultaneously Mark observation cross-sectional data file extracts blocks of data and is converted to JSON formatted file.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 5) be:
5.1) data loading module is by step 4) in the structurized metadata record that extracts In PostgreSQL relevant database, it is mainly stored in buoy metadata table, buoy sight Survey cross-sectional data information table and this three classes table of buoy gridded data product information table, wherein buoy Metadata table is used for storing the metadata information of all Argo buoys, the most each Argo buoy Technical parameter, including WMO numbering, platform number, transmission system, signal transmission repetitive rate, Alignment system, manufacturer, section sample direction, sensor information and cyclical information;Buoy Observation cross-sectional data information table is used for storing all observation section relevant informations, for improving inquiry effect Rate, this table by year divides multilist to store, including buoy ID, WMO numbering, section period, Profiling observation direction, date and longitude and latitude;Buoy gridded data product information table is used for depositing Store up all gridded data product related informations, including product category, product date, product model Enclose;
5.2) non-structured JSON and PNG file data blocks is uploaded by data loading module To HDFS distributed memory system, and it is standby with redundancy to complete storage on multiple physical nodes Part, its data block access path leaves on the host node of cluster;
5.3) data loading module is simultaneously by data block access path and the metadata corresponding to it Information unification is stored in PostgreSQL relevant database, sets up data block and unit with this Index between data, it is achieved whole world Argo data at HDFS distributed memory system and The mixing storage of PostgreSQL relevant database.
A kind of global Argo based on hybrid database framework the most according to claim 1 Data storage and update method, it is characterised in that described step 6) be: data filing module The data completed by warehouse-in are filed, in units of day, according to the three of whole world Argo data Individual classification forms journal file respectively.
CN201610230748.XA 2016-04-13 2016-04-13 Global Argo data storage and update method based on mixed database architecture Pending CN105930381A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610230748.XA CN105930381A (en) 2016-04-13 2016-04-13 Global Argo data storage and update method based on mixed database architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610230748.XA CN105930381A (en) 2016-04-13 2016-04-13 Global Argo data storage and update method based on mixed database architecture

Publications (1)

Publication Number Publication Date
CN105930381A true CN105930381A (en) 2016-09-07

Family

ID=56838072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610230748.XA Pending CN105930381A (en) 2016-04-13 2016-04-13 Global Argo data storage and update method based on mixed database architecture

Country Status (1)

Country Link
CN (1) CN105930381A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
CN106372367A (en) * 2016-09-30 2017-02-01 浙江大学 Visual simulation method for Argo float ocean product
CN106599158A (en) * 2016-12-07 2017-04-26 国家海洋局第二海洋研究所 Quick query method of typhoon sea area Argo information based on space-time dual approximate index
CN107317838A (en) * 2017-05-24 2017-11-03 重庆邮电大学 A kind of astronomical metadata archiving method and system based on stream data processing framework
CN117669126A (en) * 2023-10-11 2024-03-08 宁波麦思捷科技有限公司武汉分公司 Large-scale buoy networking method and system for marine environment research

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2466978A1 (en) * 2004-05-07 2005-11-07 Isca Technologies, Inc. Method for pest management using pest identification sensors and network accessible database
CN102999592A (en) * 2012-11-19 2013-03-27 北京中海新图科技有限公司 B/S architecture based global Argo multi-source marine data management and visualization system and method
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2466978A1 (en) * 2004-05-07 2005-11-07 Isca Technologies, Inc. Method for pest management using pest identification sensors and network accessible database
CN102999592A (en) * 2012-11-19 2013-03-27 北京中海新图科技有限公司 B/S architecture based global Argo multi-source marine data management and visualization system and method
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
CN106372367A (en) * 2016-09-30 2017-02-01 浙江大学 Visual simulation method for Argo float ocean product
CN106599158A (en) * 2016-12-07 2017-04-26 国家海洋局第二海洋研究所 Quick query method of typhoon sea area Argo information based on space-time dual approximate index
CN107317838A (en) * 2017-05-24 2017-11-03 重庆邮电大学 A kind of astronomical metadata archiving method and system based on stream data processing framework
CN107317838B (en) * 2017-05-24 2020-11-17 重庆邮电大学 Astronomical metadata filing method and system based on streaming data processing architecture
CN117669126A (en) * 2023-10-11 2024-03-08 宁波麦思捷科技有限公司武汉分公司 Large-scale buoy networking method and system for marine environment research

Similar Documents

Publication Publication Date Title
CN105930381A (en) Global Argo data storage and update method based on mixed database architecture
CN109635068A (en) Mass remote sensing data high-efficiency tissue and method for quickly retrieving under cloud computing environment
CN103106152B (en) Based on the data dispatching method of level storage medium
CN104050196A (en) Point of interest (POI) data redundancy detection method and device
Corrado et al. Data structuring for the ontological modelling of urban energy systems: The experience of the SEMANCO project
CN106909644A (en) A kind of multistage tissue and indexing means towards mass remote sensing image
CN105160039A (en) Query method based on big data
CN104252665A (en) Method and system for managing marine environment monitoring data
CN105117502A (en) Search method based on big data
CN105303456A (en) Method for processing monitoring data of electric power transmission equipment
CN104077312A (en) Picture classification method and device
CN104346438A (en) Data management service system based on large data
CN108920499B (en) Space-time trajectory indexing and retrieval method for periodic retrieval
CN107562840A (en) A kind of typhoon track method for quick predicting based on GIS
CN105830041A (en) Metadata recovery method and apparatus
Peterson Discrete global grid systems
CN104034340A (en) Navigation system with deduper mechanism and method of operation thereof
CN105205167A (en) Log data system
CN104021210B (en) Geographic data reading and writing method of MongoDB cluster of geographic data stored in GeoJSON-format semi-structured mode
CN112579712A (en) Method and equipment for constructing polymorphic geographic entity data model and storage equipment
CN115964542A (en) Space-time big data mining method based on multi-view algorithm
CN105787090A (en) Index building method and system of OLAP system of electric data
CN114661744A (en) Terrain database updating method and system based on deep learning
Lv et al. Massive AIS data storage and query based on Hadoop platform
CN104516955A (en) Massive vehicle-mounted machine track data storage method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160907

RJ01 Rejection of invention patent application after publication