CN104765876B - Magnanimity GNSS small documents cloud storage methods - Google Patents

Magnanimity GNSS small documents cloud storage methods Download PDF

Info

Publication number
CN104765876B
CN104765876B CN201510204235.7A CN201510204235A CN104765876B CN 104765876 B CN104765876 B CN 104765876B CN 201510204235 A CN201510204235 A CN 201510204235A CN 104765876 B CN104765876 B CN 104765876B
Authority
CN
China
Prior art keywords
file
index
gnss
small documents
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510204235.7A
Other languages
Chinese (zh)
Other versions
CN104765876A (en
Inventor
吕志平
李林阳
陈正生
崔阳
黄令勇
王宇谱
吕浩
孙大双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201510204235.7A priority Critical patent/CN104765876B/en
Publication of CN104765876A publication Critical patent/CN104765876A/en
Application granted granted Critical
Publication of CN104765876B publication Critical patent/CN104765876B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to magnanimity GNSS small documents cloud storage methods, effectively solve the problems, such as magnanimity GNSS small documents efficient storage, management, issue and shared, method is that magnanimity GNSS small documents are merged into big file first, and the big file after merging is established and indexed;And optimum indexing block storage strategy, blocks of files after cutting and index block are stored on the node or the back end nearest from data block of data block, the index of GNSS data type is stored on name node, reduce the consumption of memory capacity and the memory consumption of name node, improve large amount of small documents write-in, the performance for accessing and deleting, the inventive method is simple, it is easy to operate, save memory space, reduce memory consumption, improve write-in, reading and deletion efficiency, effectively improve magnanimity GNSS small documents efficient storages, management, issue and shared purpose, it is that magnanimity GNSS small documents managerial one are innovated greatly, economic and social benefit is huge.

Description

Magnanimity GNSS small documents cloud storage methods
Technical field
The present invention relates to " Geodesy and Survey Engineering " technical field in " Surveying Science and Technology " subject, especially It is a kind of magnanimity GNSS small documents cloud storage method.
Background technology
With the continuous development of scientific technology, global, national, region class CORS net (CORS, Continuously Operating Reference Station System) constantly build up, GPS (GNSS, Global Navigation Satellite System) is widely used in every field, particularly integrates independent What type CORS was formed possesses more base stations, higher level associative form CORS networking successively and Continuous Observation, global satellite The scale of guidance system data amount is increasing.
The data of magnanimity bring challenges to storage and management, data latency processing more than a large amount of TB levels.Observed with GNSS Data instance, Continuous Observation 1 day, sample rate be 1 second, the data volume of only gps satellite just up to 80MB, there are up to ten thousand observations in the whole world Stand, the data volume of one day is just up to tens of to hundreds of TB;In addition, it is different from network log and remote sensing image, GNSS data species It is rich and varied with form, the category for being counted as GNSS data of the fruit as representative and belonging to small documents is conciliate using GNSS observation files.
The challenge brought for magnanimity GNSS small documents to storage and management, traditional storage area network (SAN, Storage Area Network) and network attached storage (NAS, Network-Attached Storage) in capacity and performance Extension on bottleneck be present.FTP (FTP, the File Transfer that GNSS data center uses at present Protocol) and there are many restrictions in relational database in terms of magnanimity GNSS data is managed, and centralised storage method can not Meet the needs of extensive GNSS data storage application.Domestic and international research institution and researcher store to mass small documents to be carried out Extensive concern and research, the document delivered mainly include:It is external《Journal of Network and Computer Applications》's《An Optimized Approach for Storing and Accessing Small Files on Cloud Storage》、《Web Information Systems and Mining》's《Metadata-Aware Small Files Storage Architecture on Hadoop》、《Algorithms and Architectures for Parallel Processing》's《Hmfs:Efficient Support of Small Files Processing over HDFS》, it is domestic《XI AN JIAOTONG UNIVERSITY Subject Index》's《A kind of scheme for improving cloud storage small file storage efficiency》、《Wuhan is big Learn journal information science version》's《A kind of combination RDBMS and Hadoop mass small documents storage method》With《Under cloud environment Space-time data small documents storage strategy》.
Existing solution has all been placed on focus the correlation inquired between metadata schema, analysis mass small documents Property, the structure of adjustment system and user access rule etc., but indexed to data type and feature and to merging part hereinafter Placement Strategy concern it is less, it is impossible to be entirely applied to the management of GNSS small documents.In face of the magnanimity using small documents as representative The storage demand of GNSS data, it is small with reference to GNSS data type and feature, design magnanimity GNSS using the cloud platform of increasing income of bottom File cloud storage method, turn into magnanimity GNSS small documents efficient storage, management, issue and shared effective way.
The content of the invention
For the above situation, to overcome the defect of prior art, it is small that the purpose of the present invention is just to provide a kind of magnanimity GNSS File cloud storage method, effectively solve the problems, such as magnanimity GNSS small documents efficient storage, management, issue and shared.
The technical scheme that the present invention solves is, the defects of for magnanimity GNSS small documents centralised storage methods and bottleneck, Increased income cloud platform (Hadoop) based on bottom, build and design magnanimity GNSS small documents cloud storage methods, realize that magnanimity GNSS is small The efficient cloud storage of file, magnanimity GNSS small documents are merged into big file first, index is established to the big file after merging;And Optimum indexing block storage strategy, the blocks of files after cutting and index block are stored in the node of data block or nearest from data block On back end (DataNode), the index of GNSS data type is stored on name node (NameNode), reduces storage The consumption of capacity and the memory consumption of name node (NameNode), the performance for improve large amount of small documents write-in, accessing and deleting, Specifically include following steps:
(1) magnanimity GNSS small documents, are merged into big file, to reduce large amount of small documents to name node (NameNode) The occupancy of internal memory, it is first to merge same observation period or resolving time, same type of file that small documents, which merge,;Its In to GNSS observe file merging when, merged by four alphabetical sequencings of survey station name, to resolve achievement text During the merging of part, merged by three alphabetical sequencings of GNSS analysis centers title, a large amount of GNSS observation files are closed And big file is continuously observed as an observation period, will resolve Outcome Document merging turns into a resolving time Sequentially continuous The big file of resolving achievement;
(2), to the big file structure indexes of GNSS after merging, i.e., observation file is conciliate respectively and is counted as fruit structure index, Using character with indexing one-to-one mode, to observing file, Pyatyi rope is built by file sequence number, year day of year and survey station name Draw, the positional information of storage observation file in afterbody index;To resolving Outcome Document, by day and analysis in GPS weeks, week Center names build six grades of indexes, the positional information of storage resolving Outcome Document in afterbody index;
(3) index of foundation, is subjected to cutting by data block size, due to software can be handled by one by GNSS data Observation data in it merge, therefore file sequence number can be unified for 0, and corresponding file first order index file sequence number of observing also is 0, When indexing cutting, to observing the second of file to level V index, resolving first to the 6th grade of achievement index, take from lower and On mode, the size of computation index, by its cutting be 64MB sizes index block;
(4), index block is placed on the node or the node nearest from data block of data storage block, file is improved and reads Speed and the memory consumption for further reducing name node (NameNode);
(5), the index of the file type of the big files of GNSS after merging is stored on name node (NameNode), file Block map paths and sign observation file are stored in name node with resolving three characters/index block map paths of achievement type (NameNode) on, blocks of files and index block are stored on back end (DataNode), realize magnanimity GNSS small documents Cloud storage.
The inventive method is simple, easy to operate, saves memory space, reduces memory consumption, improves write-in, reads and deletes effect Rate, magnanimity GNSS small documents efficient storage, management, issue and shared purpose are effectively improved, is to magnanimity GNSS small documents pipes A big innovation in reason, economic and social benefit are huge.
Brief description of the drawings
Fig. 1 is small documents storage platform functional schematic of the present invention.
Fig. 2 is present invention observation file index structure figure.
Fig. 3 resolves achievement index construct figure for the present invention.
Fig. 4 conciliates for present invention observation file is counted as fruit file storage location schematic diagram.
Embodiment
The embodiment of the present invention is elaborated below in conjunction with accompanying drawing.
As shown in Fig. 1-4, the present invention comprises the following steps in specific implementation:
Step 1:Magnanimity GNSS small documents are merged into big file, to reduce large amount of small documents to name node (NameNode) occupancy of internal memory, magnanimity GNSS small documents include following two type files:One kind is, to observe data, lead Boat ephemeris and the observation file that meteorological file is representative, another kind of is using coordinate file, precise ephemeris, precise clock correction as representative Resolve Outcome Document;Either observation file still resolves Outcome Document, all employs the reference format of international uniform, observation text The DIF (RINEX, Receiver Independent Exchange Format) unrelated with receiver of part use, solution It is counted as the unrelated achievement DIF of resolving (SINEX, the Solution (Software/technique) of fruit use Independent Exchange Format), ionosphere DIF (IONEX, Ionosphere Exchange Format) With precise ephemeris data format (SP3, NGS Standard GPS Format) form, n GNSS small documents are stored in system, Every part of GNSS small documents all include three kinds of position, time and file type parameters, are made a distinction between data by parameter, GNSS Small documents data set D is expressed as:
D={ d (Li,Tj,Ik),d|Li∈L,Tj∈T,Ik∈ I }, i, j, k ∈ Z formulas (1)
Wherein, L represents positional information caused by file, main to include gathering the survey station of observation file and resolve Outcome Document Mechanism;T represents time mark caused by file, because the 24h Continuous Observations of survey station and the timing of data center continuously resolve With issue, T is a continuous time series;I represents file type, is defined by above-mentioned reference format, and L and T are from file The top of file of name and file record, which is separately won, to be taken, and the top of file of I from file extension and file record, which is separately won, to be taken;D presenting sets Close, i, j, k represent the sequence number of document location, time and type parameter respectively, and Z is integer;
When small documents merge, first by same observation period or resolving time, same type of file, by survey station name four Character, the sequencing of three characters of analysis center's title merge, GNSS small documents collection after mergingIt is expressed as:
Formula (2)
Wherein, TjRepresent j-th of observation period or resolve moment, IkK-th of file type is represented, Z is integer;
Then, the small documents of each type are merged by continuous observation period or resolving time sequence respectively, because For in the measurement of GNSS small documents, all solutions are of universal significance, therefore the observation file of continuous 7 days and the day of 7 days are solved respectively Piece file mergence is a big file,It is represented by:
Formula (3)
The merging of two steps more than, you can it is continuous into an observation period that the GNSS of continuous 7 days is observed into Piece file mergence The big file of observation, the resolving Outcome Document of 7 days is merged into the big file of resolving achievement of a resolving time Sequentially continuous;Greatly The filename of file is with file type, the observation of starting and ending or resolving time, first and end survey station name or analysis center Name mark, the file after merging are stored in cloud storage system by the way of piecemeal, and data block is dimensioned to 64MB, often Individual data block is the set of multiple small documents, and takes name node (NameNode) 150B memory headroom, before relatively merging Each small documents take 150B memory headroom compare, substantially reduce name node (NameNode) memory consumption;
Described magnanimity GNSS small documents include GNSS observation files and resolve Outcome Document, and these files all follow the world Unified reference format, because GNSS data and achievement form can constantly upgrade, therefore, to the file format after upgrading and newest The file type of proposition, it can bring the category of GNSS small documents into;
Described same observation period or resolving time, same type of file merge, and can also press phase respectively first The same observation period resolving date merges, then is merged by continuous observation period or resolving cycle, the text of big file Part name is referred to as with file type, the observation of starting and ending or resolving time, first and end survey station title or analysis center's name To mark, it is stored in after big Piece file mergence in cloud storage system by the way of piecemeal, data block is dimensioned to 64MB, often Individual data block is the set of multiple GNSS small documents;
Step 2:To the big file structure indexes of GNSS after merging, i.e., observation file is conciliate respectively and be counted as fruit by L and T structures Index, method is:
During to observation file structure index, because observation file is using the preservation of RINEX forms, RINEX forms are using 8.3 Naming method, wherein 8 represent the root name of 8 character lengths for representing file ownership, 3 represent for representing file type 3 character lengths extension name, concrete form ssssdddf.yyt, wherein ssss represent the survey station name of 4 character lengths, Ddd represents year day of year, and f represents the file sequence number of one day;Intraday file sequence number is represented using character f, character string ddd is represented Year day of year, character string ssss represent survey station name, from top to down, take character and index one-to-one mode, build Pyatyi rope Draw, in the positional information of the end node storage observation file of afterbody index;First order index is file sequence number, indexes model Enclose by [0,9] and [a, z] two section compositions, [0,9] represents 10 Arabic integers, and [a, z] represents 26 english lowercase words Mother, second level index are hundred of year day of year, and index range is [0,3], and [0,3] represents 4 Arabic integers, third level index Ten of year day of year are corresponded to, index range is [0,9], and fourth stage index is the individual position of year day of year, and index range is [0,9], Level V index is the survey station name of four character lengths, and the scope of each character falls in [0,9] and [a, z] two sections;
To resolving Outcome Document, preserved in the form of sssddddd.ttt, wherein sss represents three words of analysis center Referred to as, preceding four d in ddddd are represented from GPS weeks that January 6,0h was started in 1980, last d represents day in week to symbol, Ttt represents the type for resolving achievement, six grades of indexes is built by day in GPS weeks, week and analysis center's title, in the end section of index Point storage resolve Outcome Document positional information, the first order to the fourth stage index be respectively GPS weeks kilobit, hundred, ten and Individual position, index range are [0,9], and level V index is day in GPS weeks, and index range is [0,7], wherein the 7 of [0,6] representative Individual integer represents the day solution file of one week, and numeral 7 represents week solution file, the 6th grade of analysis institution's name indexed as three character lengths Claim, the equal scope of each character falls in [a, z] section;
Described observation file and resolving Outcome Document builds Pyatyi and six grades of indexes respectively, and the foundation of index follows standard File format, the routing information of storage file in afterbody index;
Step 3:The index of foundation is subjected to cutting by data block 64MB, to observing file, due to GNSS numbers can be passed through Intraday observation data are merged according to processing software, therefore file sequence number is unified for 0, corresponding first order index file sequence Number also it is 0;When indexing cutting, to observing the second of file to level V index, resolving first to the 6th grade of achievement index, adopt Mode from bottom to top, the size of computation index block are taken, current i-1 and the i size (IBlock) indexed meet following formula
Formula (4)
First i-1 index is saved as into an independent index block, in such a way, completes the institute built to step 2 There is the cutting of index;
Step 4:Index block is placed on the back end (DataNode) of data storage block or the number nearest from data block According on node (DataNode), improving reading speed and further reducing the memory consumption of name node (NameNode), will cut Point the content of index block matched with the title of GNSS large file blocks after merging, take during matching from top to bottom by The mode of level matching, when branch occurs in index, there is the ratio shared by each index character of bifurcation in statistics, will account for index block The maximum character of ratio is matched with data block in back end (DataNode), using matching rate highest node as index The memory node of block;When index block is placed on the node of data storage block or the node nearest from data block, on the one hand, reduce Communication overhead during digital independent, that is, find on node local or adjacent again after some is indexed with regard to corresponding file can be found Content, improve reading speed;On the other hand, due to indexing and being not stored on name node (NameNode), but in data On node (DataNode), therefore further reduce the memory consumption of name node (NameNode);
Step 5:The index of the file type of the big files of GNSS after merging is stored on name node (NameNode), File is observed to GNSS, the index being stored on name node (NameNode) removes the file type represented comprising a letter Outside, the rear two digits in observation year on date are also included;To resolving Outcome Document, the rope being stored on name node (NameNode) Draw the file type only represented comprising three letters;Therefore, the data block copy amount except storage and big File name/path Mapping, the file type/index block path being made up of three bit digitals or letter also are stored on name node (NameNode), from And realize magnanimity GNSS small documents cloud storages.
The present invention can also be realized in specific implementation by following methods:
Provided by Fig. 1, the invention mainly comprises a name node (NameNode) to be used as host node, several data sections Point (DataNode) includes being responsible for as blocks of files and the memory node of index block, the task of each back end (DataNode) Small documents merge and the structure of index block, and a certain specific back end (DataNode) is responsible for merging and the index block of index Cutting, comprise the concrete steps that:
1) magnanimity GNSS small documents are merged:Magnanimity GNSS small documents include GNSS observations file, resolve the class of Outcome Document two, Observation file is received via all kinds of receivers, the file structure for the standard RINEX forms being converted into through Data Format Conversion Software Into main include RINEX 2.0 and 3.0 two kind of form, observation data of the file type including multisystem multifrequency, each system are led Navigate ephemeris, satellite clock correction and observation summary (summary file) four class files;Resolving Outcome Document includes precise ephemeris, precision Clock correction, earth rotation parameter (ERP), satellite yaw rate and coordinate file etc., be by international GNSS Servers Organizations (IGS, International GNSS Service) each analysis center resolves to obtain using high-precision GNSS data processing software, form Follow SP3, SINEX, IONEX standard;
Observation file corresponds to corresponding observation period, comprising information such as initial time, end time and sampling intervals, because This first can be merged the observation file of identical period by survey station name;Then continuous observation time sequence is pressed, is merged different The observation file of observation period;Resolve achievement to correspond to by the period of resolved data, during starting and ending comprising resolved data Between, therefore the identical period can be observed and achievement merging is resolved corresponding to data, when merging different according still further to the continuous resolving cycle The resolving achievement of phase, the filename of big file are surveyed with file type, the observation of starting and ending or resolving time, first and end Station name or analysis center's name are referred to as marking;
Each back end (DataNode) is responsible for completing the merging of the node small documents;
2) the observation file after merging is conciliate respectively and is counted as fruit structure index:During to observation file structure index, due to Observation data generally use RINEX forms, and RINEX forms use 8.3 naming method, wherein 8 represent for representing that file is returned The root name of 8 character lengths of category, 3 represent the extension name of 3 character lengths for representing file type, and concrete form is Ssssdddf.yyt, therefore the year day of year and word that the intraday file sequence number that is represented using character f, character string ddd are represented The survey station name that symbol string ssss is represented, from top to down, character is taken with indexing one-to-one mode, structure Pyatyi index, most The routing information of the end node storage observation file of rear stage index;As shown in Fig. 2 observation file indexes, first order index is File sequence number, index range are made up of [0,9] and [a, z] two sections, and [0,9] represents 10 Arabic integers, and [a, z] is represented 26 English lower cases, second level index is hundred of year day of year, and index range is [0,3], and [0,3] represents 4 Arab Integer, third level index correspond to ten of year day of year, and index range is [0,9], and the fourth stage is indexed as the individual position of year day of year, rope It is [0,9] to draw scope, and level V index is the survey station name of four character lengths, and the scope of each character falls in [0,9] and [a, z] In two sections;
To resolving Outcome Document, preserved in the form of sssddddd.ttt, wherein sss represents three words of analysis center Referred to as, preceding four d in ddddd are represented from GPS weeks that January 6,0h was started in 1980, last d represents day in week to symbol, Ttt represents the type for resolving achievement, six grades of indexes is built by day in GPS weeks, week and analysis center's title, in the end section of index Point storage resolves the positional information of Outcome Document;As shown in Fig. 3 resolving Outcome Document indexes, the first order to the fourth stage, which indexes, to be distinguished For the kilobit of GPS weeks, hundred, ten and individual position, index range is [0,9], and level V index is day in GPS weeks, indexes model Enclose for [0,7], wherein day solve file of [0,6] 7 integers representing as one week, numeral 7 represent week solution file, the 6th grade indexes For analysis institution's title of three character lengths, the scope of each character falls in [a, z] section;
Each back end (DataNode) is responsible for completing the structure of node small documents index;After the completion of index construct, The merging of index is completed in another specific back end (DataNode);
3) cutting index block, the index that second step is established is subjected to cutting by data block size (64MB), to observing file, Intraday observation data are merged due to software can be handled by GNSS data, therefore file sequence number can be unified for 0, Corresponding first order index file sequence number is also 0, when indexing cutting, to observing the second of file to level V index, resolving achievement First to the 6th grade of index, take mode from bottom to top, the size of computation index block, when index size exceedes data first During the size of block, an index is returned to, this index is saved as into an independent index block, in such a way, completion pair The cutting of all indexes of second step structure;
Index block is that the back end (DataNode) for merging index in second step is completed with cutting;
4) index block is stored, and the index block that the 3rd step segments is stored in the back end of corresponding data block (DataNode) it is or on the back end nearest from data block (DataNode), the content of index block and the GNSS after merging is big The title of file data blocks is matched, and takes matching way step by step during matching, and when branch occurs in index block, statistics is divided The ratio shared by each index character at branch, the maximum character of this grade of index ratio and number in back end (DataNode) will be accounted for Matched according to the title of block, the memory node using matching rate highest node as the index block;
5) file type is indexed/index block path is stored on name node (NameNode), such as Fig. 4 observation files and Resolve Outcome Document storage location to illustrate shown in schematic diagram, file is observed to GNSS, is stored on name node (NameNode) Index except comprising one letter represent file type in addition to, also comprising observation year on date rear two digits;To resolving achievement File, the file type that the index being stored on name node (NameNode) only represents comprising three letters;By file type Index is stored on name node (NameNode) with the one-to-one address of cache of index block, completes the index of above-mentioned structure Mapping, therefore, except the data block copy amount and big File name/map paths of storage, by three bit digitals or alphabetical group Into file type/index block path also be stored on name node (NameNode), so as to realize magnanimity GNSS small documents clouds Storage.
The foregoing is only a preferred embodiment of the present invention, protection scope of the present invention not limited to this, any ripe Those skilled in the art are known in the technical scope of present disclosure, the letter for the technical scheme that can be become apparent to Altered or equivalence replacement are each fallen within protection scope of the present invention.
From the foregoing, the present invention is a kind of method of new magnanimity GNSS small documents cloud storages, support to magnanimity GNSS The efficient storages of small documents, management, inquiry and shared.The cluster that experiment is formed by building 9 nodes, 1 is used as title Node (NameNode), remaining 8 as back end (DataNode), number of copies is arranged to 3, tests magnanimity GNSS small documents Write-in, reading and delete speed.By test, small documents storage method proposed by the present invention compared with traditional HDFS methods, Memory space is greatlyd save, memory consumption reduces 1/2, and writing speed improves about 4 times, and reading speed improves about 3 times, deletes Except speed improves about 2.5 times.The effect of practical application and the scale of storage system, the performance of each node, network environment, data The difference of size and type etc. is closely related.Therefore the present invention compared with prior art, there is Advantageous following prominent to imitate Fruit:
(1) memory space is saved
According to GNSS data type and data characteristicses, the observation data to the Continuous Observation period are conciliate to be counted as the present invention Fruit, the strategy for being merged into big file is taken, improve Hadoop distributed file systems (HDFS, Hadoop Distributed File System) in each small documents take the situation of whole data block space, data of the big file after cutting after merging Block takes the size of a data block, effectively saves back end (DataNode) memory space, improves memory space Utilization rate.
(2) memory consumption is reduced
Proposed by the present invention conciliate according to GNSS observation files is counted as fruit naming rule, and rope is established to the big file after merging Draw, in the path that the end node storage file of index preserves.On the one hand, small documents are merged, storage system can be greatly reduced The quantity of middle data block, reduce name node (NameNode) memory cost;On the other hand, the big file after merging is built Lithol draws and after cutting, index block is stored in back end (DataNode), name node (NameNode) only saves The mapping of rope file type/index path of file extension and the mapping of big File name/file path, are further reduced The memory consumption of name node (NameNode).
(3) write-in, reading and deletion efficiency are improved
Method proposed by the present invention establishes the method indexed by merging GNSS small documents, to the file after merging, establishes Efficient memory mechanism, reduces client and name node (NameNode), name node (NameNode) and data section Communication between point (DataNode), client and back end (DataNode), reduce the response time of inquiry and retrieval. Improve write-in, reading and deletion efficiency.
(4) it is easy to extend
Method proposed by the present invention has wide applicability, and being counted as fruit to all kinds of GNSS observation file reconciliation passes through conjunction And, establish index and piecemeal after, efficient storage can be realized.To newly-increased GNSS data and achievement form, according to data class Type and feature merge, and after the steps such as structure index, piecemeal storage, can all include the small documents storage system of the present invention System, can have broad applicability and stronger autgmentability, solve bottleneck and challenge that existing GNSS small documents storage faces, band Efficient storage efficiency is carried out, " Geodesy and Survey Engineering " skill being efficiently applied in " Surveying Science and Technology " subject Art field, realize magnanimity GNSS small documents efficient storage, management, issue and share, economic and social benefit is huge.

Claims (5)

  1. A kind of 1. magnanimity GNSS small documents cloud storage method, it is characterised in that magnanimity GNSS small documents are merged into big text first Part, index is established to the big file after merging;And optimum indexing block storage strategy, the blocks of files after cutting and index block are stored On the node or the back end nearest from data block of data block, the index of GNSS data type is stored in name node On, the consumption of memory capacity and the memory consumption of name node are reduced, the property for improve large amount of small documents write-in, accessing and deleting Can, specifically include following steps:
    (1) magnanimity GNSS small documents, are merged into big file, it is small to reduce occupancy of the large amount of small documents to name node internal memory Piece file mergence is first to merge same observation period or resolving time, same type of file;Wherein seen to GNSS When surveying the merging of file, merge by four alphabetical sequencings of survey station name, in the merging to resolving Outcome Document, press Three alphabetical sequencings of GNSS analysis centers title are merged, and a large amount of GNSS observation Piece file mergences are turned into an observation Period continuously observes big file, will resolve Outcome Document and merges as the big text of resolving achievement of a resolving time Sequentially continuous Part;
    (2), to the big file structure indexes of GNSS after merging, i.e., observation file is conciliate respectively and is counted as fruit structure index, used Character is with indexing one-to-one mode, to observing file, is indexed by file sequence number, year day of year and survey station name structure Pyatyi, The positional information of storage observation file in afterbody index;During to observation file structure index, first order index is file sequence Number, index range is made up of [0,9] and [a, z] two sections, and [0,9] represents 10 Arabic integers, and [a, z] represents 26 English Literary lowercase, second level index is hundred of year day of year, and index range is [0,3], and [0,3] represents 4 Arabic integers, the Three level list corresponds to ten of year day of year, and index range is [0,9], and the fourth stage is indexed as the individual position of year day of year, index range For [0,9], level V index is the survey station name of four character lengths, and the scope of each character falls in [0,9] and [a, z] Liang Ge areas In;To resolving Outcome Document, six grades of indexes are built by day in GPS weeks, week and analysis center's title, in afterbody index Storage resolves the positional information of Outcome Document;To resolving Outcome Document, the first order to fourth stage index is respectively the thousand of GPS weeks Position, hundred, ten and individual position, index range are [0,9], and level V index is day in GPS weeks, and index range is [0,7], its In [0,6] represent 7 integers represent one week day solution file, numeral 7 represents week solution file, the 6th grade index be three characters length Analysis institution's title of degree, the equal scope of each character fall in [a, z] section;
    (3) index of foundation, is subjected to cutting by data block size, due to software can be handled by one day by GNSS data Observation data merge, therefore file sequence number can be unified for 0, and corresponding file first order index file sequence number of observing also is 0, index During cutting, to observing the second of file to level V index, resolving first to the 6th grade of achievement index, take from bottom to top Mode, the size of computation index, by the index block that its cutting is 64MB sizes;
    (4), index block is placed on the node or the node nearest from data block of data storage block, improves file reading speed And further reduce the memory consumption of name node;
    (5), the index of the file type of the big files of GNSS after merging is stored on name node, blocks of files map paths and table Sign observation file is stored on name node with resolving three characters/index block map paths of achievement type, blocks of files and rope Draw block to be stored on back end, realize the cloud storage of magnanimity GNSS small documents.
  2. 2. magnanimity GNSS small documents cloud storage method according to claim 1, it is characterised in that comprise the following steps:
    Step 1:Magnanimity GNSS small documents are merged into big file, to reduce occupancy of the large amount of small documents to name node internal memory, Magnanimity GNSS small documents include following two type files:One kind is, gentle as file is representative to observe data, navigation ephemeris Observation file, it is another kind of be using coordinate file, precise ephemeris, precise clock correction as representative resolving Outcome Document;Either see Survey file and still resolve Outcome Document, all employ the reference format of international uniform, observation file uses unrelated with receiver DIF, resolve achievement use the unrelated achievement DIF of resolving, ionosphere DIF and precise ephemeris data lattice Formula form, n GNSS small documents are stored in system, and every part of GNSS small documents all include position, time and the seed ginseng of file type three Count, made a distinction between data by parameter, GNSS small documents data sets D is expressed as:
    D={ d (Li,Tj,Ik),d|Li∈L,Tj∈T,Ik∈ I }, i, j, k ∈ Z formulas (1)
    Wherein, L represents positional information caused by file, the main survey station for including collection observation file and the machine for resolving Outcome Document Structure;T represents time mark caused by file, and due to the 24h Continuous Observations of survey station and the timing of data center continuously resolves and hair Cloth, T are a continuous time serieses;I represents file type, is defined by above-mentioned reference format, L and T from filename and The top of file of file record, which is separately won, to be taken, and the top of file of I from file extension and file record, which is separately won, to be taken;D, which is represented, to be gathered, i, J, k represents the sequence number of document location, time and type parameter respectively, and Z is integer;
    When small documents merge, first by same observation period or resolving time, same type of file, by four words of survey station name Symbol, the sequencing of three characters of analysis center's title merge, GNSS small documents collection after mergingIt is expressed as:
    Wherein, TjRepresent j-th of observation period or resolve moment, IkK-th of file type is represented, Z is integer;
    Then, the small documents of each type are merged by continuous observation period or resolving time sequence respectively, because In the measurement of GNSS small documents, all solutions are of universal significance, therefore the observation file of continuous 7 days and the day of 7 days are solved into file respectively A big file is merged into,It is represented by:
    The merging of two steps more than, you can continuously see the GNSS observation Piece file mergences of continuous 7 days into an observation period Big file is surveyed, the resolving Outcome Document of 7 days is merged into the big file of resolving achievement of a resolving time Sequentially continuous;Big file Filename with file type, the observation of starting and ending or resolving time, first and end survey station name or analysis center's identifier Note;File after merging is stored in cloud storage system by the way of piecemeal, and data block is dimensioned to 64MB, per number It is the set of multiple small documents according to block, and takes name node 150B memory headroom, each small documents before relatively merging accounts for Compared with 150B memory headroom, substantially reduce the memory consumption of name node;
    Step 2:To the big file structure indexes of GNSS after merging, i.e., observation file is conciliate respectively and be counted as fruit by L and T structure ropes Draw, method is:
    During to observation file structure index, preserved due to observing file using RINEX forms, RINEX forms use 8.3 name Mode, wherein 8 represent for represent file ownership 8 character lengths root name, 3 represent for represent file type 3 The extension name of position character length, concrete form ssssdddf.yyt, wherein ssss represent the survey station name of 4 character lengths, ddd generations Table year day of year, f represents the file sequence number of one day;Intraday file sequence number is represented using character f, character string ddd represents year product Day, character string ssss represents survey station name, from top to down, takes character to be indexed with indexing one-to-one mode, structure Pyatyi, The positional information of the end node storage observation file of afterbody index;First order index is file sequence number, index range by [0,9] and [a, z] two section compositions, [0,9] represents 10 Arabic integers, and [a, z] represents 26 English lower cases, the Secondary index is hundred of year day of year, and index range is [0,3], and [0,3] represents 4 Arabic integers, and third level index is corresponding For ten of year day of year, index range is [0,9], and fourth stage index is the individual position of year day of year, and index range is [0,9], the 5th Level index is the survey station name of four character lengths, and the scope of each character falls in [0,9] and [a, z] two sections;
    To resolving Outcome Document, preserved in the form of sssddddd.ttt, wherein sss represents three character letters of analysis center Claim, preceding four d in ddddd are represented from GPS weeks that January 6,0h was started in 1980, last d and represented day in week, ttt generations Table resolves the type of achievement, builds six grades of indexes by day in GPS weeks, week and analysis center's title, is deposited in the end node of index The positional information of storage resolving Outcome Document, the first order to the fourth stage index kilobit of respectively GPS weeks, hundred, ten and individual position, Index range is [0,9], and level V index is day in GPS weeks, and index range is [0,7], wherein 7 of [0,6] representative are whole Number represents the file that solves day of one week, and numeral 7 represents week solution file, and the 6th grade of index is analysis institution's title of three character lengths, The scope of each character is all fallen within [a, z] section;
    Step 3:The index of foundation is subjected to cutting by data block 64MB, to observing file, at can be by GNSS data Reason software merges intraday observation data, therefore file sequence number is unified for 0, corresponding first order index file sequence number For 0;When indexing cutting, to observing the second of file to level V index, resolving first to the 6th grade of achievement index, take from The size of mode on down, the size of computation index block, current i-1 and i index meets following formula
    First i-1 index is saved as into an independent index block, in such a way, completes all ropes built to step 2 The cutting drawn;
    Step 4:Index block is placed on the back end or the back end nearest from data block of data storage block, improves and reads Take speed and further reduce name node memory consumption, by the content of the index block of cutting with merge after the big files of GNSS The title of data block is matched, and is taken the mode matched step by step from top to bottom during matching, when branch occurs in index, is counted Ratio shared by each index character of existing bifurcation, data block in the character and back end that account for index block ratio maximum is carried out Matching, the memory node using matching rate highest node as index block;When index block be placed on data storage block node or During nearest from data block node, on the one hand, reduce communication overhead during digital independent, that is, find after some index in local Or corresponding file content just can be found on adjacent node, improve reading speed;On the other hand, due to indexing and being not stored in On name node, but on back end, therefore further reduce the memory consumption of name node;
    Step 5:The index of the file type of the big files of GNSS after merging is stored on name node, text is observed to GNSS Part, the index on name node is stored in addition to the file type represented comprising a letter, after also including observation year on date Two digits;To resolving Outcome Document, the index on name node is stored in only comprising three alphabetical file types represented;Cause This, is except the data block copy amount and big File name/map paths of storage, the files classes being made up of three bit digitals or letter Type/index block path also is stored on name node, so as to realize magnanimity GNSS small documents cloud storages.
  3. 3. magnanimity GNSS small documents cloud storage method according to claim 2, it is characterised in that described step 1 magnanimity GNSS small documents include GNSS observation files and resolve Outcome Document, and these files all follow the reference format of international uniform, by Can constantly it upgrade in GNSS data and achievement form, therefore, to the file format after upgrading and the file type of newest proposition, The category of GNSS small documents can be brought into.
  4. 4. magnanimity GNSS small documents cloud storage method according to claim 2, it is characterised in that described step 1 is same Observation period or resolving time, same type of file merge, and can also resolve day by identical observation period respectively first Phase merges, then is merged by continuous observation period or resolving cycle, and the filename of big file is with file type, starting Observation or resolving time, first and end survey station title or analysis center's name with end are referred to as marking, after big Piece file mergence It is stored in cloud storage system by the way of piecemeal, data block is dimensioned to 64MB, and each data block is multiple GNSS The set of small documents.
  5. 5. magnanimity GNSS small documents cloud storage method according to claim 2, it is characterised in that described step 2 is observed File and resolving Outcome Document build Pyatyi and six grades of indexes respectively, and the foundation of index follows Standard File Format, at last The positional information of storage file in level index.
CN201510204235.7A 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods Expired - Fee Related CN104765876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510204235.7A CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510204235.7A CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Publications (2)

Publication Number Publication Date
CN104765876A CN104765876A (en) 2015-07-08
CN104765876B true CN104765876B (en) 2017-11-10

Family

ID=53647703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510204235.7A Expired - Fee Related CN104765876B (en) 2015-04-24 2015-04-24 Magnanimity GNSS small documents cloud storage methods

Country Status (1)

Country Link
CN (1) CN104765876B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608212B (en) * 2015-12-30 2020-02-07 成都国腾实业集团有限公司 Method and system for ensuring that MapReduce data input fragment contains complete record
CN106970928B (en) * 2016-01-14 2020-12-29 平安科技(深圳)有限公司 File management method and system
CN105843841A (en) * 2016-03-07 2016-08-10 青岛理工大学 Small file storage method and system
CN107402924A (en) * 2016-05-19 2017-11-28 普天信息技术有限公司 MR files apply the implementation method and device in HDFS
CN106528451B (en) * 2016-11-14 2019-09-03 哈尔滨工业大学(威海) The cloud storage frame and construction method prefetched for the L2 cache of small documents
CN107391423A (en) * 2017-07-26 2017-11-24 Tcl移动通信科技(宁波)有限公司 Method, storage medium and the mobile terminal of file are transmitted by OTG functions
CN109947703A (en) * 2017-11-09 2019-06-28 北京京东尚科信息技术有限公司 File system, file memory method, storage device and computer-readable medium
CN109947721B (en) * 2017-12-01 2021-08-17 北京安天网络安全技术有限公司 Small file processing method and device
CN108460121B (en) * 2018-01-22 2022-02-08 重庆邮电大学 Little file merging method for space-time data in smart city
CN108470577B (en) * 2018-02-02 2021-07-27 重庆金山医疗器械有限公司 Capsule endoscopy system data storage method
CN109033137B (en) * 2018-06-06 2021-11-05 千寻位置网络有限公司 Dynamic RINEX data storage method and device
CN109800184B (en) * 2018-12-12 2024-06-25 平安科技(深圳)有限公司 Caching method, system, device and storable medium for small block input
CN110795391A (en) * 2019-10-28 2020-02-14 深圳市元征科技股份有限公司 Automobile repair data processing method and device, electronic equipment and storage medium
CN111159120A (en) * 2019-12-16 2020-05-15 西门子电力自动化有限公司 Method, device and system for processing files in power system
CN111461537A (en) * 2020-03-31 2020-07-28 山东胜软科技股份有限公司 Oil gas production data based classified quantity counting method and control system
CN111475463B (en) * 2020-04-01 2023-02-24 中国人民解放军火箭军工程大学 GNSS observation data digital relation storage method
CN111400247B (en) * 2020-04-13 2023-08-01 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN112347045B (en) * 2020-11-30 2022-07-26 长春工程学院 Storage method of mass cable tunnel state signal data
CN113032348A (en) * 2021-05-25 2021-06-25 湖南省第二测绘院 Spatial data management method, system and computer readable storage medium
CN113420186B (en) * 2021-06-18 2022-10-04 自然资源部第三地形测量队 Data storage method, data storage device, computer readable storage medium and data reading method
CN114416811A (en) * 2021-12-07 2022-04-29 中国科学院国家授时中心 Distributed storage system for GNSS data
CN116150113A (en) * 2023-04-17 2023-05-23 江苏北斗信创科技发展有限公司 Data storage method for GNSS

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
WO2014000458A1 (en) * 2012-06-28 2014-01-03 华为技术有限公司 Small file processing method and device
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system
CN104346384A (en) * 2013-07-31 2015-02-11 上海云端广告有限公司 Method and device for processing small files

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
WO2014000458A1 (en) * 2012-06-28 2014-01-03 华为技术有限公司 Small file processing method and device
CN104346384A (en) * 2013-07-31 2015-02-11 上海云端广告有限公司 Method and device for processing small files
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An optimized approach for storing and accessing small files on cloud storage;Bo Dong等;《Journal of Network and Computer Applications》;20120724;第35卷(第6期);第1847-1862页 *
基于Hadoop的海量小文件存储方法的研究;时倩等;《数字技术与应用》;20140115(第01期);第50,52页 *

Also Published As

Publication number Publication date
CN104765876A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104765876B (en) Magnanimity GNSS small documents cloud storage methods
CN109635068A (en) Mass remote sensing data high-efficiency tissue and method for quickly retrieving under cloud computing environment
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN110291518A (en) Merge tree garbage index
US20120197900A1 (en) Systems and methods for search time tree indexes
CN110383261A (en) Stream for multithread storage device selects
US7533112B2 (en) Context hierarchies for address searching
CN106933833B (en) Method for quickly querying position information based on spatial index technology
CN105160039A (en) Query method based on big data
CN106909644A (en) A kind of multistage tissue and indexing means towards mass remote sensing image
CN109684428A (en) Spatial data building method, device, equipment and storage medium
CN105117502A (en) Search method based on big data
CN102982103A (en) On-line analytical processing (OLAP) massive multidimensional data dimension storage method
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
CN106599040A (en) Layered indexing method and search method for cloud storage
CN108804602A (en) A kind of distributed spatial data storage computational methods based on SPARK
US20130073553A1 (en) Information management method and information management apparatus
CN103399945A (en) Data structure based on cloud computing database system
CN108009265B (en) Spatial data indexing method in cloud computing environment
CN104199860A (en) Dataset fragmentation method based on two-dimensional geographic position information
CN106202378A (en) The immediate processing method of a kind of streaming meteorological data and system
CN104021210B (en) Geographic data reading and writing method of MongoDB cluster of geographic data stored in GeoJSON-format semi-structured mode
CN103678657B (en) Method for storing and reading altitude data of terrain
CN103970842A (en) Water conservancy big data access system and method for field of flood control and disaster reduction
CN104008209B (en) Reading-writing method for MongoDB cluster geographic data stored with GeoJSON format structuring method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171110

Termination date: 20180424

CF01 Termination of patent right due to non-payment of annual fee