CN105718499B - Geologic information data cleaning method and system - Google Patents
Geologic information data cleaning method and system Download PDFInfo
- Publication number
- CN105718499B CN105718499B CN201510920801.4A CN201510920801A CN105718499B CN 105718499 B CN105718499 B CN 105718499B CN 201510920801 A CN201510920801 A CN 201510920801A CN 105718499 B CN105718499 B CN 105718499B
- Authority
- CN
- China
- Prior art keywords
- file
- format
- data
- information
- geologic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Abstract
The present invention provides a kind of geologic information data cleaning method and systems, this method comprises: file name verification step, requires according to the problems of geologic information file to be processed, to verify the file name of each geologic information file to be processed;File format verification step is verified and is recorded to the file format of the geologic information data retain after the file name verification step;And the file information acquisition step, for the file of the geologic information data recorded, records corresponding format and configuration information after carrying out the file format verification step respectively.Geologic information data cleaning method and system through the embodiment of the present invention, file name verification, file format verification and the file information successively can be carried out to geology data to acquire, thus, automatically polynary isomery, geologic information data from a wealth of sources can be cleaned, register data cleansing result abundant to obtain quick, high quality, information.
Description
Technical field
The present invention relates to the fields GIS-Geographic Information System (Geographic Information System, GIS), especially relate to
And a kind of geologic information data cleaning method and system.
Background technique
Geologic information is the important foundation information resources that geological work is formed, and development and utilization, Neng Gouchang can be repeated by having
Phase provides the critical function of service.Although Ministry of Land and Resources's dispatch (territory money hair [2006] 210) specifies that achievement geology provides
The requirement of electronic document problems is expected, but since the Outcome Document of all kinds of professional techniques work is not quite similar, in technical requirements
Detail does not also refine, in addition the level and the different therefore received junction number of attitude of geologic information junction unit
According to there is all kinds of isomerisms, inconsistency and quality problems, such as the inconsistency of data and catalogue, in data storage catalogue
The repeatability etc. of existing illegal property or archives mark.
Since geologic information data have the work characteristics exclusive up to service overall process from a group volume, reception, management, processing
And application demand, previous geologic information from junction to management, then to consult all too many levels applied by management means or
More coarse and crude, the files file form such as unit of shelves saves data, but the file management under every files folder is then left
Junction people tissue, no longer does the work segmented, it is difficult to meet the needs of data fine-grained management;Or the technical side used
Method and tool the degree of automation are lower, and overwhelming majority work also relies on artificial cleaning to complete.This case greatly limits
The efficiency of data management work, reduces the utilization rate of geologic information, hinders the development of national geological work.
Data cleansing technical solution common at present is carried out generally be directed to structural data for polynary isomery number
According to data cleansing technical solution it is actually rare.Data cleansing technology generally mainly may include Data Detection and data correction two
A step or module.Data Detection be used to detect file error (including deficiency of data and abnormal data) and repeatedly to it is similar heavy
Multiple record.After being counted, choose comprehensive dirty data information.Wherein, for repeating generally to use with duplicated records
The detections such as fields match and record matching operation.The step of dirty data detected is cleaned, usually to endless integer
According to or repeated data using deletion, substitution after artificial judgment cleaning method, so that the Problem-Error in file be made to be repaired
Just.
In existing data cleansing technical solution, usually pre-defined according to what is provided by algorithms library or rule base
Cleaning algorithm and cleaning rule carry out cleaning.However, in actual opertions engineering, it often will be for the difference encountered
Problem carries out that algorithm is adjusted to redefine and adjust with regular, and therefore, the scheme of the prior art is difficult to the versatility of rule.
In addition, prior art is can not to provide effective cleaning to suggest or count for a large amount of wrong data
Data, it generally requires to submit to user, it is time-consuming, laborious, it is also difficult to quality assurance by its manual processing.
In addition, the statistics and analysis for the type of error of data and other statistical informations are also difficult to pass through current technology
Scheme is easily realized.
Summary of the invention
Technical problem
In view of this, the technical problem to be solved by the present invention is to how automatically to polynary isomery, geology from a wealth of sources
Data is cleaned.
Solution
In order to solve the above-mentioned technical problem, an embodiment according to the present invention, provides a kind of geologic information data cleansing side
Method, comprising:
File name verification step is required according to the problems of geologic information file to be processed, come verify it is each it is described to
Handle the file name of geologic information file;
File format verification step, the text to the geologic information data retain after the file name verification step
Part format is verified and is recorded;And
The file information acquisition step, after carrying out the file format verification step, for the geologic information recorded
The file of data records corresponding format and configuration information respectively.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file name verification
Step includes:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed
The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively
All characters in file name.
For above-mentioned geologic information data cleaning method, in one possible implementation, in the geology to be processed
In the effective situation of information paper, all characters in the file name of the geologic information file to be processed are verified respectively, are wrapped
It includes:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for
It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed
Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number
The continuity and uniqueness of effect property and this document serial number in geologic information data.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file format verification
Step includes:
For carrying out the file in the geologic information data that the file name verification step retains later, identified simultaneously
Record corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule
To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data
Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open
The content of each file.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file information acquisition
Step includes:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded
The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded
Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded
Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With
And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
In order to solve the above-mentioned technical problem, another embodiment according to the present invention, provides a kind of geologic information data cleansing
System, comprising:
File name correction verification module, for being required according to the problems of geologic information file to be processed, to verify each institute
State the file name of geologic information file to be processed;
File format correction verification module is connect with the file name correction verification module, for the utilization file name school
The file format for testing the geologic information data retained after geologic information data described in resume module is verified and is recorded;And
The file information acquisition module is connect with the file format correction verification module, for being directed to recorded geologic information
The file of data records corresponding format and configuration information respectively.
For above-mentioned geologic information Data clean system, in one possible implementation, the file name verification
Module is used for:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed
The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively
All characters in file name.
For above-mentioned geologic information Data clean system, in one possible implementation, in the geology to be processed
In the effective situation of information paper, all characters in the file name of the geologic information file to be processed are verified respectively, are wrapped
It includes:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for
It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed
Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number
The continuity and uniqueness of effect property and this document serial number in geologic information data.
For above-mentioned geologic information Data clean system, in one possible implementation, the file format verification
Module is used for:
For carrying out the file in the geologic information data that the file name verification step retains later, identified simultaneously
Record corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule
To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data
Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open
The content of each file.
For above-mentioned geologic information Data clean system, in one possible implementation, the information acquisition module
For:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded
The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded
Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded
Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With
And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
Beneficial effect
Geologic information data cleaning method and system through the embodiment of the present invention, can successively to geology data into
The verification of row file name, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically,
Information collection.Thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources, quick with acquisition,
High quality, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention and it is
The in disorder all kinds of problem datas of tissue, office, portion, dirty data can be promoted as the geologic information number convenient for management and service by system
According to library, i.e., geologic information outcome data is converted into specification, consistent polynary isomeric data tissue, it is effective to reach management system
The purpose of management and service.
According to below with reference to the accompanying drawings becoming to detailed description of illustrative embodiments, other feature of the invention and aspect
It is clear.
Detailed description of the invention
Comprising in the description and constitute the attached drawing of part of specification and specification together illustrate it is of the invention
Exemplary embodiment, feature and aspect, and principle for explaining the present invention.
Fig. 1 shows the flow chart of geologic information data cleaning method according to an embodiment of the invention;
Fig. 2 shows the flow charts of geologic information data cleaning method according to another embodiment of the present invention;
Fig. 3 shows the schematic diagram of geological achievement and material electronic document title;
Fig. 4 shows the structural block diagram of geologic information Data clean system according to an embodiment of the invention.
Specific embodiment
Below with reference to attached drawing various exemplary embodiments, feature and the aspect that the present invention will be described in detail.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, in order to better illustrate the present invention, numerous details is given in specific embodiment below.
It will be appreciated by those skilled in the art that without certain details, the present invention equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight purport of the invention.
Term definition
Geologic information: forming in geological work, has value for preservation to state and society, with text, chart, sound
Geology existing for the different forms such as picture, sample, sample, rock core, mineral products information and material object etc., be divided into original geologic information, at
Fruit geologic information and geologic information three classes in kind;
Geological achievement and material: when all kinds of geological works and Science Research Project are completed, by relevant art specification and former project
Design requirement, a whole set of science and technology of the reflection achievement provided in the form of text, figure, table, multimedia, database and software etc.
Documents;
Geologic information data: being loaded with geological achievement and material information, can be identified by computer system, processing, by certain format
The digital code sequences that is stored on magnetic-optical media, and can be transmitted in intercomputer.
Embodiment 1
Fig. 1 shows the flow chart of geologic information data cleaning method according to an embodiment of the invention.As shown in Figure 1, should
Geologic information data cleaning method mainly may comprise steps of:
File name verification step S100, it is required according to the problems of geologic information file to be processed, to verify each institute
State the file name of geologic information file to be processed;
File format verification step S200, the geologic information data that the progress file name verification step is retained later
File format verified and recorded;And
The file information acquisition step S300, after carrying out the file format verification step, for the geology recorded
The file of data records corresponding format and configuration information respectively.
In this way, geologic information data cleaning method through the embodiment of the present invention, can successively to geology data into
The verification of row file name, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically,
Information collection.Thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources, quick with acquisition,
High quality, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention, can
By the in disorder all kinds of problem datas of tissue, office, portion, dirty data, it is promoted as the geologic information database convenient for management and service, i.e.,
Geologic information outcome data is converted into specification, consistent polynary isomeric data tissue, reach management system effectively manage with
The purpose of service.
Embodiment 2
Fig. 2 shows the flow charts of geologic information data cleaning method according to another embodiment of the present invention.In Fig. 2 label with
Fig. 1 identical step function having the same omits the detailed description to these steps for simplicity.
As shown in Fig. 2, geologic information data cleaning method shown in Fig. 2 and geologic information data cleansing side shown in FIG. 1
The main distinction of method is that the file name verification step S100 mainly may comprise steps of:
Step S1001, described to be processed to judge according to the length of the file name of the geologic information file to be processed
The validity of geologic information file;And
Step S1002, in the effective situation of geologic information file to be processed, the geology to be processed is verified respectively
All characters in the file name of information paper.
In one possible implementation, above-mentioned steps S1002 mainly may comprise steps of:
Step S10021, verify whether each of the file name of the geologic information file to be processed character is to have
Character is imitated, the file there are idle character is recorded and prejudged;
Step S10022, judged according to the classification position in the file name of the geologic information file to be processed it is described to
Whether the file type of processing geologic information file meets stated type, and the file of type against regulation is recorded;
Step S10023, according to the file serial number position in the file name of the geologic information file to be processed, to judge
Continuity and uniqueness of the validity and this document serial number of file serial number in geologic information data.
Specifically, being directed to geologic information file (geological achievement and material electronic document), it is named according to its classification.Such as
Shown in Fig. 3, filename forms (not including file name suffix) by 8 characters, by the difference of its mark action, is divided into 3 parts:
Classification position, volume tagmeme, file serial number position.Wherein, according to the difference of content and form, geological achievement and material electronic document is by following
Eight classifications are constituted: text class, examination & approval class, attached drawing class, subordinate list class, accessory class, database and software class, multimedia class, other
Class.According to above-mentioned call format, the file name of geologic information file be, for example, Z01_0001, S01_0002, J01_0003,
Q01_0004 etc..
The specific checking procedure of above-mentioned file name verification step S100 is as follows.
Step 1: judging whether filename length meets technical specification, the text of known technology specification is closed for length violation
Part verifies it to common supporting paper or whether journal file name is similar, determined according to file size and file header feature
Whether file retains.
Step 2: whether each of verification filename character is significant character, excludes messy code influence and character code such as
The influence of Unicode compression, for there are the files of the idle characters such as messy code to record, logic is made at the position occurred to messy code
Judgement, prejudges its possible character.Can not also judge that the file of character true content records to based on context, to manually into
Part of composing a piece of writing is recalled.
Step 3: judging whether the category feature code (classification position) in filename coding meets known technology specification, judge text
Whether part classification meets file format, records to the file not being inconsistent, to manually carry out file backtracking.
Step 4: Effective judgement being carried out to the digital number (file serial number position) in filename, excludes non-numeric pictograph
Influence of the character to digital number.
Step 5: judging uniqueness of the digital number in context in filename, find to carry out file repetition when repeated number
Property judgement, for that can not judge which file in repeated number has recording for validity, to manually carry out file backtracking.
Step 6: judging continuity of the digital number in context in filename, the digital number unknown for reason is jumped
Number the case where recorded, to manually carry out file backtracking.
According to above step, the file name verification to the file in geology data can be completed.
In one possible implementation, above-mentioned file format verification step 200 mainly may comprise steps of:
Step S2001, for carrying out the file in the geologic information data that the file name verification step retains later,
It is identified and records corresponding file format;
Step S2002, in the case where the file identical but different file format there are file name, according to file format
Priority rule determines the nominative formula of file, the sequence of the file format priority from high to low be Data Format,
Structured data format, vector data form, cartographic data format, table data format, document data format, raster data lattice
Formula;
Step S2003, judgement and record whether can effectively read the File header information of each file and whether can
Effectively open the content of each file.
The specific checking procedure of above-mentioned file format verification step S200 is as follows.
Step 1: identification file data is that monofile exists or file exists, and is registered;
Step 2: identification file data whether there is multi-format situation (same file name but a variety of file suffixes), spatially
Data format > structured database data format (structured data format) > vector data form/cartographic data format > table number
It determines nominative formula according to format > document data format > raster data format priority rule, and records, wherein attached drawing electron-like text
The common Data Format of part or vector format are, for example, MapGIS, ArcGIS, AutoCAD, CorelDraw, MapInfo
Deng common raster data format is, for example, JPEG, BMP, Tiff etc.;
Step 3: judging whether each file can effectively read File header information, if file content can be effectively opened,
Judge the file availability of all kinds of formats, and records.
According to above step, the file format verification to the file in geology data can be completed.
In one possible implementation, the file information acquisition step 300 is directed to various types of file formats, respectively
Carry out following information collection.
Step 1: judging whether there is spatial data, such as exist, record its format, version, judge whether there is engineering text
Whether part can simultaneously read, and judge whether there is and whether can reading for each factor kind/figure layer, acquired projections coordinate parameters,
Express the data amount information of auxiliary information library information and each factor kind/figure layer.It is wherein, corresponding with Data Format file,
There may be multiple relevant figure layer files, but must there is only a project file (or master map layer files), such as MapGIS lattice
The .mxd of the .mpj of formula, ArcGIS format, the .dwg of AutoCAD format, the .mif of the .cdr of CorelDraw format, MapInfo
Deng.The project file (or master map layer file) is to generate the source file for achieving respective figure file in electronic document.
Step 2: judge whether there is structural data, such as exist, record its format, version number, record number, Field Count and
Data volume size.
Step 3: judging whether there is vector data/cartographic data, such as exist, record its format, version number, expression auxiliary
Information base information.
Step 4: judging whether there is table data, such as exist, record its format, version number, record number, Field Count and data
Measure size.
Step 5: judging whether there is document data, such as exist, record its format, version number, character quantity and data volume
Size.
Step 6: judging whether there is raster data, such as exist, record its format, compression ratio, dot matrix and data volume size.
According to above step, the file information acquisition to the file in geology data can be completed.
Geologic information data cleaning method through the embodiment of the present invention is collected for information about, can at any time according to
All kinds of needs in family select different libraries to execute data statistics, and statistic analysis result produces report, or generates statistics according to setting
Graph (is showed) in the form of histogram, cake chart, line graph and scatter plot, can intuitively judge and grasp number in favor of user
The case where according to library.
Geologic information data cleaning method through the embodiment of the present invention successively can carry out file to geology data
Title verification, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, information is adopted
Collection.It is quick, high-quality to obtain thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources
Amount, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention, can be by group
It knits, all kinds of problem datas that office, portion is in disorder, dirty data, is promoted as the geologic information database convenient for management and service, i.e., by ground
Matter data result transformation is specification, consistent polynary isomeric data tissue, reaches management system and effectively manages and service
Purpose.
Geologic information data cleaning method through the embodiment of the present invention can solve big by an automatic operation
The technical issues of partial data, the case where only leaving less need for artificial judgment, are verified again, can greatly simplify geology
The working efficiency and work quality of data management personnel.
The application of geologic information data cleaning method through the embodiment of the present invention, can quickly generate it is a set of can be by
The geologic information database of computer management and service greatly improves the utilization efficiency and utility value of geologic information.
Embodiment 3
Fig. 4 shows the structural block diagram of geologic information Data clean system according to an embodiment of the invention.As shown in figure 4,
The geologic information Data clean system 40 mainly may include file name correction verification module 41, file format correction verification module 42 and
The file information acquisition module 43.Wherein, file name correction verification module 41 is mainly used for the remittance according to geologic information file to be processed
Call format is handed over, to verify the file name of each geologic information file to be processed;File format correction verification module 42, with file
Title correction verification module 41 connects, for handling geologic information data guarantor later to using the file name correction verification module 41
The file format for the geologic information data stayed is verified and is recorded;And the file information acquisition module 43, with the tray
Formula correction verification module 42 connects, and for being directed to the file of recorded geologic information data, records corresponding format respectively and matches
Confidence breath.
In one possible implementation, the file name correction verification module 41 is used for:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed
The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively
All characters in file name.
In one possible implementation, it in the effective situation of geologic information file to be processed, verifies respectively
All characters in the file name of the geologic information file to be processed, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for
It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed
Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number
The continuity and uniqueness of effect property and this document serial number in geologic information data.
In one possible implementation, the file format correction verification module 42 is used for:
For handling the geologic information retained after the geologic information data using the file name correction verification module 41
File in data is identified and records corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule
To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data
Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open
The content of each file.
In one possible implementation, the information acquisition module 43 is used for:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded
The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded
Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded
Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With
And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
Geologic information Data clean system through the embodiment of the present invention successively can carry out file to geology data
Title verification, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, information is adopted
Collection.It is quick, high-quality to obtain thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources
Amount, information register data cleansing result abundant.Geologic information Data clean system through the embodiment of the present invention, can be by group
It knits, all kinds of problem datas that office, portion is in disorder, dirty data, is promoted as the geologic information database convenient for management and service, i.e., by ground
Matter data result transformation is specification, consistent polynary isomeric data tissue, reaches management system and effectively manages and service
Purpose.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (8)
1. a kind of geologic information data cleaning method characterized by comprising
File name verification step is required according to the problems of geologic information file to be processed, each described to be processed to verify
The file name of geologic information file;
File format verification step, the tray to the geologic information data retain after the file name verification step
Formula is verified and is recorded;And
The file information acquisition step, after carrying out the file format verification step, for the geologic information data recorded
File, record corresponding format and configuration information respectively;
Wherein, the file format verification step includes:
For carrying out the file in the geologic information data that the file name verification step retains later, is identified and recorded
Corresponding file format;
In the case where the file identical but different file format there are file name, come according to file format priority rule true
Determine the nominative formula of file, the sequence of the file format priority from high to low be Data Format, structured data format,
Vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open each text
The content of part.
2. the method according to claim 1, wherein the file name verification step includes:
The geologic information file to be processed is judged according to the length of the file name of the geologic information file to be processed
Validity;And
In the effective situation of geologic information file to be processed, the file of the geologic information file to be processed is verified respectively
All characters in title.
3. according to the method described in claim 2, it is characterized in that, the situation effective in the geologic information file to be processed
Under, all characters in the file name of the geologic information file to be processed are verified respectively, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for existing
The file of idle character is recorded and is prejudged;
The geologic information text to be processed is judged according to the classification position in the file name of the geologic information file to be processed
Whether the file type of part meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge the validity of file serial number
And continuity and uniqueness of this document serial number in geologic information data.
4. the method according to claim 1, wherein the file information acquisition step includes:
For the file of Data Format, record the format of file, version number, the information of project file, projection coordinate's parameter,
Express the data amount information of auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library information of file are recorded;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
5. a kind of geologic information Data clean system characterized by comprising
File name correction verification module, for being required according to the problems of geologic information file to be processed, come verify it is each it is described to
Handle the file name of geologic information file;
File format correction verification module is connect with the file name correction verification module, for the utilization file name calibration mode
The file format that block handles the geologic information data retained after the geologic information data is verified and is recorded;And
The file information acquisition module is connect with the file format correction verification module, for being directed to recorded geologic information data
File, record corresponding format and configuration information respectively;
Wherein, the file format correction verification module is used for:
For being handled in the geologic information data retained after the geologic information data using the file name correction verification module
File, identified and record corresponding file format;
In the case where the file identical but different file format there are file name, come according to file format priority rule true
Determine the nominative formula of file, the sequence of the file format priority from high to low be Data Format, structured data format,
Vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open each text
The content of part.
6. system according to claim 5, which is characterized in that the file name correction verification module is used for:
The geologic information file to be processed is judged according to the length of the file name of the geologic information file to be processed
Validity;And
In the effective situation of geologic information file to be processed, the file of the geologic information file to be processed is verified respectively
All characters in title.
7. system according to claim 6, which is characterized in that in the effective situation of the geologic information file to be processed
Under, all characters in the file name of the geologic information file to be processed are verified respectively, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for existing
The file of idle character is recorded and is prejudged;
The geologic information text to be processed is judged according to the classification position in the file name of the geologic information file to be processed
Whether the file type of part meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge the validity of file serial number
And continuity and uniqueness of this document serial number in geologic information data.
8. system according to claim 5, which is characterized in that the information acquisition module is used for:
For the file of Data Format, record the format of file, version number, the information of project file, projection coordinate's parameter,
Express the data amount information of auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library information of file are recorded;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510920801.4A CN105718499B (en) | 2015-12-11 | 2015-12-11 | Geologic information data cleaning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510920801.4A CN105718499B (en) | 2015-12-11 | 2015-12-11 | Geologic information data cleaning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105718499A CN105718499A (en) | 2016-06-29 |
CN105718499B true CN105718499B (en) | 2019-07-19 |
Family
ID=56146904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510920801.4A Expired - Fee Related CN105718499B (en) | 2015-12-11 | 2015-12-11 | Geologic information data cleaning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105718499B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108614819B (en) * | 2016-12-09 | 2021-03-09 | 中国地质调查局发展研究中心 | Geological data management system |
CN110019153B (en) * | 2017-09-13 | 2022-03-04 | 北京宸信征信有限公司 | Multi-type batch data processing system and processing method thereof |
CN109491994B (en) * | 2018-11-28 | 2020-12-18 | 中国科学院空天信息创新研究院 | Simplified screening method for Landsat-8 satellite selection remote sensing data set |
CN111339221B (en) * | 2018-12-18 | 2024-04-26 | 中兴通讯股份有限公司 | Data processing method, system and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101334266A (en) * | 2008-07-18 | 2008-12-31 | 北京优纳科技有限公司 | Circuit board defect off-line checking method based on large-capacity image storage technology |
CN102968349A (en) * | 2012-09-06 | 2013-03-13 | 北京吉威时代软件技术有限公司 | Method and system for file completeness verification of remote sensing image data |
CN103593352A (en) * | 2012-08-15 | 2014-02-19 | 阿里巴巴集团控股有限公司 | Method and device for cleaning mass data |
CN104731859A (en) * | 2015-02-02 | 2015-06-24 | 厦门市美亚柏科信息股份有限公司 | Data processing method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090234623A1 (en) * | 2008-03-12 | 2009-09-17 | Schlumberger Technology Corporation | Validating field data |
US8930361B2 (en) * | 2011-03-31 | 2015-01-06 | Nokia Corporation | Method and apparatus for cleaning data sets for a search process |
CN104252471A (en) * | 2013-06-27 | 2014-12-31 | 宁夏新航信息科技有限公司 | Intelligent file management system |
CN103578032A (en) * | 2013-11-14 | 2014-02-12 | 中国银行股份有限公司 | Data processing system |
CN103870594B (en) * | 2014-03-31 | 2017-02-01 | 国家电网公司 | Method and device for managing and calibrating long-distance engineering project digital photos |
-
2015
- 2015-12-11 CN CN201510920801.4A patent/CN105718499B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101334266A (en) * | 2008-07-18 | 2008-12-31 | 北京优纳科技有限公司 | Circuit board defect off-line checking method based on large-capacity image storage technology |
CN103593352A (en) * | 2012-08-15 | 2014-02-19 | 阿里巴巴集团控股有限公司 | Method and device for cleaning mass data |
CN102968349A (en) * | 2012-09-06 | 2013-03-13 | 北京吉威时代软件技术有限公司 | Method and system for file completeness verification of remote sensing image data |
CN104731859A (en) * | 2015-02-02 | 2015-06-24 | 厦门市美亚柏科信息股份有限公司 | Data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105718499A (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105718499B (en) | Geologic information data cleaning method and system | |
CN106815326B (en) | System and method for detecting consistency of data table without main key | |
CN104361018B (en) | Electronic archives information reorganization method and device | |
US20130339371A1 (en) | Spatio-temporal data management system, spatio-temporal data management method, and program thereof | |
CN104298726B (en) | A kind of BMS data-storage systems and its method based on database | |
CN109255112A (en) | A kind of report automatic generation method and system | |
CN205581925U (en) | Anti -fake bar code label of nested type, anti -fake bar code tag information collector and anti -fake verification system | |
CN102567864B (en) | Material substitution method in MRP system, device | |
CN106355375B (en) | A kind of automatic materiel affirmation method | |
CN105893340A (en) | Efficient data processing system used during detection and analysis | |
CN104866576A (en) | Method and apparatus for automatically constructing Data Vault-modeled data warehouse | |
CN103473076A (en) | Issuing method and issuing system for code version | |
CN108614819B (en) | Geological data management system | |
CN103678682B (en) | Magnanimity raster data processing and management method based on abstraction templates | |
CN103544185A (en) | Well-logging data file storage method | |
CN106484789A (en) | The storage management system and method for pictorial information | |
CN110717754A (en) | Commodity transaction method, server, user side, laboratory side and system | |
CN102521713B (en) | Data processing equipment and data processing method | |
CN103679355A (en) | Method and device for controlling operation flow | |
CN106021047A (en) | Method and apparatus for processing hard disk test data | |
CN110795520B (en) | Automatic identification method for association relation between digital geological data packet directory and file | |
CN110287114B (en) | Method and device for testing performance of database script | |
CN103136187A (en) | Method and system for extraction of patent rejection information | |
CN112116314B (en) | Sample weighing data management system | |
CN110675729A (en) | Multi-version local geographic information integrated drawing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190719 Termination date: 20211211 |
|
CF01 | Termination of patent right due to non-payment of annual fee |