CN105718499B - Geologic information data cleaning method and system - Google Patents

Geologic information data cleaning method and system Download PDF

Info

Publication number
CN105718499B
CN105718499B CN201510920801.4A CN201510920801A CN105718499B CN 105718499 B CN105718499 B CN 105718499B CN 201510920801 A CN201510920801 A CN 201510920801A CN 105718499 B CN105718499 B CN 105718499B
Authority
CN
China
Prior art keywords
file
format
data
information
geologic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510920801.4A
Other languages
Chinese (zh)
Other versions
CN105718499A (en
Inventor
王新春
吴轩
孔昭煜
高学正
李晓蕾
齐钒宇
段佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DEVELOPMENT AND Research CENTER GEOLOGIC SURVEY BUREAU OF CHINA
Original Assignee
DEVELOPMENT AND Research CENTER GEOLOGIC SURVEY BUREAU OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DEVELOPMENT AND Research CENTER GEOLOGIC SURVEY BUREAU OF CHINA filed Critical DEVELOPMENT AND Research CENTER GEOLOGIC SURVEY BUREAU OF CHINA
Priority to CN201510920801.4A priority Critical patent/CN105718499B/en
Publication of CN105718499A publication Critical patent/CN105718499A/en
Application granted granted Critical
Publication of CN105718499B publication Critical patent/CN105718499B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

The present invention provides a kind of geologic information data cleaning method and systems, this method comprises: file name verification step, requires according to the problems of geologic information file to be processed, to verify the file name of each geologic information file to be processed;File format verification step is verified and is recorded to the file format of the geologic information data retain after the file name verification step;And the file information acquisition step, for the file of the geologic information data recorded, records corresponding format and configuration information after carrying out the file format verification step respectively.Geologic information data cleaning method and system through the embodiment of the present invention, file name verification, file format verification and the file information successively can be carried out to geology data to acquire, thus, automatically polynary isomery, geologic information data from a wealth of sources can be cleaned, register data cleansing result abundant to obtain quick, high quality, information.

Description

Geologic information data cleaning method and system
Technical field
The present invention relates to the fields GIS-Geographic Information System (Geographic Information System, GIS), especially relate to And a kind of geologic information data cleaning method and system.
Background technique
Geologic information is the important foundation information resources that geological work is formed, and development and utilization, Neng Gouchang can be repeated by having Phase provides the critical function of service.Although Ministry of Land and Resources's dispatch (territory money hair [2006] 210) specifies that achievement geology provides The requirement of electronic document problems is expected, but since the Outcome Document of all kinds of professional techniques work is not quite similar, in technical requirements Detail does not also refine, in addition the level and the different therefore received junction number of attitude of geologic information junction unit According to there is all kinds of isomerisms, inconsistency and quality problems, such as the inconsistency of data and catalogue, in data storage catalogue The repeatability etc. of existing illegal property or archives mark.
Since geologic information data have the work characteristics exclusive up to service overall process from a group volume, reception, management, processing And application demand, previous geologic information from junction to management, then to consult all too many levels applied by management means or More coarse and crude, the files file form such as unit of shelves saves data, but the file management under every files folder is then left Junction people tissue, no longer does the work segmented, it is difficult to meet the needs of data fine-grained management;Or the technical side used Method and tool the degree of automation are lower, and overwhelming majority work also relies on artificial cleaning to complete.This case greatly limits The efficiency of data management work, reduces the utilization rate of geologic information, hinders the development of national geological work.
Data cleansing technical solution common at present is carried out generally be directed to structural data for polynary isomery number According to data cleansing technical solution it is actually rare.Data cleansing technology generally mainly may include Data Detection and data correction two A step or module.Data Detection be used to detect file error (including deficiency of data and abnormal data) and repeatedly to it is similar heavy Multiple record.After being counted, choose comprehensive dirty data information.Wherein, for repeating generally to use with duplicated records The detections such as fields match and record matching operation.The step of dirty data detected is cleaned, usually to endless integer According to or repeated data using deletion, substitution after artificial judgment cleaning method, so that the Problem-Error in file be made to be repaired Just.
In existing data cleansing technical solution, usually pre-defined according to what is provided by algorithms library or rule base Cleaning algorithm and cleaning rule carry out cleaning.However, in actual opertions engineering, it often will be for the difference encountered Problem carries out that algorithm is adjusted to redefine and adjust with regular, and therefore, the scheme of the prior art is difficult to the versatility of rule.
In addition, prior art is can not to provide effective cleaning to suggest or count for a large amount of wrong data Data, it generally requires to submit to user, it is time-consuming, laborious, it is also difficult to quality assurance by its manual processing.
In addition, the statistics and analysis for the type of error of data and other statistical informations are also difficult to pass through current technology Scheme is easily realized.
Summary of the invention
Technical problem
In view of this, the technical problem to be solved by the present invention is to how automatically to polynary isomery, geology from a wealth of sources Data is cleaned.
Solution
In order to solve the above-mentioned technical problem, an embodiment according to the present invention, provides a kind of geologic information data cleansing side Method, comprising:
File name verification step is required according to the problems of geologic information file to be processed, come verify it is each it is described to Handle the file name of geologic information file;
File format verification step, the text to the geologic information data retain after the file name verification step Part format is verified and is recorded;And
The file information acquisition step, after carrying out the file format verification step, for the geologic information recorded The file of data records corresponding format and configuration information respectively.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file name verification Step includes:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively All characters in file name.
For above-mentioned geologic information data cleaning method, in one possible implementation, in the geology to be processed In the effective situation of information paper, all characters in the file name of the geologic information file to be processed are verified respectively, are wrapped It includes:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number The continuity and uniqueness of effect property and this document serial number in geologic information data.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file format verification Step includes:
For carrying out the file in the geologic information data that the file name verification step retains later, identified simultaneously Record corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open The content of each file.
For above-mentioned geologic information data cleaning method, in one possible implementation, the file information acquisition Step includes:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
In order to solve the above-mentioned technical problem, another embodiment according to the present invention, provides a kind of geologic information data cleansing System, comprising:
File name correction verification module, for being required according to the problems of geologic information file to be processed, to verify each institute State the file name of geologic information file to be processed;
File format correction verification module is connect with the file name correction verification module, for the utilization file name school The file format for testing the geologic information data retained after geologic information data described in resume module is verified and is recorded;And
The file information acquisition module is connect with the file format correction verification module, for being directed to recorded geologic information The file of data records corresponding format and configuration information respectively.
For above-mentioned geologic information Data clean system, in one possible implementation, the file name verification Module is used for:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively All characters in file name.
For above-mentioned geologic information Data clean system, in one possible implementation, in the geology to be processed In the effective situation of information paper, all characters in the file name of the geologic information file to be processed are verified respectively, are wrapped It includes:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number The continuity and uniqueness of effect property and this document serial number in geologic information data.
For above-mentioned geologic information Data clean system, in one possible implementation, the file format verification Module is used for:
For carrying out the file in the geologic information data that the file name verification step retains later, identified simultaneously Record corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open The content of each file.
For above-mentioned geologic information Data clean system, in one possible implementation, the information acquisition module For:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
Beneficial effect
Geologic information data cleaning method and system through the embodiment of the present invention, can successively to geology data into The verification of row file name, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, Information collection.Thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources, quick with acquisition, High quality, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention and it is The in disorder all kinds of problem datas of tissue, office, portion, dirty data can be promoted as the geologic information number convenient for management and service by system According to library, i.e., geologic information outcome data is converted into specification, consistent polynary isomeric data tissue, it is effective to reach management system The purpose of management and service.
According to below with reference to the accompanying drawings becoming to detailed description of illustrative embodiments, other feature of the invention and aspect It is clear.
Detailed description of the invention
Comprising in the description and constitute the attached drawing of part of specification and specification together illustrate it is of the invention Exemplary embodiment, feature and aspect, and principle for explaining the present invention.
Fig. 1 shows the flow chart of geologic information data cleaning method according to an embodiment of the invention;
Fig. 2 shows the flow charts of geologic information data cleaning method according to another embodiment of the present invention;
Fig. 3 shows the schematic diagram of geological achievement and material electronic document title;
Fig. 4 shows the structural block diagram of geologic information Data clean system according to an embodiment of the invention.
Specific embodiment
Below with reference to attached drawing various exemplary embodiments, feature and the aspect that the present invention will be described in detail.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, in order to better illustrate the present invention, numerous details is given in specific embodiment below. It will be appreciated by those skilled in the art that without certain details, the present invention equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight purport of the invention.
Term definition
Geologic information: forming in geological work, has value for preservation to state and society, with text, chart, sound Geology existing for the different forms such as picture, sample, sample, rock core, mineral products information and material object etc., be divided into original geologic information, at Fruit geologic information and geologic information three classes in kind;
Geological achievement and material: when all kinds of geological works and Science Research Project are completed, by relevant art specification and former project Design requirement, a whole set of science and technology of the reflection achievement provided in the form of text, figure, table, multimedia, database and software etc. Documents;
Geologic information data: being loaded with geological achievement and material information, can be identified by computer system, processing, by certain format The digital code sequences that is stored on magnetic-optical media, and can be transmitted in intercomputer.
Embodiment 1
Fig. 1 shows the flow chart of geologic information data cleaning method according to an embodiment of the invention.As shown in Figure 1, should Geologic information data cleaning method mainly may comprise steps of:
File name verification step S100, it is required according to the problems of geologic information file to be processed, to verify each institute State the file name of geologic information file to be processed;
File format verification step S200, the geologic information data that the progress file name verification step is retained later File format verified and recorded;And
The file information acquisition step S300, after carrying out the file format verification step, for the geology recorded The file of data records corresponding format and configuration information respectively.
In this way, geologic information data cleaning method through the embodiment of the present invention, can successively to geology data into The verification of row file name, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, Information collection.Thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources, quick with acquisition, High quality, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention, can By the in disorder all kinds of problem datas of tissue, office, portion, dirty data, it is promoted as the geologic information database convenient for management and service, i.e., Geologic information outcome data is converted into specification, consistent polynary isomeric data tissue, reach management system effectively manage with The purpose of service.
Embodiment 2
Fig. 2 shows the flow charts of geologic information data cleaning method according to another embodiment of the present invention.In Fig. 2 label with Fig. 1 identical step function having the same omits the detailed description to these steps for simplicity.
As shown in Fig. 2, geologic information data cleaning method shown in Fig. 2 and geologic information data cleansing side shown in FIG. 1 The main distinction of method is that the file name verification step S100 mainly may comprise steps of:
Step S1001, described to be processed to judge according to the length of the file name of the geologic information file to be processed The validity of geologic information file;And
Step S1002, in the effective situation of geologic information file to be processed, the geology to be processed is verified respectively All characters in the file name of information paper.
In one possible implementation, above-mentioned steps S1002 mainly may comprise steps of:
Step S10021, verify whether each of the file name of the geologic information file to be processed character is to have Character is imitated, the file there are idle character is recorded and prejudged;
Step S10022, judged according to the classification position in the file name of the geologic information file to be processed it is described to Whether the file type of processing geologic information file meets stated type, and the file of type against regulation is recorded;
Step S10023, according to the file serial number position in the file name of the geologic information file to be processed, to judge Continuity and uniqueness of the validity and this document serial number of file serial number in geologic information data.
Specifically, being directed to geologic information file (geological achievement and material electronic document), it is named according to its classification.Such as Shown in Fig. 3, filename forms (not including file name suffix) by 8 characters, by the difference of its mark action, is divided into 3 parts: Classification position, volume tagmeme, file serial number position.Wherein, according to the difference of content and form, geological achievement and material electronic document is by following Eight classifications are constituted: text class, examination & approval class, attached drawing class, subordinate list class, accessory class, database and software class, multimedia class, other Class.According to above-mentioned call format, the file name of geologic information file be, for example, Z01_0001, S01_0002, J01_0003, Q01_0004 etc..
The specific checking procedure of above-mentioned file name verification step S100 is as follows.
Step 1: judging whether filename length meets technical specification, the text of known technology specification is closed for length violation Part verifies it to common supporting paper or whether journal file name is similar, determined according to file size and file header feature Whether file retains.
Step 2: whether each of verification filename character is significant character, excludes messy code influence and character code such as The influence of Unicode compression, for there are the files of the idle characters such as messy code to record, logic is made at the position occurred to messy code Judgement, prejudges its possible character.Can not also judge that the file of character true content records to based on context, to manually into Part of composing a piece of writing is recalled.
Step 3: judging whether the category feature code (classification position) in filename coding meets known technology specification, judge text Whether part classification meets file format, records to the file not being inconsistent, to manually carry out file backtracking.
Step 4: Effective judgement being carried out to the digital number (file serial number position) in filename, excludes non-numeric pictograph Influence of the character to digital number.
Step 5: judging uniqueness of the digital number in context in filename, find to carry out file repetition when repeated number Property judgement, for that can not judge which file in repeated number has recording for validity, to manually carry out file backtracking.
Step 6: judging continuity of the digital number in context in filename, the digital number unknown for reason is jumped Number the case where recorded, to manually carry out file backtracking.
According to above step, the file name verification to the file in geology data can be completed.
In one possible implementation, above-mentioned file format verification step 200 mainly may comprise steps of:
Step S2001, for carrying out the file in the geologic information data that the file name verification step retains later, It is identified and records corresponding file format;
Step S2002, in the case where the file identical but different file format there are file name, according to file format Priority rule determines the nominative formula of file, the sequence of the file format priority from high to low be Data Format, Structured data format, vector data form, cartographic data format, table data format, document data format, raster data lattice Formula;
Step S2003, judgement and record whether can effectively read the File header information of each file and whether can Effectively open the content of each file.
The specific checking procedure of above-mentioned file format verification step S200 is as follows.
Step 1: identification file data is that monofile exists or file exists, and is registered;
Step 2: identification file data whether there is multi-format situation (same file name but a variety of file suffixes), spatially Data format > structured database data format (structured data format) > vector data form/cartographic data format > table number It determines nominative formula according to format > document data format > raster data format priority rule, and records, wherein attached drawing electron-like text The common Data Format of part or vector format are, for example, MapGIS, ArcGIS, AutoCAD, CorelDraw, MapInfo Deng common raster data format is, for example, JPEG, BMP, Tiff etc.;
Step 3: judging whether each file can effectively read File header information, if file content can be effectively opened, Judge the file availability of all kinds of formats, and records.
According to above step, the file format verification to the file in geology data can be completed.
In one possible implementation, the file information acquisition step 300 is directed to various types of file formats, respectively Carry out following information collection.
Step 1: judging whether there is spatial data, such as exist, record its format, version, judge whether there is engineering text Whether part can simultaneously read, and judge whether there is and whether can reading for each factor kind/figure layer, acquired projections coordinate parameters, Express the data amount information of auxiliary information library information and each factor kind/figure layer.It is wherein, corresponding with Data Format file, There may be multiple relevant figure layer files, but must there is only a project file (or master map layer files), such as MapGIS lattice The .mxd of the .mpj of formula, ArcGIS format, the .dwg of AutoCAD format, the .mif of the .cdr of CorelDraw format, MapInfo Deng.The project file (or master map layer file) is to generate the source file for achieving respective figure file in electronic document.
Step 2: judge whether there is structural data, such as exist, record its format, version number, record number, Field Count and Data volume size.
Step 3: judging whether there is vector data/cartographic data, such as exist, record its format, version number, expression auxiliary Information base information.
Step 4: judging whether there is table data, such as exist, record its format, version number, record number, Field Count and data Measure size.
Step 5: judging whether there is document data, such as exist, record its format, version number, character quantity and data volume Size.
Step 6: judging whether there is raster data, such as exist, record its format, compression ratio, dot matrix and data volume size.
According to above step, the file information acquisition to the file in geology data can be completed.
Geologic information data cleaning method through the embodiment of the present invention is collected for information about, can at any time according to All kinds of needs in family select different libraries to execute data statistics, and statistic analysis result produces report, or generates statistics according to setting Graph (is showed) in the form of histogram, cake chart, line graph and scatter plot, can intuitively judge and grasp number in favor of user The case where according to library.
Geologic information data cleaning method through the embodiment of the present invention successively can carry out file to geology data Title verification, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, information is adopted Collection.It is quick, high-quality to obtain thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources Amount, information register data cleansing result abundant.Geologic information data cleaning method through the embodiment of the present invention, can be by group It knits, all kinds of problem datas that office, portion is in disorder, dirty data, is promoted as the geologic information database convenient for management and service, i.e., by ground Matter data result transformation is specification, consistent polynary isomeric data tissue, reaches management system and effectively manages and service Purpose.
Geologic information data cleaning method through the embodiment of the present invention can solve big by an automatic operation The technical issues of partial data, the case where only leaving less need for artificial judgment, are verified again, can greatly simplify geology The working efficiency and work quality of data management personnel.
The application of geologic information data cleaning method through the embodiment of the present invention, can quickly generate it is a set of can be by The geologic information database of computer management and service greatly improves the utilization efficiency and utility value of geologic information.
Embodiment 3
Fig. 4 shows the structural block diagram of geologic information Data clean system according to an embodiment of the invention.As shown in figure 4, The geologic information Data clean system 40 mainly may include file name correction verification module 41, file format correction verification module 42 and The file information acquisition module 43.Wherein, file name correction verification module 41 is mainly used for the remittance according to geologic information file to be processed Call format is handed over, to verify the file name of each geologic information file to be processed;File format correction verification module 42, with file Title correction verification module 41 connects, for handling geologic information data guarantor later to using the file name correction verification module 41 The file format for the geologic information data stayed is verified and is recorded;And the file information acquisition module 43, with the tray Formula correction verification module 42 connects, and for being directed to the file of recorded geologic information data, records corresponding format respectively and matches Confidence breath.
In one possible implementation, the file name correction verification module 41 is used for:
The geologic information text to be processed is judged according to the length of the file name of the geologic information file to be processed The validity of part;And
In the effective situation of geologic information file to be processed, the geologic information file to be processed is verified respectively All characters in file name.
In one possible implementation, it in the effective situation of geologic information file to be processed, verifies respectively All characters in the file name of the geologic information file to be processed, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for It is recorded and is prejudged there are the file of idle character;
The geology money to be processed is judged according to the classification position in the file name of the geologic information file to be processed Whether the file type of material file meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge having for file serial number The continuity and uniqueness of effect property and this document serial number in geologic information data.
In one possible implementation, the file format correction verification module 42 is used for:
For handling the geologic information retained after the geologic information data using the file name correction verification module 41 File in data is identified and records corresponding file format;
In the case where the file identical but different file format there are file name, according to file format priority rule To determine that the nominative formula of file, the sequence of the file format priority from high to low are Data Format, structural data Format, vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open The content of each file.
In one possible implementation, the information acquisition module 43 is used for:
For the file of Data Format, format, the version number, the information of project file, projection coordinate of file are recorded The data amount information of parameter, expression auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume of file are recorded Size;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library letter of file are recorded Breath;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;With And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
Geologic information Data clean system through the embodiment of the present invention successively can carry out file to geology data Title verification, file format verification and the file information acquisition, to be detected, be corrected to geology data automatically, information is adopted Collection.It is quick, high-quality to obtain thereby, it is possible to automatically be cleaned to polynary isomery, geologic information data from a wealth of sources Amount, information register data cleansing result abundant.Geologic information Data clean system through the embodiment of the present invention, can be by group It knits, all kinds of problem datas that office, portion is in disorder, dirty data, is promoted as the geologic information database convenient for management and service, i.e., by ground Matter data result transformation is specification, consistent polynary isomeric data tissue, reaches management system and effectively manages and service Purpose.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (8)

1. a kind of geologic information data cleaning method characterized by comprising
File name verification step is required according to the problems of geologic information file to be processed, each described to be processed to verify The file name of geologic information file;
File format verification step, the tray to the geologic information data retain after the file name verification step Formula is verified and is recorded;And
The file information acquisition step, after carrying out the file format verification step, for the geologic information data recorded File, record corresponding format and configuration information respectively;
Wherein, the file format verification step includes:
For carrying out the file in the geologic information data that the file name verification step retains later, is identified and recorded Corresponding file format;
In the case where the file identical but different file format there are file name, come according to file format priority rule true Determine the nominative formula of file, the sequence of the file format priority from high to low be Data Format, structured data format, Vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open each text The content of part.
2. the method according to claim 1, wherein the file name verification step includes:
The geologic information file to be processed is judged according to the length of the file name of the geologic information file to be processed Validity;And
In the effective situation of geologic information file to be processed, the file of the geologic information file to be processed is verified respectively All characters in title.
3. according to the method described in claim 2, it is characterized in that, the situation effective in the geologic information file to be processed Under, all characters in the file name of the geologic information file to be processed are verified respectively, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for existing The file of idle character is recorded and is prejudged;
The geologic information text to be processed is judged according to the classification position in the file name of the geologic information file to be processed Whether the file type of part meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge the validity of file serial number And continuity and uniqueness of this document serial number in geologic information data.
4. the method according to claim 1, wherein the file information acquisition step includes:
For the file of Data Format, record the format of file, version number, the information of project file, projection coordinate's parameter, Express the data amount information of auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library information of file are recorded;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
5. a kind of geologic information Data clean system characterized by comprising
File name correction verification module, for being required according to the problems of geologic information file to be processed, come verify it is each it is described to Handle the file name of geologic information file;
File format correction verification module is connect with the file name correction verification module, for the utilization file name calibration mode The file format that block handles the geologic information data retained after the geologic information data is verified and is recorded;And
The file information acquisition module is connect with the file format correction verification module, for being directed to recorded geologic information data File, record corresponding format and configuration information respectively;
Wherein, the file format correction verification module is used for:
For being handled in the geologic information data retained after the geologic information data using the file name correction verification module File, identified and record corresponding file format;
In the case where the file identical but different file format there are file name, come according to file format priority rule true Determine the nominative formula of file, the sequence of the file format priority from high to low be Data Format, structured data format, Vector data form, cartographic data format, table data format, document data format, raster data format;
Judge and record the File header information that whether can effectively read each file and whether can effectively open each text The content of part.
6. system according to claim 5, which is characterized in that the file name correction verification module is used for:
The geologic information file to be processed is judged according to the length of the file name of the geologic information file to be processed Validity;And
In the effective situation of geologic information file to be processed, the file of the geologic information file to be processed is verified respectively All characters in title.
7. system according to claim 6, which is characterized in that in the effective situation of the geologic information file to be processed Under, all characters in the file name of the geologic information file to be processed are verified respectively, comprising:
Verify whether each of the file name of the geologic information file to be processed character is significant character, for existing The file of idle character is recorded and is prejudged;
The geologic information text to be processed is judged according to the classification position in the file name of the geologic information file to be processed Whether the file type of part meets stated type, and the file of type against regulation is recorded;
According to the file serial number position in the file name of the geologic information file to be processed, to judge the validity of file serial number And continuity and uniqueness of this document serial number in geologic information data.
8. system according to claim 5, which is characterized in that the information acquisition module is used for:
For the file of Data Format, record the format of file, version number, the information of project file, projection coordinate's parameter, Express the data amount information of auxiliary information library information and each figure layer;
For the file of structured data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of vector data or cartographic data, format, version number and the expression auxiliary information library information of file are recorded;
For the file of table data format, format, version number, record number, Field Count and the data volume size of file are recorded;
For the file of document data format, format, version number, character quantity and the data volume size of file are recorded;And
For the file of raster data format, format, compression ratio, dot matrix and the data volume size of file are recorded.
CN201510920801.4A 2015-12-11 2015-12-11 Geologic information data cleaning method and system Expired - Fee Related CN105718499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510920801.4A CN105718499B (en) 2015-12-11 2015-12-11 Geologic information data cleaning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510920801.4A CN105718499B (en) 2015-12-11 2015-12-11 Geologic information data cleaning method and system

Publications (2)

Publication Number Publication Date
CN105718499A CN105718499A (en) 2016-06-29
CN105718499B true CN105718499B (en) 2019-07-19

Family

ID=56146904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510920801.4A Expired - Fee Related CN105718499B (en) 2015-12-11 2015-12-11 Geologic information data cleaning method and system

Country Status (1)

Country Link
CN (1) CN105718499B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108614819B (en) * 2016-12-09 2021-03-09 中国地质调查局发展研究中心 Geological data management system
CN110019153B (en) * 2017-09-13 2022-03-04 北京宸信征信有限公司 Multi-type batch data processing system and processing method thereof
CN109491994B (en) * 2018-11-28 2020-12-18 中国科学院空天信息创新研究院 Simplified screening method for Landsat-8 satellite selection remote sensing data set
CN111339221B (en) * 2018-12-18 2024-04-26 中兴通讯股份有限公司 Data processing method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334266A (en) * 2008-07-18 2008-12-31 北京优纳科技有限公司 Circuit board defect off-line checking method based on large-capacity image storage technology
CN102968349A (en) * 2012-09-06 2013-03-13 北京吉威时代软件技术有限公司 Method and system for file completeness verification of remote sensing image data
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN104731859A (en) * 2015-02-02 2015-06-24 厦门市美亚柏科信息股份有限公司 Data processing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090234623A1 (en) * 2008-03-12 2009-09-17 Schlumberger Technology Corporation Validating field data
US8930361B2 (en) * 2011-03-31 2015-01-06 Nokia Corporation Method and apparatus for cleaning data sets for a search process
CN104252471A (en) * 2013-06-27 2014-12-31 宁夏新航信息科技有限公司 Intelligent file management system
CN103578032A (en) * 2013-11-14 2014-02-12 中国银行股份有限公司 Data processing system
CN103870594B (en) * 2014-03-31 2017-02-01 国家电网公司 Method and device for managing and calibrating long-distance engineering project digital photos

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334266A (en) * 2008-07-18 2008-12-31 北京优纳科技有限公司 Circuit board defect off-line checking method based on large-capacity image storage technology
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN102968349A (en) * 2012-09-06 2013-03-13 北京吉威时代软件技术有限公司 Method and system for file completeness verification of remote sensing image data
CN104731859A (en) * 2015-02-02 2015-06-24 厦门市美亚柏科信息股份有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN105718499A (en) 2016-06-29

Similar Documents

Publication Publication Date Title
CN105718499B (en) Geologic information data cleaning method and system
CN106815326B (en) System and method for detecting consistency of data table without main key
CN104361018B (en) Electronic archives information reorganization method and device
US20130339371A1 (en) Spatio-temporal data management system, spatio-temporal data management method, and program thereof
CN104298726B (en) A kind of BMS data-storage systems and its method based on database
CN109255112A (en) A kind of report automatic generation method and system
CN205581925U (en) Anti -fake bar code label of nested type, anti -fake bar code tag information collector and anti -fake verification system
CN102567864B (en) Material substitution method in MRP system, device
CN106355375B (en) A kind of automatic materiel affirmation method
CN105893340A (en) Efficient data processing system used during detection and analysis
CN104866576A (en) Method and apparatus for automatically constructing Data Vault-modeled data warehouse
CN103473076A (en) Issuing method and issuing system for code version
CN108614819B (en) Geological data management system
CN103678682B (en) Magnanimity raster data processing and management method based on abstraction templates
CN103544185A (en) Well-logging data file storage method
CN106484789A (en) The storage management system and method for pictorial information
CN110717754A (en) Commodity transaction method, server, user side, laboratory side and system
CN102521713B (en) Data processing equipment and data processing method
CN103679355A (en) Method and device for controlling operation flow
CN106021047A (en) Method and apparatus for processing hard disk test data
CN110795520B (en) Automatic identification method for association relation between digital geological data packet directory and file
CN110287114B (en) Method and device for testing performance of database script
CN103136187A (en) Method and system for extraction of patent rejection information
CN112116314B (en) Sample weighing data management system
CN110675729A (en) Multi-version local geographic information integrated drawing method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190719

Termination date: 20211211

CF01 Termination of patent right due to non-payment of annual fee