CN112861508A - Standardization method and system for logging discrete data - Google Patents

Standardization method and system for logging discrete data Download PDF

Info

Publication number
CN112861508A
CN112861508A CN202110039023.3A CN202110039023A CN112861508A CN 112861508 A CN112861508 A CN 112861508A CN 202110039023 A CN202110039023 A CN 202110039023A CN 112861508 A CN112861508 A CN 112861508A
Authority
CN
China
Prior art keywords
data
file
template
analysis
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110039023.3A
Other languages
Chinese (zh)
Inventor
余长江
杜钦波
李国军
张娟
段先斐
刘昱晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Petroleum Corp
China Petroleum Logging Co Ltd
Original Assignee
China National Petroleum Corp
China Petroleum Logging Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Petroleum Corp, China Petroleum Logging Co Ltd filed Critical China National Petroleum Corp
Priority to CN202110039023.3A priority Critical patent/CN112861508A/en
Publication of CN112861508A publication Critical patent/CN112861508A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for standardizing logging discrete data, which comprise the following steps: determining an analysis template to be used according to the data formats of the original file and the required result data; if the corresponding analysis template does not exist, the analysis template is manufactured according to the original file and the required result data, and the template is stored; loading an original file, analyzing the original file by a program according to a used template, and forming result data; if the formed result data do not meet the requirements, adjusting template parameters, correcting the result data, and storing a final template; and writing the finally formed result into the target platform.

Description

Standardization method and system for logging discrete data
Technical Field
The invention belongs to the technical field of petroleum exploration logging interpretation, and particularly relates to a method and a system for standardizing logging discrete data.
Background
The qualified data source is a necessary condition for well logging interpretation work, various data can be used in the well logging interpretation work, and the particularity of discrete data brings inconvenience to the interpretation work. Different from other data with fixed format, the organization of discrete data is free, different units have certain difference between the organization modes of the same data, and different interpretation platforms have difference in definition of the discrete data, so that before the data is used, data normalization is needed. In the traditional data specification work, a large amount of manual arrangement work needs to be carried out on each data file, and finally a required data source is formed, so that a large amount of work is increased invisibly.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a system for standardizing discrete logging data, a user can customize a standard template for each type of data, and when the data is standardized, the corresponding template can be directly used without doing a large amount of repetitive work, so that interpreters are liberated from fussy data preparation work.
In order to achieve the purpose, the invention provides the following technical scheme: a normalization method for logging discrete data specifically comprises the following steps: acquiring a data format of an original file to be analyzed and required result data;
analyzing the original file to be analyzed into required achievement data, determining whether the data format of the obtained required achievement data is correct, and if the data format of the obtained required achievement data is incorrect, re-analyzing the original file to be analyzed to obtain final achievement data;
and writing the final result data into a required system or file.
The invention also provides a system for realizing the well logging discrete data standardization method, which is characterized by comprising the following steps:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original file to be analyzed and a data format of required result data, and the type of the original file to be analyzed is an Excel file or a text file;
the analysis module comprises an analysis template, the analysis template is used for receiving an original file to be analyzed, analyzing the original file to be analyzed into required achievement data, judging whether the data format of the obtained required achievement data is correct or not, and re-analyzing the original file to be analyzed if the data format of the obtained required achievement data is incorrect to obtain final achievement data;
and the writing module is used for receiving the final result data and writing the final result data into the target platform.
Further, the analysis template is manufactured according to the data format of the required result data, and if the original file to be analyzed does not have the corresponding analysis template, the analysis template is manufactured firstly; and if the data format of the obtained required result data is incorrect, adjusting the parameters of the analysis template, re-analyzing the original file to obtain the final result data, and storing the adjusted analysis template.
Further, the analysis template is used for analyzing the Excel file and comprises a form index, a header line, a start line, a blank line number, a key column, a reading mode and a target column index.
Further, the form index is used for specifying an index where an Excel file needing to be read is located;
the header row is used for designating the row where the data header in the Excel file is located;
the starting line is used for appointing the position of the starting line for reading data in the Excel file;
the blank line number is used for judging whether to finish reading the data in the Excel file, and when the blank line number in the original file exceeds the blank line number, the analysis template finishes reading the data in the Excel file;
the key column is used for judging the validity of row data in the Excel file, and if the key column in the row data is a null value, the row data is invalid;
the automatic segmentation is used for determining columns needing to be segmented and separators needing to be segmented in the Excel file;
the reading mode is used for determining the format of the cells in the read Excel file;
the target column index is used for storing the corresponding relation between the column index in the Excel file and the column index in the target file, so that the data in the Excel file is converted into the required result data.
Further, the parsing template is used for parsing the text file, and the parsing template includes the following contents: a header row, a data row, a separator, and a target column index.
Further, the header line is used for specifying a line in which the header is located in the text file;
the data line is used for specifying the position of a starting line of data in the text file;
the separator is used for cutting each line of data in the text file into a plurality of columns, the line separator in the text file is 'n', and the column separator is designated by a user;
the target column index is used for storing the corresponding relation between the column index in the text file and the column index in the target file, so that the original data is converted into the required result data.
Furthermore, the parsing template is named and stored to an XML file, and the corresponding parsing template can be directly called through the name of the parsing template when the original file is parsed.
Further, the analysis template is analyzed through a template manager, the template manager extracts an analysis rule of the analysis template, reads data in the original file according to the analysis rule to form a two-dimensional data table, and then forms required result data according to a column corresponding relation in the analysis rule.
Furthermore, the making and the adjusting of the analysis template can be interactively carried out in a software interface.
Compared with the prior art, the invention has at least the following beneficial effects:
the normalization method for discrete logging data provided by the invention analyzes the discrete logging data, namely the original file, to obtain the required result data, improves the efficiency of normalizing the discrete logging data to obtain the required result data, lightens the data preparation work of processing interpreters, and avoids the problem that the discrete logging data can be used by the processing interpreters after being subjected to a large amount of complicated normalization arrangement due to the non-normalization of the discrete logging data.
The system for realizing the logging discrete data standardization method comprises the steps of defining an analysis template for standardizing data in an analysis module, manufacturing and adjusting the analysis template, storing the final analysis template in a configuration file, and realizing repeated calling of the analysis template.
Drawings
Fig. 1 is a system flow diagram.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
The invention provides a standardization method for logging discrete data, which comprises the steps of obtaining an original file to be analyzed and a data format of required result data, wherein the type of the original file to be analyzed is an Excel file or a text file;
analyzing the original file to be analyzed into required achievement data, determining whether the data format of the obtained required achievement data is correct, and if the data format of the obtained required achievement data is incorrect, re-analyzing the original file to be analyzed to obtain final achievement data;
and writing the final result data into a required system or file.
The specific steps of the normalization method are as follows:
step 1, obtaining the data format of the original file and the required result data, and determining the analysis template which is required to be used and corresponds to the original file, if the corresponding analysis template exists, directly jumping to step 3, otherwise, executing step 2.
Step 2, manufacturing an analysis template according to the data format of the original file and the required achievement data, storing the analysis template, and interactively finishing the process of manufacturing the template in a software interface, wherein the type of the original file comprises an Excel file or a text file;
step 3, loading an original file, analyzing the original file by the program according to the used analysis template, and obtaining result data; if the data format of the formed result data is correct, the step 5 can be skipped, otherwise, the step 4 is executed.
And 4, adjusting parameters of the analysis template, correcting the data format of the result data, and storing the final analysis template.
And 5, calling a platform interface, and importing the finally formed result data into the target software platform.
Preferably, in step 3, the parsing template is named and stored in an XML file, and the corresponding parsing template can be directly called by the name of the parsing template when the original file is parsed.
Preferably, in step 3, the template manager is used for parsing the parsing template in the XML file, the template manager extracts the parsing rule of the parsing template, reads the data in the original file according to the parsing rule to form a two-dimensional data table, and then forms the required result data according to the column corresponding relationship in the parsing rule.
Preferably, a plurality of rules can be stored in the parsing template, the user selects a proper rule to parse the original file, and how to reorganize the original file into data in a required format is defined in the parsing rule.
Preferably, in step 4, parameters of the parsing template are interactively adjusted based on a software interface.
Preferably, in step 5, the target software platform finally adopted in this embodiment is a logging processing interpretation platform, and platforms adopted by different users are different, depending on the circumstances.
The invention provides a system for realizing a well logging discrete data standardization method, which specifically comprises the following steps:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original file to be analyzed and a data format of required result data, and the type of the original file to be analyzed is an Excel file or a text file;
the analysis module comprises an analysis template, the analysis template is used for receiving an original file to be analyzed, analyzing the original file to be analyzed into required achievement data, judging whether the data format of the obtained required achievement data is correct or not, and re-analyzing the original file to be analyzed if the data format of the obtained required achievement data is incorrect to obtain final achievement data;
and the writing module is used for receiving the final result data and writing the final result data into a required system or file.
Preferably, the parsing template for parsing the exception file includes the following contents:
a) and (3) form indexing: designating an index where an Excel file needing to be read is located;
b) a header row: specifying a row where an Excel file header is located;
c) beginning line: appointing the starting row position of reading data in the Excel file;
d) blank line number: judging whether to finish reading the data in the Excel file or not, and when the blank line number in the original file exceeds the blank line number, analyzing the template to finish reading the data in the Excel file;
e) key column: judging the validity of row data in the Excel file, and if a key column in the row has a null value, the row data is invalid;
f) automatic segmentation: determining columns needing to be segmented and segmentation separators of the Excel file;
g) the reading mode is as follows: determining formats of cells in the read Excel file, such as texts, numbers, dates and the like;
h) target column index: and storing the corresponding relation between the original column and the target column in the read Excel file, so as to convert the original data into the required result data.
Preferably, the parsing template for parsing the text file includes the following contents:
a) a header row: specifying a line in which a header is located in the text file;
b) data row: specifying the position of a start line of data in a text file;
c) a separator: the separator is used for cutting each line of data in the text file into a plurality of columns, the line separator in the text file is "\ n", and the column separator is designated by a user, such as ","; "," | "," \ t "," \\ b ", etc.
e) Target column index: and storing the corresponding relation between the original column and the target column in the read text file, so as to convert the original data into required result data.
Preferably, the operation of interactively arranging the data can be stored as a template, and when the data in the same format is arranged in a standard mode, the arrangement work can be completed in one key mode only by selecting the corresponding template.
Taking a logging processing interpretation platform as an example, data of the table 1 is subjected to data specification processing, a filling column function of the specification method automatically fills data such as well names, horizons and the like into corresponding cells, a column splitting function automatically splits a depth column into two columns of data, and data columns of original files correspond to target formats one by one through interactive adjustment of headers, so that the results after the specification processing are shown in table 2:
TABLE 1 raw discrete data
Figure BDA0002894899060000061
Figure BDA0002894899060000071
TABLE 2 results after normalization
Figure BDA0002894899060000072
In summary, it can be seen that table 1 is an original discrete data file, table 2 is a normalized final result, it is found through comparison that information such as well names and levels of multiple rows of data in table 1 is the same, merging cells are adopted, starting depths and ending depths are stored in the same column by using connectors, a certain difference exists between the names of headers in the original file and a target format, and the sequence of data columns does not completely correspond to the target format.
The original data provided by the production unit is generally in Excel or text format, and the data formats provided by different units are different, and for aesthetic or convenient viewing, the original data is processed (for example, a cell is merged, a plurality of data are stored in the same cell, etc.), so that the original data formats are diversified.
When such discrete data is used in different software platforms, an original Excel file needs to be converted into a format which can be recognized by the platform, the same piece of data has different storage formats on different software platforms, for example, a certain column of data is stored in a first column in a first platform, but may be stored in a second column in a second platform, and when the platform reads the data, the corresponding data can only be read according to the specified column, so that when the original data is loaded into the software platform, each column of data needs to be strictly in one-to-one correspondence.
The traditional mode is that each line of data in the original data is manually copied to a line designated by a software platform, or an original file is manually arranged into a formulated format and then is imported, so that the method is complicated, and based on a template analysis mode, for the same kind of data, only a corresponding analysis template needs to be made, and after the data is loaded, software automatically calls the corresponding template to quickly realize the standardization of the data.
The main object of the present invention is to provide a method for quickly importing original data into a software platform, rather than making the data itself better usable, because different software platforms already define the storage mode of the data, and regardless of which mode the data is loaded, the format finally stored in the software platform is the same, otherwise the data cannot be used.

Claims (10)

1. A normalization method for discrete logging data is characterized by specifically comprising the following steps: acquiring a data format of an original file to be analyzed and required result data;
analyzing the original file to be analyzed into required achievement data, determining whether the data format of the obtained required achievement data is correct, and if the data format of the obtained required achievement data is incorrect, re-analyzing the original file to be analyzed to obtain final achievement data;
and writing the final result data into a required system or file.
2. A system for realizing a well logging discrete data specification method is characterized by specifically comprising the following steps:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original file to be analyzed and a data format of required result data, and the type of the original file to be analyzed is an Excel file or a text file;
the analysis module comprises an analysis template, the analysis template is used for receiving an original file to be analyzed, analyzing the original file to be analyzed into required achievement data, judging whether the data format of the obtained required achievement data is correct or not, and re-analyzing the original file to be analyzed if the data format of the obtained required achievement data is incorrect to obtain final achievement data;
and the writing module is used for receiving the final result data and writing the final result data into the target platform.
3. The system for realizing the well logging discrete data specification method according to claim 2, wherein the analysis template is manufactured according to the data format of the required result data, and if the original file to be analyzed has no corresponding analysis template, the analysis template is manufactured first; and if the data format of the obtained required result data is incorrect, adjusting the parameters of the analysis template, re-analyzing the original file to obtain the final result data, and storing the adjusted analysis template.
4. The system for implementing the well logging discrete data specification method according to claim 2, wherein the parsing template is used for parsing an Excel file, and the parsing template includes a form index, a header row, a start row, a blank row, a key column, a reading mode, and a target column index.
5. The system for implementing a well logging discrete data specification method according to claim 4,
the form index is used for designating an index where an Excel file needing to be read is located;
the header row is used for designating the row where the data header in the Excel file is located;
the starting line is used for appointing the position of the starting line for reading data in the Excel file;
the blank line number is used for judging whether to finish reading the data in the Excel file, and when the blank line number in the original file exceeds the blank line number, the analysis template finishes reading the data in the Excel file;
the key column is used for judging the validity of row data in the Excel file, and if the key column in the row data is a null value, the row data is invalid;
the automatic segmentation is used for determining columns needing to be segmented and separators needing to be segmented in the Excel file;
the reading mode is used for determining the format of the cells in the read Excel file;
the target column index is used for storing the corresponding relation between the column index in the Excel file and the column index in the target file, so that the data in the Excel file is converted into the required result data.
6. The system for implementing the well logging discrete data specification method according to claim 2, wherein the parsing template is used for parsing a text file, and the parsing template comprises the following contents: a header row, a data row, a separator, and a target column index.
7. The system for implementing a well logging discrete data specification method according to claim 6,
the header line is used for designating the line of the header in the text file;
the data line is used for specifying the position of a starting line of data in the text file;
the separator is used for cutting each line of data in the text file into a plurality of columns, the line separator in the text file is 'n', and the column separator is designated by a user;
the target column index is used for storing the corresponding relation between the column index in the text file and the column index in the target file, so that the original data is converted into the required result data.
8. The system for implementing the well logging discrete data specification method as claimed in claim 2, wherein the parsing template is named and saved to an XML file, and the corresponding parsing template can be called directly by the name of the parsing template when the original file is parsed.
9. The system for realizing the well logging discrete data specification method according to claim 2, wherein the analysis template is analyzed through a template manager, the template manager extracts an analysis rule of the analysis template, reads data in the original file according to the analysis rule to form a two-dimensional data table, and then forms required achievement data according to a column corresponding relation in the analysis rule.
10. The system for implementing the well logging discrete data specification method according to claim 3, wherein the making and adjusting of the parsing template can be interactively performed in a software interface.
CN202110039023.3A 2021-01-12 2021-01-12 Standardization method and system for logging discrete data Pending CN112861508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110039023.3A CN112861508A (en) 2021-01-12 2021-01-12 Standardization method and system for logging discrete data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110039023.3A CN112861508A (en) 2021-01-12 2021-01-12 Standardization method and system for logging discrete data

Publications (1)

Publication Number Publication Date
CN112861508A true CN112861508A (en) 2021-05-28

Family

ID=76003018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110039023.3A Pending CN112861508A (en) 2021-01-12 2021-01-12 Standardization method and system for logging discrete data

Country Status (1)

Country Link
CN (1) CN112861508A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297152A (en) * 2021-11-30 2022-04-08 厦门市美亚柏科信息股份有限公司 Data reporting method and terminal equipment
CN114661811A (en) * 2022-03-07 2022-06-24 深圳市欢太数字科技有限公司 Data display method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216952A (en) * 2014-08-20 2014-12-17 烽火通信科技股份有限公司 Universal report generation method and universal report generation system based on XML (extensive markup language) technology
CN108241642A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 Document analysis method and apparatus
CN108763185A (en) * 2018-05-31 2018-11-06 苏州市计量测试院 The method of calibration and system of Excel file
CN109543164A (en) * 2018-11-13 2019-03-29 中煤地第勘探局有限责任公司 The method of Autocad combination Excel generation log sheet
CN111309313A (en) * 2019-10-17 2020-06-19 天津大学 Method for quickly generating HTML (hypertext markup language) and storing form data
CN111400387A (en) * 2020-03-18 2020-07-10 浩云科技股份有限公司 Conversion method and device for import and export data, terminal equipment and storage medium
CN111444254A (en) * 2020-03-30 2020-07-24 北京东方金信科技有限公司 SK L system file format conversion method and system
CN111666114A (en) * 2020-04-28 2020-09-15 中国石油天然气集团有限公司 Plug-in type well logging data conversion method
CN111787061A (en) * 2020-05-28 2020-10-16 中国石油天然气集团有限公司 Transmission method of well site real-time logging multivariate data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216952A (en) * 2014-08-20 2014-12-17 烽火通信科技股份有限公司 Universal report generation method and universal report generation system based on XML (extensive markup language) technology
CN108241642A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 Document analysis method and apparatus
CN108763185A (en) * 2018-05-31 2018-11-06 苏州市计量测试院 The method of calibration and system of Excel file
CN109543164A (en) * 2018-11-13 2019-03-29 中煤地第勘探局有限责任公司 The method of Autocad combination Excel generation log sheet
CN111309313A (en) * 2019-10-17 2020-06-19 天津大学 Method for quickly generating HTML (hypertext markup language) and storing form data
CN111400387A (en) * 2020-03-18 2020-07-10 浩云科技股份有限公司 Conversion method and device for import and export data, terminal equipment and storage medium
CN111444254A (en) * 2020-03-30 2020-07-24 北京东方金信科技有限公司 SK L system file format conversion method and system
CN111666114A (en) * 2020-04-28 2020-09-15 中国石油天然气集团有限公司 Plug-in type well logging data conversion method
CN111787061A (en) * 2020-05-28 2020-10-16 中国石油天然气集团有限公司 Transmission method of well site real-time logging multivariate data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297152A (en) * 2021-11-30 2022-04-08 厦门市美亚柏科信息股份有限公司 Data reporting method and terminal equipment
CN114661811A (en) * 2022-03-07 2022-06-24 深圳市欢太数字科技有限公司 Data display method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US9092417B2 (en) Systems and methods for extracting data from a document in an electronic format
CN112861508A (en) Standardization method and system for logging discrete data
US20160055376A1 (en) Method and system for identification and extraction of data from structured documents
CN106844324B (en) Method for exporting variable column data into Excel format
DE10321944A1 (en) Devices and methods for processing text-based electronic documents
CN111897528B (en) Low-code platform for enterprise online education
CN110795858A (en) Method and device for generating home decoration design drawing
CN114782970B (en) Table extraction method, system and readable medium
CN112286934A (en) Database table importing method, device, equipment and medium
CN118332072B (en) Intelligent document retrieval generation method and system based on RAG technology
US5895473A (en) System for extracting text from CAD files
CN115391439B (en) Document data export method, device, electronic equipment and storage medium
CN117725437B (en) Machine learning-based data accurate matching analysis method
CN104536998A (en) Data import method and device
CN111708810A (en) Model optimization recommendation method and device and computer storage medium
CN112214473B (en) Data migration method and system between databases
CN112416340B (en) Webpage generation method and system based on sketch
CN113238865A (en) Method for quickly constructing knowledge graph based on Excel one-key import
CN116932694A (en) Intelligent retrieval method, device and storage medium for knowledge base
CN116186144A (en) Automatic formatting processing method and system for mine remote sensing monitoring data
CN114492436B (en) Audit interview information processing method, device and system
CN114118018A (en) Implementation mode of cross-platform electronic report
CN114840673A (en) Multi-source heterogeneous marine environment data integration method based on NetCDF
CN112632132A (en) Method, device and equipment for processing abnormal import data
CN118093597B (en) Table data reconstruction method and device and question-answering method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination