CN112861508A

CN112861508A - Standardization method and system for logging discrete data

Info

Publication number: CN112861508A
Application number: CN202110039023.3A
Authority: CN
Inventors: 余长江; 杜钦波; 李国军; 张娟; 段先斐; 刘昱晟
Original assignee: China National Petroleum Corp; China Petroleum Logging Co Ltd
Current assignee: China National Petroleum Corp; China Petroleum Logging Co Ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-05-28

Abstract

The invention discloses a method and a system for standardizing logging discrete data, which comprise the following steps: determining an analysis template to be used according to the data formats of the original file and the required result data; if the corresponding analysis template does not exist, the analysis template is manufactured according to the original file and the required result data, and the template is stored; loading an original file, analyzing the original file by a program according to a used template, and forming result data; if the formed result data do not meet the requirements, adjusting template parameters, correcting the result data, and storing a final template; and writing the finally formed result into the target platform.

Description

Standardization method and system for logging discrete data

Technical Field

The invention belongs to the technical field of petroleum exploration logging interpretation, and particularly relates to a method and a system for standardizing logging discrete data.

Background

The qualified data source is a necessary condition for well logging interpretation work, various data can be used in the well logging interpretation work, and the particularity of discrete data brings inconvenience to the interpretation work. Different from other data with fixed format, the organization of discrete data is free, different units have certain difference between the organization modes of the same data, and different interpretation platforms have difference in definition of the discrete data, so that before the data is used, data normalization is needed. In the traditional data specification work, a large amount of manual arrangement work needs to be carried out on each data file, and finally a required data source is formed, so that a large amount of work is increased invisibly.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method and a system for standardizing discrete logging data, a user can customize a standard template for each type of data, and when the data is standardized, the corresponding template can be directly used without doing a large amount of repetitive work, so that interpreters are liberated from fussy data preparation work.

In order to achieve the purpose, the invention provides the following technical scheme: a normalization method for logging discrete data specifically comprises the following steps: acquiring a data format of an original file to be analyzed and required result data;

analyzing the original file to be analyzed into required achievement data, determining whether the data format of the obtained required achievement data is correct, and if the data format of the obtained required achievement data is incorrect, re-analyzing the original file to be analyzed to obtain final achievement data;

and writing the final result data into a required system or file.

The invention also provides a system for realizing the well logging discrete data standardization method, which is characterized by comprising the following steps:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original file to be analyzed and a data format of required result data, and the type of the original file to be analyzed is an Excel file or a text file;

the analysis module comprises an analysis template, the analysis template is used for receiving an original file to be analyzed, analyzing the original file to be analyzed into required achievement data, judging whether the data format of the obtained required achievement data is correct or not, and re-analyzing the original file to be analyzed if the data format of the obtained required achievement data is incorrect to obtain final achievement data;

and the writing module is used for receiving the final result data and writing the final result data into the target platform.

Further, the analysis template is manufactured according to the data format of the required result data, and if the original file to be analyzed does not have the corresponding analysis template, the analysis template is manufactured firstly; and if the data format of the obtained required result data is incorrect, adjusting the parameters of the analysis template, re-analyzing the original file to obtain the final result data, and storing the adjusted analysis template.

Further, the analysis template is used for analyzing the Excel file and comprises a form index, a header line, a start line, a blank line number, a key column, a reading mode and a target column index.

Further, the form index is used for specifying an index where an Excel file needing to be read is located;

the header row is used for designating the row where the data header in the Excel file is located;

the starting line is used for appointing the position of the starting line for reading data in the Excel file;

the blank line number is used for judging whether to finish reading the data in the Excel file, and when the blank line number in the original file exceeds the blank line number, the analysis template finishes reading the data in the Excel file;

the key column is used for judging the validity of row data in the Excel file, and if the key column in the row data is a null value, the row data is invalid;

the automatic segmentation is used for determining columns needing to be segmented and separators needing to be segmented in the Excel file;

the reading mode is used for determining the format of the cells in the read Excel file;

the target column index is used for storing the corresponding relation between the column index in the Excel file and the column index in the target file, so that the data in the Excel file is converted into the required result data.

Further, the parsing template is used for parsing the text file, and the parsing template includes the following contents: a header row, a data row, a separator, and a target column index.

Further, the header line is used for specifying a line in which the header is located in the text file;

the data line is used for specifying the position of a starting line of data in the text file;

the separator is used for cutting each line of data in the text file into a plurality of columns, the line separator in the text file is 'n', and the column separator is designated by a user;

the target column index is used for storing the corresponding relation between the column index in the text file and the column index in the target file, so that the original data is converted into the required result data.

Furthermore, the parsing template is named and stored to an XML file, and the corresponding parsing template can be directly called through the name of the parsing template when the original file is parsed.

Further, the analysis template is analyzed through a template manager, the template manager extracts an analysis rule of the analysis template, reads data in the original file according to the analysis rule to form a two-dimensional data table, and then forms required result data according to a column corresponding relation in the analysis rule.

Furthermore, the making and the adjusting of the analysis template can be interactively carried out in a software interface.

Compared with the prior art, the invention has at least the following beneficial effects:

the normalization method for discrete logging data provided by the invention analyzes the discrete logging data, namely the original file, to obtain the required result data, improves the efficiency of normalizing the discrete logging data to obtain the required result data, lightens the data preparation work of processing interpreters, and avoids the problem that the discrete logging data can be used by the processing interpreters after being subjected to a large amount of complicated normalization arrangement due to the non-normalization of the discrete logging data.

The system for realizing the logging discrete data standardization method comprises the steps of defining an analysis template for standardizing data in an analysis module, manufacturing and adjusting the analysis template, storing the final analysis template in a configuration file, and realizing repeated calling of the analysis template.

Drawings

Fig. 1 is a system flow diagram.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

The invention provides a standardization method for logging discrete data, which comprises the steps of obtaining an original file to be analyzed and a data format of required result data, wherein the type of the original file to be analyzed is an Excel file or a text file;

and writing the final result data into a required system or file.

The specific steps of the normalization method are as follows:

step 1, obtaining the data format of the original file and the required result data, and determining the analysis template which is required to be used and corresponds to the original file, if the corresponding analysis template exists, directly jumping to step 3, otherwise, executing step 2.

Step 2, manufacturing an analysis template according to the data format of the original file and the required achievement data, storing the analysis template, and interactively finishing the process of manufacturing the template in a software interface, wherein the type of the original file comprises an Excel file or a text file;

step 3, loading an original file, analyzing the original file by the program according to the used analysis template, and obtaining result data; if the data format of the formed result data is correct, the step 5 can be skipped, otherwise, the step 4 is executed.

And 4, adjusting parameters of the analysis template, correcting the data format of the result data, and storing the final analysis template.

And 5, calling a platform interface, and importing the finally formed result data into the target software platform.

Preferably, in step 3, the parsing template is named and stored in an XML file, and the corresponding parsing template can be directly called by the name of the parsing template when the original file is parsed.

Preferably, in step 3, the template manager is used for parsing the parsing template in the XML file, the template manager extracts the parsing rule of the parsing template, reads the data in the original file according to the parsing rule to form a two-dimensional data table, and then forms the required result data according to the column corresponding relationship in the parsing rule.

Preferably, a plurality of rules can be stored in the parsing template, the user selects a proper rule to parse the original file, and how to reorganize the original file into data in a required format is defined in the parsing rule.

Preferably, in step 4, parameters of the parsing template are interactively adjusted based on a software interface.

Preferably, in step 5, the target software platform finally adopted in this embodiment is a logging processing interpretation platform, and platforms adopted by different users are different, depending on the circumstances.

The invention provides a system for realizing a well logging discrete data standardization method, which specifically comprises the following steps:

and the writing module is used for receiving the final result data and writing the final result data into a required system or file.

Preferably, the parsing template for parsing the exception file includes the following contents:

a) and (3) form indexing: designating an index where an Excel file needing to be read is located;

b) a header row: specifying a row where an Excel file header is located;

c) beginning line: appointing the starting row position of reading data in the Excel file;

d) blank line number: judging whether to finish reading the data in the Excel file or not, and when the blank line number in the original file exceeds the blank line number, analyzing the template to finish reading the data in the Excel file;

e) key column: judging the validity of row data in the Excel file, and if a key column in the row has a null value, the row data is invalid;

f) automatic segmentation: determining columns needing to be segmented and segmentation separators of the Excel file;

g) the reading mode is as follows: determining formats of cells in the read Excel file, such as texts, numbers, dates and the like;

h) target column index: and storing the corresponding relation between the original column and the target column in the read Excel file, so as to convert the original data into the required result data.

Preferably, the parsing template for parsing the text file includes the following contents:

a) a header row: specifying a line in which a header is located in the text file;

b) data row: specifying the position of a start line of data in a text file;

c) a separator: the separator is used for cutting each line of data in the text file into a plurality of columns, the line separator in the text file is "\ n", and the column separator is designated by a user, such as ","; "," | "," \ t "," \\ b ", etc.

e) Target column index: and storing the corresponding relation between the original column and the target column in the read text file, so as to convert the original data into required result data.

Preferably, the operation of interactively arranging the data can be stored as a template, and when the data in the same format is arranged in a standard mode, the arrangement work can be completed in one key mode only by selecting the corresponding template.

Taking a logging processing interpretation platform as an example, data of the table 1 is subjected to data specification processing, a filling column function of the specification method automatically fills data such as well names, horizons and the like into corresponding cells, a column splitting function automatically splits a depth column into two columns of data, and data columns of original files correspond to target formats one by one through interactive adjustment of headers, so that the results after the specification processing are shown in table 2:

TABLE 1 raw discrete data

TABLE 2 results after normalization

In summary, it can be seen that table 1 is an original discrete data file, table 2 is a normalized final result, it is found through comparison that information such as well names and levels of multiple rows of data in table 1 is the same, merging cells are adopted, starting depths and ending depths are stored in the same column by using connectors, a certain difference exists between the names of headers in the original file and a target format, and the sequence of data columns does not completely correspond to the target format.

The original data provided by the production unit is generally in Excel or text format, and the data formats provided by different units are different, and for aesthetic or convenient viewing, the original data is processed (for example, a cell is merged, a plurality of data are stored in the same cell, etc.), so that the original data formats are diversified.

When such discrete data is used in different software platforms, an original Excel file needs to be converted into a format which can be recognized by the platform, the same piece of data has different storage formats on different software platforms, for example, a certain column of data is stored in a first column in a first platform, but may be stored in a second column in a second platform, and when the platform reads the data, the corresponding data can only be read according to the specified column, so that when the original data is loaded into the software platform, each column of data needs to be strictly in one-to-one correspondence.

The traditional mode is that each line of data in the original data is manually copied to a line designated by a software platform, or an original file is manually arranged into a formulated format and then is imported, so that the method is complicated, and based on a template analysis mode, for the same kind of data, only a corresponding analysis template needs to be made, and after the data is loaded, software automatically calls the corresponding template to quickly realize the standardization of the data.

The main object of the present invention is to provide a method for quickly importing original data into a software platform, rather than making the data itself better usable, because different software platforms already define the storage mode of the data, and regardless of which mode the data is loaded, the format finally stored in the software platform is the same, otherwise the data cannot be used.

Claims

1. A normalization method for discrete logging data is characterized by specifically comprising the following steps: acquiring a data format of an original file to be analyzed and required result data;

and writing the final result data into a required system or file.

2. A system for realizing a well logging discrete data specification method is characterized by specifically comprising the following steps:

3. The system for realizing the well logging discrete data specification method according to claim 2, wherein the analysis template is manufactured according to the data format of the required result data, and if the original file to be analyzed has no corresponding analysis template, the analysis template is manufactured first; and if the data format of the obtained required result data is incorrect, adjusting the parameters of the analysis template, re-analyzing the original file to obtain the final result data, and storing the adjusted analysis template.

4. The system for implementing the well logging discrete data specification method according to claim 2, wherein the parsing template is used for parsing an Excel file, and the parsing template includes a form index, a header row, a start row, a blank row, a key column, a reading mode, and a target column index.

5. The system for implementing a well logging discrete data specification method according to claim 4,

the form index is used for designating an index where an Excel file needing to be read is located;

6. The system for implementing the well logging discrete data specification method according to claim 2, wherein the parsing template is used for parsing a text file, and the parsing template comprises the following contents: a header row, a data row, a separator, and a target column index.

7. The system for implementing a well logging discrete data specification method according to claim 6,

the header line is used for designating the line of the header in the text file;

8. The system for implementing the well logging discrete data specification method as claimed in claim 2, wherein the parsing template is named and saved to an XML file, and the corresponding parsing template can be called directly by the name of the parsing template when the original file is parsed.

9. The system for realizing the well logging discrete data specification method according to claim 2, wherein the analysis template is analyzed through a template manager, the template manager extracts an analysis rule of the analysis template, reads data in the original file according to the analysis rule to form a two-dimensional data table, and then forms required achievement data according to a column corresponding relation in the analysis rule.

10. The system for implementing the well logging discrete data specification method according to claim 3, wherein the making and adjusting of the parsing template can be interactively performed in a software interface.