CN107544090B

CN107544090B - Seismic data analyzing and storing method based on MapReduce

Info

Publication number: CN107544090B
Application number: CN201710813111.8A
Authority: CN
Inventors: 李克文; 谢鹏; 冯德永; 朱剑兵; 李萍
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2021-08-24
Anticipated expiration: 2037-09-11
Also published as: CN107544090A

Abstract

The invention discloses a seismic data analyzing and storing method based on MapReduce, which mainly adopts the principle that SEG-Y files are analyzed in a distributed mode through a MapReduce framework, and analyzed seismic attribute values are stored in an HBase distributed column storage database through a Phoenix interface. Compared with single-machine operation, the method can analyze a plurality of SEG-Y files in parallel, quickly and synchronously and store the SEG-Y files into HBase for later use; the seismic attributes of the required work area can be inquired and derived in HBase through Phoenix, and a large amount of time is saved compared with the method for deriving data with the same size through single-machine work.

Description

Seismic data analyzing and storing method based on MapReduce

Technical Field

The invention belongs to the field of geophysical exploration and the field of machine learning, and particularly relates to a seismic data analyzing and storing method based on MapReduce.

Background

At present, the number of seismic attribute data volumes is hundreds, each seismic attribute is stored by an SEG-Y file, and each SEG-Y file occupies a large space due to the fact that a seismic work area is large and the network measurement precision is high. The conventional seismic attribute extraction method is as follows: the SEG-Y file is imported by adopting seismic analysis software, certain seismic data of a required work area are taken out by reading byte sections of target data, but the space occupied by the SEG-Y file is increased due to the improvement of exploration technology, and a plurality of SEG-Y files need to be read and analyzed when a plurality of seismic attribute values in a certain work area are extracted from a single machine, so that the reading efficiency and the query efficiency are low.

Disclosure of Invention

In order to overcome the defects of single-machine analysis of the existing seismic SEG-Y file, the invention provides a seismic data analysis and storage method based on MapReduce.

In order to achieve the purpose, the technical scheme of the invention is as follows: firstly, storing the seismic attribute SEG-Y file on an HDFS (Hadoop distributed file system) of Hadoop, then segmenting the SEG-Y file in an InputFormat of MapReduce, cutting 3600 bytes of a track head, wherein the size of each Split is the number of bytes occupied by a plurality of tracks of track data, so as to ensure the data integrity during Map stage analysis; and reading data of one track at a time in the Map stage for analysis so as to prevent the overload of the Map content from causing node crash, and exporting the analyzed seismic attributes to an HBase distributed column storage database through a Phoenix interface from the output of the Map stage.

The invention has the beneficial effects that: the distributed analysis and storage of the seismic files are realized, the method for extracting the SEG-Y files by a single computer is uploaded to a cluster, a MapReduce framework is used for rapidly and synchronously analyzing a plurality of SEG-Y files and storing the SEG-Y files in HBase for later use; the seismic attributes of the required work area can be inquired and derived in HBase through Phoenix, and a large amount of time is saved compared with the method for deriving data with the same size through single-machine work.

Drawings

FIG. 1 is a MapReduce analysis storage process of the invention

FIG. 2 is a source data format of the present invention

FIG. 3 shows a data storage structure in HBase of the present invention

In the upper diagram: 11. file uploading, 12.InputFormat fragmentation, 13.Mapper task, 14.Phoenix interface, 15.HBase database, 21.SEG-Y file format, and 31.HBase table data structure.

Detailed Description

Fig. 1 is a flow chart of a MapReduce parsing storage process of the present invention, and the matching method is divided into three stages, specifically including:

SEG-Y file header parsing: reading the first 3600 bytes of the SEG-Y file, analyzing and obtaining the TIME depth TIME and the track number TRACES of the SEG-Y file;

b, InputFormat design: the method comprises the steps of uploading (11) the SEG-Y file to the HDFS, taking the SEG-Y file uploaded to the HDFS as an input file of MapReduce, removing 3600 bytes from an InputFormat fragment 12 of the MapReduce, enabling the length of each Split to be TIME 4N, enabling N to be a positive integer, and adjusting according to single-machine performance to keep load balance and data integrity;

c, mapper task design: each Mapper task 13 fetches a track length of data from Split, i.e., TIME 4 bytes, parses the first 240 bytes and fetches the track number of the track, and then parses each 4 bytes as IBM floating point numbers to get the value. The output of each Mapper is directly stored in the HBase database 15 through the Phoenix interface 14 without passing through the Reduce process, so that the effects of reducing the I/O process and shortening the processing time are achieved. FIG. 2 is a data structure diagram of the present invention, where a seismic data 11 is matched to a plurality of well data 12 in a well-seismic data set specification 21, the matching method being accurate for each seismic grid's well data as compared to the original time-depth conversion.

Examples

The source data format used by the invention is seismic SEG-Y file format 21, and by taking 1001 data points in each channel as an example, 3600 bytes of header data are removed, and data in one channel are divided into 240 bytes of channel header data and 4004 bytes of channel data. Firstly, uploading 11 the SEG-Y file to the HDFS, where the byte number of each InputFormat fragment 12 must be an integer multiple of 4244, and a Mapper task 13 needs to be responsible for parsing n × 4244 bytes of input stream to ensure data integrity. And storing the data in the Map stage into an HBase database 15 through a Phoenix interface 14, wherein the data structure of an HBase table is shown as 31, and inquiring the HBase and exporting the data through the Phoenix interface.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and scope of the invention. Any simple modification, change or amendment to the above-mentioned embodiments according to the technical solutions of the present invention without departing from the technical solutions of the present invention belong to the protection scope of the technical solutions of the present invention.

Claims

1. A seismic data analysis and storage method based on MapReduce is characterized by comprising the following steps:

SEG-Y file header parsing:

reading the first 3600 bytes of the SEG-Y file, analyzing and obtaining the TIME depth TIME and the track number TRACES of the SEG-Y file;

b, InputFormat design:

uploading the SEG-Y file to the HDFS, taking the SEG-Y file uploaded to the HDFS as an input file of MapReduce, removing 3600 bytes from an InputFormat fragment of the MapReduce, wherein the length of each Split is TIME 4N, N is a positive integer, and the SEG-Y file can be adjusted according to single machine performance to keep load balance and data integrity;

c, mapper task design:

each Mapper task obtains the data length of one track from Split, namely TIME 4 bytes, analyzes the first 240 bytes and obtains the track number of the track data, then analyzes each 4 bytes according to IBM floating point number to obtain a numerical value, and the output of each Mapper is directly stored in an HBase database through a Phoenix interface without a Reduce process, so that the purpose of shortening the I/O processing TIME is achieved.