CN107544090B - Seismic data analyzing and storing method based on MapReduce - Google Patents

Seismic data analyzing and storing method based on MapReduce Download PDF

Info

Publication number
CN107544090B
CN107544090B CN201710813111.8A CN201710813111A CN107544090B CN 107544090 B CN107544090 B CN 107544090B CN 201710813111 A CN201710813111 A CN 201710813111A CN 107544090 B CN107544090 B CN 107544090B
Authority
CN
China
Prior art keywords
seg
file
mapreduce
bytes
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710813111.8A
Other languages
Chinese (zh)
Other versions
CN107544090A (en
Inventor
李克文
谢鹏
冯德永
朱剑兵
李萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201710813111.8A priority Critical patent/CN107544090B/en
Publication of CN107544090A publication Critical patent/CN107544090A/en
Application granted granted Critical
Publication of CN107544090B publication Critical patent/CN107544090B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Geophysics And Detection Of Objects (AREA)

Abstract

The invention discloses a seismic data analyzing and storing method based on MapReduce, which mainly adopts the principle that SEG-Y files are analyzed in a distributed mode through a MapReduce framework, and analyzed seismic attribute values are stored in an HBase distributed column storage database through a Phoenix interface. Compared with single-machine operation, the method can analyze a plurality of SEG-Y files in parallel, quickly and synchronously and store the SEG-Y files into HBase for later use; the seismic attributes of the required work area can be inquired and derived in HBase through Phoenix, and a large amount of time is saved compared with the method for deriving data with the same size through single-machine work.

Description

Seismic data analyzing and storing method based on MapReduce
Technical Field
The invention belongs to the field of geophysical exploration and the field of machine learning, and particularly relates to a seismic data analyzing and storing method based on MapReduce.
Background
At present, the number of seismic attribute data volumes is hundreds, each seismic attribute is stored by an SEG-Y file, and each SEG-Y file occupies a large space due to the fact that a seismic work area is large and the network measurement precision is high. The conventional seismic attribute extraction method is as follows: the SEG-Y file is imported by adopting seismic analysis software, certain seismic data of a required work area are taken out by reading byte sections of target data, but the space occupied by the SEG-Y file is increased due to the improvement of exploration technology, and a plurality of SEG-Y files need to be read and analyzed when a plurality of seismic attribute values in a certain work area are extracted from a single machine, so that the reading efficiency and the query efficiency are low.
Disclosure of Invention
In order to overcome the defects of single-machine analysis of the existing seismic SEG-Y file, the invention provides a seismic data analysis and storage method based on MapReduce.
In order to achieve the purpose, the technical scheme of the invention is as follows: firstly, storing the seismic attribute SEG-Y file on an HDFS (Hadoop distributed file system) of Hadoop, then segmenting the SEG-Y file in an InputFormat of MapReduce, cutting 3600 bytes of a track head, wherein the size of each Split is the number of bytes occupied by a plurality of tracks of track data, so as to ensure the data integrity during Map stage analysis; and reading data of one track at a time in the Map stage for analysis so as to prevent the overload of the Map content from causing node crash, and exporting the analyzed seismic attributes to an HBase distributed column storage database through a Phoenix interface from the output of the Map stage.
The invention has the beneficial effects that: the distributed analysis and storage of the seismic files are realized, the method for extracting the SEG-Y files by a single computer is uploaded to a cluster, a MapReduce framework is used for rapidly and synchronously analyzing a plurality of SEG-Y files and storing the SEG-Y files in HBase for later use; the seismic attributes of the required work area can be inquired and derived in HBase through Phoenix, and a large amount of time is saved compared with the method for deriving data with the same size through single-machine work.
Drawings
FIG. 1 is a MapReduce analysis storage process of the invention
FIG. 2 is a source data format of the present invention
FIG. 3 shows a data storage structure in HBase of the present invention
In the upper diagram: 11. file uploading, 12.InputFormat fragmentation, 13.Mapper task, 14.Phoenix interface, 15.HBase database, 21.SEG-Y file format, and 31.HBase table data structure.
Detailed Description
Fig. 1 is a flow chart of a MapReduce parsing storage process of the present invention, and the matching method is divided into three stages, specifically including:
SEG-Y file header parsing: reading the first 3600 bytes of the SEG-Y file, analyzing and obtaining the TIME depth TIME and the track number TRACES of the SEG-Y file;
b, InputFormat design: the method comprises the steps of uploading (11) the SEG-Y file to the HDFS, taking the SEG-Y file uploaded to the HDFS as an input file of MapReduce, removing 3600 bytes from an InputFormat fragment 12 of the MapReduce, enabling the length of each Split to be TIME 4N, enabling N to be a positive integer, and adjusting according to single-machine performance to keep load balance and data integrity;
c, mapper task design: each Mapper task 13 fetches a track length of data from Split, i.e., TIME 4 bytes, parses the first 240 bytes and fetches the track number of the track, and then parses each 4 bytes as IBM floating point numbers to get the value. The output of each Mapper is directly stored in the HBase database 15 through the Phoenix interface 14 without passing through the Reduce process, so that the effects of reducing the I/O process and shortening the processing time are achieved. FIG. 2 is a data structure diagram of the present invention, where a seismic data 11 is matched to a plurality of well data 12 in a well-seismic data set specification 21, the matching method being accurate for each seismic grid's well data as compared to the original time-depth conversion.
Examples
The source data format used by the invention is seismic SEG-Y file format 21, and by taking 1001 data points in each channel as an example, 3600 bytes of header data are removed, and data in one channel are divided into 240 bytes of channel header data and 4004 bytes of channel data. Firstly, uploading 11 the SEG-Y file to the HDFS, where the byte number of each InputFormat fragment 12 must be an integer multiple of 4244, and a Mapper task 13 needs to be responsible for parsing n × 4244 bytes of input stream to ensure data integrity. And storing the data in the Map stage into an HBase database 15 through a Phoenix interface 14, wherein the data structure of an HBase table is shown as 31, and inquiring the HBase and exporting the data through the Phoenix interface.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and scope of the invention. Any simple modification, change or amendment to the above-mentioned embodiments according to the technical solutions of the present invention without departing from the technical solutions of the present invention belong to the protection scope of the technical solutions of the present invention.

Claims (1)

1. A seismic data analysis and storage method based on MapReduce is characterized by comprising the following steps:
SEG-Y file header parsing:
reading the first 3600 bytes of the SEG-Y file, analyzing and obtaining the TIME depth TIME and the track number TRACES of the SEG-Y file;
b, InputFormat design:
uploading the SEG-Y file to the HDFS, taking the SEG-Y file uploaded to the HDFS as an input file of MapReduce, removing 3600 bytes from an InputFormat fragment of the MapReduce, wherein the length of each Split is TIME 4N, N is a positive integer, and the SEG-Y file can be adjusted according to single machine performance to keep load balance and data integrity;
c, mapper task design:
each Mapper task obtains the data length of one track from Split, namely TIME 4 bytes, analyzes the first 240 bytes and obtains the track number of the track data, then analyzes each 4 bytes according to IBM floating point number to obtain a numerical value, and the output of each Mapper is directly stored in an HBase database through a Phoenix interface without a Reduce process, so that the purpose of shortening the I/O processing TIME is achieved.
CN201710813111.8A 2017-09-11 2017-09-11 Seismic data analyzing and storing method based on MapReduce Active CN107544090B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710813111.8A CN107544090B (en) 2017-09-11 2017-09-11 Seismic data analyzing and storing method based on MapReduce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710813111.8A CN107544090B (en) 2017-09-11 2017-09-11 Seismic data analyzing and storing method based on MapReduce

Publications (2)

Publication Number Publication Date
CN107544090A CN107544090A (en) 2018-01-05
CN107544090B true CN107544090B (en) 2021-08-24

Family

ID=60963374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710813111.8A Active CN107544090B (en) 2017-09-11 2017-09-11 Seismic data analyzing and storing method based on MapReduce

Country Status (1)

Country Link
CN (1) CN107544090B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125216B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546771A (en) * 2011-12-27 2012-07-04 西安博构电子信息科技有限公司 Cloud mining network public opinion monitoring system based on characteristic model
CN103336959A (en) * 2013-07-19 2013-10-02 西安电子科技大学 Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration
CN105375930A (en) * 2015-04-09 2016-03-02 国家电网公司 Energy storage power station massive data compression method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104570081B (en) * 2013-10-29 2017-12-26 中国石油化工股份有限公司 A kind of integration method pre-stack time migration Processing Seismic Data and system
US10386514B2 (en) * 2014-07-24 2019-08-20 Conocophillips Company Target-oriented process for estimating fracture attributes from seismic data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546771A (en) * 2011-12-27 2012-07-04 西安博构电子信息科技有限公司 Cloud mining network public opinion monitoring system based on characteristic model
CN103336959A (en) * 2013-07-19 2013-10-02 西安电子科技大学 Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration
CN105375930A (en) * 2015-04-09 2016-03-02 国家电网公司 Energy storage power station massive data compression method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
云计算及其关键技术;陈全,等;《计算机应用》;20090930;2562-2567 *

Also Published As

Publication number Publication date
CN107544090A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
US11907244B2 (en) Modifying field definitions to include post-processing instructions
CN110019218B (en) Data storage and query method and equipment
US9619492B2 (en) Data migration
KR101696338B1 (en) System and method for processing and analysing big data provding efficiently using columnar index data format
CN109376196B (en) Method and device for batch synchronization of redo logs
US8321476B2 (en) Method and system for determining boundary values dynamically defining key value bounds of two or more disjoint subsets of sort run-based parallel processing of data from databases
US10002142B2 (en) Method and apparatus for generating schema of non-relational database
KR102610636B1 (en) Offload parallel compute to database accelerators
CN106970929A (en) Data lead-in method and device
CN108132986B (en) Rapid processing method for test data of mass sensors of aircraft
CN112650529B (en) System and method for configurable generation of mobile terminal APP codes
CN111400361A (en) Data real-time storage method and device, computer equipment and storage medium
US10552394B2 (en) Data storage with improved efficiency
JP6313864B2 (en) Stream data processing method and stream data processing apparatus
CN107544090B (en) Seismic data analyzing and storing method based on MapReduce
CN103810197A (en) Hadoop-based data processing method and system
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN105308579A (en) Series data parallel analysis infrastructure and parallel distributed processing method therefor
CN103412942B (en) A kind of voltage dip data analysing method based on cloud computing technology
CN110990472B (en) Hbase-based data deriving method and Hbase-based data deriving device
CN112052248A (en) Audit big data processing method and system
CN104750846A (en) Method and device for finding substring
CN108256003A (en) A kind of method that union operation efficiencies are improved according to analysis Data duplication rate
CN110188160A (en) Date storage method and method for reading data
CN110968555A (en) Dimension data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant