CN115687251A

CN115687251A - Method for quickly loading and using seismic mass data

Info

Publication number: CN115687251A
Application number: CN202110841154.3A
Authority: CN
Inventors: 隋志强; 廉西猛; 张云银; 曲志鹏; 王修银; 张猛; 隆文韬
Original assignee: China Petroleum and Chemical Corp; Geophysical Research Institute of Sinopec Shengli Oilfield Co
Current assignee: China Petroleum and Chemical Corp; Geophysical Research Institute of Sinopec Shengli Oilfield Co
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-02-03

Abstract

The invention provides a method for quickly loading and using seismic mass data, which comprises the following steps: step 1, acquiring cluster environment parameters and data file information; step 2, calculating the position offset of the data block; step 3, starting MPI operation, and distributing the starting offset and the ending offset of each block to each process; step 4, scanning data blocks by each process, acquiring relevant track head and track set information, and establishing a primary index; step 5, scanning the primary index of each block and establishing a secondary index; step 6, establishing the association of the data file index file; and 7, reading data according to the index file. The method for rapidly loading and using the seismic mass data adopts a block parallel technology, can greatly improve the loading and using efficiency of the mass seismic data, is realized based on the MPI technology, and can meet the data requirements of the existing mass MPI seismic exploration algorithm.

Description

Method for quickly loading and using seismic mass data

Technical Field

The invention relates to the technical field of seismic exploration, in particular to a method for quickly loading and using seismic mass data.

Background

In order to obtain a detailed description of the complex underground structures, the development of seismic exploration technology is gradually moving towards small-element and high-density directions. This trend has led to an increasing volume of seismic data, and scan loading and processing usage for large volumes of data has become increasingly inefficient, affecting the efficiency of seismic exploration production.

In the field of seismic exploration, a computer cluster comprising a plurality of computing nodes is generally applied to realize parallel operation of a seismic exploration algorithm, and the adopted technologies mainly include an MPI (Message paging Interface) multi-process parallel technology, an OpenMP (open mesh multi-thread parallel technology and the like. The technologies realize parallel processing of the algorithm by dividing the seismic exploration algorithm into multiple tasks and distributing the tasks to multiple processes (or threads), thereby improving the operation efficiency. However, the seismic algorithm based on the parallel technology only realizes multi-task parallel of an algorithm level, and does not perform parallel processing on input and output of data. All tasks read data from one node or one large data file, which causes the situation that parallel tasks are queued to wait for obtaining data to occur frequently, and becomes a bottleneck problem of efficiency improvement.

For the problem, a solution is to use a new generation of distributed parallel framework Hadoop or Spark, and the parallel technologies adopt a distributed storage mode at the bottom layer, so that the data reading and writing efficiency is improved. At present, the scheme based on the parallel framework needs to be customized and developed according to the characteristics of seismic exploration data, and a seismic exploration algorithm needs to be greatly modified so as to be migrated under the technical framework.

The existing scheme based on the MPI parallel technology has the efficiency bottleneck of a data input and output stage, and is particularly obvious when massive seismic exploration data are processed. However, the technical scheme based on the Hadoop or Spark parallel framework has two disadvantages. On one hand, the supporting capability of the technology for seismic exploration data is not mature, on the other hand, a large number of original algorithm functions compiled based on MPI need to be compiled based on a new frame again, and can be migrated below the new parallel frame, the workload is huge, and the improvement of the seismic exploration data processing efficiency in a short period is difficult to realize.

In the application No.: CN201910950908.1, which relates to a seismic data calculation method and system based on MPI. The method can comprise the following steps: determining a plurality of minimum calculation data units of seismic data to be calculated, and obtaining a seismic data file processing record table; distributing the minimum calculation data units to a plurality of calculation nodes for calculation to obtain calculation completion data; and comparing whether the calculated data is matched with the seismic data file processing record table or not, if so, finishing the calculation of the seismic data to be calculated, and if not, recalculating the calculation node until the calculated data is matched with the seismic data file processing record table.

In application No.: CN201310489545.9, which belongs to the field of seismic data imaging processing, relates to a method for extracting a common imaging point gather of a reverse time migration offset domain. The method comprises the following steps: each computer node carries out reverse time migration processing on shot data to obtain a reverse time migration shot domain imaging data body, and the reverse time migration shot domain imaging data body is stored on a local disk of the computer node; and extracting a common imaging point gather based on the reverse time migration shot domain imaging data of the multiple nodes.

In application No.: chinese patent application CN201510320303.6 relates to a method for custom partitioning data of a Hadoop file system, which comprises the following steps: sorting the input data; according to preset data blocking parameters, blocking the sorted input data to obtain data blocks, wherein the blocking of the sorted input data comprises: recording the start position and the end position of each data block in the sorted input data in the block information corresponding to each data block; and reading corresponding data blocks from the sorted input data based on the blocking information so as to perform parallel processing.

The prior art is greatly different from the method, the technical problem which is required to be solved is not solved, and therefore a novel method for quickly loading and using the seismic mass data is invented.

Disclosure of Invention

The invention aims to provide a rapid seismic mass data loading and using method which is based on an MPI parallel technology, realizes rapid parallel loading and processing of seismic mass data and improves processing efficiency.

The object of the invention can be achieved by the following technical measures: the method for quickly loading and using the seismic mass data comprises the following steps:

step 1, acquiring cluster environment parameters and data file information;

step 2, calculating the position offset of the data block;

step 3, starting MPI operation, and distributing the starting offset and the ending offset of each block to each process;

step 4, scanning data blocks by each process, acquiring relevant track head and track set information, and establishing a primary index;

step 5, scanning the primary index of each block and establishing a secondary index;

step 6, establishing the association of the data file index file;

and 7, reading data according to the index file.

The object of the invention can also be achieved by the following technical measures:

in step 1, in the cluster environment configured with MPI, cluster environment parameters are obtained, and the size of a seismic data file and the size of single seismic channel data are obtained according to a file name.

Step 1 also includes setting the number of processes used for loading data and the size of each data block.

In step 2, according to the parameter setting, the position offset of the start track and the end track of each data block in the whole data file is calculated and stored.

In step 2, the data block takes single seismic channel data as a basic unit, and the single seismic channel is not allowed to be decomposed into two blocks; the size of the actual data chunk will not be exactly equal to the set chunk size.

In step 4, each process scans data from the starting offset to the ending offset to obtain related track head and gather information, uses the information of each gather as a record, establishes a primary index, and stores the primary index in a database or a file; and outputting the data of each block to a disk, wherein each block is output as a file.

In step 5, the primary indexes of each block are scanned, the related information of each primary index is used as a record, a secondary index is established, and the secondary index is stored in a database or a file.

In step 6, the association between the data file and the index file is established and saved to a database or file.

In step 7, when the data is used, the associated information is inquired according to the data file name, and the index file name is obtained; retrieving a secondary index of the data according to the track set number or the track number of the required data, and inquiring to obtain the data block number of the required data; then, retrieving a primary index corresponding to the block, and inquiring the position offset of the required data in the file; and reading data at the corresponding position of the corresponding data block file according to the inquired data block number and the position offset.

In step 7, the process of reading data according to the index file is run in a multi-process environment, each process is independently executed, and the required data is acquired in parallel.

Compared with the prior art, the rapid loading and using method of the seismic mass data has the following advantages:

1. compared with the traditional seismic data loading and using method, the method adopts the block parallel technology, can fully utilize cluster resources, and greatly improves the loading and using efficiency of mass seismic data

2. The field of seismic exploration has accumulated a number of seismic exploration methods and algorithms based on MPI technology. Based on the method, the high-efficiency parallel acquisition of mass data can be realized in the methods and algorithms only by a small amount of interface modification, and the processing efficiency of the seismic exploration method and algorithm is further improved.

Drawings

FIG. 1 is a flow diagram of one embodiment of a seismic mass data fast loading process of the present invention;

FIG. 2 is a flow chart of an embodiment of the rapid seismic mass data usage process of the present invention.

Detailed Description

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should also be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of the features, steps, operations and/or combinations thereof.

As shown in fig. 1, fig. 1 is a flow chart of the seismic mass data fast loading and using method of the present invention. The seismic mass data rapid loading and using method comprises the following steps:

(1) In the cluster environment with configured MPI, cluster environment parameters are obtained, and the size of a seismic data file and the size of single seismic channel data are obtained according to the file name.

(2) And setting the number of processes used for loading data and the size of each data block.

(3) And calculating and storing the position offset of the starting track and the ending track of each data block in the whole data file according to the parameter setting. Data chunking is based on a single seismic trace data unit, and does not allow a single seismic trace to be broken down into two chunks. The size of the actual data chunks will not be exactly equal to the set chunk size.

(4) Starting MPI operation, and distributing the starting offset and the ending offset of each block to each process;

(5) Each process scans data between the start offset and the end offset, acquires related trace header and trace set information, uses the information of each trace set as a record, establishes a primary index, and stores the primary index in a database or a file. And outputting the data of each block to a disk, wherein each block is output as a file.

(6) And scanning the primary indexes of each block, taking the related information of each primary index as a record, establishing a secondary index, and storing the secondary index in a database or a file.

(7) And establishing association between the data file and the index file, and storing the association to a database or a file.

(8) When the data is used, the associated information is inquired according to the data file name, and the index file name is obtained. And retrieving the secondary index of the data according to the track set number or the track number of the required data, and inquiring to obtain the data block number of the required data. And then, retrieving the primary index corresponding to the block, and inquiring the position offset of the required data in the file. And reading data at the corresponding position of the corresponding data block file according to the inquired data block number and the position offset. The above process can be operated in a multi-process environment, each process is independently executed, and the required data is parallelly acquired.

Example 1:

in a specific embodiment 1 to which the present invention is applied, as shown in fig. 1, a flow chart of a seismic mass data fast loading process of the present invention is shown; the seismic mass data rapid loading process comprises the following steps:

(1) And acquiring cluster environment parameters in the cluster environment with the MPI configured.

(2) And acquiring file information such as the size of the seismic data file and the size of single seismic channel data according to the file name.

(3) And calculating and storing the position offset of the start track and the end track of each data block in the whole data file according to the set parameters.

(4) The MPI job is started.

(5) The start and stop offsets for each partition are assigned to each process.

(6) Each process scans the data between the start offset and the end offset to obtain the relevant track header and gather information.

(7) And taking the information of each gather as a record, establishing a primary index, and storing the primary index in a database or a file. And outputting the data of each block to a disk, wherein each block is output as a file.

(8) And scanning the primary indexes of the blocks, and extracting the related information of the primary indexes.

(9) And taking the relevant information of each primary index as a record, establishing a secondary index, and storing the secondary index in a database or a file.

(10) And establishing association between the data file and the index file, and storing the association to a database or a file.

Example 2:

in a specific embodiment 2 to which the present invention is applied, as shown in fig. 2, it is a flowchart of a rapid using process of seismic mass data of the present invention; the rapid using process of the seismic mass data comprises the following steps:

(1) And acquiring demand information of the data.

(2) And inquiring the associated information according to the data file name to obtain the index file name.

(3) And retrieving the secondary index of the data according to the track set number or the track number of the required data, and inquiring to obtain the data block number of the required data.

(4) And searching the primary index corresponding to the block according to the number, and inquiring the position offset of the required data in the file.

(5) And reading data at the corresponding position of the corresponding data block file according to the inquired data block number and the position offset.

Example 3:

in the specific embodiment 3 to which the present invention is applied, seismic data of 1.2TB is selected, and the data is loaded by using the conventional serial method and the technique of the present patent, respectively. The conventional serial loading of this data takes about 7.6 hours. When the patent technology is used, the time is about 3.3 hours when 3 processes are adopted; when 6 runs were used, it took about 1.4 hours. The comparison shows that the loading efficiency of mass data is greatly improved by the patent technology.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Except for the technical features described in the specification, the method is known by the technical personnel in the field.

Claims

1. The rapid loading and using method for the seismic mass data is characterized by comprising the following steps:

step 1, acquiring cluster environment parameters and data file information;

step 2, calculating the position offset of the data blocks;

step 6, establishing the association of the data file index file;

and 7, reading data according to the index file.

2. The method for rapidly loading and using the seismic mass data according to claim 1, characterized in that in step 1, cluster environment parameters are obtained in a cluster environment configured with MPI, and the size of a seismic data file and the size of single seismic channel data are obtained according to file names.

3. The method for rapidly loading and using the seismic mass data according to claim 2, wherein the step 1 further comprises the steps of setting the number of processes for loading the data and the size of each data block.

4. The method for rapidly loading and using the seismic mass data according to claim 1, wherein in step 2, the position offset of the start track and the end track of each data block in the whole data file is calculated and stored according to the parameter setting.

5. The method for rapidly loading and using the seismic mass data according to claim 4, wherein in the step 2, the data blocks use single seismic channel data as a basic unit, and the single seismic channel is not allowed to be decomposed into two blocks; the size of the actual data chunk will not be exactly equal to the set chunk size.

6. The method for rapidly loading and using the seismic mass data according to claim 1, characterized in that in step 4, each process scans data from a start offset to an end offset to obtain related trace header and trace gather information, the information of each trace gather is used as a record, a primary index is established and stored in a database or a file; and outputting the data of each block to a disk, wherein each block is output as a file.

7. The method for rapidly loading and using the seismic mass data according to claim 1, wherein in step 5, the primary indexes of each block are scanned, the related information of each primary index is used as a record, a secondary index is established, and the record is stored in a database or a file.

8. The method for rapidly loading and using the seismic mass data according to claim 1, wherein in step 6, the association between the data file and the index file is established and stored in a database or a file.

9. The method for rapidly loading and using the seismic mass data according to claim 1, characterized in that in step 7, when the data is used, the associated information is inquired according to the data file name to obtain an index file name; retrieving a secondary index of the data according to the track set number or the track number of the required data, and inquiring to obtain the data block number of the required data; then, retrieving a primary index corresponding to the block, and inquiring the position offset of the required data in the file; and reading data at the corresponding position of the corresponding data block file according to the inquired data block number and the position offset.

10. The method for rapidly loading and using the seismic mass data according to claim 9, characterized in that in step 7, the process of reading data according to the index file is operated in a multi-process environment, each process is independently executed, and the required data is obtained in parallel.