CN106250380B - The customized method of partition of Hadoop file system data - Google Patents

The customized method of partition of Hadoop file system data Download PDF

Info

Publication number
CN106250380B
CN106250380B CN201510320303.6A CN201510320303A CN106250380B CN 106250380 B CN106250380 B CN 106250380B CN 201510320303 A CN201510320303 A CN 201510320303A CN 106250380 B CN106250380 B CN 106250380B
Authority
CN
China
Prior art keywords
data
input data
block
file system
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510320303.6A
Other languages
Chinese (zh)
Other versions
CN106250380A (en
Inventor
亢永敢
赵改善
杨祥森
孙成龙
许自龙
段文超
杨文广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Original Assignee
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Petroleum and Chemical Corp, Sinopec Geophysical Research Institute filed Critical China Petroleum and Chemical Corp
Priority to CN201510320303.6A priority Critical patent/CN106250380B/en
Publication of CN106250380A publication Critical patent/CN106250380A/en
Application granted granted Critical
Publication of CN106250380B publication Critical patent/CN106250380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1858Parallel file systems, i.e. file systems supporting multiple processors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Propose a kind of customized method of partition of Hadoop file system data, comprising: be ranked up to input data;According to pre-set deblocking parameter, piecemeal is carried out to the input data after sequence, to obtain data block, wherein carrying out piecemeal to the input data after sequence includes: that initial position in input data by each data block after sequence and final position are recorded in blocking information corresponding with each data block;And it is based on the blocking information, corresponding data block is read, from the input data after sequence to carry out parallel processing.

Description

The customized method of partition of Hadoop file system data
Technical field
The invention belongs to parallel file system data management domains in computer field, and in particular to a kind of Hadoop file The customized method of partition of system data.
Background technique
Hadoop distributed file system HDFS (Hadoop Distributed File System) is Google file The open source version of system GFS (Google File System), is the distributed file system of an Error Tolerance, is suitble to deployment In cheap large-scale machines.HDFS is capable of providing the data access of high-throughput, and big file is supported to store, and is very suitable to big Application on scale data collection.HDFS is the sub-project of Hadoop, provides the expansible of high-throughput for Hadoop upper layer application Big file storage service, be Hadoop cloud calculate basis.
Fig. 1 is the structural schematic diagram of HDFS in the prior art, and the basic structure of HDFS uses master slave mode, a HDFS collection Group includes a namenode, it is the primary server of a management file name space and adjusting client access file, when So there are also some back end, one machine of a usually node, it manages the storage of corresponding node.HDFS opening File name space simultaneously allows user data to store with document form.
The internal mechanism of HDFS is by a file division into one or more blocks, these blocks are stored in one group of data section Point in.Namenode is used to the file or directory operation of operation file NameSpace, such as opens, and closes, renaming etc..It is same When determine the mapping of block and back end.Back end is responsible for the read-write requests from file system client.Back end is same When also want the creation of perfoming block, delete, and the block duplicate instructions from namenode.
The design of HDFS is for supporting big file.The program operated on HDFS is also for handling large data sets 's.These programs only write a data, one or many read data requests, and these read operations and are required to meet stream transmission speed Degree.HDFS supports the write multiple times operation of file.Typical block size is 64MB in HDFS, and HDFS file can be by It is cut into the block of multiple 64MB sizes, this fixed block mode limits the application field of Hadoop, such as in prestack seismic data In preceding migration processing, a data input needs repeatedly different partitioned mode processing, and the fixed data partitioned mode of HDFS can not It meets the requirements.
Summary of the invention
The present invention proposes a kind of descriptive piecemeal side of self-defining data on the basis of HDFS fixed data piecemeal Method realizes customized, the descriptive piecemeal of data in HDFS file system, solves HDFS in deblocking access mode Solid data fixed block mode is taken, the problem of changeable data access requires is not adapted to, improves HDFS data file The versatility and flexibility of access.
One aspect of the present invention proposes a kind of customized method of partition of Hadoop file system data, comprising: to input number According to being ranked up;According to pre-set deblocking parameter, piecemeal is carried out to the input data after sequence, to obtain data Block, wherein including: the starting in the input data by each data block after sequence to the input data progress piecemeal after sequence Position and final position are recorded in blocking information corresponding with each data block;And it is based on the blocking information, from row Corresponding data block is read in input data after sequence, to carry out parallel processing.
According to another embodiment of the present invention, a kind of customized blocking devices of Hadoop file system data are proposed, are wrapped It includes: the component for being ranked up to input data;For according to pre-set deblocking parameter, to the input after sequence Data carry out piecemeal, to obtain the component of data block, wherein carrying out piecemeal to the input data after sequence includes: by each data Initial position and final position of the block in the input data after sequence are recorded in blocking information corresponding with each data block In;And for being based on the blocking information, corresponding data block is read, from the input data after sequence to be located parallel The component of reason.
Each aspect of the present invention improves HDFS file access method, improve HDFS data file access versatility and Flexibility provides more efficient file storage service for the popularization and application of Hadoop technology.
Detailed description of the invention
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its Its purpose, feature and advantage will be apparent, wherein in disclosure illustrative embodiments, identical reference label Typically represent same parts.
Fig. 1 shows the structural schematic diagram of HDFS in the prior art.
Fig. 2 shows a kind of customized method of partition of Hadoop file system data according to an embodiment of the invention Flow chart.
Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here Formula is limited.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be by the disclosure Range is completely communicated to those skilled in the art.
Fig. 2 shows the streams of the customized method of partition of Hadoop file system data according to an embodiment of the invention Cheng Tu, in this embodiment, this method comprises:
Step 201, input data is ranked up;
Step 202, according to pre-set deblocking parameter, piecemeal is carried out to the input data after sequence, to obtain Data block, wherein including: in the input data by each data block after sequence to the input data progress piecemeal after sequence Initial position and final position are recorded in blocking information corresponding with each data block;
Step 203, it is based on the blocking information, corresponding data block is read from the input data after sequence, to carry out Parallel processing.
The present embodiment, which is used, carries out piecemeal simultaneously to the input data after sequence according to pre-set deblocking parameter Blocking information is recorded, further according to the mode of blocking information read block, this is a kind of different from traditional entity fixed block The descriptive customized partitioned mode of mode, which solve HDFS to take fixed point of solid data in deblocking access mode Block mode does not adapt to the problem of changeable data access requires, improves the versatility and flexibly of HDFS data file access Property, extend the application range of Hadoop cloud calculating.
Data sorting
It is to provide the input data of regularization for subsequent descriptive piecemeal to the purpose that input data is ranked up, guarantees The continuity of deblocking, and the blocking information in following blocks processing is made to can simplify initial position and end for data Location information keeps parallel processing more efficient.
It will be understood by those skilled in the art that can arbitrarily formulate as needed the principle that input data is ranked up.One In a example, can be according to the parallel processing to be executed the characteristics of, is (for example, the processing sequence of input data is wanted in parallel processing Ask), input data is ranked up.
In one example, classification processing first can be carried out to input data before sorting, so that with same alike result Data concentrate in together, then are ranked up.By this processing, sorting out by attribute and orderly for data can be further realized Change.
In one example, the input data in the present embodiment, which can be, is consolidated solid data based on Hadoop file system Determine the data stored after piecemeal.That is, this implementation can be built upon secondary point on original fixed block basis Block.However it will be understood by those skilled in the art that the present embodiment can also be used for replacing the fixed block in Hadoop file system.
Deblocking
The present embodiment carries out piecemeal to the input data after sequence according to pre-set deblocking parameter, realizes certainly Define piecemeal.In addition, the present embodiment is by by initial position of each data block in input data after sequence and stop bit It sets and is recorded in blocking information corresponding with each data block, to realize descriptive piecemeal.By such customized, description Property piecemeal processing after, the corresponding blocking information of each data block, and solid data remains unchanged, thus keeping solid data In the case that storage mode is constant, any customized piecemeal can be carried out to data at any time according to the demand of parallel processing and handled.
In one example, piecemeal parameter is parameter required for carrying out piecemeal, can arbitrarily be set as needed by user It is fixed, to meet a variety of partitioned modes of user's needs.It is that the present embodiment can be realized and " make by oneself that piecemeal parameter, which can arbitrarily be set, The embodiment of adopted piecemeal ".
Parallel processing
In the present embodiment, it is based on the blocking information, corresponding data block is read from the input data after sequence, with Parallel processing is carried out, this mode is not limited by solid data piecemeal storage, can be carried out in real time in operation treatment process Data block access and processing.
In one example, it before parallel processing, can also further comprise: be opened according to the number of obtained data block Dynamic parallel processing element a, wherein parallel processing element can be started for each data block.After starting parallel processing element, Using parallel processing element be based on the blocking information, corresponding data block is read from the input data after sequence, with into Row parallel processing.
Data regularization
In one example, the step of method of the present embodiment can also further comprise data regularization, to parallel processing As a result reduction process is carried out.Reduction process can be carried out respectively for the parallel processing of each completion, it can also be in all parallel processings After the completion, then for the processing result of each parallel processing reduction process is carried out.
Reduction principle can be determined according to piecemeal principle, for example, if the input data that piecemeal principle is same attribute is divided into One data block, then reduction principle can will be combined into a data result through reduction group for the output result of each data block. After completing reduction process, processing result is exported.
It will be understood by those skilled in the art that it is not necessary that under carrying out the application scenarios of reduction operation, the present embodiment can not also be wrapped Containing the reduction operation in the example.
Using example
Hereinafter, one for providing the embodiment of the present invention applies example for using seismic channel set data as input data.This It should be understood to the one skilled in the art that this is intended merely to better understand the present invention using example, any details is not intended to this in field Invention is limited.
Data sorting
Common seismic channel set data may include common-shot-gather data and common point (CMP) trace gather data, common-source point Trace gather refers to the data acquired per same big gun is belonged to together in trace gather, and concentrically trace gather data refer to counting in trace gather per one According to inspection point center it is identical.
Before carrying out parallel processing to seismic channel set data, sequence processing according to an embodiment of the present invention can be carried out, this It outside, can also further progress classification processing before sequence processing.In this application example, classification can be by taking out trace gather come real It is existing, i.e., trace gather data (such as common-shot-gather data or common point (CMP) trace gather data) are generated by taking out trace gather operation Required trace gather form.For example, under the application scenarios for the piecemeal processing for mainly needing to solve the problems, such as common offset data, institute The trace gather form needed can be common offset trace gather.Common offset trace gather data refer to the offset distance (shot point of each track data in trace gather To the distance of geophone station) it is identical, therefore the identical trace gather data of offset distance can be referred to together, form common offset trace gather Data.
Trace gather data after classification can be ranked up, for obtaining common offset trace gather after sorting out, sequencer procedure It can be and common offset trace gather data are ranked up according to the size of offset distance value.
Deblocking
Piecemeal can be carried out to the seismic channel set data after sequence according to pre-set deblocking parameter.In the application In example, still by taking the piecemeal of common offset data as an example, deblocking parameter can include but is not limited to one in following parameter It is a or multiple: the max number of channels etc. in minimum offset values, maximum offset value, offset distance class interval and each data block. These parameters can be provided by user, to determine partitioned mode.By start bit of each data block in the input data after sequence Set and be recorded in blocking information corresponding with each data block with final position, thus realize to seismic channel set data from Definition, descriptive data piecemeal.
It should be noted that purpose for ease of description, this application example is described by taking " common offset " principle as an example, However it will be understood by those skilled in the art that deblocking and the principle of processing are not limited to " common offset ", can be according to number According to processing any principle actually required, such as in the processing of big gun domain, need to carry out big gun collection piecemeal to data.At seismic data Reason field, main piecemeal principle may include offset distance piecemeal, CMP trace gather piecemeal, big gun collection piecemeal etc..
Parallel processing
Can be based on the blocking information, read corresponding data block from the input seismic channel set data after sequence, with into Row parallel processing.The former processing mode of Hadoop file system is fixed block mode, i.e., input data is stored in file system When, multi-block data storage is had been separated into, this piecemeal storage is just immobilized after input data importing.And in earthquake number According in treatment process, especially in migration before stack treatment process, the piecemeal of data is required to be to change, each run program, User may need different deblocking modes to handle.It is applied in example at this, utilizes making by oneself for the embodiment of the present invention The fixed block storage mode that adopted, descriptive block parallel processing mode solves Hadoop cannot adapt to seismic data process Problem realizes in real time according to user's definition come any carry out deblocking during seismic data process, meets ground Shake the specific demand of data processing.
Data regularization.
This can also include reduction process using example, and reduction process can be different and different according to piecemeal principle, still with altogether For offset distance processing, reduction mode can be determined according to the packet mode of offset distance, for example, same offset distance group (i.e. same number According to block) data can be overlapped reduction process, i.e., will for multiple parallel processings of same offset distance group result it is corresponding Value is added, a raw result;The data of different offset distance groups can be combined reduction process, i.e., will be directed to different offset distance groups The result of each parallel processing is combined.And for big gun domain processing routine, last reduction process can be overlapped reduction. After completing reduction process, processing result is exported.
According to another embodiment of the present invention, a kind of customized blocking devices of Hadoop file system data are proposed, are wrapped It includes: the component for being ranked up to input data;For according to pre-set deblocking parameter, to the input after sequence Data carry out piecemeal, to obtain the component of data block, wherein carrying out piecemeal to the input data after sequence includes: by each data Initial position and final position of the block in the input data after sequence are recorded in blocking information corresponding with each data block In;And for being based on the blocking information, corresponding data block is read, from the input data after sequence to be located parallel The component of reason.
In one example, which may also include that for before being ranked up to input data, to input data into Row classification processing, so that the component that the data with same alike result concentrate in together.
In one example, can be will be after solid data fixed block based on Hadoop file system for the input data The data of storage.
In one example, which may also include that for starting parallel processing according to the number of obtained data block The component of unit a, wherein parallel processing element can be started for each data block;And for utilizing the parallel place started It manages unit and is based on the blocking information, read corresponding data block, from the input data after sequence to carry out parallel processing Component.
In one example, which may also include that the portion that reduction process is carried out for the processing result to parallel processing Part.
In one example, input data can be seismic channel set data.
In one example, be ranked up to input data may include: to return the identical seismic channel set data of offset distance Class is to together, to form common offset trace gather data;And common offset trace gather data are carried out according to the size of offset distance value Sequence.
In one example, according to pre-set deblocking parameter, carrying out piecemeal to the input data after sequence can To include: to carry out piecemeal to the common offset trace gather data after sequence according to one or more of following data piecemeal parameter: Max number of channels in minimum offset values, maximum offset value, offset distance class interval and each data block.
In one example, which may also include the result pair of multiple parallel processings for that will be directed to same data block Addition should be worth, to realize the component of reduction process.
In one example, which may also include the combination of the result for that will be directed to each parallel processing of different data block Together, to realize the component of reduction process.
The embodiment of the present invention proposes a kind of customized, descriptive data piecemeal mechanism, realizes in HDFS file system Customized, the descriptive piecemeals of data.It is right using data specifying-information on the basis of HDFS fixed data piecemeal mechanism Data carry out customized piecemeal, in the case where not changing solid data storage, realize any piecemeal of data.Realize number According to flexible piecemeal, extend the data management function and application field of HDFS file system.
In seismic data process application scenarios, by applying customized, the descriptive piecemeal mechanism of the embodiment of the present invention, The Hadoop parallel processing that earthquake migration before stack processing technique may be implemented, improves mass data processing ability.
The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (8)

1. a kind of customized method of partition of Hadoop file system data, comprising:
Input data is ranked up;
According to pre-set deblocking parameter, piecemeal is carried out to the input data after sequence, to obtain data block, wherein right Input data progress piecemeal after sequence includes: initial position and termination in the input data by each data block after sequence Position is recorded in blocking information corresponding with each data block;And
Based on the blocking information, corresponding data block is read from the input data after sequence, to carry out parallel processing;
Wherein, the input data is the data that will be stored after solid data fixed block based on Hadoop file system;
It is described according to pre-set deblocking parameter, carrying out piecemeal to the input data after sequence includes:
According to one or more of following data piecemeal parameter, piecemeal is carried out to the common offset trace gather data after sequence: most Max number of channels in small deviant, maximum offset value, offset distance class interval and each data block.
2. the customized method of partition of Hadoop file system data according to claim 1, further includes:
Before being ranked up to input data, classification processing is carried out to input data, so that the data set with same alike result In together.
3. the customized method of partition of Hadoop file system data according to claim 1, further includes:
Start parallel processing element according to the number of obtained data block, wherein one can be started for each data block parallel Processing unit;And
It utilizes started parallel processing element to be based on the blocking information, corresponding number is read from the input data after sequence According to block, to carry out parallel processing.
4. the customized method of partition of Hadoop file system data according to claim 1, further includes:
Reduction process is carried out to the processing result of parallel processing.
5. the customized method of partition of Hadoop file system data according to claim 1, wherein
The input data is seismic channel set data.
6. the customized method of partition of Hadoop file system data according to claim 5, wherein being carried out to input data Sequence includes:
The identical seismic channel set data of offset distance are referred to together, to form common offset trace gather data;And
Common offset trace gather data are ranked up according to the size of offset distance value.
7. the customized method of partition of Hadoop file system data according to claim 5, wherein same data will be directed to The result respective value of multiple parallel processings of block is added, to carry out reduction process.
8. the customized method of partition of Hadoop file system data according to claim 5, wherein different data will be directed to The result of each parallel processing of block is combined, to carry out reduction process.
CN201510320303.6A 2015-06-12 2015-06-12 The customized method of partition of Hadoop file system data Active CN106250380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510320303.6A CN106250380B (en) 2015-06-12 2015-06-12 The customized method of partition of Hadoop file system data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510320303.6A CN106250380B (en) 2015-06-12 2015-06-12 The customized method of partition of Hadoop file system data

Publications (2)

Publication Number Publication Date
CN106250380A CN106250380A (en) 2016-12-21
CN106250380B true CN106250380B (en) 2019-08-23

Family

ID=57626402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510320303.6A Active CN106250380B (en) 2015-06-12 2015-06-12 The customized method of partition of Hadoop file system data

Country Status (1)

Country Link
CN (1) CN106250380B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109655911A (en) * 2017-10-11 2019-04-19 中国石油化工股份有限公司 Seismic data visualization system and method based on WebService
CN110954941B (en) * 2018-09-26 2021-08-24 中国石油化工股份有限公司 Automatic first arrival picking method and system
CN114463962A (en) * 2020-10-21 2022-05-10 中国石油化工股份有限公司 Intelligent node data acquisition method, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231155A (en) * 2011-06-03 2011-11-02 中国石油集团川庆钻探工程有限公司地球物理勘探公司 Method for managing and organizing three-dimensional seismic data
CN102508902A (en) * 2011-11-08 2012-06-20 西安电子科技大学 Block size variable data blocking method for cloud storage system
CN103428494A (en) * 2013-08-01 2013-12-04 浙江大学 Image sequence coding and recovering method based on cloud computing platform
WO2014209375A1 (en) * 2013-06-28 2014-12-31 Landmark Graphics Corporation Smart grouping of seismic data in inventory trees

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231155A (en) * 2011-06-03 2011-11-02 中国石油集团川庆钻探工程有限公司地球物理勘探公司 Method for managing and organizing three-dimensional seismic data
CN102508902A (en) * 2011-11-08 2012-06-20 西安电子科技大学 Block size variable data blocking method for cloud storage system
WO2014209375A1 (en) * 2013-06-28 2014-12-31 Landmark Graphics Corporation Smart grouping of seismic data in inventory trees
CN103428494A (en) * 2013-08-01 2013-12-04 浙江大学 Image sequence coding and recovering method based on cloud computing platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Hadoop的地震数据分布式存储策略的研究;冯翔;《中国优秀硕士学位论文全文数据库信息科技辑》;20150215;第2-5章

Also Published As

Publication number Publication date
CN106250380A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106250987B (en) A kind of machine learning method, device and big data platform
US10263879B2 (en) I/O stack modeling for problem diagnosis and optimization
CN105446896B (en) The buffer memory management method and device of MapReduce application
US10970322B2 (en) Training an artificial intelligence to generate an answer to a query based on an answer table pattern
US10373071B2 (en) Automated intelligent data navigation and prediction tool
CN103177329B (en) Rule-based determination and checking in business object processing
CN106250380B (en) The customized method of partition of Hadoop file system data
US10223435B2 (en) Data transfer between multiple databases
CN105988911B (en) Trust chain is established in system log
US11321318B2 (en) Dynamic access paths
US10394788B2 (en) Schema-free in-graph indexing
JP2022524006A (en) Development and training of deep forest models
CN110019111A (en) Data processing method, device, storage medium and processor
CN106250101A (en) Migration before stack method for parallel processing based on MapReduce and device
CN109408601B (en) Data model conversion method based on graph data and graph data structure converter
US10685294B2 (en) Hardware device based software selection
CN110069453A (en) Operation/maintenance data treating method and apparatus
CN106570572B (en) Travel time calculation method and device based on MapReduce
CN104572921B (en) A kind of method of data synchronization and device across data center
CN109582476A (en) Data processing method, apparatus and system
US20200387507A1 (en) Optimization of database execution planning
CN103176843B (en) The file migration method and apparatus of MapReduce distributed system
JP2023088289A (en) Computer-implemented method for building decision tree in machine learning, computer program product including computer-readable storage medium implemented therein with program instructions, and system (for performance improvement of classification and regression trees through dimensionality reduction)
CN109101641A (en) Form processing method, device, system and medium
CN110287977A (en) Content clustering method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant