CN109656890A - A kind of Large-scale parallel computing input and output implementation method safely and fast - Google Patents

A kind of Large-scale parallel computing input and output implementation method safely and fast Download PDF

Info

Publication number
CN109656890A
CN109656890A CN201811583516.8A CN201811583516A CN109656890A CN 109656890 A CN109656890 A CN 109656890A CN 201811583516 A CN201811583516 A CN 201811583516A CN 109656890 A CN109656890 A CN 109656890A
Authority
CN
China
Prior art keywords
breakpoint
output
fast
parallel computing
implementation method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811583516.8A
Other languages
Chinese (zh)
Inventor
陈德训
郭恒
徐金秀
李芳�
徐占
孙唯哲
范昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201811583516.8A priority Critical patent/CN109656890A/en
Publication of CN109656890A publication Critical patent/CN109656890A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of Large-scale parallel computing input and output implementation method safely and fast, and this method establishes double copies security catalog mechanism;Using the hierarchical directory structure management of breakpoint file;According to bottom-layer network bandwidth and the main nucleus number selection output factor of calculate node.The present invention solves the safety issue of ultra-large breakpoint file, solves the problems, such as the efficient output input of ultra-large breakpoint file, be it is a kind of it is safe and reliable, high-efficient write breakpoint document method, solve the reliable parallel computation problem of the field CFD wide scale security.

Description

A kind of Large-scale parallel computing input and output implementation method safely and fast
Technical field
The present invention relates to ultra-large parallel computation field, more particularly to a kind of Large-scale parallel computing safely and fast are defeated Enter to export implementation method.
Background technique
Currently, the ultra-large parallel computation in the field hydrodynamics (Computational Fluid Dynamics, CFD) Have the characteristics that process number up to ten thousand, the tens of thousands of steps of iterative steps, write that breakpoint document time is long, and writing breakpoint document method, there are safeties The problem of difference, low efficiency.The fundamental equation-of the gas-kinetic theory of each basin gas molecule transport phenomena is described Boltzmann equation can be used for describing aerospace craft to many fields such as micro- sightless isotopic separations, it is research packet Include a main basis of the gasdynamic problem including various Micro-flows.Gas based on Boltzmann model equation Body kinematics numerical computation method, amount of ram is big, and it is long to calculate the time.Distribution function is the array of a 6 DOF or 7 degree of freedom, is owned Calculating be based on distribution function, the output of various maroscopic quantities such as temperature, pressure, density etc. also relies on distribution function, so one Denier distribution function data, which are destroyed, will cause irremediable loss.The numerical computation method of Boltzmann model equation is current That takes writes breakpoint document method are as follows: creates a directory, the corresponding breakpoint file of all processes is all write under this catalogue. But the disadvantages of this method are: one, safety is poor;Two, it is limited by file system bottom management, directory listing is very slow;Three, do not have There is consideration to calculate knot-net situation, I/O efficiency is low.Wherein, breakpoint file refers to: ultra-large parallel computation is general Time is all longer, and in order to can be carried out long-time numerical behavior, some data in calculating process need to be retained with document form to be saved, to prevent It breaks down in calculating process;Once operation stops, the data in these files need to be re-read, restore to calculate.
Summary of the invention
It is an object of the invention to the Large-scale parallel computing input and output implementation methods by one kind safely and fast, to solve The problem of certainly background section above is mentioned.
To achieve this purpose, the present invention adopts the following technical scheme:
A kind of Large-scale parallel computing input and output implementation method safely and fast, this method comprises: one, establish double copies Security catalog mechanism;Two, using the hierarchical directory structure management of breakpoint file.
Particularly, described to establish double copies security catalog mechanism, it specifically includes: establishing two breakpoint file directorys, iteration Two breakpoint file directory intersections write data in turn in the process.
Particularly, the hierarchical directory structure management using breakpoint file, specifically includes: by breakpoint as unit of 1024 Document classification, catalogue number are as follows: Num=(total process number -1)/1024+1.
Particularly, the Large-scale parallel computing input and output implementation method safely and fast further include: according to including but Be not limited to output data total amount, the network bandwidth capabilities of proxy service node, calculate node network bandwidth capabilities selection output because Son.
Large-scale parallel computing input and output implementation method safely and fast proposed by the present invention solves ultra-large disconnected The safety issue of dot file solves the problems, such as the efficient output input of ultra-large breakpoint file, is a kind of safe and reliable, effect Rate it is high write breakpoint document method, solve the reliable parallel computation problem of the field CFD wide scale security.
Detailed description of the invention
Fig. 1 is that the Large-scale parallel computing input and output implementation method principle safely and fast that case study on implementation of the present invention supplies is shown It is intended to;
Fig. 2 is the restart0 gradation directory schematic diagram that case study on implementation of the present invention supplies.
Specific embodiment
The invention will be further described with case study on implementation with reference to the accompanying drawing.It is understood that described herein Specific implementation case is used only for explaining the present invention rather than limiting the invention.It also should be noted that for the ease of Description, only some but not all contents related to the present invention are shown in the drawings, unless otherwise defined, institute used herein There is technical and scientific term to have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.Made herein Term is intended merely to describe specific case study on implementation, it is not intended that in the limitation present invention.
It please refers to shown in Fig. 1, Fig. 1 is the Large-scale parallel computing input and output safely and fast that case study on implementation of the present invention supplies Implementation method schematic illustration.
Large-scale parallel computing input and output implementation method 100 in the implementation case safely and fast specifically includes: one, building Vertical double copies security catalog mechanism 101;Two, 102 are managed using the hierarchical directory structure of breakpoint file.
Specifically, establishing double copies security catalog mechanism 101 described in the implementation case, specifically include: establishing two Breakpoint file directory such as restart0, restart1, two breakpoint file directorys intersections write data in turn in iterative process, from And can to avoid during writing data be destroyed.
Up to ten thousand or tens of thousands of a breakpoint files are an immense pressure and pipe to file system basic-level support under a catalogue Reason is difficult, using the hierarchical directory structure management 102 of breakpoint file in the implementation case, specifically includes: as unit of 1024 By breakpoint document classification, catalogue number are as follows: Num=(total process number -1)/1024+1.Such as: restart0 gradation directory such as Fig. 2 It is shown.
Large-scale parallel computing input and output implementation method described in the implementation case safely and fast further include: according to Bottom-layer network bandwidth and the main nucleus number selection output factor 103 of calculate node.
Program is realized as follows: (formula translation, myrank: process number, NGROP: the output factor)
According to including but not limited to output data total amount, the network bandwidth capabilities of proxy service node, calculate node network The selections such as bandwidth ability export the factor, specifically include: setting MmaxGB/s indicates the network bandwidth capabilities of proxy service node, Mp MB/s indicates the network bandwidth capabilities of calculate node, and 4 processes in the same calculate node share a network path.If one The process number of a operation is PROCS, and an output data total amount is OUTmax, then it is as follows to export factor NGROP selection:
1. working as each process output quantity: OUTmax/PROCS≤Mp
NGROP=2 or 4
2. working as each process output quantity: OUTmax/PROCS>Mp, then NGROP need to meet following condition:
(OUTmax/PROCS)*(PROCS/NGROP)≤Mmax
The present invention is a kind of safe and reliable, machine feature of combining closely output method that rapidly inputs, and solves gas motion Parallel computation problem extensive by numerical computation method, prolonged.Technical solution of the present invention solves ultra-large breakpoint The safety issue of file solves the problems, such as the efficient output input of ultra-large breakpoint file, is a kind of safe and reliable, efficiency High writes breakpoint document method, solves the reliable parallel computation problem of the field CFD wide scale security.
It is partially that can pass through calculating those of ordinary skill in the art will appreciate that realizing the whole in above-mentioned case study on implementation Machine program is completed to specify relevant hardware, and the program can be stored in computer-readable storage medium, the program When being executed, it may include the process of above-mentioned each method case study on implementation.Wherein, the storage medium can for data disks battle array, CD, Read-only memory or random access memory etc..
Note that above are only preferable case study on implementation and institute's application technology principle of the invention.Those skilled in the art can manage Solution, the present invention is not limited to particular implementation cases described here, are able to carry out for a person skilled in the art various apparent Variation is readjusted and is substituted without departing from protection scope of the present invention.Therefore, although by the above case study on implementation to this hair It is bright to be described in further detail, but the present invention is not limited only to the above case study on implementation, is not departing from present inventive concept In the case of, it can also include other more equivalence enforcement cases, and the scope of the invention is determined by the scope of the appended claims.

Claims (4)

1. a kind of Large-scale parallel computing input and output implementation method safely and fast characterized by comprising one, foundation pair Backup security catalog mechanism;Two, using the hierarchical directory structure management of breakpoint file.
2. Large-scale parallel computing input and output implementation method safely and fast according to claim 1, which is characterized in that It is described to establish double copies security catalog mechanism, it specifically includes: establishing two breakpoint file directorys, two breakpoint texts in iterative process Part catalogue intersection writes data in turn.
3. Large-scale parallel computing input and output implementation method safely and fast according to claim 1, which is characterized in that The hierarchical directory structure management using breakpoint file, specifically includes: by breakpoint document classification, catalogue as unit of 1024 Number are as follows: Num=(total process number -1)/1024+1.
4. special according to claim 1 to the Large-scale parallel computing input and output implementation method described in one of 3 safely and fast Sign is, further includes: is saved according to including but not limited to output data total amount, the network bandwidth capabilities of proxy service node, calculating The spot net bandwidth ability selection output factor.
CN201811583516.8A 2018-12-24 2018-12-24 A kind of Large-scale parallel computing input and output implementation method safely and fast Pending CN109656890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811583516.8A CN109656890A (en) 2018-12-24 2018-12-24 A kind of Large-scale parallel computing input and output implementation method safely and fast

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811583516.8A CN109656890A (en) 2018-12-24 2018-12-24 A kind of Large-scale parallel computing input and output implementation method safely and fast

Publications (1)

Publication Number Publication Date
CN109656890A true CN109656890A (en) 2019-04-19

Family

ID=66116333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811583516.8A Pending CN109656890A (en) 2018-12-24 2018-12-24 A kind of Large-scale parallel computing input and output implementation method safely and fast

Country Status (1)

Country Link
CN (1) CN109656890A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127879A (en) * 2007-10-10 2008-02-20 张福泉 Stream media VoD system for intelligent access data
CN103514063A (en) * 2012-06-21 2014-01-15 浙江大华技术股份有限公司 Method and device for processing FLASH data
US20150332294A1 (en) * 2014-05-19 2015-11-19 The Board Of Trustees Of The Leland Stanford Junior University Method and system for profiling and scheduling of thermal residential energy use for demand-side management programs
CN107357885A (en) * 2017-06-30 2017-11-17 北京奇虎科技有限公司 Method for writing data and device, electronic equipment, computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127879A (en) * 2007-10-10 2008-02-20 张福泉 Stream media VoD system for intelligent access data
CN103514063A (en) * 2012-06-21 2014-01-15 浙江大华技术股份有限公司 Method and device for processing FLASH data
US20150332294A1 (en) * 2014-05-19 2015-11-19 The Board Of Trustees Of The Leland Stanford Junior University Method and system for profiling and scheduling of thermal residential energy use for demand-side management programs
CN107357885A (en) * 2017-06-30 2017-11-17 北京奇虎科技有限公司 Method for writing data and device, electronic equipment, computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴凯迪: "基于分布功能协同的无人机仿真技术研究", 《工程科技Ⅱ辑》 *

Similar Documents

Publication Publication Date Title
Li et al. Stability and performance of control systems with actuator saturation
Mao et al. New results on stability of switched continuous-time systems with all subsystems unstable
CN102521712A (en) Process instance data processing method and device
CN103049508B (en) A kind of data processing method and device
Zhang et al. Existence of solutions to fractional Hamiltonian systems with combined nonlinearities
Kolmes et al. Recovering Gardner restacking with purely diffusive operations
Anabtawi Practical stability of nonlinear stochastic hybrid parabolic systems of Itô-type: Vector Lyapunov functions approach
JP2015518587A (en) Computer and computer control method
CN109656890A (en) A kind of Large-scale parallel computing input and output implementation method safely and fast
Karakus et al. Redundancy techniques for straggler mitigation in distributed optimization and learning
BRPI0920319B1 (en) method for accessing magnitude data from the smart grid services database and system and device for it
CN111966644A (en) Supercomputer data storage method, device, system and storage medium
Podobnik et al. Systemic risk in dynamical networks with stochastic failure criterion
Dzyuba et al. Application of sector modeling technology for giant reservoir simulations
Kolokoltsov On extensions of mollified Boltzmann and Smoluchowski equations to particle systems with a k-ary interaction
Pandagale et al. Hadoop-HBase for finding association rules using Apriori MapReduce algorithm
Montagna et al. Hubs and resilience: towards more realistic models of the interbank markets
CN106156065B (en) A kind of file persistence method, delet method and relevant apparatus
CN110297818A (en) Construct the method and device of data warehouse
Crumpton Robust Fuzzy Timestep Selector for a Fully Implicit Reservoir Simulator
Kalimoldayev et al. Solving mean-shift clustering using MapReduce Hadoop
Serfaty Large systems with Coulomb interactions: variational study and statistical mechanics
US20090287712A1 (en) Configurable Persistent Storage on a Computer System Using a Database
Saad et al. Front tracking for two‐phase flow in reservoir simulation by adaptive mesh
CN117112528B (en) Method and system for optimizing data storage in Filecoin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190419

RJ01 Rejection of invention patent application after publication