WO2007072566A1 - Recuperateur de metadonnees - Google Patents

Recuperateur de metadonnees Download PDF

Info

Publication number
WO2007072566A1
WO2007072566A1 PCT/JP2005/023649 JP2005023649W WO2007072566A1 WO 2007072566 A1 WO2007072566 A1 WO 2007072566A1 JP 2005023649 W JP2005023649 W JP 2005023649W WO 2007072566 A1 WO2007072566 A1 WO 2007072566A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
metadata
processing
data
data file
Prior art date
Application number
PCT/JP2005/023649
Other languages
English (en)
Japanese (ja)
Inventor
Hitoshi Uehara
Hideharu Sasaki
Yoshikazu Sasai
Original Assignee
Japan Agency For Marine-Earth Science And Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Agency For Marine-Earth Science And Technology filed Critical Japan Agency For Marine-Earth Science And Technology
Priority to PCT/JP2005/023649 priority Critical patent/WO2007072566A1/fr
Priority to JP2007550971A priority patent/JP4905989B2/ja
Publication of WO2007072566A1 publication Critical patent/WO2007072566A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3212Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to a job, e.g. communication, capture or filing of an image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3212Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to a job, e.g. communication, capture or filing of an image
    • H04N2201/3214Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to a job, e.g. communication, capture or filing of an image of a date
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3212Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to a job, e.g. communication, capture or filing of an image
    • H04N2201/3215Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to a job, e.g. communication, capture or filing of an image of a time or duration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3225Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3225Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document
    • H04N2201/3226Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document of identification information or the like, e.g. ID code, index, title, part of an image, reduced-size image
    • H04N2201/3228Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document of identification information or the like, e.g. ID code, index, title, part of an image, reduced-size image further additional information (metadata) being comprised in the identification information
    • H04N2201/3229Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document of identification information or the like, e.g. ID code, index, title, part of an image, reduced-size image further additional information (metadata) being comprised in the identification information further additional information (metadata) being comprised in the file name (including path, e.g. directory or folder names at one or more higher hierarchical levels)

Definitions

  • the present invention relates to a technology for processing a large amount of data files such as large-scale simulation data in the computer science field.
  • Simulation results performed in a large-scale simulation system such as a marine simulation system consist of a large amount of data files. It is rare for a large number of data files to be several tens of terabytes or more in size and 10,000 or more in number of files. In general, such a large amount of data files are not created in the same format, and often have a slightly different format depending on the content of the data.
  • calculation grids In addition, in numerical simulation, generally, the intersections of line segments as shown in FIG. 16A are defined as calculation grids, and processing (for example, calculation of physical quantities) on numerical data is advanced based on this calculation grid. Be Depending on the physical characteristics of the numerical data and the circumstances of the calculation formula, processing (calculation of physical quantities) for some numerical data is based on different calculation grids as shown in FIG. 16B. Is often done.
  • first calculation grid a first data file calculated based on the calculation grid (referred to as “first calculation grid”) shown in FIG. 16A
  • second calculation grid a calculation grid shown in FIG. 16B
  • the first data file as described above is calculated based on what calculation grid Data indicating the details of the data in the data file, whether it contains data or not, is referred to herein as "metadata”.
  • An object of the present invention is to provide a technology that eliminates the need for a user to specify metadata for processing target data.
  • the present invention adopts the following means in order to achieve the above object.
  • Metadata storage means storing metadata representing details of the data file, extraction means for extracting the keyword from the specified file identifier, and metadata corresponding to the extracted keyword are searched from the metadata storage means Search means to output
  • Is a metadata search device including
  • each data file is stored on a directory structure, and the storage location information indicates a file path.
  • the metadata storage means is provided on a storage area different from the storage area of the data file to which the file identifier is assigned.
  • the present invention also provides a metadata search having the same features as the above-described metadata search device. It can be specified as a search method, a program, or a recording medium on which the program is recorded. Effect of the invention
  • FIG. 1 is a view showing a configuration example of a simulation system to which the present invention can be applied.
  • FIG. 2 is a view showing an example of the configuration of a control computer shown in FIG. 1;
  • FIG. 3 A diagram showing an example of the configuration of the node shown in FIG.
  • FIG. 4 is a view showing an example of the directory structure of a file database storing the processing object data file shown in FIG.
  • FIG. 5 is a view showing an example of the data structure of the metadata table shown in FIG. 2;
  • FIG. 6 is a view showing an example of the data structure of a usage and load distribution situation table shown in FIG. 2;
  • FIG. 7 A diagram showing a display example of a user interface (designated screen) provided to a user of the system.
  • FIG. 8 is a diagram showing an example of description of a file of parallel processing designation information input using a user interface.
  • FIG. 9 A flowchart showing a main routine of creation processing of a parallel processing job script and a parallel processing program setting file.
  • FIG. 10 is a flowchart showing a main routine of processing for creating a parallel processing job script and a parallel processing program setting file.
  • FIG. 11 is a flowchart showing a main routine of processing for creating a parallel processing job script and a parallel processing program setting file.
  • FIG. 12 A flowchart showing a subroutine related to acquisition of analysis of metadata.
  • FIG. 13 is a flowchart showing a subroutine of search and determination processing of a node to be a placement destination of a processing target data file.
  • FIG. 14 is a diagram showing an example of description of a parallel processing program setting file.
  • FIG. 15 is a flowchart showing execution processing of a parallel processing program.
  • FIG. 16A is a diagram showing an example of a calculation grid prepared as metadata for processing target data.
  • FIG. 16B A diagram showing an example of a calculation grid different from the calculation grid of FIG. 16A, which is prepared as metadata for processing target data.
  • FIG. 1 is a view showing a configuration example of a simulation system to which the present invention can be applied.
  • the simulation system comprises a parallel computer group X and a control computer (information processing apparatus) Y connected to the parallel computer group X via a communication line (network).
  • the parallel computer group X is made up of a plurality of nodes # 0 to #n (n is a natural number) performing parallel processing on a large number of data files that constitute large-scale simulation data such as ocean general circulation model. .
  • Computer Y is a simulation data to be processed by parallel computer group X (processing It manages the target data), and performs control in the case of making the parallel computer group X execute parallel processing using simulation data according to the user's operation.
  • a user of the simulation system performs parallel processing to execute parallel processing of a large amount of processing target data (processing target data group) using parallel computer group X through a UI (user interface) provided by computer Y. Enter the specified information.
  • the parallel processing specification information includes a plurality of simulation data files (processing target data file group) to be subjected to parallel processing, and processing contents of parallel computer group Y for the processing target data file group (processing type, It can include specification of processing detail parameters), multiple nodes (number of nodes) performing parallel processing, and storage location of file (processed data file (processing result file)) generated as a result of parallel processing.
  • Computer Y is a parallel processing job script for giving control instructions related to parallel processing to parallel computer group X based on the input parallel processing specification information (control program for parallel computer group Y: It may be written as “script” and a parallel processing program setting file (hereinafter referred to as “setting file”) that is referred to when each node that executes parallel processing processes the data file to be processed. And automatically generate.
  • Computer Y generates a script during the process of generating a script, acquires metadata (detailed information of process target data) for each process target data file, and processes a process target data file group for parallel computer group X. Make placement decisions. Metadata and placement decision results will be reflected in the description content of the script.
  • Computer Y distributes (distributes) processing target data files to a plurality of nodes through script execution, and instructs these nodes to execute a parallel processing program (job). .
  • Each node executes the parallel processing program according to the description of the setting file, and performs processing on the distributed processing target data file based on the corresponding metadata.
  • a processing result file is created through the processing.
  • the processing result file is stored in the storage location specified as parallel processing specification information.
  • FIG. 2 is a diagram showing an example of the configuration of a computer.
  • the computer ⁇ is connected to the CPU 1, the main memory (MM: for example RAM) 2, the external memory through the bus ⁇ .
  • Memory eg hard disk
  • I / F input / output interface
  • communication interface 6 is provided.
  • the I / F 4 is connected to an input device (keyboard, pointing device (for example, a mouse), etc.) as an input means, and a display device (display) 8 as an output means is connected to the IZF 5. It is done. Furthermore, the communication IZF 6 is connected to each node # 0 to #n via a communication line (network).
  • the external storage device 3 includes a file database (file DB) 31 storing a large amount of simulation data files constituting large-scale simulation data, and metadata (simulation data corresponding to each data file).
  • Metadata table 32 storing detailed information and usage and load distribution situation table 33 (hereinafter referred to as “situation table 33”) of each node to be referred to when distributing the processing target data file group to a plurality of nodes. Notation) is stored.
  • File DB 31 and metadata table 32 are created on different storage areas.
  • the CPU 1 implements the following function, for example, by loading a program recorded in the external storage device 3 into the MM 2 and executing the program.
  • the CPU 1 corresponds to a receiving unit, an extracting unit, and a searching unit according to the present invention
  • the external storage device 3 corresponds to a metadata storage unit (storage unit) according to the present invention.
  • FIG. 3 is a diagram showing an example of the configuration of a node.
  • the node includes a CPU 11, a main memory 12, a calculation processor 13, an external storage device (for example, a hard disk) 14, and a communication interface (communication IZF) 15 mutually connected via a bus B 1.
  • Communication IZF 15 is connected to computer Y and other nodes via a network.
  • the node receives the processing target data file transferred from the computer Y by the communication I / F 15 and stores it in the external storage device 14.
  • the node also receives parallel processing instructions and configuration files from computer Y via communication I / F 15.
  • the calculation processor 13 is used for calculation using data to be processed.
  • the calculation processor 13 reads the processing target data file stored in the external storage device 14 onto the MM 12 and executes predetermined processing (for example, cutting out a predetermined area in the data file, calculation of physical quantity) using this. This predetermined process is performed based on the metadata.
  • a process result file is generated and stored in the external storage device 14.
  • the processing result file stored in the external storage device 14 is moved (transferred) to a predetermined storage position.
  • the CPU 11 corresponds to the receiving means, the extracting means and the searching means according to the present invention
  • the external storage device 14 corresponds to the metadata storing means according to the present invention.
  • the file DB 31 classifies and stores a large number of simulation data files (hereinafter sometimes referred to simply as “data file”) using a directory structure.
  • FIG. 4 is a diagram showing an example of the directory structure of the file DB 31. As shown in FIG. In the file DB 31, a directory tree starting from the root directory (the directory "data" in FIG. 4) is formed, and directories of each hierarchy are given predetermined directory names. Data files are stored in a directory located at the end of the directory tree, and given a specified data file name.
  • Data files are identified using file identifiers.
  • the file identifier is represented by a list of data file names and the names (path names) of each directory located on the path of the directory tree from the root directory to the end directory.
  • the file identifier of the data file having the data file name “timeXXX.0000.000.dat” in FIG. 4 is 7 data / experiment A / 3 D / statistics A / variable B / time XXX. 000.000.
  • the file The identifier contains storage location information (file path) of the data file.
  • the directory name (“3D”, “statsticsA”, “variableB”, etc.) and the data file name (“timeXXX.0000.000") in the file identifier indicate the details (properties, etc.) of the data in the data file. It is specified as a keyword to show. Keywords consist of any one or more characters, and are placed in at least one place in directory names and data file names. However, no keyword is set in the extension part of the file name. A keyword functions as a search key for searching metadata corresponding to processing target data.
  • the data files may be distributed and stored in a plurality of storage areas arranged inside or outside the computer Y which need not necessarily be stored in one storage area.
  • the metadata table 32 stores metadata corresponding to the keywords in the file identifier.
  • FIG. 5 is a view showing an example of the data structure of the metadata table 32. As shown in FIG.
  • the metadata table 32 is composed of a plurality of records storing a search key (keyword) and metadata corresponding thereto.
  • the keyword is used as a search key from the file identifier of the data file (data file to be processed) specified by the user. It is extracted.
  • Metadata is information indicating details (properties and attributes, etc.) of simulation data (data to be processed), for example, information indicating physical properties of data to be processed, statistical processing, and time-space (vertical , Height, and time (date and time)).
  • the information on the calculation grids shown in FIGS. 16A and 16B is information on space.
  • a keyword representing such information of the calculation grid for example, a variable name represented by an arbitrary number of characters is applied.
  • FIG. 5 shows a case where one force of the directory name included in the file identifier corresponds to one metadata.
  • one metadata may be retrieved from a combination of a plurality of keywords included in one file identifier.
  • some of directory names and data file names include a key code, and keywords may be extracted from file identifiers in partial match search.
  • FIG. 6 is a view showing an example of the data structure of the situation table 33.
  • the status table 33 is composed of a plurality of small tables 34 prepared for each node.
  • Each small table 34 has the same data structure.
  • the small table 34 contains identification information (user ID) of the user authorized to use the node, the maximum size of the external storage device (hard disk) of the node usable by the user, and the external currently used by the user. It consists of a set of records whose elements (items) are the storage capacity (load).
  • Each small table 34 is assigned a node identifier, and information corresponding to the node identifier is stored in the small table.
  • the CPU 1 provides the user of the computer Y with an input environment (UI) of parallel processing specification information through execution of a program.
  • UI input environment
  • the user uses the UI to process the processing target data file group (file identifier), the plurality of nodes for processing the processing target data file group, and the processing target data file group, which are elements (items) of parallel processing specification information.
  • Processing content processing type and detailed parameters
  • processing result The storage location of the file can be specified.
  • FIG. 7 is a diagram showing an example of a designation screen of parallel processing designation information provided as a UI.
  • the designated screen is displayed on the screen of the display device 8 through the execution of the program by the CPU 1.
  • the designation screen includes a file path display field 81, a file list display field 82, and a command input field 83.
  • the file path display field 81 displays the directory (file path) in the file DB 31 selected by the user using the input device 7.
  • the command input field 83 is used to input a command related to the process for the data file to be processed.
  • the user can operate the input device 7 to display a desired file path in the file path display field 81 (select a file path).
  • the display contents of the file list display field 82 are changed according to the selection result of the file path, and the file list corresponding to the file path is displayed in the display field 82.
  • the user designates the file identifier of the data file to be processed by designating the desired file name from the file list displayed in the file list display field 82 by the cursor operation using the input device 7. Can. At this time, it is possible to specify multiple data files at one time through cursor operation. Thus, the user can designate the file identifier of the data file to be processed using the file path display field 81 and the file list display field 82.
  • the user can use the command input field 83 to designate and input nodes (number of nodes) used for parallel processing, processing contents for processing target data file group, storage location of processing result file, etc. .
  • the parallel processing specification information is a parallel processing specification information file described in a predetermined format. Are stored in a predetermined position of the external storage device 3.
  • FIG. 8 is a diagram showing an example of description of a parallel processing specification information file.
  • the parallel process specification information file includes a specification line of computer resources, a specification line of process details (process content), and a specification line of a processing target data file and a storage position of a process result thereto.
  • Such a description (parallel processing specification information file) is automatically set by the CPU 1 by the user specifying the number of nodes, processing content, processing target data file group, and storage location using the UI. It is created.
  • FIG. 9, FIG. 10 and FIG. 11 are flowcharts showing an example of a main routine of script and setting file creation processing executed by the CPU 1 (FIG. 2).
  • the execution of the processing is started, for example, triggered by the completion of creation of the parallel processing specification information file or the input of a processing start instruction from the user.
  • CPU 1 When the process shown in FIG. 9 is started, CPU 1 first performs an initialization process (step SO).
  • the CPU 1 sends the parallel processing specification information file stored in the external storage device 3 (see FIG. 8).
  • CPU 1 executes analysis loop processing of parallel processing specification information.
  • CPU 1 takes one specified line from the parallel processing specification information file.
  • the extracted line is set as the analysis target line, and the analysis target line is analyzed.
  • the CPU 1 determines whether or not the analysis target line extracted from the parallel processing specification information file is a specification line of computer resources (step S 003).
  • the analysis target line is a designated line of computer resources (S 003; YES)
  • CPUl takes arguments (the number of nodes: “3” in the example of FIG. 8) in this analysis target line. It is determined as a computer resource parameter related to parallel processing, and is stored in a predetermined position (a predetermined work area on MM2) (step S004). Thereafter, the CPU 1 determines the next designated line as the analysis target line, and returns the process to step S003.
  • step S 003 If it is determined in step S 003 that the analysis target line is not a designated line of computer resources (SO 03; N)), CPU 1 determines whether the analysis target line is a designated line of processing details. Yes (Step S005).
  • the CPU 1 specifies the processing type specification and arguments in this analysis target line (specified processing parameters: example of FIG. 8) If so, "PROC-A" (procedure A) corresponds to the process type specification, and "120.0 150.0 20.0 50.0" corresponds to the process parameter), and this process type and arguments are processed parameters related to parallel processing And store it in a predetermined position (work area) (step S006). Thereafter, the CPU 1 determines the next designated line as a line to be analyzed, and returns the process to step S003.
  • step S 005 If it is determined in step S 005 that the analysis target line is not a designated line of process details (S 005; NO), CPU 1 determines that the analysis target line is a designated line of a processing target data file and a storage location, According to this judgment, the file identifier in the analysis target line and identification information of the storage position are taken out and stored in a predetermined position (work area) (S 007).
  • step S 008 CPU 1 outputs the header portion of the parallel processing job script.
  • the header is stored in advance at a predetermined position of the external storage device 3 as a fixed text.
  • the header contains a configuration file transfer instruction.
  • nodes to be used for parallel processing are determined based on the number of data files to be processed and the specified number of nodes used for parallel processing. Usage and load for each node # 0 to #n The situation is managed by, for example, the OS (Operating System) of the computer Y.
  • the number of processing target data files in the parallel processing specification information file and the number of nodes are transferred to the OS.
  • the OS extracts nodes that are permitted to be used by the user from nodes # 0 to #n, and takes into consideration the usage of multiple extracted nodes, the load status, and the number of files. Select the number of nodes. For example, from the extracted nodes, the nodes with the specified number of nodes are determined as the nodes to be used for parallel processing in the order of smaller load. The determined use and load status of each node is set as a small table 34 in the status table 33. As a result, the processing target data file group is processed in parallel by the nodes of the designated node number determined by ⁇ S.
  • small table 34 for all nodes # 0 to #n is stored in status table 33 (FIG. 6), and S refers to small table 34, and the number of specified nodes is in the order of small load.
  • a minute node may be selected, and the mask may be set (disallowed) in the small table 34 corresponding to the non-selected node.
  • the CPU 1 executes loop processing of analysis' processing of the processing target data file.
  • the loop processing is executed for each file identifier (processing target data file) obtained in step S 007.
  • CPU 1 first identifies one of the designated processing target data files (processing target data files having the file identifier obtained in step S 007) (referred to as analysis target files).
  • analysis target files processing target data files having the file identifier obtained in step S 007)
  • the CPU 1 starts a subroutine of metadata analysis processing of the analysis target file (step S 009),
  • FIG. 12 is a flowchart showing an example of a metadata analysis' acquisition subroutine.
  • the CPU 1 receives an input of data file designation (step S101). That is, the CPU 1 receives the file identifier of the analysis target file.
  • CPU 1 determines whether or not the file identifier has the correct format (step S102). At this time, if the file identifier does not have the correct format (S 102; NO), the processing for creating the script and the setting file ends, assuming that the processing is unsuccessful (NG). In this case, an error display process is performed to notify the user of an error. It is possible to do S.
  • the CPU 1 starts the keyword acquisition loop process. In the loop processing, first, the CPU 1 determines whether or not a keyword representing metadata is included in the file identifier (S 103).
  • the CPU 1 extracts the directory name next to the root directory in the file identifier, and this directory name and a list of keywords in the metadata table 32 (FIG. 5) (keywords stored in the metadata table 32) Match the group name) and search for keywords that match the extracted directory name.
  • the CPU 1 extracts the next directory name and collates with the keyword list. In this way, CPU 1 repeats the extraction process of the directory name or data file name as described above and the collation process with the keyword list until a directory name or data file name matching one of the keywords is found. .
  • Step S104 If the CPU 1 finds a keyword that matches the extracted directory name or data file name (S 103; YES), the CPU 1 interrupts the extraction process, and the metadata corresponding to the keyword is extracted from the metadata table 32. Take out and acquire (Step S104).
  • the file identifier 7 data / experiment A / 3D / statisticsA / variableB / timeXXX.
  • the CPU 1 When the CPU 1 acquires metadata from the metadata table 32, the CPU 1 resumes extraction of the directory name or data file name and collation with the keyword list for the file identifier.
  • the directory name "statisticsA" next to the directory name "3D" as a keyword the corresponding metadata "metal” is acquired from the metadata table 32 power.
  • the computer Y automatically uses the property information (keyword) included in the file identifier for the metadata corresponding to the processing target data. Identify (acquire)
  • step S010 CPU 1 analyzes the metadata in a row, and in parallel processing of the processing target data file (analysis target file) targeted by loop processing, this analysis is not performed on the analysis target file alone. Determine whether data related to the target file (related data file) is necessary.
  • the file identifier is an X component and a Y component in the directory name or data file name.
  • z component can include a character or a character string indicating component information indicating whether it is offset or not. It is created by routinely changing the character or character string description of the Y component and z component data information corresponding to the data file of a certain component (for example, the X component). For example, if the letter “X” of the component information contained in the file identifier is replaced with the letter “Y” or “Z” indicating the Y component or the Z component, the corresponding Y component or Z component data file It becomes a file identifier.
  • step S010 if the CPU 1 finds that the analysis target file is, for example, an X component data file through analysis of the metadata obtained in step S009, the related data file is required. It is determined that (S010; YES), and the process proceeds to step soil. If not (S010; N)), the CPU 1 advances the process to step S012.
  • the analysis target file is, for example, an X component data file through analysis of the metadata obtained in step S009
  • step S011 the CPU 1 generates a file identifier of the related data file.
  • the file identifier of the related data file is, for example, the analysis target file as described above It can be generated by changing part of the file identifier of.
  • the generated association is stored in the work area on MM2.
  • the related data file is stored in the database DB 31 and is written. Thereafter, the process proceeds to step S012.
  • step S012 a subroutine for determining the arrangement of the analysis target file (specified data file) or the analysis target file and the related data file is executed.
  • FIG. 13 is a flowchart showing an example of the arrangement determination subroutine (S012).
  • the CPU 1 first estimates the size of the data file allocated to the node and the computer resource A required for the process (step S201).
  • the CPU 1 obtains the size of the analysis target file (for example, obtained from metadata). Subsequently, the CPU 1 estimates the size of the processing result file created when the processing specified by the processing detail parameter obtained in step S 006 (FIG. 9) is executed on the analysis target file according to the corresponding metadata. Do. The CPU 1 calculates the sum of the size of the analysis target file and the size of the processing result file as the computer resource A.
  • the size of the processing result file is, for example, extracted from the extraction range if the processing content specified by the processing detail parameter, and a process of extracting a part of the analysis target file from the extraction range specified. .
  • step S201 if there is a related data file, the size of the related data file and the size of the processing result file for the related data file are also included in the computer resource A.
  • the size of the related data file and the size of the processing result file for this can be estimated, for example, the size of the analysis target file and the size of the processing result file for this.
  • CPU 1 can provide the user with a capacity corresponding to computer resource A with reference to situation table 33 (FIG. 6), and the load is the largest in the current load distribution situation. Search for nodes expected to be light (step S202).
  • the CPU 1 refers to the status table 33 and refers to the user's record in each small table 34.
  • the user ID has already been input to the computer Y by the user, for example, when starting to use the simulation system, and the CPU 1 refers to the record corresponding to this user ID.
  • the CPU 1 subtracts the load (current usage size) from the maximum size in each record to determine the remaining usable size of the user at each node. Subsequently, the CPU 1 determines the node with the largest available size and the node with the smallest load as the node to which the analysis target file (and the related data file) should be placed.
  • the CPU 1 updates the situation table 33 based on the computer resource A (step S 203). That is, the CPU 1 adds the value of the computer resource A to the load value (use size) of the small table 34 corresponding to the determined node.
  • the CPU 1 ends the processing of the subroutine, and passes the identifier of the node determined as the file allocation destination to the main routine.
  • step S013 of the S main routine the CPU 1 outputs an instruction statement (referred to as “data arrangement instruction statement”) related to data arrangement to nodes.
  • data arrangement instruction statement an instruction statement related to data arrangement to nodes.
  • the CPU 1 reads a template of data arrangement command statements (previously stored in the external storage device 3).
  • the template is configured such that the command statement is completed if a file identifier to be placed and a node identifier are described at a predetermined position of the fixed command statement.
  • the CPU 1 describes the identifier of the file to be analyzed (and the related data file) at a predetermined position of the template and describes the node identifier obtained in step S012. In this way, the completed data allocation statement becomes part of the parallel processing job script.
  • CPU 1 outputs a command statement (referred to as “processing result transfer command statement”) for moving processed data (processing result file) to the storage position after completion of parallel processing (step s).
  • processing result transfer command statement a command statement for moving processed data (processing result file) to the storage position after completion of parallel processing (step s).
  • the CPU 1 reads a template of the processing result movement instruction statement (previously stored in the external storage device 3).
  • the template is configured such that the command statement is completed if the storage position specified by the UI is described at a predetermined position of the fixed command statement.
  • the CPU 1 writes the storage location of the processing result file for the analysis target file obtained in step S 007 in a predetermined position of the template. In this way, the completed processing result transfer statement becomes part of the parallel processing job script.
  • the CPU 1 stores data arrangement information (step S015). That is, the CPU 1 stores the correspondence between the file identifier and the node identifier as data arrangement information in a predetermined storage area.
  • step S015 When step S015 is finished, if there is a file identifier of the processing target data file that is not a file to be analyzed, processing power returns to step S009, and processing power S of steps S009 to S015 described above is executed. Ru. When the process for the file identifiers of all process target data files is completed, the process proceeds to step S016.
  • the CPU 1 outputs a parallel processing program execution statement. That is, the CPU 1 reads out a parallel processing program execution statement stored in advance in the external storage device 3 and sets it as a part of a parallel processing job script. In this way, a parallel processing job script including a header, a data placement instruction statement, a processing result transfer instruction statement, and a parallel processing program execution statement is automatically generated.
  • the CPU 1 starts the process of creating the setting file of the parallel processing program (step S017: FIG. 11).
  • the CPU 1 starts parallel processing program setting creation loop processing. This loop process is executed for each data file to be processed.
  • the CPU 1 creates the setting for the process target data file based on the data arrangement information (the correspondence between the file identifier and the node identifier) (S018).
  • CPU 1 extracts a portion related to one processing target data file from the data arrangement information obtained in step S 015, and processes the processing parameter corresponding to this file identifier. Combine with the meter (obtained in step S006). The CPU 1 describes the result of the combination in a predetermined format for the setting file.
  • the CPU 1 performs such a process for each data file to be processed, and ends the main routine when the process of step S 019 on all the data files to be processed is completed.
  • FIG. 14 is a diagram showing an example of description of a setting file for parallel processing program.
  • the setting file consists of a plurality of lines described for each data file to be processed.
  • CPU 1 When creation of the script and configuration file is completed, CPU 1 starts script execution. By executing the script, the computer Y transfers the configuration file to each node of the parallel computer group X according to the configuration file transfer instruction statement of the header.
  • computer Y transfers each processing target data file (processing target data file group) stored in file DB 31 to the node at the placement destination according to the data placement information by executing the data placement command statement. Do.
  • the computer Y executes, for each node, the processing result file (processed data) created by processing the processing target data file in each node by executing the processing result moving instruction statement. It instructs to store in the specified storage location (for example, prepared in file DB 31).
  • the computer Y instructs each node to start the execution of the parallel processing program by executing the parallel processing program execution statement.
  • Each node (FIG. 3) at which the processing target data file group is arranged receives the setting file and the processing target data file from the computer Y via the network. These are stored in the external storage device 14 in the node. After that, the CPU 11 of each node When the execution instruction of the force parallel processing program is received, the execution of the parallel processing program is started.
  • FIG. 14 is a flowchart showing an execution process of a parallel processing program executed by the CPU 11.
  • the CPU 11 starts the process shown in FIG. 14, it first executes an initialization process (step S301).
  • the CPU 11 reads the setting file stored in the external storage device 14 into the MM 12 (step S302).
  • the CPU 11 executes a processing loop of the processing target data file according to the setting file.
  • the CPU 11 sets one line in the setting file as the line to be processed, and executes the process on the data file to be processed according to the setting contents described in the line to be processed.
  • the CPU 11 refers to the node identifier in the configuration file and determines whether this node identifier is equal to the identifier of the own node (step S303).
  • step S 303 if the node identifiers are not equal (S 303; NO), the next line in the setting file is set as the line to be processed, and the process of step S 303 is executed.
  • the CPU 11 performs processing for acquiring metadata corresponding to the file identifier described in the processing target line (step S 304).
  • step S304 is the same process as the subroutine shown in FIG. That is, the CPU 11 refers to the metadata table 32A (the data structure is the same as the metadata table 32 (FIG. 5)) stored in the external storage device 14 and retrieves the corresponding metadata.
  • the CPU 11 refers to the metadata table 32A (the data structure is the same as the metadata table 32 (FIG. 5)) stored in the external storage device 14 and retrieves the corresponding metadata.
  • the CPU 11 executes the process on the process target data file in accordance with the process type specification in the process target line, the process parameter, and the metadata (step S305). That is, the CPU 11 gives the calculation processor 13 specification of processing type, processing parameters, file identifier and metadata. Then, the calculation processor 13 reads the processing target data file corresponding to the file identifier from the external storage unit 14 to the MM 12 and executes processing according to the processing type specification and the processing parameter based on the metadata.
  • the CPU 11 After that, when the processing by the calculation processor 13 is completed, the CPU 11 outputs the data of the processing result. Data (processed data) is output as a processing result file (step S306).
  • the processing result file is transferred, for example, to the computer Y, and the computer Y stores the processing result file in the storage location designated by the user (for example, prepared in the file DB 31).
  • the storage area of the process target data file (simulation data file) is provided on the external storage device 3 of the computer Y
  • the storage area may be provided on a file server independent of the computer Y and the parallel computer group X which each node may have.
  • the parallel processing for the data file group to be processed is performed.
  • Processing control programs (scripts) and configuration files for parallel program execution are created automatically.
  • a desired script and setting file are automatically created simply by designating or inputting information that is an element of the above-described parallel processing designation information using the UI. This can greatly reduce the effort of the user.
  • the time required to describe a script is shortened, the time required to obtain parallel processing results can be shortened.
  • the metadata for the processing target data is automatically searched for and acquired by the designation of the file identifier by the user. That is, when the user designates a file identifier, a keyword is extracted from the file identifier, and metadata corresponding to this keyword is treated as designated metadata. This eliminates the need for the user to input metadata specifications for each data file to be processed. Therefore, the effort of the user can be reduced, the processing time can be shortened, and the user's input error can be prevented.
  • a file identifier including data storage location information (file path) is applied to the processing target data file, and a keyword indicating the nature of the processing target data (for metadata search) The keywords of) are included.
  • the file identifier can include multiple keywords.
  • the user when the user designates the data file to be processed, the user is configured to designate the file identifier including the file path. In this way, specification of the file identifier doubles as keyword input. Therefore, the burden on the user can be reduced.
  • the metadata is configured to be stored in a storage area different from the processing target data file. As a result, the data file to be processed can be efficiently stored in the storage area.
  • the present invention is applicable to, for example, data processing in various numerical simulation systems.

Abstract

L'invention porte sur un récupérateur de métadonnées comportant: un accepteur désignant un identificateur de fichiers comprenant à la fois des informations sur le lieu de stockage d'un fichier de données et un mot clef de récupération de métadonnées; un stockage des métadonnées correspondant au fichier de données; un extracteur du mot clef de l'identificateur de fichiers désigné; et un récupérateur des données correspondant au mot clef extrait du stockage de métadonnées.
PCT/JP2005/023649 2005-12-22 2005-12-22 Recuperateur de metadonnees WO2007072566A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2005/023649 WO2007072566A1 (fr) 2005-12-22 2005-12-22 Recuperateur de metadonnees
JP2007550971A JP4905989B2 (ja) 2005-12-22 2005-12-22 メタデータ検索装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2005/023649 WO2007072566A1 (fr) 2005-12-22 2005-12-22 Recuperateur de metadonnees

Publications (1)

Publication Number Publication Date
WO2007072566A1 true WO2007072566A1 (fr) 2007-06-28

Family

ID=38188357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/023649 WO2007072566A1 (fr) 2005-12-22 2005-12-22 Recuperateur de metadonnees

Country Status (2)

Country Link
JP (1) JP4905989B2 (fr)
WO (1) WO2007072566A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014134877A (ja) * 2013-01-08 2014-07-24 Aplix Ip Holdings Corp 情報処理装置およびキュー記憶方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002244900A (ja) * 2000-12-12 2002-08-30 Matsushita Electric Ind Co Ltd ファイル管理方法及びコンテンツ記録/再生装置
JP2003281163A (ja) * 2002-03-26 2003-10-03 Canon Inc 画像処理装置及び画像処理方法、記憶媒体
JP2005174063A (ja) * 2003-12-12 2005-06-30 Nippon Telegr & Teleph Corp <Ntt> ファイル管理装置,動的名前空間生成方法および動的名前空間生成プログラム
JP2005530242A (ja) * 2000-09-11 2005-10-06 アガミ システムズ, インコーポレイテッド 区画された移動可能メタデータを有する記憶システム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11110276A (ja) * 1997-10-03 1999-04-23 Canon Inc ファイル検索方法及び情報処理装置
JP4724283B2 (ja) * 2000-09-20 2011-07-13 キヤノン株式会社 文書管理装置、文書管理方法および記憶媒体

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005530242A (ja) * 2000-09-11 2005-10-06 アガミ システムズ, インコーポレイテッド 区画された移動可能メタデータを有する記憶システム
JP2002244900A (ja) * 2000-12-12 2002-08-30 Matsushita Electric Ind Co Ltd ファイル管理方法及びコンテンツ記録/再生装置
JP2003281163A (ja) * 2002-03-26 2003-10-03 Canon Inc 画像処理装置及び画像処理方法、記憶媒体
JP2005174063A (ja) * 2003-12-12 2005-06-30 Nippon Telegr & Teleph Corp <Ntt> ファイル管理装置,動的名前空間生成方法および動的名前空間生成プログラム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014134877A (ja) * 2013-01-08 2014-07-24 Aplix Ip Holdings Corp 情報処理装置およびキュー記憶方法

Also Published As

Publication number Publication date
JP4905989B2 (ja) 2012-03-28
JPWO2007072566A1 (ja) 2009-05-28

Similar Documents

Publication Publication Date Title
US11593369B2 (en) Managing data queries
CN107533453B (zh) 用于生成数据可视化应用的系统和方法
US10733055B1 (en) Methods and apparatus related to graph transformation and synchronization
US6775806B2 (en) Method, system and computer product to produce a computer-generated integrated circuit design
CN106528201B (zh) 游戏中加载动画的方法和装置
CN108446398A (zh) 一种数据库的生成方法及装置
JP6996629B2 (ja) 検証自動化装置、検証自動化方法、およびプログラム
JP6886101B2 (ja) 情報処理装置、情報処理方法、プログラム
JP5602871B2 (ja) 照会リネージの自動生成のための方法、システム、およびコンピュータ・プログラム
JP4846736B2 (ja) 並列処理支援装置
CN110377610A (zh) 一种基于云平台的数据库更新方法、装置、设备及介质
JP2006031608A (ja) 計算機、ストレージシステム、計算機が行うファイル管理方法、およびプログラム
JP4905989B2 (ja) メタデータ検索装置
CN115329753A (zh) 一种基于自然语言处理的智能数据分析方法和系统
CN108038181A (zh) 一种数据处理系统及数据处理方法
CN111045991B (zh) 一种基于命令行模式快速打开文件的实现方法
JP2007094572A (ja) マスタデータ管理装置及びマスタデータ管理プログラム
JP2007164532A (ja) タスク表示装置、タスク表示方法及びタスク表示プログラム
US8775873B2 (en) Data processing apparatus that performs test validation and computer-readable storage medium
JP5732138B2 (ja) 仮想計算機提供システム及び提供方法
CN110134687A (zh) 一种通过表字段的动态增改控制清单要素的方法及系统
JP2006040024A (ja) ストレージ管理方法、管理装置及びコンピュータシステム
JP6677809B2 (ja) アダプタ生成装置及び方法
CN110222105A (zh) 数据汇总处理方法及装置
US10606939B2 (en) Applying matching data transformation information based on a user&#39;s editing of data within a document

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2007550971

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05820189

Country of ref document: EP

Kind code of ref document: A1