CN104484441A - File batch processing and scheduling method - Google Patents

File batch processing and scheduling method Download PDF

Info

Publication number
CN104484441A
CN104484441A CN201410816038.6A CN201410816038A CN104484441A CN 104484441 A CN104484441 A CN 104484441A CN 201410816038 A CN201410816038 A CN 201410816038A CN 104484441 A CN104484441 A CN 104484441A
Authority
CN
China
Prior art keywords
file
external data
data file
files
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410816038.6A
Other languages
Chinese (zh)
Inventor
王莉
郭铸
王作为
陈世强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201410816038.6A priority Critical patent/CN104484441A/en
Publication of CN104484441A publication Critical patent/CN104484441A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file batch processing and scheduling method. The method comprises the following steps: receiving an external data file issued by a downloading platform; loading the external data file to a database. According to the file batch processing and scheduling method, each processing state of the external data file is scheduled by using a state drive method, so the purpose of efficiently and concurrently processing the file in a resource controllable mode is achieved. A state is set for each processing step of the file, and is recorded in the database; each processing process is sequentially scheduled by using the mode that the file is processed once the file is received, and the concurrence of the greatest extent is realized between every two processing stages of different files.

Description

The method of files in batch process and scheduling
Technical field
The invention discloses a kind of document handling method, a kind of method of particularly files in batch process and scheduling.
Background technology
At present, in data processing type systematic, extremely important for the inspection of the external data file in source, cleaning, loading procedure, be the basis of Construction of Data Warehouse; Especially for the system that data volume is very large, how can efficient stable to realize above-mentioned requirements more crucial.
For the concurrent processing of mass file and scheduling in prior art, there is no special file processing lot size scheduling instrument or method.Such as under AIX (Advanced Interactive eXecutive) system, AIX system is a set of class UNIX operating system that IBM develops based on AT & T Unix System V, operate in IBM proprietary Power family chip design minicomputer hardware system on.It has the features such as good security, manageability and clock availability, and is widely used in the field such as bank, retail trade.And for bank, for concurrent processing and the scheduling problem of mass file, there is the low and stable not problem of efficiency all the time and exist.
Summary of the invention
In view of the problems referred to above that prior art exists, the object of the present invention is to provide a kind of files in batch process and dispatching method.The method can the realization of efficient stable for the batch processing of the external data file in source and scheduling.
To achieve these goals, the method for a kind of files in batch process provided by the invention and scheduling, comprising:
Receive the external data file passing down platform and issue;
Load described external data file to database.
As preferably, load described external data file to database, comprising:
Connection data storehouse;
Obtain Loading Control file and according to described Loading Control files loading external data file to database.
As preferably, behind connection data storehouse, first obtain journal file path, and after loading external data file to database, again check that loading journal file judges that whether loading external data file is successful, if judge to load external data file success, updating file state turn-off data storehouse connect.
As preferably, when obtaining Loading Control file, if successfully loaded, in delete database current table section data after enter and load external data file step; Otherwise then first write Loading Control file and obtain Loading Control file again.
As preferably, before loading described external data file to database, judge the whether current file of described external data file, if it is load described external data file to database; Otherwise described external data file compression is preserved, and when issue arrives preset value, external data file described in decompress(ion).
As preferably, before judging the whether current file of described external data file, clean described external data file, this step comprises: the public informations such as file control information inspection, acquisition file separator, cleaning configuration file, line by line the cleaning rule of file according to each field is cleaned, data after cleaning are write line by line the rear file of cleaning, calculate cleaning error rate.
As preferably, before file cleaning is carried out to described external data file, check described external data file, comprising:
Connection data storehouse;
Open described external data file, after file reading control information, check file control information and according to different inspections, different states is arranged to file.
As preferably, described file control information comprises systematic name, passes table name down, increases full dose mark, file separator, the from date of data content and the Close Date of data content.
As preferably, before checking described external data file, external data file described in decompress(ion).
Compared with prior art, the method that the method using state of files in batch process of the present invention and scheduling drives is dispatched the processing stage of external data file each, reaches the object of efficiently concurrent, that resource is controlled process file.And be each treatment step set condition of file, and give record in a database; Take, with to each processing procedure of mode sequence call with process, to realize farthest concurrent between the processing stage of different file each.
Accompanying drawing explanation
Fig. 1 is the general flow chart of the method for files in batch process of the present invention and scheduling.
Fig. 2 is the general flow chart loading external data file in the method for files in batch process of the present invention and scheduling.
Fig. 3 is the general flow chart checking external data file in the method for files in batch process of the present invention and scheduling.
Embodiment
Below in conjunction with the drawings and the specific embodiments, technical scheme of the present invention is further described in detail.
The method of a kind of files in batch process provided by the invention and scheduling, provides concurrent processing and the scheduling feature of mass file under AIX system, for the basic document data preparation stage of Construction of Data Warehouse process provides control.Consist essentially of: first receive down the external data file passing platform and issue; And then be loaded into database with the maximum concurrent external data file that makes.These two steps are the most basic embodiments realizing technical solution of the present invention.And in following accompanying drawing 1, provide another more specifically embodiment, as shown in Figure 1, the method comprises:
S10, receives external data file.Here external data file is often referred to all data files from passing down platform.In data processing type systematic, extremely important for the inspection of the external data file in source, cleaning, loading procedure, be the basis of Construction of Data Warehouse; Especially for the system that data volume is very large, how can efficient stable to realize above-mentioned requirements more crucial.
S11, decompress(ion) external data file.If this refers to the external data file passing platform transmission by is down compressed format, need to decompress so that subsequent operation to it at this.In actual mechanical process, this operation can be carried out by calling gunzip in this step.
S12, checks external data file.The effect of this step checks that whether the file control information of each file is complete, and systematic name in file reading control information, the information such as from date, Close Date that passes table name down, increase full dose mark, file separator, data content.And according to the configuration that in database, one of them is shown, obtain the table name in database corresponding to external data file, then these information are recorded to again in another table.Such as, in concrete operations, for ODS (Operational DataStore, operational data stores) data set, according to the configuration in SYS_TABNAMECHG table, obtain the ODS table name that file is corresponding, and these are recorded in SYS_FTPFILECTL table.The state that file checking terminates rear file is 3000.
Fig. 3 shows the general flow chart checking external data file in the method for files in batch process of the present invention and scheduling.As shown in Figure 3, when carrying out file checking, step comprises: S31, connection data storehouse; If successful connection, then enter S32 step; S32, open the source file of the external data file that will check, if open successfully, enter S33 step, if open file unsuccessfully, arranging file status is 2005; The file control information of S33, reading external data file, if read successfully, enters S34 step, if file reading control information failure, then arranging file status is 2001; S34, inspection external data file.If check successfully, then enter S35 step.If check unsuccessfully, then according to different inspections, different states is arranged to file; S35, updating file state, if be updated successfully, enter S36 step, if upgrade unsuccessfully, then arranging file status is 2006; S36, close file; S37, turn-off data storehouse connect.
S13, cleaning external data file.File control information checks, obtain file separator, clean the public informations such as configuration file, cleans line by line to the cleaning rule of file according to each field, data after cleaning is write line by line file after cleaning, calculates and clean error rate.
S14, load described external data file to database before, judge the whether current file of described external data file, if it is load described external data file to database; Otherwise enter S15 step.
S15, the compression of described external data file to be preserved, and when issue arrives preset value, enter S16 step.
External data file described in S16, decompress(ion).
S17, loading external data file are to database.The effect of this step is by being loaded in the table of the database corresponding to it by the data file after having cleaned.Accept above example, such as, find corresponding ODS table name by SYS_TABNAMECHG table in ODS, and then the external data file after this cleaning is loaded in the ODS table of its correspondence by calling program (such as sqlldr instrument).Be 6000 by loading the file status successfully simultaneously.
In this step, as shown in Figure 2, be again specifically the loading having carried out external data file as follows: S21, connection data storehouse; S22, connection after database, first obtains journal file path, if obtain successfully, enters S23 step; S23, acquisition Loading Control file, successfully enter S25 step if obtained, otherwise enter S24 step when obtaining unsuccessfully; S24, when obtain in S23 step Loading Control file failure time, first can automatically write Loading Control file and obtain Loading Control file again, enter S25 step; In S25, delete database current table section data after enter and load external data file step.Such as, still for ODS data set, when connecting upper database and after obtaining Loading Control file, deleting the data of the current region that current ODS shows in this step; S26, loading external data file are to database; S27, acquisition Loading Control file and according to described Loading Control files loading external data file to database; S28, updating file state; S29, turn-off data storehouse connect.
Acquisition adds when specifically using, and directly disposes in file system in corresponding program, database and installs correlation parameter table and create required catalogue.
The method that the method using state of files in batch process of the present invention and scheduling drives is dispatched the processing stage of external data file each, reaches the object of efficiently concurrent, that resource is controlled process file.And be each treatment step set condition of file, and give record in a database; Take, with to each processing procedure of mode sequence call with process, to realize farthest concurrent between the processing stage of different file each.Realize concurrent processing and the scheduling feature of mass file under such as AIX system, for the basic document data preparation stage of Construction of Data Warehouse process provides control.
Certainly, the above is the preferred embodiment of the present invention, should be understood that; for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (8)

1. a method for files in batch process and scheduling, is characterized in that, comprising:
Receive the external data file passing down platform and issue;
Load described external data file to database.
2. the method for files in batch process as claimed in claim 1 and scheduling, is characterized in that, load described external data file to database, comprising:
Connection data storehouse;
Obtain Loading Control file and according to described Loading Control files loading external data file to database.
3. the method for files in batch process as claimed in claim 2 and scheduling, it is characterized in that, behind connection data storehouse, first obtain journal file path, and after loading external data file to database, again check that loading journal file judges that whether loading external data file is successful, if judge to load external data file success, updating file state turn-off data storehouse connect.
4. the method for files in batch process as claimed in claim 2 and scheduling, is characterized in that, when obtaining Loading Control file, if successfully loaded, in delete database current table section data after enter and load external data file step; Otherwise then first write Loading Control file and obtain Loading Control file again.
5. the method for files in batch process as claimed in claim 1 and scheduling, is characterized in that, before loading described external data file to database, judge the whether current file of described external data file, if it is load described external data file to database; Otherwise described external data file compression is preserved, and when issue arrives preset value, external data file described in decompress(ion).
6. the method for files in batch process as claimed in claim 5 and scheduling, it is characterized in that, before judging the whether current file of described external data file, clean described external data file, this step comprises: check file control information; Obtain file separator; Cleaning configuration file, cleans the cleaning rule that file is preset according to each field line by line, then data after cleaning is write the file after cleaning line by line.
7. the method for files in batch process as claimed in claim 6 and scheduling, is characterized in that, before carrying out file cleaning, check described external data file, comprising described external data file:
Connection data storehouse;
Open described external data file, after file reading control information, check file control information and according to different inspections, different states is arranged to file.
8. the method for files in batch process as claimed in claim 7 and scheduling, is characterized in that, described file control information comprises systematic name, passes table name down, increases full dose mark, file separator, the from date of data content and the Close Date of data content.
CN201410816038.6A 2014-12-23 2014-12-23 File batch processing and scheduling method Pending CN104484441A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410816038.6A CN104484441A (en) 2014-12-23 2014-12-23 File batch processing and scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410816038.6A CN104484441A (en) 2014-12-23 2014-12-23 File batch processing and scheduling method

Publications (1)

Publication Number Publication Date
CN104484441A true CN104484441A (en) 2015-04-01

Family

ID=52758982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410816038.6A Pending CN104484441A (en) 2014-12-23 2014-12-23 File batch processing and scheduling method

Country Status (1)

Country Link
CN (1) CN104484441A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294219A1 (en) * 2004-01-22 2007-12-20 International Business Machines Corporation Shared scans utilizing query monitor during query execution to improve buffer cache utilization across multi-stream query environments
CN101251861A (en) * 2008-03-18 2008-08-27 北京锐安科技有限公司 Method for loading and inquiring magnanimity data
JP2008242677A (en) * 2007-03-27 2008-10-09 Hitachi Information Systems Ltd Database construction-supporting system, database construction information-generating method, and program
CN103077241A (en) * 2013-01-10 2013-05-01 中国银行股份有限公司 Method for loading data in parallel after splitting files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294219A1 (en) * 2004-01-22 2007-12-20 International Business Machines Corporation Shared scans utilizing query monitor during query execution to improve buffer cache utilization across multi-stream query environments
JP2008242677A (en) * 2007-03-27 2008-10-09 Hitachi Information Systems Ltd Database construction-supporting system, database construction information-generating method, and program
CN101251861A (en) * 2008-03-18 2008-08-27 北京锐安科技有限公司 Method for loading and inquiring magnanimity data
CN103077241A (en) * 2013-01-10 2013-05-01 中国银行股份有限公司 Method for loading data in parallel after splitting files

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
夏阳等: "Oracle数据库的备份方法及策略", 《微型机与应用》 *
李恒锐: "构建数据仓库的ETL系统研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
李玉萍等: "局域网中WindowsNT、Oracle7服务器故障9例", 《医学信息》 *
秦峰巍等: "基于SQL*Loader的海量数据装载方案优化", 《武汉理工大学学报信息与管理工程版》 *
顾轶: "数据库控制文件丢失后的恢复", 《电脑报》 *

Similar Documents

Publication Publication Date Title
US10346850B2 (en) Case management integration with external content repositories
CN109598427B (en) Robot management method and device and electronic equipment
CN105512294B (en) Multimedia file update prompting method and device
CN105045676B (en) A kind of restoration methods of the loss data based on SQLite databases
CN107608798A (en) A kind of method for processing business and equipment
CN110650164B (en) File uploading method and device, terminal and computer storage medium
CN102375891A (en) Implementation tool for unloading and loading incremental data
US8473504B2 (en) Stabilized binary differencing
CN107819883A (en) A kind of multi signal processing equipment and its remote upgrade method to FPGA programs
CN109324821B (en) Self-service terminal system version management method
US20170185388A1 (en) Application program uninstallation method and apparatus
CN113760611B (en) System site switching method and device, electronic equipment and storage medium
CN112181695A (en) Abnormal application processing method, device, server and storage medium
CN104318467A (en) Food material information input method for intelligent refrigerator
CN110191182A (en) Distributed document batch processing method, device, equipment and readable storage medium storing program for executing
CN112583743B (en) Distributed file exchange method and device
CN104484441A (en) File batch processing and scheduling method
CN103731629B (en) A kind of video conference terminal and its implementation method for supporting third-party application
CN116069859A (en) Incremental data synchronization method of database, storage medium and computer equipment
CN108984221B (en) Method and device for acquiring multi-platform user behavior logs
CN110838338A (en) System, method, storage medium, and electronic device for creating biological analysis item
CN112579250B (en) Middleware management method and device and repair engine system
CN105630554B (en) A kind of reloading method and user terminal of third-party application
CN105791514A (en) Application starting monitoring method and device
CN108334454A (en) A kind of automatic scheduling method and system of smart card test platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150401

RJ01 Rejection of invention patent application after publication