CN105827702A - Distributed message queue based FTP data collection method - Google Patents

Distributed message queue based FTP data collection method Download PDF

Info

Publication number
CN105827702A
CN105827702A CN201610149074.0A CN201610149074A CN105827702A CN 105827702 A CN105827702 A CN 105827702A CN 201610149074 A CN201610149074 A CN 201610149074A CN 105827702 A CN105827702 A CN 105827702A
Authority
CN
China
Prior art keywords
ftp
data
server
file
message queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610149074.0A
Other languages
Chinese (zh)
Inventor
程永新
宋辉
吴泽锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qingwei Software Co Ltd
Original Assignee
Shanghai Qingwei Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qingwei Software Co Ltd filed Critical Shanghai Qingwei Software Co Ltd
Priority to CN201610149074.0A priority Critical patent/CN105827702A/en
Publication of CN105827702A publication Critical patent/CN105827702A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention discloses a distributed message queue based FTP data collection method. The method comprises the following steps that a) an FTP configuration file is read, an FTP server is scanned, and FTP file information which needs collecting is read; b) the time point of each scanning is recorded, and data is filtered according to the time point of last scanning; c) the filtered FTP file information which needs collecting is sent to a distributed message queue ActiveMQ; and d) the file name, server IP, server account number and password in the FTP file information is obtained from the ActiveMQ, FTP connection is carried out according to the server IP, server account number and password, and data is collected in a category of the file. According to the invention, it is ensured that distributed FTP data collection can be carried out, the data consistency is ensured, data is prevented from sampling repeatedly, leakage and abnormity, maintenance is easy, and the expansibility is high.

Description

A kind of data FTP acquisition method based on Distributed Message Queue
Technical field
The present invention relates to a kind of collecting method, particularly relate to a kind of data FTP acquisition method based on Distributed Message Queue.
Background technology
FTP is the English abbreviation of FileTransferProtocol (file transfer protocol (FTP)), and Chinese referred to as " literary composition passes agreement ".The transmitted in both directions controlling file on Internet.Meanwhile, it is also an application program (Application).There is different FTP application programs based on different operating system, and all these application program is in compliance with same agreement to transmit file;It is a kind of safety and stable quickly transmission blocks of files technology.
Along with the development of big data fields, for more accuracy and the promptness of data transmission, IT industry occurs in that the most popular data transfer tool and technology;Class kafka, the file collection of flume distributed and the transmission technology wherein gushed out replaces the most traditional data acquisition technology;But ftp file transmission still plays a part to despise;Particularly in telecommunications/movement industry, propelling and universality along with 4G, manufacturer in order to faster, more preferably, more safety and stable transmission of data blocks is processed to data center, FTP transmission technology plays an important role, in order to ensure the development and propelling that can adapt to big data;During data transmission, especially needed guarantee data safety and stability is transmitted efficiently, it is impossible to allow data generation delay phenomenon.
At present, the mode of data stabilization that what industry was universal realize transmission has two kinds:
1, one way sequence point-to-point acquisition data, by the IP of FTP, account, password is implanted directly in program, remove to scan the fileinfo of FTP storage by timing timer-triggered scheduler, obtaining fileinfo record to be written in disk A file as original checksums information, file goes to gather the most one by one, and the file after gathering successfully is written in B file as a comparison, collection is accomplished by program and is contrasted by A and B file, verify for standard with A, if B has disappearance, re-start and go to obtain by program;This mode has the shortcoming that existence is bigger, A and B file must be locked and be ensured to disturb without other, and program ver-ify is special, needs to expend sizable manpower and goes to safeguard, deal with improperly and can produce deadlock and cause file, even program internal memory overflows, and causes file verification failure, and leakage gathers, production system file can not get processing in time, and missing data, repeated acquisition, cause system data inaccurate;Cause massive losses.
2, still utilize that program is point-to-point carries out data acquisition, the IP of FTP, account, password, configures, and program loading configuration file is implanted in program, the fileinfo of single-point program scanning FTP storage, is inserted into data base by fileinfo, carries out program ver-ify comparison by SQL statement;When information is too much, operation is excessively frequent, and data base is difficult to bear this pressure, and time length there will be connection time-out, connects and waits, SQL performs slowly, and even data base delays machine phenomenon;Causing file to gather unsuccessfully, database manipulation unsuccessfully also results in data file leakage and gathers, and production system file can not get processing in time, causes system data inaccurate;Cause massive losses.
Therefore, the consumption mode of point-to-point collection file, and the data acquisition Support Mode of data age the most greatly can not have been met by manual file verification or the traditional form Support Mode that carries out verifying by data keeping records;Especially in day more than 5T~more than 20T, and file collection interaction cycle is under the frequency of 1 minute~5 minutes, and data base connects and interactive mode can't bear the heavy load, and improves accordingly, it would be desirable to existing data file is gathered consuming method.
Summary of the invention
The technical problem to be solved is to provide a kind of data FTP acquisition method based on Distributed Message Queue, it is ensured that can carry out distributed FTP data collection, ensures the concordance of data, does not haves data and heavily adopts, leaks problems such as adopting, abnormal;And safeguard that easily autgmentability is strong.
The present invention solves that above-mentioned technical problem employed technical scheme comprise that a kind of data FTP acquisition method based on Distributed Message Queue of offer, comprise the steps: a) to read FTP configuration file, scanning ftp server, and read the ftp file information needing to gather;B) record the time point scanned each time, and carry out data filtering according to the time point of last time scanning;C) the ftp file information needing to gather after filtering is sent to Distributed Message Queue ActiveMQ;D) from ActiveMQ, obtain file name, server ip, server account and the password in ftp file information, carry out FTP connection according to server ip, server account, password, and enter into and under the catalogue at this document place, carry out data acquisition.
Above-mentioned data FTP acquisition method based on Distributed Message Queue, wherein, in described FTP configuration file, record has ftp server IP, account, password and catalogue, described step a) connects ftp server according to FTP configuration file, open the catalogue of correspondence, scan the file under this catalogue, and read ftp file information.
Above-mentioned data FTP acquisition method based on Distributed Message Queue, wherein, the time point of described step b) record last time scanning, if present scan time point is less than or equal to the time of last registration, abandon present scan.
Above-mentioned data FTP acquisition method based on Distributed Message Queue, wherein, described step d) connects ActiveMQ by TCP and reads data, and the data collected directly is buffered in internal memory, carries out Synchronization Control and process concurrent capture program in internal memory.
Above-mentioned data FTP acquisition method based on Distributed Message Queue, wherein, if data acquisition is broken down in described step d), then re-writes inside ActiveMQ queue by the information that this processes, and waits and mending next time.
The present invention contrasts prior art a following beneficial effect: the data FTP acquisition method based on Distributed Message Queue that the present invention provides, use distributed queue as Transaction mechanism, guarantee to carry out distributed FTP data collection, ensure the concordance of data, do not have data and heavily adopt, leak problems such as adopting, extremely cause data delay process incorrect;It is suitable for span extensive, practical, do not have version to limit, it is possible to use any FTP gathers, and safeguard that easily, autgmentability is strong, and can ensure the real-time of data acquisition.
Accompanying drawing explanation
Fig. 1 is that present invention data FTP based on Distributed Message Queue gather configuration diagram;
Fig. 2 is present invention data FTP based on Distributed Message Queue collecting flowchart figure.
Detailed description of the invention
The invention will be further described with embodiment below in conjunction with the accompanying drawings.
Fig. 1 is that present invention data FTP based on Distributed Message Queue gather configuration diagram;Fig. 2 is present invention data FTP based on Distributed Message Queue collecting flowchart figure.
Refer to Fig. 1 and Fig. 2, the data FTP acquisition method based on Distributed Message Queue that the present invention provides, comprise the steps:
Step S1: read FTP configuration file, scans ftp server, and reads the ftp file information needing to gather;
Step S1: the time point that record scans each time, and carry out data filtering according to the time point of last time scanning;
Step S1: the ftp file information needing to gather after filtering is sent to Distributed Message Queue ActiveMQ;
Step S4: obtain file name, server ip, server account and the password in ftp file information from ActiveMQ, carries out FTP connection according to server ip, server account, password, and enters into and carry out data acquisition under the catalogue at this document place.
nullIt is packaged in present invention system architecture based on activityMQ Distributed Message Queue,The fileinfo utilizing the high performance Distributed Message Queue of ActiveMQ to update FTP carries out real-time storage,By configuring the information that FTP gathers,Timing scan program reads configuration information,The upper fileinfo needing the ftp server gathered is write inside ActiveMQ queue,The message of ActiveMQ has possessed the performance of High Availabitity,Ensure that and data will be caused to leak situation about adopting because of server failure,Control of consumption that ActiveMQ is concrete,Ensure that do not have Double Spending and result in data heavily adopt file,After having consumed data,Ftp server breaks down and causes gathering,Then only this information need to be re-write inside ActiveMQ queue,Wait and mending next time.Distributed capture program reads the message in ActiveMQ, by the ftp server information in message and file directory information, connects ftp server and carries out data acquisition.
The data FTP acquisition method based on Distributed Message Queue that the present invention provides, is broadly divided into four steps: scanning ftp server, data are overanxious, data are sent to ActiveMQ, FTP and gather;Each step is given below implements process.
1, scanning ftp server
Read FTP configuration file, the ftp server IP of configuration, account, password, catalogue inside configuration file.
Connect ftp server, open the catalogue of correspondence, scan the file under this catalogue.
2, data filtering is carried out, it is to avoid data are heavily adopted, the time point of record last time scanning, then when being scanned, go the time that the time is more than last registration to filter, it is ensured that the data every time scanned are all preliminary scan, it is to avoid repeated acquisition.Obtain file name, server ip, server account, password.
3, in data write ActiveMQ, the fileinfo after being filtered by FTP is sent to Distributed Message Queue ActiveMQ, and it has possessed following characteristics:
1) polyglot and agreement write client.Language: Java, C, C++, C#, Ruby, Perl, Python, PHP.Application protocol: OpenWire, StompREST, WSNotification, XMPP, AMQP.
2) JMS1.1 and J2EE1.4 specification (persistence, XA message, affairs) is supported completely.
3) multiple transportation protocol is supported: in-VM, TCP, SSL, NIO, UDP, JGroups, JXTA.
4) support to provide message duration at a high speed by JDBC and journal.
5) from design, ensure that high performance cluster, client-server, point-to-point.
6) built-in failover mechanism, it is ensured that the High Availabitity of client, when certain broker fault, uses other standby broker automatically.
It is as follows that the present invention sends data to ActiveMQ Program:
4, distributed capture program consumption data from ActiveMQ, reads the file name of message, server ip, server account, password, carries out FTP connection according to IP, account, password, and enters under the catalogue at this document place, carries out data acquisition.
Connecting ActiveMQ by TCP and read data, data are directly buffered in internal memory, it is ensured that each capture program will not consume same record.Carrying out Synchronization Control in internal memory, concurrent speed can be to million/second.Guarantee that the time delay of capture program is low.The program reading ActiveMQ data is as follows:
In sum, the data FTP acquisition method based on Distributed Message Queue that the present invention provides, use activeMQ as gathering information control point, by controlling the concurrent and coherence method that message queue guarantees to gather, and all solutions being acquired by FTP mode can be met.Concrete advantage is as follows: 1) solves FTP and gathers Double Spending problem.2) high concurrent, low delay under big data distributed capture is solved.3) promote data acquisition conforming while, nor affect on the overall performance of collection.4) span it is suitable for extensive, practical, do not have version to limit, be desirably integrated into current any FTP capture program.5) safeguard that easily autgmentability is strong.
Although the present invention discloses as above with preferred embodiment; so it is not limited to the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when making a little amendment and perfect, therefore protection scope of the present invention is when with being as the criterion that claims are defined.

Claims (5)

1. a data FTP acquisition method based on Distributed Message Queue, it is characterised in that comprise the steps:
A) read FTP configuration file, scan ftp server, and read the ftp file information needing to gather;
B) record the time point scanned each time, and carry out data filtering according to the time point of last time scanning;
C) the ftp file information needing to gather after filtering is sent to Distributed Message Queue ActiveMQ;
D) from ActiveMQ, obtain file name, server ip, server account and the password in ftp file information, carry out FTP connection according to server ip, server account, password, and enter into and under the catalogue at this document place, carry out data acquisition.
2. data FTP acquisition method based on Distributed Message Queue as claimed in claim 1, it is characterized in that, in described FTP configuration file, record has ftp server IP, account, password and catalogue, described step a) connects ftp server according to FTP configuration file, open the catalogue of correspondence, scan the file under this catalogue, and read ftp file information.
3. data FTP acquisition method based on Distributed Message Queue as claimed in claim 1, it is characterised in that the time point of described step b) record last time scanning, if present scan time point is less than or equal to the time of last registration, abandons present scan.
4. data FTP acquisition method based on Distributed Message Queue as claimed in claim 1, it is characterized in that, described step d) connects ActiveMQ by TCP and reads data, and the data collected directly are buffered in internal memory, internal memory carries out Synchronization Control and processes concurrent capture program.
5. data FTP acquisition method based on Distributed Message Queue as claimed in claim 1, it is characterized in that, if data acquisition is broken down in described step d), then the information that this processes is re-write inside ActiveMQ queue, wait and mending next time.
CN201610149074.0A 2016-03-16 2016-03-16 Distributed message queue based FTP data collection method Pending CN105827702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610149074.0A CN105827702A (en) 2016-03-16 2016-03-16 Distributed message queue based FTP data collection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610149074.0A CN105827702A (en) 2016-03-16 2016-03-16 Distributed message queue based FTP data collection method

Publications (1)

Publication Number Publication Date
CN105827702A true CN105827702A (en) 2016-08-03

Family

ID=56987887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610149074.0A Pending CN105827702A (en) 2016-03-16 2016-03-16 Distributed message queue based FTP data collection method

Country Status (1)

Country Link
CN (1) CN105827702A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789324A (en) * 2017-01-09 2017-05-31 上海轻维软件有限公司 FTP distributed acquisition methods based on MapReduce
CN109739549A (en) * 2018-12-28 2019-05-10 武汉长光科技有限公司 A kind of equipment performance acquisition method based on micro services
CN110532435A (en) * 2019-08-12 2019-12-03 广州海颐信息安全技术有限公司 The method and device of dynamic extending privilege account scanning system integrating external system
CN111526176A (en) * 2020-03-26 2020-08-11 青岛奥利普自动化控制系统有限公司 Data acquisition method and system for Claus Ma Fei injection molding machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515863A (en) * 2008-02-22 2009-08-26 中国移动通信集团公司 Network data acquiring method, acquisition machine and acquisition system
CN101557295A (en) * 2008-04-10 2009-10-14 中兴通讯股份有限公司 Method for realizing repetitive phone bill document picking on FTP server
CN102118261A (en) * 2009-12-30 2011-07-06 中兴通讯股份有限公司 Method and device for data acquisition, and network management equipment
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method
CN104486107A (en) * 2014-12-05 2015-04-01 曙光信息产业(北京)有限公司 Log collection device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515863A (en) * 2008-02-22 2009-08-26 中国移动通信集团公司 Network data acquiring method, acquisition machine and acquisition system
CN101557295A (en) * 2008-04-10 2009-10-14 中兴通讯股份有限公司 Method for realizing repetitive phone bill document picking on FTP server
CN102118261A (en) * 2009-12-30 2011-07-06 中兴通讯股份有限公司 Method and device for data acquisition, and network management equipment
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method
CN104486107A (en) * 2014-12-05 2015-04-01 曙光信息产业(北京)有限公司 Log collection device and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789324A (en) * 2017-01-09 2017-05-31 上海轻维软件有限公司 FTP distributed acquisition methods based on MapReduce
CN106789324B (en) * 2017-01-09 2024-03-22 上海轻维软件有限公司 FTP distributed acquisition method based on MapReduce
CN109739549A (en) * 2018-12-28 2019-05-10 武汉长光科技有限公司 A kind of equipment performance acquisition method based on micro services
CN110532435A (en) * 2019-08-12 2019-12-03 广州海颐信息安全技术有限公司 The method and device of dynamic extending privilege account scanning system integrating external system
CN110532435B (en) * 2019-08-12 2022-05-17 广州海颐信息安全技术有限公司 Method and device for integrating external system with dynamically extensible privileged account scanning system
CN111526176A (en) * 2020-03-26 2020-08-11 青岛奥利普自动化控制系统有限公司 Data acquisition method and system for Claus Ma Fei injection molding machine

Similar Documents

Publication Publication Date Title
CN105827702A (en) Distributed message queue based FTP data collection method
US6157940A (en) Automated client-based web server stress tool simulating simultaneous multiple user server accesses
US8433680B2 (en) Capturing and restoring database session state
Lamport et al. Vertical paxos and primary-backup replication
US6185701B1 (en) Automated client-based web application URL link extraction tool for use in testing and verification of internet web servers and associated applications executing thereon
Borthakur et al. Apache hadoop goes realtime at facebook
US6044398A (en) Virtual dynamic browsing system and method for automated web server and testing
US20130055197A1 (en) Modeling and code generation for sql-based data transformations
CN109241185A (en) A kind of method and data synchronization unit that data are synchronous
WO2016092430A1 (en) Controlling multi-database system
CN104636242A (en) Method for automatically deleting repeated content in system logs on basis of Linux operating system
US20120158770A1 (en) Client-Side Statement Cache
US20150277966A1 (en) Transaction system
US10013347B2 (en) Non-blocking parallel memory mechanisms
CN107590037A (en) A kind of method that EDPP tests are carried out to server GPU
GB2602544A (en) Data synchronization in a data analysis system
CN112527801A (en) Data synchronization method and system between relational database and big data system
CN106708972B (en) Method for optimizing ABAP program by utilizing SLT component based on HANA database
Goel et al. Jawa: Web Archival in the Era of {JavaScript}
CN103235754B (en) The treating method and apparatus of request in distributed file system
JP2007058506A (en) Document management server, document management system, and document management program and its recording medium
CN111930862A (en) SQL interactive analysis method and system based on big data platform
CN109445800A (en) A kind of version automatic deployment method and system based on distributed system
Kim et al. Modulo: Finding Convergence Failure Bugs in Distributed Systems with Divergence Resync Models.
DE112018006175T5 (en) ERROR HANDLING

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160803

WD01 Invention patent application deemed withdrawn after publication