CN108108226A - A kind of large data files analysis and processing method under cloud computing environment - Google Patents

A kind of large data files analysis and processing method under cloud computing environment Download PDF

Info

Publication number
CN108108226A
CN108108226A CN201711380414.1A CN201711380414A CN108108226A CN 108108226 A CN108108226 A CN 108108226A CN 201711380414 A CN201711380414 A CN 201711380414A CN 108108226 A CN108108226 A CN 108108226A
Authority
CN
China
Prior art keywords
file
analysis
large data
data files
exploration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711380414.1A
Other languages
Chinese (zh)
Inventor
顾佳晨
王振宇
肖楼
邓枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Landocean (beijing) Cloud Technology Co Ltd
Original Assignee
Landocean (beijing) Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Landocean (beijing) Cloud Technology Co Ltd filed Critical Landocean (beijing) Cloud Technology Co Ltd
Priority to CN201711380414.1A priority Critical patent/CN108108226A/en
Publication of CN108108226A publication Critical patent/CN108108226A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention discloses the large data files analysis and processing methods under a kind of cloud computing environment, exploration and development large data files are uploaded in cloud storage system by this method by file uploading module first, then exploration and development data file ID number is sent in message forwarding agency by Data Analysis Services module, agent in cloud computing environment void machine is listened to after message downloads the corresponding file of the ID number under the specified directory in empty machine from cloud storage system, and open the specialty analysis processing software installed in empty machine, directly software is handled after the empty machine of Data Analysis Services module connection by the specialty analysis opened to analyze and process data file.Large data file is stored in cloud storage system by the present invention, Large-scale professional analysis software is shared to by multiple professional researchers by empty machine, and be automatically downloaded in empty machine use for specialty analysis processing software by exploration and development large data files, have many advantages, such as that data processing speed is fast, stability is high.

Description

A kind of large data files analysis and processing method under cloud computing environment
Technical field
The present invention relates to large data files analysis and processing methods, and in particular to the large data files under a kind of cloud computing environment Analysis and processing method belongs to field of cloud computer technology.
Background technology
Data Analysis Services in exploration and development field generally require to face the large data file more than 100GB, analysis Handling implement is also the Large-scale professional processing software of millions of members easily, and analyzing and processing work is usually enterprising in special large scale computer Row, this mode is not only of high cost, and exist analyzing and processing work cannot more people it is shared the drawbacks of, therefore, with cloud computing The fast development of technology, stores and analyzes and processes that be all placed on high in the clouds be a kind of rational solution by data file, Ji Nengshi Existing multi-person synergy carries out analysis and research work, and can reduce total system cost, realizes efficiently using for computing resource.
Therefore, development is a kind of is stored in large-scale exploration and development large data files in cloud storage system, will be big by empty machine Type specialty analysis software shares to multiple professional researchers, and large-scale exploration and development large data files are passed through real time propelling movement skill Art is automatically downloaded to be very important for the method that interpretation software uses in empty machine, and the invention also has important answer Use prospect.
The content of the invention
The present invention is directed to the drawbacks of Data Analysis Services technology in existing exploration and development field, provides a kind of cloud computing ring Large-scale exploration and development large data files are stored in cloud storage system, pass through by the large data files analysis and processing method under border Exploration and development specialty analysis processing software is shared to multiple professional researchers by virtual machine, and by large-scale exploration and development big data File is automatically downloaded to by real time propelling movement technology in empty machine handle software use for exploration and development specialty analysis.
In order to realize above-mentioned target, the technical solution adopted in the present invention is:
Large data files analysis and processing method under a kind of cloud computing environment, which is characterized in that the described method includes following Step:
Exploration and development large data files are uploaded in cloud storage system by S1, file uploading module;
S2, will exploration and development specialty analysis handle software installation to cloud computing environment under empty machine in;
S3, agent softwares are installed in each empty machine;
Then file ID number is sent to message by Data Analysis Services module and forwarded by S4, user's select file ID number Agency;
The message channel of oneself in agent softwares monitoring information forwarding agency in S5, empty machine, receives Data Analysis Services The file ID in cloud storage system is obtained after the message that module is sent;
Agent softwares in S6, empty machine search the file in cloud storage system according to file ID, after finding corresponding file Exploration and development large data files are downloaded to handle under software catalog to the exploration and development specialty analysis in the void machine;
Agent softwares in S7, empty machine start the exploration and development specialty analysis processing software installed in empty machine;
S8, user connect empty machine by Data Analysis Services module, and in exploration and development specialty analysis handles software Manage exploration and development large data files;
By treated, data file is uploaded in cloud storage system agent in S9, empty machine.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that it is characterized in that, described In step S1, exploration and development large data files are first carried out burst processing by the file uploading module file uploading module;Then Whole slicing files and total number of files are uploaded to the server-side of file uploading module by file uploading module client;File uploads Slicing files are polymerized to a file by the server-side of module again;The server-side of final act uploading module calls cloud storage system Interface writes file in cloud storage system.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that in the step S1, institute The size for stating exploration and development large data files is more than 100GB.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that it is characterized in that, described In step S2, the void machine is mounted with the operating system of including but not limited to the following:Windows operating system, Linux behaviour Make system.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that in the step S2, institute Specialty analysis processing software package is stated to include but be not limited to the following:EPoffice、Petrel.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that in the step S3, institute State agent softwares be can cross-platform operation program, have empty machine startup after automatic running and connection message forwarding agency, prison Message channel is listened, reads message, file ID is obtained, downloads this document from cloud storage system automatically, be then automatically saved in specified Function under catalogue.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that in the step S4, institute It is with the file ID number to analyze for selecting user to state Data Analysis Services module, is sent to message forwarding agency's The module of function in the empty machine message channel of this document can be handled.
Large data files analysis and processing method under foregoing cloud computing environment, which is characterized in that in the step S4, institute Stating Data Analysis Services module includes an empty machine and the mapping table of exploration and development large data files type, data analysis After processing module is according to exploration and development large data files type search to corresponding empty machine, it will be sent comprising the message of file ID number Into the message channel of the specified empty machine of message forwarding agency.
Compared with prior art, the invention has the beneficial effects that:
(1) large data file is stored in cloud storage system, is shared to Large-scale professional analysis software by empty machine Multiple specialty researchers, and exploration and development large data files are automatically downloaded in empty machine by real time propelling movement technology for specialty Interpretation software uses, and provides a kind of one-stop solution using exploration and development big data as core;
(2) have many advantages, such as that data processing speed is fast, stability is high, highly practical, applied widely.
Description of the drawings
Fig. 1 is the system structure diagram of the large data files analysis and processing method under the cloud computing environment of the present invention;
Fig. 2 is the flow diagram of the large data files analysis and processing method under cloud computing environment in Fig. 1.
Specific embodiment
Make specific introduce to the present invention below in conjunction with the drawings and specific embodiments.
Referring to Figures 1 and 2, the large data files analysis and processing method under cloud computing environment of the invention, including following step Suddenly:
Exploration and development large data files are uploaded in cloud storage system by S1, file uploading module;
S2, will exploration and development specialty analysis handle software installation to cloud computing environment under empty machine in;
S3, agent softwares are installed in each empty machine;
Then file ID number is sent to message by Data Analysis Services module and forwarded by S4, user's select file ID number Agency, message forwarding agency installation message forwarding agent software RabbitMQ preferably in one individually empty machine, and be each Empty machine configures a message channel, the preferred direct modes of exchanging mechanism;
The message channel of oneself in agent softwares monitoring information forwarding agency in S5, empty machine, receives Data Analysis Services The file ID in cloud storage system is obtained after the message that module is sent;
Agent softwares in S6, empty machine search the file in cloud storage system according to file ID, after finding corresponding file It downloads exploration and development large data files to handle under software catalog to the exploration and development specialty analysis in the void machine, such as:C:\ In analysisfile;;
Agent softwares in S7, empty machine start the exploration and development specialty analysis processing software installed in empty machine;
S8, user connect empty machine by Data Analysis Services module, and in exploration and development specialty analysis handles software Manage exploration and development large data files;
By treated, data file is uploaded in cloud storage system agent in S9, empty machine.
As a preferred solution, in step S1, file uploading module file uploading module is first by the big number of exploration and development Burst processing is carried out according to file, by multiple subfiles that the cutting of super large exploration and development large data files is 20MB sizes, and to each Subfile is numbered in order;Then whole slicing files and total number of files are uploaded to file by file uploading module client The server-side of uploading module;The server-side of file uploading module all judges whether total number of files reaches after a file is often received It arrives, has all been uploaded if all of file, slicing files are polymerized to a file by the server-side of file uploading module again;Text The ID number of the preferred UUID generating algorithms generation file of server-side of part uploading module, then calls cloud storage interface to write file In cloud storage system, and using the UUID as the identification number of file;The server-side of final act uploading module calls cloud storage system Interface of uniting writes file in cloud storage system, the preferred CePH of cloud storage system, OpenStack Swift.
As a preferred solution, in step S1, the size of exploration and development large data files is more than 100GB.
As a preferred solution, in step S2, empty machine is mounted with the operating system of including but not limited to the following: Windows operating system, (SuSE) Linux OS.
As a preferred solution, in step S2, specialty analysis processing software package includes but is not limited to the following: EPoffice、Petrel。
As a preferred solution, in step S3, agent softwares be can cross-platform operation program, have in empty machine Automatic running and connection message forwarding agency, monitoring information passage, read message after startup, and acquisition file ID is deposited automatically from cloud Storage system downloads this document, the function being then automatically saved under specified directory.It is preferred that python writes agent programs, and note Volume is system service program, the automatic start when system starts.The IP acted on behalf of after the program automatic start by message forwarding Location and port are attached thereto, and then begin listening for the passage of the empty machine of entitled of passage.
As a preferred solution, in step S4, Data Analysis Services module is carried out with select user The file ID number of analysis is sent to the module of function in the empty machine message channel that can handle this document of message forwarding agency.
As a preferred solution, in step S4, Data Analysis Services module includes an empty machine and exploration and development The mapping table of large data files type, Data Analysis Services module according to exploration and development large data files type search to pair After the empty machine answered, in the message channel for the specified empty machine that the message comprising file ID number is sent to message forwarding agency, when with After the file of processing needed for the selection of family, Data Analysis Services module gets the ID number of this document, Ran Houtong according to user's selection It crosses empty machine and finds the empty machine that can handle this document with file type mapping table;
In conclusion large-scale exploration and development large data files are stored in cloud storage system by the present invention, it will by empty machine Large-scale professional analysis software shares to multiple professional researchers, and large-scale exploration and development large data files are passed through real time propelling movement Technology is automatically downloaded in empty machine use for interpretation software, provides an a kind of station using exploration and development big data as core Formula solution;Have many advantages, such as that data processing speed is fast, stability is high, highly practical, applied widely.
It should be noted that the foregoing is merely presently preferred embodiments of the present invention, it is not intended to limit the invention, it is all at this Within the spirit and principle of invention, any modifications, equivalent replacements and improvements are made should be included in the protection model of the present invention Within enclosing.

Claims (8)

1. the large data files analysis and processing method under a kind of cloud computing environment, which is characterized in that the described method includes following steps Suddenly:
Exploration and development large data files are uploaded in cloud storage system by S1, file uploading module;
S2, will exploration and development specialty analysis handle software installation to cloud computing environment under empty machine in;
S3, agent softwares are installed in each empty machine;
Then file ID number is sent to message forwarding agency by S4, user's select file ID number by Data Analysis Services module;
The message channel of oneself in agent softwares monitoring information forwarding agency in S5, empty machine, receives Data Analysis Services module The file ID in cloud storage system is obtained after the message of transmission;
Agent softwares in S6, empty machine search the file in cloud storage system according to file ID, are downloaded after finding corresponding file Exploration and development large data files are handled to the exploration and development specialty analysis in the void machine under software catalog;
Agent softwares in S7, empty machine start the exploration and development specialty analysis processing software installed in empty machine;
S8, user connect empty machine by Data Analysis Services module, and handle and survey in exploration and development specialty analysis handles software Visit exploitation large data files;
By treated, data file is uploaded in cloud storage system agent in S9, empty machine.
2. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S1, exploration and development large data files are first carried out burst processing by the file uploading module file uploading module;Then Whole slicing files and total number of files are uploaded to the server-side of file uploading module by file uploading module client;File uploads Slicing files are polymerized to a file by the server-side of module again;The server-side of final act uploading module calls cloud storage system Interface writes file in cloud storage system.
3. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S1, the size of the exploration and development large data files is more than 100GB.
4. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S2, the void machine is mounted with the operating system of including but not limited to the following:Windows operating system, Linux behaviour Make system.
5. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S2, the specialty analysis processing software package includes but is not limited to the following:EPoffice、Petrel.
6. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S3, the agent softwares be can cross-platform operation program, there is automatic running and connection message after the startup of empty machine Forwarding agency, monitoring information passage read message, obtain file ID, download this document from cloud storage system automatically, then automatically The function being stored under specified directory.
7. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S4, the Data Analysis Services module is with the file ID number to analyze for selecting user, is sent to and disappears The module of function in the empty machine message channel that can handle this document of breath forwarding agency.
8. the large data files analysis and processing method under cloud computing environment according to claim 1, which is characterized in that described In step S4, the Data Analysis Services module includes an empty machine and the correspondence of exploration and development large data files type Table after Data Analysis Services module is according to exploration and development large data files type search to corresponding empty machine, will include file ID Number the message specified empty machine that is sent to message forwarding agency message channel in.
CN201711380414.1A 2017-12-20 2017-12-20 A kind of large data files analysis and processing method under cloud computing environment Pending CN108108226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711380414.1A CN108108226A (en) 2017-12-20 2017-12-20 A kind of large data files analysis and processing method under cloud computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711380414.1A CN108108226A (en) 2017-12-20 2017-12-20 A kind of large data files analysis and processing method under cloud computing environment

Publications (1)

Publication Number Publication Date
CN108108226A true CN108108226A (en) 2018-06-01

Family

ID=62210378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711380414.1A Pending CN108108226A (en) 2017-12-20 2017-12-20 A kind of large data files analysis and processing method under cloud computing environment

Country Status (1)

Country Link
CN (1) CN108108226A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656682A (en) * 2018-12-03 2019-04-19 中国石油化工股份有限公司 A kind of system and method for the exploration and development big data processing platform based on container technique
CN111125050A (en) * 2019-12-26 2020-05-08 浪潮云信息技术有限公司 CephFS-based file storage method for providing NFS protocol in openstack environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789477A (en) * 2011-05-19 2012-11-21 巴比禄股份有限公司 File managing apparatus for processing an online storage service
CN103257958A (en) * 2012-02-16 2013-08-21 中兴通讯股份有限公司 Cloud storage based translating method and system
CN106020902A (en) * 2016-05-31 2016-10-12 浪潮(北京)电子信息产业有限公司 Virtual machine mirror image file management method and system applied to cloud platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789477A (en) * 2011-05-19 2012-11-21 巴比禄股份有限公司 File managing apparatus for processing an online storage service
CN103257958A (en) * 2012-02-16 2013-08-21 中兴通讯股份有限公司 Cloud storage based translating method and system
CN106020902A (en) * 2016-05-31 2016-10-12 浪潮(北京)电子信息产业有限公司 Virtual machine mirror image file management method and system applied to cloud platform

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656682A (en) * 2018-12-03 2019-04-19 中国石油化工股份有限公司 A kind of system and method for the exploration and development big data processing platform based on container technique
CN111125050A (en) * 2019-12-26 2020-05-08 浪潮云信息技术有限公司 CephFS-based file storage method for providing NFS protocol in openstack environment
CN111125050B (en) * 2019-12-26 2023-08-22 浪潮云信息技术股份公司 File storage method based on CephFS to provide NFS protocol in openstack environment

Similar Documents

Publication Publication Date Title
US10210115B2 (en) System for handling event messages for file collaboration
CN110531987A (en) Management method, device and computer readable storage medium based on Kubernetes cluster
US8775485B1 (en) Object store management operations within compute-centric object stores
CN103414759B (en) Network disk file transmission method and device
US20160253339A1 (en) Data migration systems and methods including archive migration
US11409756B1 (en) Creating and communicating data analyses using data visualization pipelines
TWI493465B (en) Method and system for distributed application stack deployment
US9807135B1 (en) Methods and computing systems for sharing cloud files using a social network
US20150339324A1 (en) System and Method for Imagery Warehousing and Collaborative Search Processing
US20130111336A1 (en) Platform and application independent system and method for networked file access and editing
US20100082713A1 (en) Method and system for attaching files to e-mail from backup copies remotely stored
US9020992B1 (en) Systems and methods for facilitating file archiving
US20130227116A1 (en) Determining optimal component location in a networked computing environment
CN102930218B (en) File management system and file management method
US11218544B1 (en) Tiered queuing system
US20210174238A1 (en) Machine learning inference calls for database query processing
US11361039B2 (en) Autodidactic phenological data collection and verification
CN114586020A (en) On-demand code obfuscation of data in an input path of an object storage service
CN108108226A (en) A kind of large data files analysis and processing method under cloud computing environment
CN105099735B (en) A kind of method and system for obtaining magnanimity more detailed logging
Vianna et al. A tool for personal data extraction
US20230101774A1 (en) Techniques for performing clipboard-to-file paste operations
US9059870B1 (en) Techniques for managing electronic message distribution
US9984088B1 (en) User driven data pre-fetch
CN108259543A (en) Distributed cloud storage database and its be deployed in the method for multiple data centers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180601

RJ01 Rejection of invention patent application after publication