CN114185484A - Method, device, equipment and medium for clustering document storage - Google Patents

Method, device, equipment and medium for clustering document storage Download PDF

Info

Publication number
CN114185484A
CN114185484A CN202111297292.6A CN202111297292A CN114185484A CN 114185484 A CN114185484 A CN 114185484A CN 202111297292 A CN202111297292 A CN 202111297292A CN 114185484 A CN114185484 A CN 114185484A
Authority
CN
China
Prior art keywords
file
server
user
data
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111297292.6A
Other languages
Chinese (zh)
Inventor
张辉
吴桂荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Centerm Information Co Ltd
Original Assignee
Fujian Centerm Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Centerm Information Co Ltd filed Critical Fujian Centerm Information Co Ltd
Priority to CN202111297292.6A priority Critical patent/CN114185484A/en
Publication of CN114185484A publication Critical patent/CN114185484A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention provides a method, a device, equipment and a medium for clustering document storage, wherein the method comprises the following steps: and (3) file storage process: receiving a file uploaded by a user through a nginx server, forwarding the file to any one document system server in a rear-end document system server cluster according to a load balancing principle, storing the file in each Hadoop server of the Hadoop server cluster in a data block distribution mode, and recording the number, sequence and storage address of the data blocks through a NameNode component; and (3) file downloading process: and receiving a file downloading request of a user through the nginx server, forwarding the file downloading request to any one of the document system servers in the document system server cluster according to a load balancing principle, requesting the Hadoop server cluster to acquire the file, acquiring all data blocks corresponding to the file according to the storage address of each data block, recombining the file and returning the file to the user.

Description

Method, device, equipment and medium for clustering document storage
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for clustering document storage.
Background
Normally, the document is directly stored in the local storage of the server, and is stored in a single point, and there is only one file backup (i.e. the file itself), so that if the server is down or the memory is damaged, the file is inevitably lost, which may cause irreparable results.
On the basis, a hot standby server is generally developed to synchronize file data on a main server in real time, for example, file comparison and synchronization are realized by using rsync, and the backup server can take over the main server to provide file data service once the main server is unavailable or storage is damaged. However, in the prior art, the scheme of real-time master-slave synchronization is adopted, which is not real synchronization actually, data still has a great risk of loss, data synchronization has a certain interval, and the master-slave mode increases the data maintenance cost.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, an apparatus, a device and a medium for clustering document storage, wherein a data block form is adopted to store completely distributed on at least two machines of a Hadoop server cluster, each data block can have at least two data blocks, data synchronization is not required, and distributed storage of data is realized by software.
In a first aspect, the present invention provides a method for clustering document storage, including:
and (3) file storage process: receiving a file uploaded by a user through the nginx server, and forwarding the file to any one document system server in the document system server cluster at the rear end according to a load balancing principle; the file system server uniformly stores files into a Hadoop server cluster, the files are distributed and stored on each Hadoop server of the Hadoop server cluster in a data block mode, the number of the data blocks, the sequence of each data block and storage addresses are recorded through a NameNode component, and each data block is backed up in at least two Hadoop servers at the same time;
and (3) file downloading process: displaying a file list through the nginx server, receiving a file downloading request of a user, and forwarding the file downloading request to any one document system server in the document system server cluster according to a load balancing principle; and requesting a Hadoop server cluster to acquire the file by the document system server, acquiring all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizing the file according to the number of the data blocks and the sequence of each data block, and returning the file to a user through the nginx server. .
In a second aspect, the present invention provides a document storage clustering apparatus, including:
the nginx service module is used for receiving the file uploaded by the user through the nginx server, displaying a file list through the nginx server, receiving a file downloading request of the user, and forwarding the file downloading request to any one document system server in the document system server cluster at the rear end according to a load balancing principle;
the file system server comprises a file service module, a data block backup module and a data block backup module, wherein the file service module is used for storing files on each Hadoop server of a Hadoop server cluster in a data block mode in a distributed mode when the files are uploaded, and recording the number of data blocks, the sequence of each data block and a storage address through a NameNode component, wherein each data block is backed up in at least two Hadoop servers at the same time; and when downloading the file, the file system server requests the Hadoop server cluster to acquire the file, acquires all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizes the file according to the number of the data blocks and the sequence of each data block, and returns the file to the user through the nginx server.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of the first aspect when executing the program.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect.
One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages: files are stored on each Hadoop server of the Hadoop server cluster in a distributed mode in a data block mode, so that the file servers are clustered, the concurrency and the throughput of users of the system can be improved, and the performance is greatly improved; the distributed storage of the files also ensures the high availability and integrity of the data; due to distributed storage of the data blocks, files cannot be directly read on the Hadoop server cluster, and only can be read through the document system server according to the storage address, so that the safety of the data is greatly improved; and each data block is backed up on a plurality of Hadoop servers, so that the safety and high availability of the data are improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a schematic block diagram of the system of the present invention;
FIG. 2 is a flow chart of a method according to one embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the invention;
fig. 5 is a schematic structural diagram of a medium according to a fourth embodiment of the present invention.
Detailed Description
The embodiment of the application provides a document storage clustering method, a document storage clustering device and a document storage clustering medium, the document storage clustering method, the document storage clustering device and the document storage clustering medium are completely stored on at least two machines of a Hadoop server cluster in a distributed mode, each data block can have at least two data blocks, data synchronization is not needed, files cannot be directly read on the Hadoop server cluster, and data safety is greatly guaranteed.
The technical scheme in the embodiment of the application has the following general idea: files are stored on each Hadoop server of the Hadoop server cluster in a distributed mode in a data block mode, so that the file servers are clustered, the concurrency and the throughput of users of the system can be improved, and the performance is greatly improved; the distributed storage of the files also ensures the high availability and integrity of the data; due to distributed storage of the data blocks, files cannot be directly read on the Hadoop server cluster, and only can be read through the document system server, so that the safety of the data is greatly improved; and each data block is backed up on a plurality of Hadoop servers, so that the safety and high availability of the data are improved.
Before describing the specific embodiment, a system framework corresponding to the method of the embodiment of the present application is described, and as shown in fig. 1, the system is divided into the following parts:
and the nginx server receives the user request in a unified way and forwards the user request to the back-end document server.
The document system server cluster comprises at least two document servers, is used for uniformly processing daily requests of users, bears file operation services and is connected with each data server according to service properties, such as a Hadoop server cluster, a database server, a document conversion server cluster, a redis cache server cluster and the like.
The Hadoop server cluster comprises at least two Hadoop servers, and is used for storing file data blocks and ensuring distributed storage of files and high availability of the Hadoop cluster. Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing details of a distributed bottom layer, and high-speed operation and storage are performed by fully utilizing the power of the cluster. Hadoop implements a Distributed File System (Distributed File System), where one component is HDFS (Hadoop Distributed File System). HDFS has the characteristic of high fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware; and it can provide high throughput (highthroughput) to access the data of the application program, suitable for the application program with huge data set (large data set). HDFS relaxes the requirements of (relax) POSIX (Portable Operating System Interface of unix), and can access data in a streaming access file System in a streaming format.
The database main and standby system is used for storing service data of users, two servers are deployed, namely a main database server and a standby database server, so that single-point failure is avoided, and high availability is guaranteed.
The document conversion server cluster can be one or more document conversion servers and is used for deploying document conversion services as services required by a user for previewing files;
the redis cache server cluster is used for cluster sharing of session of a user and comprises at least two redis cache servers, user session information is stored, and the session information of the user can be read from the uniform redis cache server cluster no matter which redis cache server the request of the user is forwarded to by the document system server cluster, so that the user request can be sent to any machine in the redis cache server cluster without being forced to log in again. Because the system of the invention is a cluster distributed architecture, each request of a user can be forwarded to different document servers through the nginx server, if the user logs in and is verified on the document server A and directly stores the session information on the document server A, the next request is sent to the document server B, and the document server B does not have the session information stored by the user, the user fails to check and the re-login information is returned, so the session information logged in by the user is stored in a shared redis server cluster, each document server acquires the session information of the user from the redis server cluster, the session cluster sharing can be realized, and the session cluster can not be forcibly re-logged in.
Example one
As shown in fig. 2, the present embodiment provides a method for clustering document storage, including:
and (3) file storage process: receiving a file uploaded by a user through the nginx server, and forwarding the file to any one document system server in the document system server cluster at the rear end according to a load balancing principle; the file system server uniformly stores files into a Hadoop server cluster, the files are distributed and stored on each Hadoop server of the Hadoop server cluster in a data block mode, the number of the data blocks, the sequence of each data block and storage addresses are recorded through a NameNode component, and each data block has backup in a plurality of Hadoop servers at the same time; therefore, when a certain storage server is hung, the data blocks can be read from other servers, the data of the certain server is guaranteed to be damaged, and backup can be found out for recovery in other servers.
And (3) file downloading process: displaying a file list through the nginx server, receiving a file downloading request of a user, and forwarding the file downloading request to any one document system server in the document system server cluster according to a load balancing principle; and requesting a Hadoop server cluster to acquire the file by the document system server, acquiring all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizing the file according to the number of the data blocks and the sequence of each data block, and returning the file to a user through the nginx server. The load balancing principle, namely the Nginx load balancing principle, has four configurations, including ip hash, polling, weight and minimum connection, and the configuration supports manual configuration modification.
As a more preferred or specific implementation manner of this embodiment, the method further includes a data block definition process, a user session management process, a file preview service process, and a data recovery process.
The data block definition process further includes:
(1) a manual configuration process, namely, a manual configuration interface is provided by the nginx server for a user to manually configure the number of data blocks of one file, and the size of the data blocks is the quotient of the size of the file data and the number of the data blocks; for example: if the number of the manually configured data blocks is 6, and the size of the data block is 5MB, the file is divided into 6 data blocks with the same size to be stored.
(2) The size of the data block is automatically set according to the balance principle of the data transmission time and the addressing time of the disk, wherein the balance principle is that the addressing time is 1 percent of the data transmission time, namely the optimal data transmission time TcIs the average addressing time TxAnd 100 times, the calculation mode of the size of the data block is as follows:
Figure BDA0003336965370000061
in the formula, VcIs the prevailing data transmission speed.
Generally, the number of file data blocks can be manually configured, since the addressing speed and the transmission speed are affected by the size of the data blocks, if the data blocks are set to be too large, the time for transmitting the data from the disk is obviously longer than the addressing time, so that a program can be very slow to process the data blocks; if the setting is too small, on one hand, one file can be divided into a large number of small files, a large number of memories in a NameNode component of Hadoop can be occupied for storing metadata when the large number of small files are stored, and the memories of the NameNode are limited, so that the NameNode is not preferable; on the other hand, since the data block is too small, the addressing time increases, causing the program to always find the start of the block. Therefore, the block size is set to be larger to reduce the addressing time, and the time for transmitting a file composed of a plurality of blocks depends on the transmission speed of the disk. Therefore, the automatic configuration process is automatically set according to the balance principle of the data transmission time and the addressing time of the disk.
For example: average addressing time T in HDFS of HadoopxApproximately 10 ms; through a large number of tests, the method finds thatThe optimum state is achieved when the addressing time is 1% of the transmission time, so the optimum transmission time TcComprises the following steps: 10ms/0.01 ═ 1000s ═ 1s, the transmission speed V prevailing in the current diskcAnd the optimal block size is calculated to be 100 MB/s: 100MB/s 1 s-100 MB; the block size can be set to 128 MB.
User session management process: before a user uploads or downloads a file, a login request of the user is received through the nginx server, and the login request is forwarded to any one of the document system servers in the document system server cluster at the rear end according to a load balancing principle; the document system server acquires the user information from the main and standby database systems for verification; if the verification is successful, storing the login information of the user in the redis server cluster as a session information certificate of the user, and finally returning login success information; after login is successful, the user file list is usually acquired through the Jacbrabbit component, and the user file list is displayed, so that a user can select files to download.
The redis server cluster is cluster sharing of user session, and the document system server can read session information of the user from the uniform redis cache server cluster no matter which document system server the login request of the user is forwarded to by the nginx server.
File preview service process: in the file storage process or the file downloading process, receiving a file preview request of a user through the nginx server, and forwarding the file preview request to any one of the document system servers in the document system server cluster at the rear end according to a load balancing principle; the document system server forwards all data blocks corresponding to the file to a document conversion server cluster according to a load balancing principle, and the document conversion server cluster performs format conversion on the data blocks into a format capable of being previewed and performs preview display;
and (3) data recovery process: as described above, because each data block has backup in multiple Hadoop servers at the same time, when data of a certain Hadoop server in the Hadoop server cluster is damaged, the document system server reads the damaged data block from other Hadoop servers according to the storage address of the data block recorded by the NameNode component to recover the damaged data block.
Based on the same inventive concept, the application also provides a device corresponding to the method in the first embodiment, which is detailed in the second embodiment.
Example two
As shown in fig. 3, in this embodiment, an apparatus for clustering document storage is provided, including:
the nginx service module is used for receiving the file uploaded by the user through the nginx server, displaying a file list through the nginx server, receiving a file downloading request of the user, and forwarding the file downloading request to any one document system server in the document system server cluster according to a load balancing principle;
the file system server comprises a file service module, a data block backup module and a data block backup module, wherein the file service module is used for storing files on each Hadoop server of a Hadoop server cluster in a data block mode in a distributed mode when the files are uploaded, and recording the number of data blocks, the sequence of each data block and a storage address through a NameNode component, wherein each data block has backup in a plurality of Hadoop servers at the same time; and when downloading the file, the file system server requests the Hadoop server cluster to acquire the file, acquires all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizes the file according to the number of the data blocks and the sequence of each data block, and returns the file to the user through the nginx server.
As a more preferred or specific implementation manner of this embodiment, the apparatus further includes: a data block definition module, a user session management module, a file preview service module and a data recovery module,
the data block definition module is used for providing the following processes:
(1) a manual configuration process, namely, a manual configuration interface is provided by the nginx server for a user to manually configure the number of data blocks of one file, and the size of the data blocks is the quotient of the size of the file data and the number of the data blocks;
(2) the size of the data block is automatically set according to the balance principle of the data transmission time and the addressing time of the disk, wherein the balance principle is that the addressing time is 1 percent of the data transmission time, namely the optimal data transmission time TcIs the average addressing time TxAnd 100 times, the calculation mode of the size of the data block is as follows:
Figure BDA0003336965370000081
in the formula, VcIs the prevailing data transmission speed.
The user session management module is used for receiving a login request of a user through the nginx server before the user uploads or downloads a file, and forwarding the login request to any one document system server in the document system server cluster at the rear end according to a load balancing principle; the document system server acquires the user information from the main and standby database systems for verification; if the verification is successful, storing the login information of the user in the redis server cluster as a session information certificate of the user, and finally returning login success information;
the redis server cluster is cluster sharing of user session, and the document system server can read session information of the user from the uniform redis cache server cluster no matter which document system server the login request of the user is forwarded to by the nginx server.
The file preview service module is used for receiving a file preview request of a user through the nginx server in the file storage process or the file downloading process and forwarding the file preview request to any one of the document system servers in the rear-end document system server cluster according to a load balancing principle; the document system server forwards all data blocks corresponding to the file to a document conversion server cluster according to a load balancing principle, and the document conversion server cluster performs format conversion on the data blocks into a format capable of being previewed and performs preview display;
and the data recovery module is used for reading the damaged data block from other Hadoop servers by the document system server according to the storage address of the data block recorded by the NameNode component to recover the damaged data block when the data of a certain Hadoop server in the Hadoop server cluster is damaged.
Since the apparatus described in the second embodiment of the present invention is an apparatus used for implementing the method of the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the apparatus, and thus the details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.
Based on the same inventive concept, the application provides an electronic device embodiment corresponding to the first embodiment, which is detailed in the third embodiment.
EXAMPLE III
The embodiment provides an electronic device, as shown in fig. 4, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, any one of the first embodiment modes may be implemented.
Since the electronic device described in this embodiment is a device used for implementing the method in the first embodiment of the present application, based on the method described in the first embodiment of the present application, a specific implementation of the electronic device in this embodiment and various variations thereof can be understood by those skilled in the art, and therefore, how to implement the method in the first embodiment of the present application by the electronic device is not described in detail herein. The equipment used by those skilled in the art to implement the methods in the embodiments of the present application is within the scope of the present application.
Based on the same inventive concept, the application provides a storage medium corresponding to the fourth embodiment, which is described in detail in the fourth embodiment.
Example four
The present embodiment provides a computer-readable storage medium, as shown in fig. 5, on which a computer program is stored, and when the computer program is executed by a processor, any one of the embodiments can be implemented.
The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages: files are stored on each Hadoop server of the Hadoop server cluster in a distributed mode in a data block mode, so that the file servers are clustered, the concurrency and the throughput of users of the system can be improved, and the performance is greatly improved; the distributed storage of the files also ensures the high availability and integrity of the data; due to distributed storage of the data blocks, files cannot be directly read on the Hadoop server cluster, and only can be read through the document system server according to the storage address, so that the safety of the data is greatly improved; and each data block is backed up on a plurality of Hadoop servers, so that the safety and high availability of the data are improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus or system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (10)

1. A method of document storage clustering, characterized by: the method comprises the following steps:
and (3) file storage process: receiving a file uploaded by a user through the nginx server, and forwarding the file to any one document system server in the document system server cluster at the rear end according to a load balancing principle; the file system server uniformly stores files into a Hadoop server cluster, the files are distributed and stored on each Hadoop server of the Hadoop server cluster in a data block mode, the number of the data blocks, the sequence of each data block and storage addresses are recorded through a NameNode component, and each data block is backed up in at least two Hadoop servers at the same time;
and (3) file downloading process: displaying a file list through the nginx server, receiving a file downloading request of a user, and forwarding the file downloading request to any one document system server in the document system server cluster according to a load balancing principle; and requesting a Hadoop server cluster to acquire the file by the document system server, acquiring all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizing the file according to the number of the data blocks and the sequence of each data block, and returning the file to a user through the nginx server.
2. The method of document storage clustering of claim 1, wherein: also included is a data block definition process, further comprising:
(1) a manual configuration process, namely, a manual configuration interface is provided by the nginx server for a user to manually configure the number of data blocks of one file, and the size of the data blocks is the quotient of the size of the file data and the number of the data blocks;
(2) the size of the data block is automatically set according to the balance principle of the data transmission time and the addressing time of the disk, wherein the balance principle is that the addressing time is 1 percent of the data transmission time, namely the optimal data transmission time TcIs the average addressing time TxAnd 100 times, the calculation mode of the size of the data block is as follows:
Figure FDA0003336965360000011
in the formula, VcIs the prevailing data transmission speed.
3. The method of document storage clustering of claim 1, wherein: further comprising:
user session management process: before a user uploads or downloads a file, a login request of the user is received through the nginx server, and the login request is forwarded to any one of the document system servers in the document system server cluster at the rear end according to a load balancing principle; the document system server acquires the user information from the main and standby database systems for verification; if the verification is successful, storing the login information of the user in the redis server cluster as a session information certificate of the user, and finally returning login success information;
the redis server cluster is cluster sharing of user session, and the document system server can read session information of the user from the uniform redis cache server cluster no matter which document system server the login request of the user is forwarded to by the nginx server.
4. The method of document storage clustering of claim 1, wherein: further comprising:
file preview service process: in the file storage process or the file downloading process, receiving a file preview request of a user through the nginx server, and forwarding the file preview request to any one of the document system servers in the document system server cluster at the rear end according to a load balancing principle; the document system server forwards all data blocks corresponding to the file to a document conversion server cluster according to a load balancing principle, and the document conversion server cluster performs format conversion on the data blocks into a format capable of being previewed and performs preview display;
and (3) data recovery process: when data of a certain Hadoop server in the Hadoop server cluster is damaged, the document system server reads the damaged data block from other Hadoop servers according to the storage address of the data block recorded by the NameNode component and recovers the damaged data block.
5. An apparatus for document storage clustering, characterized in that: the method comprises the following steps:
the nginx service module is used for receiving the file uploaded by the user through the nginx server, displaying a file list through the nginx server, receiving a file downloading request of the user, and forwarding the file downloading request to any one document system server in the document system server cluster at the rear end according to a load balancing principle;
the file system server comprises a file service module, a data block backup module and a data block backup module, wherein the file service module is used for storing files on each Hadoop server of a Hadoop server cluster in a data block mode in a distributed mode when the files are uploaded, and recording the number of data blocks, the sequence of each data block and a storage address through a NameNode component, wherein each data block is backed up in at least two Hadoop servers at the same time; and when downloading the file, the file system server requests the Hadoop server cluster to acquire the file, acquires all data blocks corresponding to the file from each Hadoop server in a streaming mode according to the storage address of each data block recorded by the NameNode component, re-synthesizes the file according to the number of the data blocks and the sequence of each data block, and returns the file to the user through the nginx server.
6. The apparatus of claim 5, wherein: further comprising a data block definition module for providing the following procedures:
(1) a manual configuration process, namely, a manual configuration interface is provided by the nginx server for a user to manually configure the number of data blocks of one file, and the size of the data blocks is the quotient of the size of the file data and the number of the data blocks;
(2) the size of the data block is automatically set according to the balance principle of the data transmission time and the addressing time of the disk, wherein the balance principle is that the addressing time is 1 percent of the data transmission time, namely the optimal data transmission time TcIs the average addressing time TxAnd 100 times, the calculation mode of the size of the data block is as follows:
Figure FDA0003336965360000031
in the formula, VcIs the prevailing data transmission speed.
7. The apparatus of claim 5, wherein: further comprising:
the user session management module is used for receiving a login request of a user through the nginx server before the user uploads or downloads a file, and forwarding the login request to any one document system server in the document system server cluster at the rear end according to a load balancing principle; the document system server acquires the user information from the main and standby database systems for verification; if the verification is successful, storing the login information of the user in the redis server cluster as a session information certificate of the user, and finally returning login success information;
the redis server cluster is cluster sharing of user session, and the document system server can read session information of the user from the uniform redis cache server cluster no matter which document system server the login request of the user is forwarded to by the nginx server.
8. The apparatus of claim 5, wherein: further comprising:
the file preview service module is used for receiving a file preview request of a user through the nginx server in the file storage process or the file downloading process and forwarding the file preview request to any one of the document system servers in the rear-end document system server cluster according to a load balancing principle; the document system server forwards all data blocks corresponding to the file to a document conversion server cluster according to a load balancing principle, and the document conversion server cluster performs format conversion on the data blocks into a format capable of being previewed and performs preview display;
and the data recovery module is used for reading the damaged data block from other Hadoop servers by the document system server according to the storage address of the data block recorded by the NameNode component to recover the damaged data block when the data of a certain Hadoop server in the Hadoop server cluster is damaged.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202111297292.6A 2021-11-04 2021-11-04 Method, device, equipment and medium for clustering document storage Withdrawn CN114185484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111297292.6A CN114185484A (en) 2021-11-04 2021-11-04 Method, device, equipment and medium for clustering document storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111297292.6A CN114185484A (en) 2021-11-04 2021-11-04 Method, device, equipment and medium for clustering document storage

Publications (1)

Publication Number Publication Date
CN114185484A true CN114185484A (en) 2022-03-15

Family

ID=80601868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111297292.6A Withdrawn CN114185484A (en) 2021-11-04 2021-11-04 Method, device, equipment and medium for clustering document storage

Country Status (1)

Country Link
CN (1) CN114185484A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905131A (en) * 2022-11-09 2023-04-04 中国人民解放军32039部队 Method and system for processing space flight measurement and control retransmission data
CN116094804A (en) * 2023-01-10 2023-05-09 广东红餐科技有限公司 Method for avoiding repeated login of user based on Lvs server cluster
CN116760835A (en) * 2023-08-15 2023-09-15 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294585A (en) * 2016-07-28 2017-01-04 四川新环佳科技发展有限公司 A kind of storage method under cloud computing platform
US20180101525A1 (en) * 2016-10-11 2018-04-12 Canon Kabushiki Kaisha Information processing apparatus, document display method, document display system, and medium
CN108989359A (en) * 2018-10-12 2018-12-11 苏州创旅天下信息技术有限公司 Method for verifying login and system, the readable storage medium storing program for executing and terminal of server cluster
CN110191128A (en) * 2019-05-30 2019-08-30 山东浪潮商用系统有限公司 A kind of tax shared file system and implementation method based on HDFS
CN112800019A (en) * 2021-03-03 2021-05-14 国网甘肃省电力公司 Data backup method and system based on Hadoop distributed file system
CN113032357A (en) * 2021-04-29 2021-06-25 中国工商银行股份有限公司 File storage method and device and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294585A (en) * 2016-07-28 2017-01-04 四川新环佳科技发展有限公司 A kind of storage method under cloud computing platform
US20180101525A1 (en) * 2016-10-11 2018-04-12 Canon Kabushiki Kaisha Information processing apparatus, document display method, document display system, and medium
CN108989359A (en) * 2018-10-12 2018-12-11 苏州创旅天下信息技术有限公司 Method for verifying login and system, the readable storage medium storing program for executing and terminal of server cluster
CN110191128A (en) * 2019-05-30 2019-08-30 山东浪潮商用系统有限公司 A kind of tax shared file system and implementation method based on HDFS
CN112800019A (en) * 2021-03-03 2021-05-14 国网甘肃省电力公司 Data backup method and system based on Hadoop distributed file system
CN113032357A (en) * 2021-04-29 2021-06-25 中国工商银行股份有限公司 File storage method and device and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范旭辉: "基于Hadoop的工业大数据存储分析系统", 《科技创新与应用》, pages 1 - 4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905131A (en) * 2022-11-09 2023-04-04 中国人民解放军32039部队 Method and system for processing space flight measurement and control retransmission data
CN115905131B (en) * 2022-11-09 2023-10-27 中国人民解放军32039部队 Method and system for processing aerospace measurement and control retransmission data
CN116094804A (en) * 2023-01-10 2023-05-09 广东红餐科技有限公司 Method for avoiding repeated login of user based on Lvs server cluster
CN116094804B (en) * 2023-01-10 2023-09-08 广东红餐科技有限公司 Method for avoiding repeated login of user based on Lvs server cluster
CN116760835A (en) * 2023-08-15 2023-09-15 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium
CN116760835B (en) * 2023-08-15 2023-10-20 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium

Similar Documents

Publication Publication Date Title
JP6723329B2 (en) System, method, and computer readable storage medium for customizable event-triggered calculations at edge locations
CN114185484A (en) Method, device, equipment and medium for clustering document storage
CN110532247B (en) Data migration method and data migration system
WO2017050141A1 (en) Distributed storage-based file delivery system and method
US9560165B2 (en) BT offline data download system and method, and computer storage medium
US20150237113A1 (en) Method and system for file transmission
WO2017096968A1 (en) Log uploading method and apparatus
WO2016138474A1 (en) Data migration systems and methods including archive migration
CN106156359A (en) A kind of data synchronization updating method under cloud computing platform
CN111694791B (en) Data access method and device in distributed basic framework
CN104348859B (en) File synchronisation method, device, server, terminal and system
CN107368369B (en) Distributed container management method and system
CN111182067A (en) Data writing method and device based on interplanetary file system IPFS
CN112202853B (en) Data synchronization method, system, computer device and storage medium
CN102255866A (en) Method and device for downloading data
CN103077034A (en) JAVA application migration method and system for hybrid virtualization platform
CN112866406B (en) Data storage method, system, device, equipment and storage medium
CN110620798B (en) Control method, system, equipment and storage medium for FTP connection
CN114401261A (en) File downloading method and device
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN112667393B (en) Method and device for building distributed task computing scheduling framework and computer equipment
CN112351098B (en) Copying service cluster system, control method, device and medium
CN112364026A (en) Distributed data fragment storage and reading method
CN106844058B (en) Management method and device for virtualized resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220315

WW01 Invention patent application withdrawn after publication