CN106776720A - A kind of document handling method and device - Google Patents

A kind of document handling method and device Download PDF

Info

Publication number
CN106776720A
CN106776720A CN201611022842.2A CN201611022842A CN106776720A CN 106776720 A CN106776720 A CN 106776720A CN 201611022842 A CN201611022842 A CN 201611022842A CN 106776720 A CN106776720 A CN 106776720A
Authority
CN
China
Prior art keywords
file
pending
essential information
treatment
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611022842.2A
Other languages
Chinese (zh)
Inventor
卢加磊
余晓兵
唐泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201611022842.2A priority Critical patent/CN106776720A/en
Publication of CN106776720A publication Critical patent/CN106776720A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of document handling method and device are the embodiment of the invention provides, method therein is specifically included:Receive document processing request;File directory belonging to the document processing request pending file of correspondence is locked;The essential information of the pending file is stored in the first file system, and the file content of the pending file is stored in the second file system;The essential information of the pending file is read from the first distributed file system;The essential information includes:Store path of the file in the second distributed file system;According to the essential information of the pending file, the document processing request is processed.The embodiment of the present invention can improve the treatment effeciency of file, and then can improve the access performance of application on site program;Also, the embodiment of the present invention can be supported concurrently to access, and then improve the handling capacity of application on site program.

Description

A kind of document handling method and device
Technical field
The present invention relates to technical field of memory, more particularly to a kind of document handling method and a kind of document handling apparatus.
Background technology
With the development and the arrival in big data epoch of Internet technology, various cloud computing services, mesh are occurred in that Preceding cloud computing service can be stored in distributed file system the file of magnanimity, and provides file acquisition to application program Service.For the application on site program for needing real-time processing, the acquisition performance of file can be direct in distributed file system Have influence on the response time of its corresponding requests, thus the acquisition performance of file in distributed file system is proposed it is higher will Ask.
HDFS (Hadoop distributed file systems, Hadoop Distributed File System) has high fault tolerance The characteristics of, data are disperseed storage on many computing devices by it in the form of one or more copies, can store magnanimity number According to, and reliability is high, there is provided access quick to data, expansible, it is adaptable to write-once, the access mould for repeatedly reading Formula.
Inventor has found that existing HDFS is to enumerate mode by listed files to obtain during embodiment is of the invention File.Specifically, the name node of HDFS needs relation and data block and the number between maintenance documentation catalogue and data block According to the relation between node, so, it is necessary to according to file directory and data after the file access request from client is received Relational query between block constitutes the data block list of file to be visited, and according to the relation between the data block and back end Inquire about the data block list to be stored on which back end, then read from the back end that inquiry is obtained to be visited The data of file.It can be seen that, the process that existing HDFS obtains file is relatively complicated, therefore have impact on the acquisition efficiency of file, enters And have impact on the response performance of application on site program.
Inventor during embodiment is of the invention it has also been found that, existing HDFS be suitable only for write-once, repeatedly The situation of inquiry, without the situation that support is concurrently write, will so influence the handling capacity of application on site program.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State a kind of document handling method and a kind of document handling apparatus of problem.
According to one aspect of the present invention, there is provided a kind of document handling method, including:
Receive document processing request;
File directory belonging to the document processing request pending file of correspondence is locked;The pending file Essential information be stored in the first file system, the file content of the pending file is stored in the second file system;
The essential information of the pending file is read from the first distributed file system;The essential information includes: Store path of the file in the second distributed file system;
According to the essential information of the pending file, the document processing request is processed.
Alternatively, the document processing request includes:File download is asked, described according to the basic of the pending file Information, the step for the treatment of the document processing request, including:
According to the store path, read from second distributed file system in the file of the pending file Hold;
Download the file content of the pending file.
Alternatively, the document processing request includes:File deletion requests, it is described according to the basic of the pending file Information, the step for the treatment of the document processing request, including:
According to the store path, deleted in second distributed file system in the file of the pending file Hold;
The essential information of the pending file is deleted in first distributed file system.
Alternatively, the document processing request includes:File status inquiry request, the essential information also includes file shape State information, the essential information according to the pending file, the step for the treatment of the document processing request, bag Include:
From the essential information of the pending file obtain file status information, and using the file status information as Result is returned.
Alternatively, the document processing request includes:Files passe is asked, and the essential information also includes:Upload and complete Mark, the essential information according to the pending file, the step for the treatment of the document processing request, including:
Mark is completed according to the upload in the essential information, judges whether to complete the upload of the pending file;
When it is determined that completing the upload of the pending file, the result for completing to upload is returned.
Alternatively, the essential information also includes:Treatment status indicator, the basic letter according to the pending file Breath, the step for the treatment of the document processing request, also includes:
When it is determined that not completing the upload of the pending file, sentence according to the treatment status indicator in the essential information The pending file that breaks is abnormal with the presence or absence for the treatment of;
When it is determined that the pending file has treatment exception, the pending file is carried out in the second file system Abnormality processing;
After the abnormality processing is completed, the storage state according to the pending file in the second file system judges Whether the pending file is successfully uploaded;
When it is determined that the pending file is successfully uploaded, the result for successfully uploading is returned.
Alternatively, the essential information according to the pending file, is processed the document processing request Step, also includes:
When it is determined that the pending file is abnormal in the absence for the treatment of or the determination pending file is not uploaded successfully When, the treatment status indicator in the essential information is set to exception, the pending file is write into the second file system System, and be set to normally the treatment status indicator in the essential information after being successfully written;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
Alternatively, methods described also includes:
When in the first distributed file system in the absence of the essential information of the pending file, by the essential information In treatment status indicator be set to exception, the pending file is write into second file system, and after being successfully written Treatment status indicator in the essential information is set to normally;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
Alternatively, methods described also includes:
Receive random scanning request;The major key of random scanning in need is carried in the random scanning request;
The major key respective file for needing random scanning is read from the first distributed file system distributed second Store path in file system;
According to the store path, the major key for needing random scanning is obtained from second distributed file system The file content of respective file;
The scanning of acquired file content is carried out using scanning tools.
According to another aspect of the present invention, there is provided a kind of document handling apparatus, including:
Request receiving module, for receiving document processing request;
Catalogue locks module, for being added to the file directory belonging to the document processing request pending file of correspondence Lock;The essential information of the pending file is stored in the first file system, and the file content of the pending file is stored in Second file system;
Data obtaining module, the essential information for reading the pending file from the first distributed file system; The essential information includes:Store path of the file in the second distributed file system;
Request processing module, for the essential information according to the pending file, is carried out to the document processing request Treatment.
A kind of document handling method and device according to embodiments of the present invention, when in the presence of the treatment for file (as accessed) During demand, store path of the file in the second distributed file system, phase can be obtained from the first distributed file system It is to enumerate mode by listed files to obtain file for existing HDFS, it is possible to achieve the quick positioning of pending file, because This can improve the treatment effeciency of file, and then can improve the access performance of application on site program.
Also, the embodiment of the present invention can be locked to the file directory belonging to pending file, so, concurrently visiting Ask the scene of (namely multi-user accesses) simultaneously under, above-mentioned locking treatment can cause that synchronization only has user's access point Cloth file, therefore the access conflict problem that multi-user access brings is can effectively solve the problem that, and then distributed document is effectively ensured Uniformity, therefore, the embodiment of the present invention can be supported concurrently to access, and then improve the handling capacity of application on site program.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention, And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter optional embodiment, various other advantages and benefit is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of optional embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
The step of Fig. 1 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 2 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 3 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 4 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 5 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 6 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 7 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;
The step of Fig. 8 shows a kind of document handling method according to an embodiment of the invention schematic flow sheet;And
Fig. 9 shows a kind of structural representation of document handling apparatus according to an embodiment of the invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
The embodiment of the present invention carries out the storage of file using the first distributed file system and the second distributed file system, Wherein, the first distributed file system can be used for the essential information of storage file, and such as file major key (key), file are at second point The information such as store path, file size, filename, file status information in cloth file system, the second distributed field system System can be used for storage file content, so, so, when there is treatment (as the accessed) demand for file, can be from first Store path of the file in the second distributed file system is obtained in distributed file system, is logical relative to existing HDFS Cross listed files and enumerate mode and obtain file, it is possible to achieve the quick positioning of pending file, therefore the place of file can be improved Reason efficiency, and then the access performance of application on site program can be improved.
Also, the embodiment of the present invention can be locked to the file directory belonging to pending file, so, concurrently visiting Ask the scene of (namely multi-user accesses) simultaneously under, above-mentioned locking treatment can cause that synchronization only has user's access point Cloth file, therefore the access conflict problem that multi-user access brings is can effectively solve the problem that, and then distributed document is effectively ensured Uniformity, therefore, the embodiment of the present invention can be supported concurrently to access, and then improve the handling capacity of application on site program.
Reference picture 1, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 101, reception document processing request;
The embodiment of the present invention can apply in the document handling system of service end, and this document processing system can be based on institute The distributed document of storage provides file process service to client, it is also possible to literary to the distribution for being stored according to service request Part is safeguarded.
In actual applications, the document processing request of client transmission can be received.Alternatively, above-mentioned document processing request At least one in following request can be included:
Files passe is asked:By the files passe of pre-set path to distributed file system;Result can include:It is complete Into upload, upload successfully or upload unsuccessfully etc.;
File status inquiry request:Inquire about the file status information of certain file;Result can be inquired about and obtained File status information;
File download is asked:Download certain file;Result can be to read the file for obtaining;
File deletion requests:Delete certain file;Result can fail to delete successfully or deleting.
It is appreciated that above-mentioned document processing request is intended only as alternative embodiment, in fact, those skilled in the art can be with According to practical application request, using various document processing requests, the embodiment of the present invention please for specific file process Ask and be not any limitation as.
In actual applications, can for every kind of document processing request set corresponding API (application programming interface, Application Programming Interface), and by all of api interface be integrated in SDK (SDK, Software Development Kit) in, for client call, and then produce corresponding document processing request.Alternatively, Above-mentioned api interface can be based on HTTP (HTTP, HyperText Transfer Protocol) agreement, for Client call.
Step 102, the file directory belonging to the document processing request pending file of correspondence is locked;It is described to treat The essential information for processing file is stored in the first file system, and the file content of the pending file is stored in the second file system System;
Above-mentioned locking treatment can cause that synchronization only has a user and accesses distributed document, therefore can effectively solve the problem that The access conflict problem that multi-user access brings, and then the uniformity of distributed document is effectively ensured.
The lock of the embodiment of the present invention can be distributed lock, and the distributed lock can have high-performance, avoid deadlock, supports The advantages of lock is reentried, to cause that the distributed lock will not turn into the performance bottleneck of system, and when the node for obtaining lock is hung not Other nodes can be caused cannot be continued forever.
In a kind of alternative embodiment of the invention, can by ZooKeeper (zooman) to the file at File directory belonging to the reason request pending file of correspondence is locked.ZooKeeper is a distributed, open source code Distributed application program coordination service, it can provide Consistency service for Distributed Application.ZooKeeper is simple comprising one Primitive collection, there is provided the interface of Java and C.The interface that distribution exclusively enjoys lock is provided in the code release of ZooKeeper.At this In a kind of application example of invention, it is assumed that Node A requests are locked, and it obtains ID1 through ZooKeeper registrations, also, node B please Locking is asked, it obtains ID2 through ZooKeeper registrations, then node A can obtain the ID of all nodes, judge the ID of itself most Small, in can be to be locked, and node B can obtain the ID of all nodes, judge that the ID of itself is not minimum, is then supervised ID is listened less than the altering event of the maximum node of self ID;Further, it is assumed that after lock is taken, carrying out file process please for node A The treatment asked, and release lock after processing is completed, namely delete own node;And node B is receiving the altering event of node A Afterwards, if judging, the ID of itself is minimum, can be locked.Wherein, above-mentioned node A, node B can be for processing at file Manage the node of request.
In another alternative embodiment of the invention, document processing request correspondence can be waited to locate by Redis File directory belonging to reason file is locked.Redis be a use ANSI C language increased income write, support network, can Based on internal memory also can persistence log type, key-Value (key-value pair) database.In a kind of application example of the invention, Lock can be set using setnx (key, the time kills time-out), if configured to work(, then directly take lock, if setting is unsuccessful, The value v1 (its expiration time is killed) of key is then obtained, v1 and current time are contrasted, see whether have timed, out, if time-out (is said The bright node for taking lock has been hung), (1) key, the time kill time-out to v2=getset, judges whether v2 is equal to v1, if phase Deng, then lock successfully, failure is otherwise locked, wait (200MS) after a time to retry again.
It is appreciated that above by ZooKeeper or Redis to the pending file institute of document processing request correspondence The process that the file directory of category is locked is intended only as alternative embodiment, in fact, those skilled in the art can be according to reality Border application demand, using other required locking instruments, such as memcached, memcached are distributed cache memories System, it carries add functions, is to be capable of achieving distributed lock using the characteristic of add functions.
Step 103, the essential information for reading from the first distributed file system the pending file;The basic letter Breath can include:Store path of the file in the second distributed file system;
In the embodiment of the present invention, the first distributed file system and the second distributed file system are different distribution texts Part system, those skilled in the art can select arbitrary two kinds of distributed file systems as the according to practical application request One distributed file system and the second distributed file system.For example, can be from GFS (google File System), HBase (Hadoop Database), HDFS, Lustre (blend of Linux and Cluster), Ceph (Linux PB grades of distributed text Part system), GridFS (GridFS), mogileFS (efficient file automated back-up component), TFS (Taobao FileSystem), FastDFS etc..
Alternatively, the first distributed file system and the second distributed file system may belong to same distributed system base Plinth framework, to improve the performance of distributed system architecture, and can also avoid the first distributed file system and second point There is compatibling problem between cloth file system and distributed system architecture.For example, the embodiment of the present invention can be used The Hbase and HDFS of Hadoop as the first distributed file system and the second distributed file system, specifically, can be by text Part content is stored in inside HDFS, to support the storage of big file and mass file, at the same time it can also by the essential information of file It is stored in Hbase.In one kind application example of the invention, it is assumed that the number of files for needing storage is 15,000,000, it is assumed that each text Within 1K, then it is 15G or so that Hbase needs the memory space for taking to the essential information size of part;Assuming that 15,000,000 files The size of file content is 50T, then in the case of 2 parts of backups of acquiescence, it is 150T or so that HDFS needs the memory space for taking.
So, while the embodiment of the present invention can be stored to mass file, by the MapReduce of Hadoop (MapReduce) carries out the full dose and random flyback ability of file.The HDFS of existing scheme enumerates mode and obtains by listed files The characteristic of file causes it can not to realize random scanning, and existing Hbase causes its complete the time-consuming more long of data access Amount scanning it is time-consuming will more than one week an iteration, and the embodiment of the present invention can be provided simultaneously with quick full dose and random flyback Ability.It is appreciated that the first distributed file system and the second distributed file system belong to same distributed system basis Framework is intended only as alternative embodiment, in fact, being not belonging to the first distributed field system of same distributed system architecture System and the second distributed file system are also within the protection domain of the embodiment of the present invention.
In actual applications, the information of pending file, such as file major key (key), text can be carried in pending request Part name, MD5 (Message Digest 5 the 5th edition, Message Digest Algorithm 5) of file etc., and the first distributed text Can include in the essential information stored in part system:File major key (key) and file are in the second distributed file system The information such as store path, because a file generally has unique key, so, can be according to the file master of pending file Key reads other essential informations of pending file from the first distributed file system, and such as file is in the second distributed field system Information such as store path, file size, filename, file status information on system etc..It should be noted that in pending request During the information such as middle filename, the MD5 of file for carrying pending file, such as Hash operation can be carried out to these information, with These information are converted into file major key, the embodiment of the present invention is not any limitation as specific transfer process.
Step 104, the essential information according to the pending file, are processed the document processing request.
In the embodiment of the present invention, the essential information of pending file can be as the treatment foundation of document processing request.Example Such as, when document processing request is asked for file download, can be according to the store path in essential information, from the second distributed text The file content of the pending file is read in part system, and downloads the file content of the pending file.
Alternatively, after the treatment for completing document processing request, can be to the document processing request pending text of correspondence File directory belonging to part is unlocked, to cause that alternative document treatment request possesses the file directory belonging to pending file Accessing may, it will be understood that the embodiment of the present invention is not any limitation as specific releasing process and unblock opportunity.
To sum up, the document handling method of the embodiment of the present invention, when there is treatment (as the accessed) demand for file, can To obtain store path of the file in the second distributed file system from the first distributed file system, relative to existing HDFS is to enumerate mode by listed files to obtain file, it is possible to achieve the quick positioning of pending file, therefore can be improved The treatment effeciency of file, and then the access performance of application on site program can be improved.
Reference picture 2, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 201, reception file download request;
Step 202, to the file download request correspondence file to be downloaded belonging to file directory lock;It is described to treat The essential information for downloading file is stored in the first file system, and the file content of the file to be downloaded is stored in the second file system System;
Step 203, the essential information for reading from the first distributed file system the file to be downloaded;The basic letter Breath can include:Store path of the file in the second distributed file system;
Step 204, according to the store path, the pending file is read from second distributed file system File content;
Step 205, the file content for downloading the pending file.
File download request can be used to download the file of certain key, and result can be to read the file for obtaining, and will The file that reading is obtained is returned.Alternatively, client can ask corresponding api interface to produce above-mentioned file by file download Download request.
In a kind of alternative embodiment of the invention, before step 204 is performed, can be according in the essential information Upload and complete mark, judge whether to complete the upload of the file to be downloaded, if so, can then perform step 204, otherwise can be with Directly return to the non-existent result of such as failed download or file.For example, after file download request is received, can The file directory belonging to correspondence file to be downloaded is asked to lock to be managed the file download by ZooKeeper, and from Store path, the file status information of the file of the file to be downloaded in the second distributed file system etc. is read on Hbase Essential information;Further, this document status information includes:Upload and complete mark, then can be according to upper in the essential information Pass and complete mark, judge whether to complete the upload of the file to be downloaded, if so, can then perform step 204, can otherwise return Return the non-existent result of such as failed download or file.The above-mentioned upload according in essential information completes what mark was carried out Judgement is processed, and corresponding result can be quickly obtained when file to be downloaded does not upload completion, is treated therefore, it is possible to improve The treatment effeciency of download request.
Reference picture 3, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 301, reception file deletion requests;
Step 302, the file directory belonging to file deletion requests correspondence file to be deleted is locked;It is described to treat The essential information for deleting file is stored in the first file system, and the file content of the file to be deleted is stored in the second file system System;
Step 303, the essential information for reading from the first distributed file system the file to be deleted;The basic letter Breath can include:Store path of the file in the second distributed file system;
Step 304, according to the store path, the file to be deleted is deleted in second distributed file system File content;
Step 305, the essential information for deleting in first distributed file system file to be deleted.
File deletion requests can be used to deleting storage content (including essential information and the text of certain certain key respective file Part content), result can fail to delete successfully or deleting.Alternatively, client can be by file deletion requests Corresponding api interface produces above-mentioned file deletion requests.
It is described to be deleted when that cannot be read from the first distributed file system in a kind of alternative embodiment of the invention During the essential information of file, it may be said that bright file to be deleted does not exist, therefore the treatment knot for for example deleting failure can be directly obtained Really.
The embodiment of the present invention is not any limitation as the execution sequence of step 304 and step 305, that is, can side by side, first Afterwards or after first carry out step 304 and step 305.
Reference picture 4, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 401, reception file status inquiry request;
Step 402, the file directory belonging to file status inquiry request correspondence file to be checked is locked;Institute The essential information for stating file to be checked is stored in the first file system, and the file content of the file to be checked is stored in the second text Part system;
Step 403, the essential information for reading from the first distributed file system the file to be checked;The basic letter Breath can include:File status information;
Step 404, file status information is obtained from the essential information of the file to be checked, and by the file status Information is returned as result.
File status inquiry request can be used to inquire about the file status information of certain key, and result can be to inquire about The file status information for arriving.Alternatively, client can ask corresponding api interface to produce above-mentioned file shape by file status State inquiry request.
In a kind of alternative embodiment of the invention, above-mentioned file status information includes:Treatment status indicator, file are long Degree, at least one uploaded during length, upload complete mark and storage skew.Wherein, above-mentioned upload completion mark can be used for table Show the upload for whether completing file, above-mentioned treatment status indicator can be used to represent be during upload, downloads of file etc. is processed It is no to there is abnormality processing, alternatively, above-mentioned upload complete mark and above-mentioned treatment status indicator can using true and false or Person 1 and 0 represents corresponding state.File size can be used to represent the byte-sized shared by file, upload length and can be used for The byte-sized shared by the file for uploading is represented, generally, length is uploaded and is less than or equal to file size, storage skew can be used for table Show the address offset amount of data block corresponding to file.It is appreciated that can be used to characterize any file status information of file status Within the protection domain of the embodiment of the present invention.
Reference picture 5, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 501, reception files passe request;
Step 502, the file directory belonging to the files passe request pending file of correspondence is locked;It is described to treat The essential information for processing file is stored in the first file system, and the file content of the pending file is stored in the second file system System;
Step 503, the essential information for reading from the first distributed file system the pending file;The basic letter Breath can include:Upload and complete mark and store path of the file in the second distributed file system;
Step 504, mark is completed according to the upload in the essential information, judge whether to complete the pending file Upload;
Step 505, when it is determined that completing the upload of the pending file, return to the result for completing to upload.
Files passe request can be used for the position of key in the files passe of pre-set path to distributed file system, place Reason result can include:Complete to upload, uploads successfully or upload unsuccessfully etc..Alternatively, client can be by files passe Corresponding api interface is asked to produce above-mentioned files passe to ask.
The embodiment of the present invention can complete mark according to the upload in essential information, judge whether to complete the pending text The upload of part, if so, then illustrating that pending file is already present on distributed file system, therefore can directly obtain completion and upload Result;The pending file of pre-set path is write into the second distributed file system, the present invention relative to traditional scheme The treatment that embodiment is based on uploading completion mark can improve the treatment effeciency that files passe is asked.
In a kind of alternative embodiment of the invention, the essential information can also include:Treatment status indicator, above-mentioned side Method can also include:When it is determined that not completing the upload of the pending file, according to the treatment state in the essential information Mark judges that the pending file is abnormal with the presence or absence for the treatment of;When it is determined that there is treatment exception in the pending file, The abnormality processing of the pending file is carried out in second file system;After the abnormality processing is completed, wait to locate described in Storage state of the reason file in the second file system judges whether the pending file is successfully uploaded;When it is determined that described treat When treatment file is successfully uploaded, the result for successfully uploading is returned.This alternative embodiment can be according further to basic letter Treatment status indicator in breath judges that pending file is abnormal with the presence or absence for the treatment of, if so, then being carried out in the second file system The abnormal reparation of the pending file, and determine whether whether pending file is successfully gone up after abnormal reparation is completed Pass, if so, then returning to the result for successfully uploading.The pending file of pre-set path is write second relative to traditional scheme Distributed file system, the embodiment of the present invention is based on treatment status indicator and the abnormal treatment repaired, can be in pending file Existing abnormality processing on the basis of carry out abnormality processing (such as breakpoint transmission or file repair etc.), therefore, it is possible to improve The treatment effeciency of files passe request.
In another alternative embodiment of the invention, the above method can also include:When it is determined that the pending file When not uploaded successfully in the absence for the treatment of exception or the determination pending file, by the treatment shape in the essential information State mark is set to exception, and the pending file is write into second file system, and will be described basic after being successfully written Treatment status indicator in information is set to normally;After the treatment status indicator in the essential information is set to normally, return The result that success is uploaded.Pending file can not only be write second file system by above-mentioned treatment, and can Treatment status indicator is updated in time.
In another alternative embodiment of the invention, the above method can also include:When the first distributed file system When above in the absence of the essential information of the pending file, the treatment status indicator in the essential information is set to exception, will The pending file writes second file system, and after being successfully written by the treatment state mark in the essential information Knowledge is set to normal;After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned. Pending file can not only be write second file system by above-mentioned treatment, and can in time update treatment state mark Know.
Reference picture 6, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 601, reception files passe request;
Step 602, the file directory belonging to the files passe request pending file of correspondence is locked;It is described to treat The essential information for processing file is stored in the first file system, and the file content of the pending file is stored in the second file system System;
Step 603, judge in the first file system with the presence or absence of the pending file essential information, if so, then holding Row step 604, otherwise performs step 608;
Step 604, mark is completed according to the upload in the essential information judge whether to complete the file to be uploaded Upload, if so, then performing step 609, otherwise perform step 605;
Step 605, when it is determined that do not complete the pending file upload when, according to the treatment shape in the essential information State mark judges that the pending file is abnormal with the presence or absence for the treatment of, if so, then performing step 606, otherwise performs step 608;
Step 606, when it is determined that the pending file exist treatment it is abnormal when, described treating is carried out in the second file system Process the abnormality processing of file;
Step 607, after the abnormality processing is completed, according to storage of the pending file in the second file system Whether pending file described in condition adjudgement is successfully uploaded, if so, then performing step 609, otherwise performs step 608;
Step 608, when in the first distributed file system in the absence of the pending file essential information or determination When the pending file is abnormal in the absence for the treatment of or determines that the pending file is not uploaded successfully, will be described basic Treatment status indicator in information is set to exception, the pending file is write into second file system, and successfully writing The treatment status indicator in the essential information is set to normally after entering;Put by the treatment status indicator in the essential information After normal, the result for successfully uploading is returned;
Step 609, the result for returning to completion upload;
Step 610, the file directory belonging to the pending file is unlocked.
Reference picture 7, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 701, reception document processing request;
Step 702, the file directory belonging to the document processing request pending file of correspondence is locked;It is described to treat The essential information for processing file is stored in the first file system, and the file content of the pending file is stored in the second file system System;
Step 703, the essential information for reading from the first distributed file system the pending file;The basic letter Breath includes:Store path of the file in the second distributed file system;
Step 704, the essential information according to the pending file, are processed the document processing request;
Relative to embodiment any in Fig. 1 to Fig. 6, the method for the present embodiment can also include:
Step 705, reception random scanning request;The major key of random scanning in need is carried in the random scanning request;
Step 706, from the first distributed file system the major key respective file for needing random scanning is read Store path in two distributed file systems;
Step 707, according to the store path, obtained from second distributed file system it is described needs sweep at random The file content of the major key respective file retouched;
Step 708, the scanning that acquired file content is carried out using scanning tools.
The HDFS of existing scheme enumerates mode and obtains the characteristic of file and causes it can not to realize at random by listed files Scanning, and existing Hbase to data access it is time-consuming more long cause that its full dose scans it is time-consuming will more than one week one change Generation.
And be stored separately for the essential information and file content of file by the embodiment of the present invention, essential information for example, by The first distributed file system storage of Hbase, it is quick during random scanning that file is obtained at second point of such as HDFS according to key Store path in cloth file system, then carries out file content by api interface primary in flyback framework bag (JAR) Obtain, the scanning finally by third party's scanning tools to file content.In addition, flyback framework of the invention, can be by the 3rd Square scanning tools pass to framework by parameter, and user can not change any code of framework, you can scanned by third party Instrument realizes the random scanning to stored file.Wherein, the parameter for customizing can be included in the flyback framework bag.
It is appreciated that in addition to random scanning, the embodiment of the present invention can also be realized being swept for the full dose of stored file Retouch.In a kind of application example of the invention, it is assumed that the quantity more than 15,000,000 of stored file, the total size of stored file It is 50T, the mean size of stored file is 3.5M, it is assumed that scanning demand is as follows:The daily full dose for having training algorithm is scanned, with Find out special result sample;Irregularly there is the random flyback of third party's scanner, find out third party's scanner and the difference of itself; When training algorithm is changed or optimized, full dose can quickly scan, to be improved or optimum results, be contrasted.Using this hair Bright embodiment, the full dose scanning spent time is or so 4 hours, time-consuming, the energy relative to more than a week in existing scheme Enough greatly improve the efficiency of full dose scanning.
Reference picture 8, flow chart the step of show a kind of document handling method according to an embodiment of the invention, specifically May include steps of:
Step 801, reception document processing request;
Step 802, the file directory belonging to the document processing request pending file of correspondence is locked;It is described to treat The essential information for processing file is stored in the first file system, and the file content of the pending file is stored in the second file system System;
Step 803, the essential information for reading from the first distributed file system the pending file;The basic letter Breath includes:Store path of the file in the second distributed file system;
Step 804, the essential information according to the pending file, are processed the document processing request;
Relative to embodiment any in Fig. 1 to Fig. 6, the method for the present embodiment can also include:
Step 805, the file periodically to increasing storage newly merge operation.
Be stored in the metadata of stored file in internal memory by the name node of HDFS in existing scheme, causes the text of storage Number of packages mesh is limited to the memory size of name node.Assuming that each file, catalogue, data block take 150Bytes in HDFS, then If the file for depositing 1million numbers at least consumes 300MB internal memories, if the file that deposit the number of 1billion will Hardware capabilities can be exceeded, therefore, HDFS is not suitable for the storage of large amount of small documents.
In order to reduce the quantity of HDFS small files, the embodiment of the present invention can also be by the Piece file mergence of newly-increased storage to greatly In file.Specifically, the strategy of interim storage is taken for increasing the file of storage newly, in merging phase hourly, will be interim The Piece file mergence of storage is in corresponding big file.For example, newly-increased storage catalogue is placed on this interim storage of xstore/bucket Under catalogue, the big file after merging is placed under xstore_merge/bucket this merging storage catalogue.
In actual applications, temporary storing directory and merging storage catalogue can be set for a bucket (bucket);Often 16^n catalogue can be divided into according to parameter configuration below individual catalogue, n is defaulted as 3, this parameter influence hadoop merges task Maximum worker (workman) number.During the file of newly-increased storage to be merged operation, can be to meeting to interim storage The file of catalogue is enumerated, and the movement of file is carried out one by one, specifically, the corresponding key of the file of newly-increased storage is taken into Hash Value, and using Hash result as merge storage catalogue subdirectory final section.
Assuming that the key of the file of newly-increased storage is 154712366.sandbox and 154717588.sandbox, by upper Pass interface storage to the temporary storing directory of following HDFS:
xstore/bucket***/c69/154712366.sandbox
xstore/bucket***/c69/154717588.sandbox
Then during operation is merged, can one by one will to that can enumerate the file of temporary storing directory In (additional) the big files to correspondence key of file append, while the file key of temporary storing directory is stored in Hbase Essential information is modified, for example, can be with by the HDFS storage catalogues after above-mentioned two Piece file mergence:
xstore_merge/bucket***/c69/.sn
xstore_merge/bucket***/c69/000.mrg
xstore_merge/bucket***/c69/001.mrg
xstore_merge/bucket***/c69/002.mrg
xstore_merge/bucket***/c69/003.mrg
xstore_merge/bucket***/c69/004.mrg
The subdirectory of above-mentioned merging storage catalogue, is incremented by according to sequence number .sn, merges arrive newly-increased file append every time Maximum serial number file, such as 004.mrg.
To sum up, file content is stored in the second distributed file system by the embodiment of the present invention, supports big file and magnanimity File;The essential information of file is stored in the first distributed file system simultaneously;Compared with existing HDFS, the present invention is implemented Example has the following advantages that:
It is possible, firstly, to while being stored to mass file, can by MapReduce carry out file full dose and Random scanning ability;
Secondly, a set of general flyback framework based on Streaming (stream) mode can be provided, third-party sweeping is supported Retouch the insertion of instrument;
Furthermore, a set of service end interface based on http protocol can be provided, inside solves high concurrent using distributed lock Access conflict problem, for calling for client;
Additionally, the file stored for the second distributed file system, can make number of files meet contraction by merging Number of files, lower file excessively causes system bottleneck in solving the problems, such as HDFS.
In addition, enumerating mode by listed files relative to existing HDFS obtains file, for file status inquiry, text The document processing requests such as part download, files passe, file deletion, can be entered by the essential information stored in the first file system The quick positioning of style of writing part, therefore, it is possible to improve file activity.
For embodiment of the method, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but this area Technical staff should know that the embodiment of the present invention is not limited by described sequence of movement, because implementing according to the present invention Example, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know, specification Described in embodiment belong to alternative embodiment, necessary to the involved action not necessarily embodiment of the present invention.
Reference picture 9, shows a kind of structured flowchart of document handling apparatus according to an embodiment of the invention, specifically may be used With including such as lower module:
Request receiving module 901, for receiving document processing request;
Catalogue locks module 902, for entering to the file directory belonging to the document processing request pending file of correspondence Row is locked;The essential information of the pending file is stored in the first file system, and the file content of the pending file is deposited It is stored in the second file system;
Data obtaining module 903, the basic letter for reading the pending file from the first distributed file system Breath;The essential information can specifically include:Store path of the file in the second distributed file system;
Request processing module 904, for the essential information according to the pending file, enters to the document processing request Row treatment.
Alternatively, the document processing request can specifically include:File download is asked, the request processing module 904 Can specifically include:
File content reading submodule, for according to the store path, reading from second distributed file system Take the file content of the pending file;
File content downloads submodule, the file content for downloading the pending file.
Alternatively, the document processing request can specifically include:File deletion requests, the request processing module 904 Can specifically include:
First deletes submodule, for according to the store path, institute being deleted in second distributed file system State the file content of pending file;
Second deletes submodule, for deleting the basic of the pending file in first distributed file system Information.
Alternatively, the document processing request can specifically include:File status inquiry request, the essential information may be used also So that including file status information, the request processing module 904 includes:
Status information processes submodule, for obtaining file status information from the essential information of the pending file, And returned the file status information as result.
Alternatively, the document processing request can include:Files passe is asked, and the essential information can also include: Upload and complete mark, the request processing module 904 can specifically include:
First judging submodule, for completing mark according to the upload in the essential information, judges whether to complete described The upload of pending file;
First result returns to submodule, completes to upload for when it is determined that completing the upload of the pending file, returning Result.
Alternatively, the essential information can also include:Treatment status indicator, the request processing module 904 can be with Including:
Second judging submodule, for when it is determined that not completing the upload of the pending file, according to the basic letter Treatment status indicator in breath judges that the pending file is abnormal with the presence or absence for the treatment of;
Abnormality processing submodule, for when it is determined that the pending file has treatment exception, in the second file system On carry out the abnormality processing of the pending file;
3rd judging submodule, for after the abnormality processing is completed, according to the pending file in the second file Storage state in system judges whether the pending file is successfully uploaded;
Second result returns to submodule, for when it is determined that the pending file is successfully uploaded, returning and successfully uploading Result.
Alternatively, the request processing module 904 can also include:
First write-in treatment submodule, for when the determination pending file is abnormal in the absence for the treatment of or determines institute When stating pending file and not uploaded successfully, the treatment status indicator in the essential information is set to exception, waits to locate by described Reason file writes second file system, and is being set to just the treatment status indicator in the essential information after being successfully written Often;
3rd result returns to submodule, for after the treatment status indicator in the essential information is set to normally, returning Return the result for successfully uploading.
Alternatively, described device can also include:
Second write-in treatment submodule, for when the base in the first distributed file system in the absence of the pending file During this information, the treatment status indicator in the essential information is set to exception, by the pending file write-in described second File system, and be set to normally the treatment status indicator in the essential information after being successfully written;
4th result returns to submodule, for after the treatment status indicator in the essential information is set to normally, returning Return the result for successfully uploading.
Alternatively, described device can also include:
Scan request receiver module, for receiving random scanning request;Carried in random scanning request it is in need with The major key of machine scanning;
Path read module, it is corresponding for reading the major key for needing random scanning from the first distributed file system Store path of the file in the second distributed file system;
File content read module, for according to the store path, being obtained from second distributed file system The file content of the major key respective file for needing random scanning;
Scan module, the scanning for carrying out acquired file content using scanning tools.
Alternatively, described device can also include:
File combination module, operation is merged for the file periodically to increasing storage newly.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related Part is illustrated referring to the part of embodiment of the method.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) are come in realizing document handling method and device according to embodiments of the present invention The some or all functions of some or all parts.The present invention is also implemented as performing method as described herein Some or all equipment or program of device (for example, computer program and computer program product).Such reality Existing program of the invention can be stored on a computer-readable medium, or can have the form of one or more signal. Such signal can be downloaded from Internet platform and obtained, or be provided on carrier signal, or in any other form There is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol being located between bracket should not be configured to limitations on claims.Word " including " do not exclude the presence of not Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.
The invention discloses A1, a kind of document handling method, including:
Receive document processing request;
File directory belonging to the document processing request pending file of correspondence is locked;The pending file Essential information be stored in the first file system, the file content of the pending file is stored in the second file system;
The essential information of the pending file is read from the first distributed file system;The essential information includes: Store path of the file in the second distributed file system;
According to the essential information of the pending file, the document processing request is processed.
A2, the method as described in A1, the document processing request include:File download is asked, and waits to locate described in the foundation The essential information of file is managed, the step for the treatment of the document processing request, including:
According to the store path, read from second distributed file system in the file of the pending file Hold;
Download the file content of the pending file.
A3, the method as described in A1, the document processing request include:File deletion requests, wait to locate described in the foundation The essential information of file is managed, the step for the treatment of the document processing request, including:
According to the store path, deleted in second distributed file system in the file of the pending file Hold;
The essential information of the pending file is deleted in first distributed file system.
A4, the method as described in A1, the document processing request include:File status inquiry request, the essential information Also include file status information, the essential information according to the pending file, at the document processing request The step of reason, including:
From the essential information of the pending file obtain file status information, and using the file status information as Result is returned.
A5, the method as described in A1, the document processing request include:Files passe is asked, and the essential information is also wrapped Include:Upload and complete mark, the essential information according to the pending file is processed the document processing request Step, including:
Mark is completed according to the upload in the essential information, judges whether to complete the upload of the pending file;
When it is determined that completing the upload of the pending file, the result for completing to upload is returned.
A6, the method as described in A5, the essential information also include:Treatment status indicator, it is described according to described pending The essential information of file, the step for the treatment of the document processing request, also includes:
When it is determined that not completing the upload of the pending file, sentence according to the treatment status indicator in the essential information The pending file that breaks is abnormal with the presence or absence for the treatment of;
When it is determined that the pending file has treatment exception, the pending file is carried out in the second file system Abnormality processing;
After the abnormality processing is completed, the storage state according to the pending file in the second file system judges Whether the pending file is successfully uploaded;
When it is determined that the pending file is successfully uploaded, the result for successfully uploading is returned.
A7, the method as described in A6, the essential information according to the pending file, to the document processing request The step of being processed, also includes:
When it is determined that the pending file is abnormal in the absence for the treatment of or the determination pending file is not uploaded successfully When, the treatment status indicator in the essential information is set to exception, the pending file is write into the second file system System, and be set to normally the treatment status indicator in the essential information after being successfully written;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
A8, the method as described in A5, methods described also include:
When in the first distributed file system in the absence of the essential information of the pending file, by the essential information In treatment status indicator be set to exception, the pending file is write into second file system, and after being successfully written Treatment status indicator in the essential information is set to normally;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
A9, the method as described in any in A1 to A8, methods described also include:
Receive random scanning request;The major key of random scanning in need is carried in the random scanning request;
The major key respective file for needing random scanning is read from the first distributed file system distributed second Store path in file system;
According to the store path, the major key for needing random scanning is obtained from second distributed file system The file content of respective file;
The scanning of acquired file content is carried out using scanning tools.
A10, the method as described in any in A1 to A8, methods described also include:
Operation periodically is merged to the file for increasing storage newly.
The invention also discloses B11, a kind of document handling apparatus, including:
Request receiving module, for receiving document processing request;
Catalogue locks module, for being added to the file directory belonging to the document processing request pending file of correspondence Lock;The essential information of the pending file is stored in the first file system, and the file content of the pending file is stored in Second file system;
Data obtaining module, the essential information for reading the pending file from the first distributed file system; The essential information includes:Store path of the file in the second distributed file system;
Request processing module, for the essential information according to the pending file, is carried out to the document processing request Treatment.
B12, the device as described in B11, the document processing request include:File download is asked, the request treatment mould Block includes:
File content reading submodule, for according to the store path, reading from second distributed file system Take the file content of the pending file;
File content downloads submodule, the file content for downloading the pending file.
B13, the device as described in B11, the document processing request include:File deletion requests, the request processes mould Block includes:
First deletes submodule, for according to the store path, institute being deleted in second distributed file system State the file content of pending file;
Second deletes submodule, for deleting the basic of the pending file in first distributed file system Information.
B14, the device as described in B11, the document processing request include:File status inquiry request, the basic letter Breath also includes file status information, and the request processing module includes:
Status information processes submodule, for obtaining file status information from the essential information of the pending file, And returned the file status information as result.
B15, the device as described in B11, the document processing request include:Files passe is asked, and the essential information is also Including:Upload and complete mark, the request processing module includes:
First judging submodule, for completing mark according to the upload in the essential information, judges whether to complete described The upload of pending file;
First result returns to submodule, completes to upload for when it is determined that completing the upload of the pending file, returning Result.
B16, the device as described in B15, the essential information also include:Treatment status indicator, the request processing module Also include:
Second judging submodule, for when it is determined that not completing the upload of the pending file, according to the basic letter Treatment status indicator in breath judges that the pending file is abnormal with the presence or absence for the treatment of;
Abnormality processing submodule, for when it is determined that the pending file has treatment exception, in the second file system On carry out the abnormality processing of the pending file;
3rd judging submodule, for after the abnormality processing is completed, according to the pending file in the second file Storage state in system judges whether the pending file is successfully uploaded;
Second result returns to submodule, for when it is determined that the pending file is successfully uploaded, returning and successfully uploading Result.
B17, the device as described in B16, the request processing module also include:
First write-in treatment submodule, for when the determination pending file is abnormal in the absence for the treatment of or determines institute When stating pending file and not uploaded successfully, the treatment status indicator in the essential information is set to exception, waits to locate by described Reason file writes second file system, and is being set to just the treatment status indicator in the essential information after being successfully written Often;
3rd result returns to submodule, for after the treatment status indicator in the essential information is set to normally, returning Return the result for successfully uploading.
B18, the device as described in B15, described device also include:
Second write-in treatment submodule, for when the base in the first distributed file system in the absence of the pending file During this information, the treatment status indicator in the essential information is set to exception, by the pending file write-in described second File system, and be set to normally the treatment status indicator in the essential information after being successfully written;
4th result returns to submodule, for after the treatment status indicator in the essential information is set to normally, returning Return the result for successfully uploading.
B19, the device as described in any in B11 to B18, described device also include:
Scan request receiver module, for receiving random scanning request;Carried in random scanning request it is in need with The major key of machine scanning;
Path read module, it is corresponding for reading the major key for needing random scanning from the first distributed file system Store path of the file in the second distributed file system;
File content read module, for according to the store path, being obtained from second distributed file system The file content of the major key respective file for needing random scanning;
Scan module, the scanning for carrying out acquired file content using scanning tools.
B20, the device as described in any in B11 to B18, described device also include:
File combination module, operation is merged for the file periodically to increasing storage newly.

Claims (10)

1. a kind of document handling method, including:
Receive document processing request;
File directory belonging to the document processing request pending file of correspondence is locked;The base of the pending file In the first file system, the file content of the pending file is stored in the second file system to this information Store;
The essential information of the pending file is read from the first distributed file system;The essential information includes:File Store path in the second distributed file system;
According to the essential information of the pending file, the document processing request is processed.
2. the method for claim 1, it is characterised in that the document processing request includes:File download is asked, described According to the essential information of the pending file, the step for the treatment of the document processing request, including:
According to the store path, the file content of the pending file is read from second distributed file system;
Download the file content of the pending file.
3. the method for claim 1, it is characterised in that the document processing request includes:File deletion requests, it is described According to the essential information of the pending file, the step for the treatment of the document processing request, including:
According to the store path, the file content of the pending file is deleted in second distributed file system;
The essential information of the pending file is deleted in first distributed file system.
4. the method for claim 1, it is characterised in that the document processing request includes:File status inquiry request, The essential information also includes file status information, the essential information according to the pending file, at the file The step of reason request is processed, including:
File status information is obtained from the essential information of the pending file, and using the file status information as treatment Result is returned.
5. the method for claim 1, it is characterised in that the document processing request includes:Files passe is asked, described Essential information also includes:Upload and complete mark, the essential information according to the pending file please to the file process The step of being processed is asked, including:
Mark is completed according to the upload in the essential information, judges whether to complete the upload of the pending file;
When it is determined that completing the upload of the pending file, the result for completing to upload is returned.
6. method as claimed in claim 5, it is characterised in that the essential information also includes:Treatment status indicator, it is described according to According to the essential information of the pending file, the step for the treatment of the document processing request, also include:
When it is determined that not completing the upload of the pending file, institute is judged according to the treatment status indicator in the essential information State pending file abnormal with the presence or absence for the treatment of;
When it is determined that the pending file has treatment exception, the different of the pending file is carried out in the second file system Often treatment;
After the abnormality processing is completed, the storage state according to the pending file in the second file system judges described Whether pending file is successfully uploaded;
When it is determined that the pending file is successfully uploaded, the result for successfully uploading is returned.
7. method as claimed in claim 6, it is characterised in that the essential information according to the pending file, to institute The step of stating document processing request and processed, also include:
When it is determined that the pending file is not uploaded successfully in the absence for the treatment of exception or the determination pending file, Treatment status indicator in the essential information is set to exception, the pending file is write into second file system, And be set to normally the treatment status indicator in the essential information after being successfully written;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
8. method as claimed in claim 5, it is characterised in that methods described also includes:
When in the first distributed file system in the absence of the essential information of the pending file, by the essential information Treatment status indicator is set to exception, and the pending file is write into second file system, and by institute after being successfully written The treatment status indicator stated in essential information is set to normally;
After the treatment status indicator in the essential information is set to normally, the result for successfully uploading is returned.
9. the method as described in any in claim 1 to 8, it is characterised in that methods described also includes:
Receive random scanning request;The major key of random scanning in need is carried in the random scanning request;
The major key respective file for needing random scanning is read from the first distributed file system in the second distributed document Store path in system;
According to the store path, the major key for needing random scanning is obtained from second distributed file system corresponding The file content of file;
The scanning of acquired file content is carried out using scanning tools.
10. a kind of document handling apparatus, including:
Request receiving module, for receiving document processing request;
Catalogue locks module, for being locked to the file directory belonging to the document processing request pending file of correspondence; The essential information of the pending file is stored in the first file system, and the file content of the pending file is stored in second File system;
Data obtaining module, the essential information for reading the pending file from the first distributed file system;It is described Essential information includes:Store path of the file in the second distributed file system;
Request processing module, for the essential information according to the pending file, is processed the document processing request.
CN201611022842.2A 2016-11-18 2016-11-18 A kind of document handling method and device Pending CN106776720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611022842.2A CN106776720A (en) 2016-11-18 2016-11-18 A kind of document handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611022842.2A CN106776720A (en) 2016-11-18 2016-11-18 A kind of document handling method and device

Publications (1)

Publication Number Publication Date
CN106776720A true CN106776720A (en) 2017-05-31

Family

ID=58969943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611022842.2A Pending CN106776720A (en) 2016-11-18 2016-11-18 A kind of document handling method and device

Country Status (1)

Country Link
CN (1) CN106776720A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107580066A (en) * 2017-09-20 2018-01-12 郑州云海信息技术有限公司 The method, apparatus and system of file access in a kind of distributed NAS storage system
CN107633090A (en) * 2017-09-29 2018-01-26 郑州云海信息技术有限公司 A kind of method split based on distributed type file system client side lock
CN110162384A (en) * 2019-04-19 2019-08-23 深圳壹账通智能科技有限公司 Time-out time dynamic adjusting method and system based on Redis distributed lock
CN110377579A (en) * 2019-07-24 2019-10-25 南京中孚信息技术有限公司 File memory method, device and server
CN110597764A (en) * 2019-10-10 2019-12-20 深圳前海微众银行股份有限公司 File management method and device
CN112783850A (en) * 2021-02-09 2021-05-11 珠海豹趣科技有限公司 File enumeration method and device based on USN log, electronic equipment and storage medium
CN115022307A (en) * 2022-07-26 2022-09-06 中银金融科技有限公司 File downloading method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5515532A (en) * 1993-09-22 1996-05-07 Kabushiki Kaisha Toshiba File management system for memory card
CN101079036A (en) * 2006-06-23 2007-11-28 腾讯科技(深圳)有限公司 Storage method and system for mass file
CN101187930A (en) * 2007-12-04 2008-05-28 浙江大学 Distribution type file system dummy directory and name space implementing method
CN101876992A (en) * 2009-11-17 2010-11-03 中国科学院自动化研究所 Method for managing image data warehouse
CN105868286A (en) * 2016-03-23 2016-08-17 中国科学院计算技术研究所 Parallel adding method and system for merging small files on basis of distributed file system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5515532A (en) * 1993-09-22 1996-05-07 Kabushiki Kaisha Toshiba File management system for memory card
CN101079036A (en) * 2006-06-23 2007-11-28 腾讯科技(深圳)有限公司 Storage method and system for mass file
CN101187930A (en) * 2007-12-04 2008-05-28 浙江大学 Distribution type file system dummy directory and name space implementing method
CN101876992A (en) * 2009-11-17 2010-11-03 中国科学院自动化研究所 Method for managing image data warehouse
CN105868286A (en) * 2016-03-23 2016-08-17 中国科学院计算技术研究所 Parallel adding method and system for merging small files on basis of distributed file system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107580066A (en) * 2017-09-20 2018-01-12 郑州云海信息技术有限公司 The method, apparatus and system of file access in a kind of distributed NAS storage system
CN107633090A (en) * 2017-09-29 2018-01-26 郑州云海信息技术有限公司 A kind of method split based on distributed type file system client side lock
CN110162384A (en) * 2019-04-19 2019-08-23 深圳壹账通智能科技有限公司 Time-out time dynamic adjusting method and system based on Redis distributed lock
CN110377579A (en) * 2019-07-24 2019-10-25 南京中孚信息技术有限公司 File memory method, device and server
CN110597764A (en) * 2019-10-10 2019-12-20 深圳前海微众银行股份有限公司 File management method and device
CN110597764B (en) * 2019-10-10 2024-05-07 深圳前海微众银行股份有限公司 File downloading and version management method and device
CN112783850A (en) * 2021-02-09 2021-05-11 珠海豹趣科技有限公司 File enumeration method and device based on USN log, electronic equipment and storage medium
CN112783850B (en) * 2021-02-09 2023-09-22 珠海豹趣科技有限公司 File enumeration method and device based on USN (universal serial bus) log, electronic equipment and storage medium
CN115022307A (en) * 2022-07-26 2022-09-06 中银金融科技有限公司 File downloading method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN106776720A (en) A kind of document handling method and device
CN109391664B (en) System and method for multi-cluster container deployment
US10540173B2 (en) Version control of applications
JP6560308B2 (en) System and method for implementing a data storage service
CN109032824A (en) Database method of calibration, device, computer equipment and storage medium
US5519855A (en) Summary catalogs
CN109634932A (en) A kind of intelligence contract storage method and storage system
CN105653901A (en) Component repository management method and system
CN106844676A (en) Date storage method and device
JP2023512247A (en) Managing Objects in Shared Cache Using Multiple Chains
CN103577546A (en) Method and equipment for data backup, and distributed cluster file system
CN113515303B (en) Project transformation method, device and equipment
US10606805B2 (en) Object-level image query and retrieval
CN107844542A (en) A kind of distributed document storage method and device
CN114238085A (en) Interface testing method and device, computer equipment and storage medium
CN109614271A (en) Control method, device, equipment and the storage medium of multiple company-data consistency
US10951465B1 (en) Distributed file system analytics
CN108304555A (en) Distributed maps data processing method
CN115174158B (en) Cloud product configuration checking method based on multi-cloud management platform
US20230109530A1 (en) Synchronous object placement for information lifecycle management
US10942912B1 (en) Chain logging using key-value data storage
CN112347794A (en) Data translation method, device, equipment and computer storage medium
CN111400243A (en) Research and development management system based on pipeline service and file storage method and device
CN117170823B (en) Method and device for executing operation in batch container and electronic equipment
CN109446168B (en) Method for sharing configuration file based on InData-Kudu object storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531