CN108038188A - A kind of document handling method and device - Google Patents

A kind of document handling method and device Download PDF

Info

Publication number
CN108038188A
CN108038188A CN201711306239.1A CN201711306239A CN108038188A CN 108038188 A CN108038188 A CN 108038188A CN 201711306239 A CN201711306239 A CN 201711306239A CN 108038188 A CN108038188 A CN 108038188A
Authority
CN
China
Prior art keywords
file
blocks
files
information
index information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711306239.1A
Other languages
Chinese (zh)
Inventor
王同庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201711306239.1A priority Critical patent/CN108038188A/en
Publication of CN108038188A publication Critical patent/CN108038188A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Document handling method and device provided by the invention, when carrying out storage processing to file, content piecemeal is carried out to file and each blocks of files obtained by piecemeal establishes index information, on this basis, file directory, file block information and blocks of files index information are carried out to correspond to storage, while storage is compressed to file and deletes original document;Subsequently inquired about especially by the corresponding each blocks of files index information of file, to realize required file polling.Since the present invention program carries out corresponding to storage to file directory, file block information and blocks of files index information, storage is compressed to file and deletes original document at the same time, so as to compared with storing the conventional store mode of original document, can effectively save memory space, at the same time, since the present invention inquires about each the blocks of files index information rather than original document of file, so as to lift file polling efficiency, in consideration of it, the present invention can be achieved low cost, expeditiously store and inquire about various data files.

Description

A kind of document handling method and device
Technical field
Storage, inquiring technology field the invention belongs to data file, more particularly to a kind of document handling method and device.
Background technology
In current internet and information age, substantial amounts of text information is generated, and then generates substantial amounts of data File, wherein word excel txt html java xml css be most basic and common form.
For these substantial amounts of data files, storage and the inquiry problem of data file are necessarily faced with, at present city The commercialization file generally provided on face by companies such as Google, Baidu stores and inquiry services, to carry out depositing for data file Storage and inquiry, but for medium-sized and small enterprises, these current commercialization storages and inquiry service generally existing are costly, The shortcomings of file occupied space is big, and query search speed is slow.
Therefore, this area is it is urgent to provide a kind of preferably data file processing method, to meet that medium-sized and small enterprises are deposited in file Storage and the demand of query aspects, realize low cost, expeditiously store and inquire about various data files.
The content of the invention
In view of this, it is an object of the invention to provide a kind of document handling method and device, it is intended to realize can it is low into Originally, expeditiously store and inquire about various data files.
For this reason, the present invention is disclosed directly below technical solution:
A kind of document handling method, is used for realization file storage, the described method includes:
Obtain pending file;
Content piecemeal processing is carried out to the pending file, obtains each blocks of files and file block information;
Index information is established for each blocks of files, obtains the blocks of files index information of each blocks of files;
The blocks of files index information of predetermined file directory, the file block information and each blocks of files is carried out pair It should store;
Processing is compressed to the pending file, the compressed file of gained is stored in described predetermined after compression is handled File directory under, and delete the pending file.
The above method, it is preferred that the pending file of acquisition, including:
Obtain the pending file that user uploads to file server.
The above method, it is preferred that it is described that content piecemeal processing is carried out to the pending file, including:
Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, Mei Gewen The data volume of part block is not higher than the data-quantity threshold;
The above method, it is preferred that it is described to establish index information for each blocks of files, obtain the blocks of files rope of each blocks of files Fuse ceases, including:
Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;
According to the corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
A kind of document handling method, is used for realization file polling, the document handling method for being used for realization file polling Including:
Obtain file polling information input by user;
Using the file polling information, looked into the blocks of files index information of blocks of files corresponding to each file Ask, obtain blocks of files index information query result;
According to the blocks of files index information query result, generation and the file polling of the file polling information match As a result.
The above method, it is preferred that the acquisition file polling information input by user, including:
Obtain the keyword input by user for being used to carry out file polling.
The above method, it is preferred that the blocks of files index information is key word index, then described to utilize the file polling Information, is inquired about in the blocks of files index information of blocks of files corresponding to each file, obtains the inquiry of blocks of files index information As a result, including:
In the key word index of each blocks of files corresponding to each file, to the keyword progress input by user With inquiry, the blocks of files index information of each file and the match information of the keyword are obtained, and the match information is made For the blocks of files index information query result.
The above method, it is preferred that it is described in the key word index of each blocks of files corresponding to each file, it is defeated to user The keyword entered carries out matching inquiry, including:
Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out Parallel keyword match inquiry.
The above method, it is preferred that described according to the blocks of files index information query result, generation and the file polling The file polling of information match as a result, including:
According to the match information of the blocks of files index information of each file and the keyword, obtain each file with it is described The matching degree of keyword;
According to the matching degree of each file and the keyword, by the order of matching degree descending to the file mesh of each file Record is ranked up, and is exported file directory ranking results as file polling result.
The above method, it is preferred that further include:
When receiving user for the file download request of corresponding document catalogue in the file directory ranking results, from Compressing file bag is downloaded in the corresponding document catalogue.
A kind of document handling apparatus, is used for realization file storage, and described device includes:
First acquisition unit, for obtaining pending file;
File block unit, for carrying out content piecemeal processing to the pending file, obtains each blocks of files and text Part blocking information;
Index establishes unit, for establishing index information for each blocks of files, obtains the blocks of files index of each blocks of files Information;
Storage unit, for by the blocks of files of predetermined file directory, the file block information and each blocks of files Index information carries out corresponding to storage;And processing is compressed to the pending file, the compression of gained after compression is handled File is stored under the predetermined file directory, and deletes the pending file.
Upper assembling device, it is preferred that the first acquisition unit, is specifically used for:
Obtain the pending file that user uploads to file server.
Upper assembling device, it is preferred that the file block unit, is specifically used for:
Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, Mei Gewen The data volume of part block is not higher than the data-quantity threshold.
Upper assembling device, it is preferred that the index establishes unit, is specifically used for:
Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;According to described The corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
A kind of document handling apparatus, is used for realization file polling, the document handling apparatus for being used for realization file polling Including:
Second acquisition unit, for obtaining file polling information input by user;
Query unit, for utilizing the file polling information, indexes in the blocks of files of blocks of files corresponding to each file Inquired about in information, obtain blocks of files index information query result;
Query result generation unit, for being looked into according to the blocks of files index information query result, generation with the file Ask the file polling result of information match.
Upper assembling device, it is preferred that the second acquisition unit is specifically used for:
Obtain the keyword input by user for being used to carry out file polling.
Upper assembling device, it is preferred that the blocks of files index information is key word index, then the query unit, specific to use In:
In the key word index of each blocks of files corresponding to each file, to the keyword progress input by user With inquiry, the blocks of files index information of each file and the match information of the keyword are obtained, and the match information is made For the blocks of files index information query result.
Upper assembling device, it is preferred that the query unit, in the key word index of each blocks of files corresponding to each file In, matching inquiry is carried out to the keyword input by user, is specifically included:
Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out Parallel keyword match inquiry.
Upper assembling device, it is preferred that the query result generation unit, is specifically used for:
According to the match information of the blocks of files index information of each file and the keyword, obtain each file with it is described The matching degree of keyword;According to the matching degree of each file and the keyword, by the order of matching degree descending to each file File directory be ranked up, and using file directory ranking results as file polling result export.
Upper assembling device, it is preferred that further include:
File download unit, for receiving user for corresponding document catalogue in the file directory ranking results When file download is asked, compressing file bag is downloaded from the corresponding document catalogue.
From above scheme, document handling method and device provided by the invention, when carrying out storage processing to file, Content piecemeal is carried out to file and each blocks of files obtained by piecemeal establishes index information, on this basis, to file directory, text Part blocking information and blocks of files index information carry out corresponding to storage, while storage is compressed to file and deletes original document; Subsequently inquired about especially by the corresponding each blocks of files index information of file, to realize required file polling.Due to this Scheme of the invention carries out corresponding to storage to file directory, file block information and blocks of files index information, while file is pressed Contracting, which stores, simultaneously deletes original document so that compared with storing the conventional store mode of original document, can effectively save memory space, Simultaneously as the present invention inquires about each the blocks of files index information rather than original document of file, looked into so as to lift file Efficiency is ask, in consideration of it, the present invention can be achieved low cost, expeditiously store and inquire about various data files.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is the document handling method flow chart provided in an embodiment of the present invention for being used for realization file storage;
Fig. 2 is the schematic diagram that user provided in an embodiment of the present invention uploads files to file server;
Fig. 3 is provided in an embodiment of the present invention piecemeal is carried out to file content and establishes the schematic diagram of keyword index;
Fig. 4 is that provided in an embodiment of the present invention correspond to file directory, file block information and index information is stored in number According to the schematic diagram in storehouse;
Fig. 5 is a kind of document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Fig. 6 is provided in an embodiment of the present invention according to keyword input by user, to each blocks of files index information of file The schematic diagram inquired about;
Fig. 7 is provided in an embodiment of the present invention to export query result information to the schematic diagram of user terminal;
Fig. 8 is another document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Fig. 9 is another document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Figure 10 is the structure diagram of the document handling apparatus provided in an embodiment of the present invention for being used for realization file storage;
Figure 11-Figure 12 is the structural representation of the document handling apparatus provided in an embodiment of the present invention for being used for realization file polling Figure.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment, belongs to the scope of protection of the invention.
The embodiment of the present invention provides a kind of document handling method and device, it is intended to realize can low cost, expeditiously deposit Various data files are stored up and inquire about, below will be by multiple to meet medium-sized and small enterprises in file storage and the needs of query aspects The document handling method and device of the present invention is described in detail in embodiment.
In one embodiment of the invention, there is provided a kind of document handling method for being used for realization file storage, with reference to figure 1 A kind of flow chart of document handling method, this method comprise the following steps in the present embodiment shown:
Step 101, obtain pending file.
The pending file can be but not limited to the file that user is uploaded to file server, for example, with reference to figure 2, It can be specifically the file that user selected and be uploaded to file server from its local computing.In practical application, user into When the selection and upload of style of writing part, however it is not limited to be once only capable of selecting and uploading a file, generally can also support user at the same time Multiple files are selected, so that user can once upload multiple files, for example, specifically user can be supported once to select and upload 10 files etc..
Wherein, file server can support the upload and storage of polytype file, specifically, for example can support Word excel txt html java xml css etc. it is various can by file type of row reading of content etc. so that, should With and when implementing the present invention program, above-mentioned any type of pending file can be uploaded to by user according to its actual demand File server so that file server by the various processes of the method for the present invention to the pending file of upload at Reason.
Step 102, carry out content piecemeal processing to the pending file, obtains each blocks of files and file block letter Breath.
In this step, the pending file based on predetermined data-quantity threshold, can be specifically divided into the text of respective numbers Part block, so as to obtain each blocks of files and corresponding file block information;Wherein, the data volume of each blocks of files is not higher than institute State data-quantity threshold.
The file block information can include but is not limited to blocks of files number, file block number and the blocks of files of file Size etc..
Such as, it is assumed that the data-quantity threshold is set as 64K (64X1024byte), while assumes the pending text Part is the word file of a 500K, then based on the data-quantity threshold, when carrying out piecemeal processing to the word file, ginseng The schematic diagram that piecemeal is carried out to file shown in Fig. 3 is examined, the word file specifically can be divided into 8 blocks of files, wherein, 1-7 blocks It is 64K per block size, the 8th piece is 52K.
Step 103, be that each blocks of files establishes index information, obtains the blocks of files index information of each blocks of files.
Data content of this step based on each blocks of files establishes index information, specifically, this step for each blocks of files Included each participle word in data content based on each blocks of files, to establish corresponding file for each blocks of files Block index information.
It is corresponding to be established for blocks of files due to included each participle word in the data content based on blocks of files Blocks of files index information when, it is necessary to each participle word included by blocks of files be used, so as to be built for each blocks of files When founding corresponding index information, cutting word/word segmentation processing need to be carried out to each blocks of files first.In practical application, due to each Blocks of files generally comprises the multiple paragraphs split by paragraph Separator, and each paragraph includes passing through punctuation mark again (e.g.,.;!Deng) multiple sentences for being split, therefore, when carrying out cutting word/word segmentation processing to blocks of files, section can be first according to It is multiple paragraphs to fall separator by blocks of files cutting, and and then by each paragraph cutting be multiple sentences by punctuation mark, herein On the basis of, it can continue sentence being cut into word, which is known as cutting word/participle, and its object is to carry out blocks of files content Cutting, forms the information list in units of keyword to match with user's querying condition.
The method of cutting word/participle has two kinds:Matching participle and statistics participle.Wherein, matching participle refers in blocks of files Content of the sentence and default " dictionary " with magnanimity vocabulary in word be compared, be then hit if matched word. Statistics participle refers to whether one can be formed according to probabilistic determination this 2 words (word) of adjacent 2 or multiple words (word) appearance A word.The present invention supports " dictionary " in being segmented to matching to be added, such as the change with people's life content, " capital The keyword such as east ", " wechat ", " power of chaotic state " can be added in " dictionary ".After cutting word processing, blocks of files is formed Keyword List.On this basis, file can be obtained according to each keyword in Keyword List in file layout scenarios in the block Every record in the key word index information of the key word index information of block, wherein blocks of files can include:Keyword, key Word numbering, occurrence number, file block number, positional information hereof etc..The number occurred according to statistical information, keyword Ratio with the total vocabulary of this document is usually 3%-8%, and the keyword message of each blocks of files is with for future reference in such save file Inquiry, without inquiring about whole file, just improves file polling efficiency.
Step 104, index the blocks of files of predetermined file directory, the file block information and each blocks of files and believe Breath carries out corresponding to storage.
The predetermined file directory, can be user when selecting and uploading the pending file, set use In the catalogue for preserving the pending file.
In the present embodiment, file server, can be by the text of user's upload after the pending file of user's upload is received Part is stored in date format, and (in date Hour Minute Second, such as catalogue 20170302100834), and user is in upper transmitting file The set catalogue for save file, file server can be placed on it before date format catalogue, so as to finally may be used The date format catalogue that the catalogue and file server being set by the user provide collectively forms the final catalogue of file, for example, Assuming that the catalogue that user sets is wtq0952, and the date format catalogue that file server provides is 20170302100834, then The final catalogue of file is wtq0952/20170302100834/.
Piecemeal processing is carried out in the pending file that file server uploads user, and is piecemeal processing gained After each blocks of files establishes corresponding blocks of files index information, this step is by the file directory of file, file block information (file Block number mesh, file block number and file block size etc.) and file each blocks of files index information (keyword, keyword compile Number, occurrence number, file block number, positional information hereof etc.) carry out corresponding to storage.
Specifically, can be by pair of each blocks of files index information of file directory, file block information and file with reference to figure 4 It should be related to that the database for being stored in file server is medium, support is provided with the file polling processing for being embodied as follow-up.
Step 105, be compressed processing to the pending file, and the compressed file of gained is stored in after compression is handled Under the predetermined file directory, and delete the pending file.
Except storage file catalogue, file block information and the correspondence of blocks of files index information, the present invention is also to text Part is compressed storage, that is to say, that when being stored to the pending file, the tradition with directly storing original document Storage mode distinguishes, and the present invention is to be compressed storage to the pending file.
I.e. specifically, the present invention is compressed processing to the pending file, and the compression obtained by after compression is handled File is stored in the file directory, specifically by compressed file be stored in user upload as described in pending file when set by File directory it is medium, and at the same time from file server delete user upload the pending file original document, this Sample just significantly saves the memory space of file server, and then can reduce the carrying cost of user file.
Document handling method provided in this embodiment, when carrying out storage processing to file, content piecemeal is carried out to file And each blocks of files obtained by piecemeal establishes index information, on this basis, to file directory, file block information and blocks of files Index information carries out corresponding to storage, while storage is compressed to file and deletes original document.Since the present invention program is to text Part catalogue, file block information and blocks of files index information carry out corresponding to storage, while are compressed storage to file and delete Original document, thus with store original document conventional store mode compared with, can effectively save memory space, therefore, using this Scheme of the invention, it can be achieved that store various data files at low cost.
In the ensuing another embodiment of the present invention, continue to disclose a kind of file process side for being used for realization file polling Method, the document handling method of the present embodiment are corresponding with the document handling method that file storage is used for realization in a upper embodiment. A kind of document handling method flow chart for being used for realization file polling, this method can wrap in the present embodiment with reference to shown in figure 5 Include following steps:
Step 501, obtain file polling information input by user.
It is described to obtain file polling information input by user in this step, can obtain user with reference to figure 6 specifically Input be used for carry out one or more keywords of file polling etc..
Step 502, using the file polling information, in the blocks of files index information of blocks of files corresponding to each file Inquired about, obtain blocks of files index information query result.
After the file polling information such as keyword input by user are obtained, it can be looked into according to files such as keywords input by user Information is ask, inquiry and the file of the file polling information match from file server.
It is not that file is taken with reference to figure 6 when carrying out matching inquiry to keyword input by user in the present embodiment The full-text query of each file progress file stored in business device, but the blocks of files rope of each blocks of files to being included corresponding to file Fuse breath is inquired about, i.e., specifically, by the keyword in the corresponding blocks of files index information of each blocks of files with it is input by user Keyword is matched, so as to obtain corresponding blocks of files index information query result.
Wherein, specifically can be according to key when carrying out the inquiry of blocks of files index information according to keyword input by user The factors such as word matching degree, Keyword Density, key position, obtain whether including keyword in blocks of files, keyword is in file The number of middle appearance, the number and the ratio of the total vocabulary of this document that keyword occurs, the position that keyword occurs hereof Etc. match information, and using the match information as blocks of files index information query result.
Step 503, according to the blocks of files index information query result, generation and the file polling information match File polling result.
On the basis of being inquired about to obtain blocks of files index information query result to the blocks of files index information of file, this Step according to the blocks of files index information query result, can obtain the matching degree of each file and the keyword, go forward side by side one Step is collected and is sorted to the file directory of each file by the order of matching degree descending, and file directory ranking results are made Exported for file polling result.
Specifically, in practical application, the number of matches of keyword and keyword input by user in file can be used (i.e. literary The occurrence number of keyword input by user in part), to weigh the matching degree of this document and keyword input by user, wherein, The number of matches of keyword and keyword input by user in file is bigger, represents of file and keyword input by user It is higher with spending, conversely, then lower.
Such as, it is assumed that the keyword match quantity 10 of file 1, the keyword match quantity 30 of file 2, then file 2 is with using Matching degree of the matching degree of the keyword of family input higher than file 1 and keyword input by user.Knowing each file with using On the basis of the matching degree of the keyword of family input, the matching degree according to each file and keyword input by user can be continued, pressed The order of matching degree descending collects the file directory of each file and is sorted (sequencer procedure alternatively referred to as falls to sort), And exported using file directory ranking results as file polling result to user, with reference to figure 7, the result can be specifically exported to use Each Terminal Types such as the local computing at family, mobile phone/PAD, subsequently may be such that the query result information of user according to output, selectivity Download required file in ground.
From above scheme, document handling method provided in this embodiment, when carrying out query processing to file, specifically By inquiring about the corresponding each blocks of files index information of file, to realize required file polling.Due to the present embodiment Scheme inquires about each the blocks of files index information rather than original document of file, so as to lift file polling efficiency, in view of This, the present invention can be achieved low cost, expeditiously inquire about various data files.
In the ensuing further embodiment of the present invention, one kind is used for realization text in the present embodiment with reference to shown in figure 8 The document handling method flow chart of part inquiry, in the present embodiment, the step 202 (utilizes the file polling information, each Inquired about in the blocks of files index information of blocks of files corresponding to file) it can be realized by following processing procedure:
Step 5021, based on keyword input by user, to the keyword rope of each blocks of files corresponding to each file Introduce the parallel keyword match inquiry of row.
In the present embodiment, based on keyword input by user, to the key of each blocks of files corresponding to each file When word indexing information carries out keyword match inquiry, with specific reference to the blocks of files number included by file, distributed and corresponded to by system The process of number is responsible for inquiry operation, if for example, blocks of files number included in file is 8, carries out to file During inquiry, 8 processes are specifically distributed, 8 task parallelisms perform, each process is according to keyword query one input by user Blocks of files index information, the query result for the whole file is obtained eventually through the implementing result for collecting each process.
Blocks of files number provided in this embodiment according to included by file, the process for distributing corresponding number are come to file Each blocks of files index information carries out the scheme of parallel query, can further lift file polling efficiency.
In the ensuing further embodiment of the present invention, one kind is used for realization text in the present embodiment with reference to shown in figure 9 The document handling method flow chart of part inquiry, in the present embodiment, the document handling method can also comprise the following steps:
Step 504, receiving file download of the user for corresponding document catalogue in the file directory ranking results During request, compressing file bag is downloaded from the corresponding document catalogue.
After pushing down sortord and exporting each file directory to user, required file directory can be selected by user, with Carry out the download of file corresponding in the catalogue.
Since the present invention is when carrying out file storage, to save the occupancy of memory space, specifically stored in file directory Be compressing file bag rather than original document so that, the corresponding text in user is received for the file directory ranking results During the file download request of part catalogue, the content downloaded from this document catalogue is specifically compressing file bag, subsequently can be by user By performing decompression operation to the compressing file bag of download, to obtain required file.
It should be noted that in embodiments of the present invention, the individual character that " keyword " includes being made of a word is crucial Word, and the keyword being made of more than one word.
In the ensuing another embodiment of the present invention, there is provided a kind of document handling apparatus for being used for realization file storage, The structure diagram of this document processing unit with reference to shown in figure 10, this is used for realization the document handling apparatus bag of file storage Include:
First acquisition unit 101, for obtaining pending file;
File block unit 102, for carrying out content piecemeal processing to the pending file, obtain each blocks of files and File block information;
Index establishes unit 103, for establishing index information for each blocks of files, obtains the blocks of files rope of each blocks of files Fuse ceases;
Storage unit 104, for by the file of predetermined file directory, the file block information and each blocks of files Block index information carries out corresponding to storage;And processing is compressed to the pending file, the pressure of gained after compression is handled Contracting file is stored under the predetermined file directory, and deletes the pending file.
In an embodiment of the embodiment of the present invention, the first acquisition unit 101, is specifically used for:Obtain user to The pending file that file server uploads.
In an embodiment of the embodiment of the present invention, the file block unit 102, is specifically used for:Based on predetermined The pending file, is divided into the blocks of files of respective numbers by data-quantity threshold;Wherein, the data volume of each blocks of files is not higher than The data-quantity threshold;
In an embodiment of the embodiment of the present invention, the index establishes unit 103, is specifically used for:To each file Block carries out cutting word processing, obtains the corresponding Keyword List of each blocks of files;According to each corresponding pass of blocks of files Key word list, key word index is established for each blocks of files.
The present embodiment discloses a kind of corresponding use of document handling apparatus for being used for realization file storage with more than at the same time In the document handling apparatus for realizing file polling, the structure diagram of this document processing unit with reference to shown in figure 11, this is used for Realizing the document handling apparatus of file polling includes:
Second acquisition unit 201, for obtaining file polling information input by user;
Query unit 202, for utilizing the file polling information, in the blocks of files rope of blocks of files corresponding to each file Inquired about in fuse breath, obtain blocks of files index information query result;
Query result generation unit 203, for according to the blocks of files index information query result, generation and the file The file polling result that Query Information matches.
In an embodiment of the embodiment of the present invention, the second acquisition unit 201 is specifically used for:Obtain user's input Be used for carry out the keyword of file polling.
In an embodiment of the embodiment of the present invention, the blocks of files index information is key word index, then described to look into Unit 202 is ask, is specifically used for:In the key word index of each blocks of files corresponding to each file, to input by user described Keyword carries out matching inquiry, obtains the blocks of files index information of each file and the match information of the keyword, and by institute Match information is stated as the blocks of files index information query result.
In an embodiment of the embodiment of the present invention, the query unit 202, in each file corresponding to each file In the key word index of block, matching inquiry is carried out to the keyword input by user, is specifically included:Based on institute input by user Keyword is stated, parallel keyword match inquiry is carried out to the key word index of each blocks of files corresponding to each file.
In an embodiment of the embodiment of the present invention, the query result generation unit 203, is specifically used for:According to each The blocks of files index information of a file and the match information of the keyword, obtain the matching of each file and the keyword Degree;According to the matching degree of each file and the keyword, by matching degree descending order to the file directory of each file into Row sequence, and exported file directory ranking results as file polling result.
In an embodiment of the embodiment of the present invention, with reference to figure 12, the file process for being used for realization file polling Device can also include:File download unit 204, for corresponding in user is received for the file directory ranking results During the file download request of file directory, compressing file bag is downloaded from the corresponding document catalogue.
For the document handling apparatus of file storage and inquiry is used for realization disclosed in the embodiment of the present invention three, due to It is corresponding with the document handling method for being used for realization file storage and inquiry that various embodiments above provides, so the ratio of description Relatively simple, related similarity, which refers to, is used for realization file storage and inquiry document handling method part in various embodiments above Explanation, be no longer described in detail herein.
In conclusion compared with the prior art, document handling method and device provided by the invention have the advantage that:This Invention is compressed file on file server and stores and non-memory original document, saves user resources, reduces use Family cost, is especially suitable for medium-sized and small enterprises.The present invention is very low to applicable environmental requirement, and migration is good, Windows or Unix/ Linux environment can be applicable in, and oracle/DB2/Sybase/MySql databases also can be applicable in.So as to for needing to carry out For the medium-sized and small enterprises of processing such as file storage, inquiry, the present invention can save file process cost, improve file process effect Rate, has running environment good migration.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to.
For convenience of description, describe to be divided into various modules when system above or device with function or unit describes respectively. Certainly, the function of each unit can be realized in same or multiple softwares and/or hardware when implementing the present invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can Realized by the mode of software plus required general hardware platform.Based on such understanding, technical scheme essence On the part that contributes in other words to the prior art can be embodied in the form of software product, the computer software product It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are used so that a computer equipment (can be personal computer, server, either network equipment etc.) performs some of each embodiment of the present invention or embodiment Method described in part.
Finally, it is to be noted that, herein, the relational terms of such as first, second, third and fourth or the like It is used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these There are any actual relationship or order between entity or operation.Moreover, term " comprising ", "comprising" or its is any Other variations are intended to non-exclusive inclusion, so that process, method, article or equipment including a series of elements Not only include those key elements, but also including other elements that are not explicitly listed, or further include as this process, side Method, article or the intrinsic key element of equipment.In the absence of more restrictions, limited by sentence "including a ..." Key element, it is not excluded that also there are other identical element in the process, method, article or apparatus that includes the element.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (20)

  1. A kind of 1. document handling method, it is characterised in that file storage is used for realization, the described method includes:
    Obtain pending file;
    Content piecemeal processing is carried out to the pending file, obtains each blocks of files and file block information;
    Index information is established for each blocks of files, obtains the blocks of files index information of each blocks of files;
    The blocks of files index information of predetermined file directory, the file block information and each blocks of files correspond to depositing Storage;
    Processing is compressed to the pending file, the compressed file of gained is stored in the predetermined text after compression is handled Under part catalogue, and delete the pending file.
  2. 2. according to the method described in claim 1, it is characterized in that, described obtain pending file, including:
    Obtain the pending file that user uploads to file server.
  3. 3. according to the method described in claim 1, it is characterized in that, described carry out at content piecemeal the pending file Reason, including:
    Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, each blocks of files Data volume be not higher than the data-quantity threshold.
  4. 4. according to the method described in claim 1, it is characterized in that, described establish index information for each blocks of files, obtain every The blocks of files index information of a blocks of files, including:
    Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;
    According to the corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
  5. 5. a kind of document handling method, it is characterised in that file polling is used for realization, based on such as any one of claim 1-4 institutes The method stated, the document handling method for being used for realization file polling include:
    Obtain file polling information input by user;
    Using the file polling information, inquired about, obtained in the blocks of files index information of blocks of files corresponding to each file To blocks of files index information query result;
    According to the blocks of files index information query result, generation and the file polling knot of the file polling information match Fruit.
  6. 6. according to the method described in claim 5, it is characterized in that, described obtain file polling information input by user, including:
    Obtain the keyword input by user for being used to carry out file polling.
  7. 7. according to the method described in claim 6, it is characterized in that, the blocks of files index information is key word index, then institute State using the file polling information, inquired about, obtained in the blocks of files index information of blocks of files corresponding to each file Blocks of files index information query result, including:
    In the key word index of each blocks of files corresponding to each file, matching is carried out to the keyword input by user and is looked into Ask, obtain the blocks of files index information of each file and the match information of the keyword, and using the match information as institute State blocks of files index information query result.
  8. 8. the method according to the description of claim 7 is characterized in that key of each blocks of files corresponding to each file In word indexing, matching inquiry is carried out to the keyword input by user, including:
    Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out parallel Keyword match inquiry.
  9. 9. the method according to claim 7 or 8, it is characterised in that described inquired about according to the blocks of files index information is tied Fruit, generate with the file polling of the file polling information match as a result, including:
    According to the match information of the blocks of files index information of each file and the keyword, each file and the key are obtained The matching degree of word;
    According to the matching degree of each file and the keyword, by matching degree descending order to the file directory of each file into Row sequence, and exported file directory ranking results as file polling result.
  10. 10. according to the method described in claim 9, it is characterized in that, further include:
    When receiving user for the file download request of corresponding document catalogue in the file directory ranking results, from described Compressing file bag is downloaded in corresponding document catalogue.
  11. 11. a kind of document handling apparatus, it is characterised in that be used for realization file storage, described device includes:
    First acquisition unit, for obtaining pending file;
    File block unit, for carrying out content piecemeal processing to the pending file, obtains each blocks of files and file point Block message;
    Index establishes unit, for establishing index information for each blocks of files, obtains the blocks of files index information of each blocks of files;
    Storage unit, for the blocks of files of predetermined file directory, the file block information and each blocks of files to be indexed Information carries out corresponding to storage;And processing is compressed to the pending file, the compressed file of gained after compression is handled It is stored under the predetermined file directory, and deletes the pending file.
  12. 12. according to the devices described in claim 11, it is characterised in that the first acquisition unit, is specifically used for:
    Obtain the pending file that user uploads to file server.
  13. 13. according to the devices described in claim 11, it is characterised in that the file block unit, is specifically used for:
    Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, each blocks of files Data volume be not higher than the data-quantity threshold.
  14. 14. according to the devices described in claim 11, it is characterised in that the index establishes unit, is specifically used for:
    Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;According to described each The corresponding Keyword List of blocks of files, key word index is established for each blocks of files.
  15. 15. a kind of document handling apparatus, it is characterised in that file polling is used for realization, based on such as any one of claim 1-4 institutes The device stated, the document handling apparatus for being used for realization file polling include:
    Second acquisition unit, for obtaining file polling information input by user;
    Query unit, for utilizing the file polling information, in the blocks of files index information of blocks of files corresponding to each file In inquired about, obtain blocks of files index information query result;
    Query result generation unit, for being believed according to the blocks of files index information query result, generation with the file polling The matched file polling result of manner of breathing.
  16. 16. device according to claim 15, it is characterised in that the second acquisition unit is specifically used for:
    Obtain the keyword input by user for being used to carry out file polling.
  17. 17. device according to claim 16, it is characterised in that the blocks of files index information is key word index, then The query unit, is specifically used for:
    In the key word index of each blocks of files corresponding to each file, matching is carried out to the keyword input by user and is looked into Ask, obtain the blocks of files index information of each file and the match information of the keyword, and using the match information as institute State blocks of files index information query result.
  18. 18. device according to claim 17, it is characterised in that the query unit, each corresponding to each file In the key word index of blocks of files, matching inquiry is carried out to the keyword input by user, is specifically included:
    Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out parallel Keyword match inquiry.
  19. 19. the device according to claim 17 or 18, it is characterised in that the query result generation unit, is specifically used for:
    According to the match information of the blocks of files index information of each file and the keyword, each file and the key are obtained The matching degree of word;According to the matching degree of each file and the keyword, by the order of matching degree descending to the text of each file Part catalogue is ranked up, and is exported file directory ranking results as file polling result.
  20. 20. device according to claim 19, it is characterised in that further include:
    File download unit, for receiving file of the user for corresponding document catalogue in the file directory ranking results During download request, compressing file bag is downloaded from the corresponding document catalogue.
CN201711306239.1A 2017-12-11 2017-12-11 A kind of document handling method and device Pending CN108038188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711306239.1A CN108038188A (en) 2017-12-11 2017-12-11 A kind of document handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711306239.1A CN108038188A (en) 2017-12-11 2017-12-11 A kind of document handling method and device

Publications (1)

Publication Number Publication Date
CN108038188A true CN108038188A (en) 2018-05-15

Family

ID=62101528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711306239.1A Pending CN108038188A (en) 2017-12-11 2017-12-11 A kind of document handling method and device

Country Status (1)

Country Link
CN (1) CN108038188A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362578A (en) * 2019-07-12 2019-10-22 西南大学 A kind of computer information data quick reference system
CN111026827A (en) * 2019-12-06 2020-04-17 北京地拓科技发展有限公司 Data service method and device for soil erosion factors and electronic equipment
CN111309678A (en) * 2020-02-22 2020-06-19 呼和浩特市奥祥电力自动化有限公司 Data circulation storage method and network message recording and analyzing device
CN112734982A (en) * 2021-01-15 2021-04-30 北京小马慧行科技有限公司 Storage method and system for unmanned vehicle driving behavior data
CN116263792A (en) * 2023-04-21 2023-06-16 云目未来科技(湖南)有限公司 Method and system for crawling complex internet data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN102193917A (en) * 2010-03-01 2011-09-21 中国移动通信集团公司 Method and device for processing and querying data
CN102915365A (en) * 2012-10-24 2013-02-06 苏州两江科技有限公司 Hadoop-based construction method for distributed search engine
CN104699815A (en) * 2015-03-24 2015-06-10 北京嘀嘀无限科技发展有限公司 Data processing method and system
US20150261783A1 (en) * 2013-01-07 2015-09-17 Tencent Technology (Shenzhen) Company Limited Method and apparatus for storing and reading files
CN106250409A (en) * 2016-07-21 2016-12-21 中国农业银行股份有限公司 Data query method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193917A (en) * 2010-03-01 2011-09-21 中国移动通信集团公司 Method and device for processing and querying data
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN102915365A (en) * 2012-10-24 2013-02-06 苏州两江科技有限公司 Hadoop-based construction method for distributed search engine
US20150261783A1 (en) * 2013-01-07 2015-09-17 Tencent Technology (Shenzhen) Company Limited Method and apparatus for storing and reading files
CN104699815A (en) * 2015-03-24 2015-06-10 北京嘀嘀无限科技发展有限公司 Data processing method and system
CN106250409A (en) * 2016-07-21 2016-12-21 中国农业银行股份有限公司 Data query method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362578A (en) * 2019-07-12 2019-10-22 西南大学 A kind of computer information data quick reference system
CN111026827A (en) * 2019-12-06 2020-04-17 北京地拓科技发展有限公司 Data service method and device for soil erosion factors and electronic equipment
CN111309678A (en) * 2020-02-22 2020-06-19 呼和浩特市奥祥电力自动化有限公司 Data circulation storage method and network message recording and analyzing device
CN111309678B (en) * 2020-02-22 2023-01-03 呼和浩特市奥祥电力自动化有限公司 Data circular storage method and network message recording and analyzing device
CN112734982A (en) * 2021-01-15 2021-04-30 北京小马慧行科技有限公司 Storage method and system for unmanned vehicle driving behavior data
CN116263792A (en) * 2023-04-21 2023-06-16 云目未来科技(湖南)有限公司 Method and system for crawling complex internet data
CN116263792B (en) * 2023-04-21 2023-07-18 云目未来科技(湖南)有限公司 Method and system for crawling complex internet data

Similar Documents

Publication Publication Date Title
CN108038188A (en) A kind of document handling method and device
US6691123B1 (en) Method for structuring and searching information
Spiliopoulou et al. A data miner analyzing the navigational behaviour of web users
CN102906751B (en) A kind of method of data storage, data query and device
US8266147B2 (en) Methods and systems for database organization
CN100456298C (en) Advertisement information retrieval system and method therefor
CN102929901B (en) The method and apparatus improving data warehouse performance
US8171029B2 (en) Automatic generation of ontologies using word affinities
CN102667761A (en) Scalable cluster database
CN105843841A (en) Small file storage method and system
CN102375853A (en) Distributed database system, method for building index therein and query method
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
CN112269816B (en) Government affair appointment correlation retrieval method
CN108701134A (en) The searching method and device of the archiving method and device of database, the database of archive
US7089233B2 (en) Method and system for searching for web content
JP2003076715A (en) Method and system for retrieving web pages, program and recording medium
CN107644050A (en) A kind of querying method and device of the Hbase based on solr
CN108984626B (en) Data processing method and device and server
CN108681577A (en) A kind of novel library structure data index method
CN108009290A (en) A kind of data modeling and storage method of track traffic command centre gauze big data
CN102467544B (en) Information smart searching method and system based on space fuzzy coding
CN101944116A (en) Complex multi-dimensional hierarchical connection and aggregation method for data warehouse
CN104834739A (en) Internet information storage system
CN106326280A (en) Data processing method, apparatus and system
CN112800083B (en) Government decision-oriented government affair big data analysis method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180515

RJ01 Rejection of invention patent application after publication