CN108038188A - A kind of document handling method and device - Google Patents
A kind of document handling method and device Download PDFInfo
- Publication number
- CN108038188A CN108038188A CN201711306239.1A CN201711306239A CN108038188A CN 108038188 A CN108038188 A CN 108038188A CN 201711306239 A CN201711306239 A CN 201711306239A CN 108038188 A CN108038188 A CN 108038188A
- Authority
- CN
- China
- Prior art keywords
- file
- blocks
- files
- information
- index information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Document handling method and device provided by the invention, when carrying out storage processing to file, content piecemeal is carried out to file and each blocks of files obtained by piecemeal establishes index information, on this basis, file directory, file block information and blocks of files index information are carried out to correspond to storage, while storage is compressed to file and deletes original document;Subsequently inquired about especially by the corresponding each blocks of files index information of file, to realize required file polling.Since the present invention program carries out corresponding to storage to file directory, file block information and blocks of files index information, storage is compressed to file and deletes original document at the same time, so as to compared with storing the conventional store mode of original document, can effectively save memory space, at the same time, since the present invention inquires about each the blocks of files index information rather than original document of file, so as to lift file polling efficiency, in consideration of it, the present invention can be achieved low cost, expeditiously store and inquire about various data files.
Description
Technical field
Storage, inquiring technology field the invention belongs to data file, more particularly to a kind of document handling method and device.
Background technology
In current internet and information age, substantial amounts of text information is generated, and then generates substantial amounts of data
File, wherein word excel txt html java xml css be most basic and common form.
For these substantial amounts of data files, storage and the inquiry problem of data file are necessarily faced with, at present city
The commercialization file generally provided on face by companies such as Google, Baidu stores and inquiry services, to carry out depositing for data file
Storage and inquiry, but for medium-sized and small enterprises, these current commercialization storages and inquiry service generally existing are costly,
The shortcomings of file occupied space is big, and query search speed is slow.
Therefore, this area is it is urgent to provide a kind of preferably data file processing method, to meet that medium-sized and small enterprises are deposited in file
Storage and the demand of query aspects, realize low cost, expeditiously store and inquire about various data files.
The content of the invention
In view of this, it is an object of the invention to provide a kind of document handling method and device, it is intended to realize can it is low into
Originally, expeditiously store and inquire about various data files.
For this reason, the present invention is disclosed directly below technical solution:
A kind of document handling method, is used for realization file storage, the described method includes:
Obtain pending file;
Content piecemeal processing is carried out to the pending file, obtains each blocks of files and file block information;
Index information is established for each blocks of files, obtains the blocks of files index information of each blocks of files;
The blocks of files index information of predetermined file directory, the file block information and each blocks of files is carried out pair
It should store;
Processing is compressed to the pending file, the compressed file of gained is stored in described predetermined after compression is handled
File directory under, and delete the pending file.
The above method, it is preferred that the pending file of acquisition, including:
Obtain the pending file that user uploads to file server.
The above method, it is preferred that it is described that content piecemeal processing is carried out to the pending file, including:
Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, Mei Gewen
The data volume of part block is not higher than the data-quantity threshold;
The above method, it is preferred that it is described to establish index information for each blocks of files, obtain the blocks of files rope of each blocks of files
Fuse ceases, including:
Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;
According to the corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
A kind of document handling method, is used for realization file polling, the document handling method for being used for realization file polling
Including:
Obtain file polling information input by user;
Using the file polling information, looked into the blocks of files index information of blocks of files corresponding to each file
Ask, obtain blocks of files index information query result;
According to the blocks of files index information query result, generation and the file polling of the file polling information match
As a result.
The above method, it is preferred that the acquisition file polling information input by user, including:
Obtain the keyword input by user for being used to carry out file polling.
The above method, it is preferred that the blocks of files index information is key word index, then described to utilize the file polling
Information, is inquired about in the blocks of files index information of blocks of files corresponding to each file, obtains the inquiry of blocks of files index information
As a result, including:
In the key word index of each blocks of files corresponding to each file, to the keyword progress input by user
With inquiry, the blocks of files index information of each file and the match information of the keyword are obtained, and the match information is made
For the blocks of files index information query result.
The above method, it is preferred that it is described in the key word index of each blocks of files corresponding to each file, it is defeated to user
The keyword entered carries out matching inquiry, including:
Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out
Parallel keyword match inquiry.
The above method, it is preferred that described according to the blocks of files index information query result, generation and the file polling
The file polling of information match as a result, including:
According to the match information of the blocks of files index information of each file and the keyword, obtain each file with it is described
The matching degree of keyword;
According to the matching degree of each file and the keyword, by the order of matching degree descending to the file mesh of each file
Record is ranked up, and is exported file directory ranking results as file polling result.
The above method, it is preferred that further include:
When receiving user for the file download request of corresponding document catalogue in the file directory ranking results, from
Compressing file bag is downloaded in the corresponding document catalogue.
A kind of document handling apparatus, is used for realization file storage, and described device includes:
First acquisition unit, for obtaining pending file;
File block unit, for carrying out content piecemeal processing to the pending file, obtains each blocks of files and text
Part blocking information;
Index establishes unit, for establishing index information for each blocks of files, obtains the blocks of files index of each blocks of files
Information;
Storage unit, for by the blocks of files of predetermined file directory, the file block information and each blocks of files
Index information carries out corresponding to storage;And processing is compressed to the pending file, the compression of gained after compression is handled
File is stored under the predetermined file directory, and deletes the pending file.
Upper assembling device, it is preferred that the first acquisition unit, is specifically used for:
Obtain the pending file that user uploads to file server.
Upper assembling device, it is preferred that the file block unit, is specifically used for:
Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, Mei Gewen
The data volume of part block is not higher than the data-quantity threshold.
Upper assembling device, it is preferred that the index establishes unit, is specifically used for:
Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;According to described
The corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
A kind of document handling apparatus, is used for realization file polling, the document handling apparatus for being used for realization file polling
Including:
Second acquisition unit, for obtaining file polling information input by user;
Query unit, for utilizing the file polling information, indexes in the blocks of files of blocks of files corresponding to each file
Inquired about in information, obtain blocks of files index information query result;
Query result generation unit, for being looked into according to the blocks of files index information query result, generation with the file
Ask the file polling result of information match.
Upper assembling device, it is preferred that the second acquisition unit is specifically used for:
Obtain the keyword input by user for being used to carry out file polling.
Upper assembling device, it is preferred that the blocks of files index information is key word index, then the query unit, specific to use
In:
In the key word index of each blocks of files corresponding to each file, to the keyword progress input by user
With inquiry, the blocks of files index information of each file and the match information of the keyword are obtained, and the match information is made
For the blocks of files index information query result.
Upper assembling device, it is preferred that the query unit, in the key word index of each blocks of files corresponding to each file
In, matching inquiry is carried out to the keyword input by user, is specifically included:
Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out
Parallel keyword match inquiry.
Upper assembling device, it is preferred that the query result generation unit, is specifically used for:
According to the match information of the blocks of files index information of each file and the keyword, obtain each file with it is described
The matching degree of keyword;According to the matching degree of each file and the keyword, by the order of matching degree descending to each file
File directory be ranked up, and using file directory ranking results as file polling result export.
Upper assembling device, it is preferred that further include:
File download unit, for receiving user for corresponding document catalogue in the file directory ranking results
When file download is asked, compressing file bag is downloaded from the corresponding document catalogue.
From above scheme, document handling method and device provided by the invention, when carrying out storage processing to file,
Content piecemeal is carried out to file and each blocks of files obtained by piecemeal establishes index information, on this basis, to file directory, text
Part blocking information and blocks of files index information carry out corresponding to storage, while storage is compressed to file and deletes original document;
Subsequently inquired about especially by the corresponding each blocks of files index information of file, to realize required file polling.Due to this
Scheme of the invention carries out corresponding to storage to file directory, file block information and blocks of files index information, while file is pressed
Contracting, which stores, simultaneously deletes original document so that compared with storing the conventional store mode of original document, can effectively save memory space,
Simultaneously as the present invention inquires about each the blocks of files index information rather than original document of file, looked into so as to lift file
Efficiency is ask, in consideration of it, the present invention can be achieved low cost, expeditiously store and inquire about various data files.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the document handling method flow chart provided in an embodiment of the present invention for being used for realization file storage;
Fig. 2 is the schematic diagram that user provided in an embodiment of the present invention uploads files to file server;
Fig. 3 is provided in an embodiment of the present invention piecemeal is carried out to file content and establishes the schematic diagram of keyword index;
Fig. 4 is that provided in an embodiment of the present invention correspond to file directory, file block information and index information is stored in number
According to the schematic diagram in storehouse;
Fig. 5 is a kind of document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Fig. 6 is provided in an embodiment of the present invention according to keyword input by user, to each blocks of files index information of file
The schematic diagram inquired about;
Fig. 7 is provided in an embodiment of the present invention to export query result information to the schematic diagram of user terminal;
Fig. 8 is another document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Fig. 9 is another document handling method flow chart provided in an embodiment of the present invention for being used for realization file polling;
Figure 10 is the structure diagram of the document handling apparatus provided in an embodiment of the present invention for being used for realization file storage;
Figure 11-Figure 12 is the structural representation of the document handling apparatus provided in an embodiment of the present invention for being used for realization file polling
Figure.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment, belongs to the scope of protection of the invention.
The embodiment of the present invention provides a kind of document handling method and device, it is intended to realize can low cost, expeditiously deposit
Various data files are stored up and inquire about, below will be by multiple to meet medium-sized and small enterprises in file storage and the needs of query aspects
The document handling method and device of the present invention is described in detail in embodiment.
In one embodiment of the invention, there is provided a kind of document handling method for being used for realization file storage, with reference to figure 1
A kind of flow chart of document handling method, this method comprise the following steps in the present embodiment shown:
Step 101, obtain pending file.
The pending file can be but not limited to the file that user is uploaded to file server, for example, with reference to figure 2,
It can be specifically the file that user selected and be uploaded to file server from its local computing.In practical application, user into
When the selection and upload of style of writing part, however it is not limited to be once only capable of selecting and uploading a file, generally can also support user at the same time
Multiple files are selected, so that user can once upload multiple files, for example, specifically user can be supported once to select and upload
10 files etc..
Wherein, file server can support the upload and storage of polytype file, specifically, for example can support
Word excel txt html java xml css etc. it is various can by file type of row reading of content etc. so that, should
With and when implementing the present invention program, above-mentioned any type of pending file can be uploaded to by user according to its actual demand
File server so that file server by the various processes of the method for the present invention to the pending file of upload at
Reason.
Step 102, carry out content piecemeal processing to the pending file, obtains each blocks of files and file block letter
Breath.
In this step, the pending file based on predetermined data-quantity threshold, can be specifically divided into the text of respective numbers
Part block, so as to obtain each blocks of files and corresponding file block information;Wherein, the data volume of each blocks of files is not higher than institute
State data-quantity threshold.
The file block information can include but is not limited to blocks of files number, file block number and the blocks of files of file
Size etc..
Such as, it is assumed that the data-quantity threshold is set as 64K (64X1024byte), while assumes the pending text
Part is the word file of a 500K, then based on the data-quantity threshold, when carrying out piecemeal processing to the word file, ginseng
The schematic diagram that piecemeal is carried out to file shown in Fig. 3 is examined, the word file specifically can be divided into 8 blocks of files, wherein, 1-7 blocks
It is 64K per block size, the 8th piece is 52K.
Step 103, be that each blocks of files establishes index information, obtains the blocks of files index information of each blocks of files.
Data content of this step based on each blocks of files establishes index information, specifically, this step for each blocks of files
Included each participle word in data content based on each blocks of files, to establish corresponding file for each blocks of files
Block index information.
It is corresponding to be established for blocks of files due to included each participle word in the data content based on blocks of files
Blocks of files index information when, it is necessary to each participle word included by blocks of files be used, so as to be built for each blocks of files
When founding corresponding index information, cutting word/word segmentation processing need to be carried out to each blocks of files first.In practical application, due to each
Blocks of files generally comprises the multiple paragraphs split by paragraph Separator, and each paragraph includes passing through punctuation mark again
(e.g.,.;!Deng) multiple sentences for being split, therefore, when carrying out cutting word/word segmentation processing to blocks of files, section can be first according to
It is multiple paragraphs to fall separator by blocks of files cutting, and and then by each paragraph cutting be multiple sentences by punctuation mark, herein
On the basis of, it can continue sentence being cut into word, which is known as cutting word/participle, and its object is to carry out blocks of files content
Cutting, forms the information list in units of keyword to match with user's querying condition.
The method of cutting word/participle has two kinds:Matching participle and statistics participle.Wherein, matching participle refers in blocks of files
Content of the sentence and default " dictionary " with magnanimity vocabulary in word be compared, be then hit if matched word.
Statistics participle refers to whether one can be formed according to probabilistic determination this 2 words (word) of adjacent 2 or multiple words (word) appearance
A word.The present invention supports " dictionary " in being segmented to matching to be added, such as the change with people's life content, " capital
The keyword such as east ", " wechat ", " power of chaotic state " can be added in " dictionary ".After cutting word processing, blocks of files is formed
Keyword List.On this basis, file can be obtained according to each keyword in Keyword List in file layout scenarios in the block
Every record in the key word index information of the key word index information of block, wherein blocks of files can include:Keyword, key
Word numbering, occurrence number, file block number, positional information hereof etc..The number occurred according to statistical information, keyword
Ratio with the total vocabulary of this document is usually 3%-8%, and the keyword message of each blocks of files is with for future reference in such save file
Inquiry, without inquiring about whole file, just improves file polling efficiency.
Step 104, index the blocks of files of predetermined file directory, the file block information and each blocks of files and believe
Breath carries out corresponding to storage.
The predetermined file directory, can be user when selecting and uploading the pending file, set use
In the catalogue for preserving the pending file.
In the present embodiment, file server, can be by the text of user's upload after the pending file of user's upload is received
Part is stored in date format, and (in date Hour Minute Second, such as catalogue 20170302100834), and user is in upper transmitting file
The set catalogue for save file, file server can be placed on it before date format catalogue, so as to finally may be used
The date format catalogue that the catalogue and file server being set by the user provide collectively forms the final catalogue of file, for example,
Assuming that the catalogue that user sets is wtq0952, and the date format catalogue that file server provides is 20170302100834, then
The final catalogue of file is wtq0952/20170302100834/.
Piecemeal processing is carried out in the pending file that file server uploads user, and is piecemeal processing gained
After each blocks of files establishes corresponding blocks of files index information, this step is by the file directory of file, file block information (file
Block number mesh, file block number and file block size etc.) and file each blocks of files index information (keyword, keyword compile
Number, occurrence number, file block number, positional information hereof etc.) carry out corresponding to storage.
Specifically, can be by pair of each blocks of files index information of file directory, file block information and file with reference to figure 4
It should be related to that the database for being stored in file server is medium, support is provided with the file polling processing for being embodied as follow-up.
Step 105, be compressed processing to the pending file, and the compressed file of gained is stored in after compression is handled
Under the predetermined file directory, and delete the pending file.
Except storage file catalogue, file block information and the correspondence of blocks of files index information, the present invention is also to text
Part is compressed storage, that is to say, that when being stored to the pending file, the tradition with directly storing original document
Storage mode distinguishes, and the present invention is to be compressed storage to the pending file.
I.e. specifically, the present invention is compressed processing to the pending file, and the compression obtained by after compression is handled
File is stored in the file directory, specifically by compressed file be stored in user upload as described in pending file when set by
File directory it is medium, and at the same time from file server delete user upload the pending file original document, this
Sample just significantly saves the memory space of file server, and then can reduce the carrying cost of user file.
Document handling method provided in this embodiment, when carrying out storage processing to file, content piecemeal is carried out to file
And each blocks of files obtained by piecemeal establishes index information, on this basis, to file directory, file block information and blocks of files
Index information carries out corresponding to storage, while storage is compressed to file and deletes original document.Since the present invention program is to text
Part catalogue, file block information and blocks of files index information carry out corresponding to storage, while are compressed storage to file and delete
Original document, thus with store original document conventional store mode compared with, can effectively save memory space, therefore, using this
Scheme of the invention, it can be achieved that store various data files at low cost.
In the ensuing another embodiment of the present invention, continue to disclose a kind of file process side for being used for realization file polling
Method, the document handling method of the present embodiment are corresponding with the document handling method that file storage is used for realization in a upper embodiment.
A kind of document handling method flow chart for being used for realization file polling, this method can wrap in the present embodiment with reference to shown in figure 5
Include following steps:
Step 501, obtain file polling information input by user.
It is described to obtain file polling information input by user in this step, can obtain user with reference to figure 6 specifically
Input be used for carry out one or more keywords of file polling etc..
Step 502, using the file polling information, in the blocks of files index information of blocks of files corresponding to each file
Inquired about, obtain blocks of files index information query result.
After the file polling information such as keyword input by user are obtained, it can be looked into according to files such as keywords input by user
Information is ask, inquiry and the file of the file polling information match from file server.
It is not that file is taken with reference to figure 6 when carrying out matching inquiry to keyword input by user in the present embodiment
The full-text query of each file progress file stored in business device, but the blocks of files rope of each blocks of files to being included corresponding to file
Fuse breath is inquired about, i.e., specifically, by the keyword in the corresponding blocks of files index information of each blocks of files with it is input by user
Keyword is matched, so as to obtain corresponding blocks of files index information query result.
Wherein, specifically can be according to key when carrying out the inquiry of blocks of files index information according to keyword input by user
The factors such as word matching degree, Keyword Density, key position, obtain whether including keyword in blocks of files, keyword is in file
The number of middle appearance, the number and the ratio of the total vocabulary of this document that keyword occurs, the position that keyword occurs hereof
Etc. match information, and using the match information as blocks of files index information query result.
Step 503, according to the blocks of files index information query result, generation and the file polling information match
File polling result.
On the basis of being inquired about to obtain blocks of files index information query result to the blocks of files index information of file, this
Step according to the blocks of files index information query result, can obtain the matching degree of each file and the keyword, go forward side by side one
Step is collected and is sorted to the file directory of each file by the order of matching degree descending, and file directory ranking results are made
Exported for file polling result.
Specifically, in practical application, the number of matches of keyword and keyword input by user in file can be used (i.e. literary
The occurrence number of keyword input by user in part), to weigh the matching degree of this document and keyword input by user, wherein,
The number of matches of keyword and keyword input by user in file is bigger, represents of file and keyword input by user
It is higher with spending, conversely, then lower.
Such as, it is assumed that the keyword match quantity 10 of file 1, the keyword match quantity 30 of file 2, then file 2 is with using
Matching degree of the matching degree of the keyword of family input higher than file 1 and keyword input by user.Knowing each file with using
On the basis of the matching degree of the keyword of family input, the matching degree according to each file and keyword input by user can be continued, pressed
The order of matching degree descending collects the file directory of each file and is sorted (sequencer procedure alternatively referred to as falls to sort),
And exported using file directory ranking results as file polling result to user, with reference to figure 7, the result can be specifically exported to use
Each Terminal Types such as the local computing at family, mobile phone/PAD, subsequently may be such that the query result information of user according to output, selectivity
Download required file in ground.
From above scheme, document handling method provided in this embodiment, when carrying out query processing to file, specifically
By inquiring about the corresponding each blocks of files index information of file, to realize required file polling.Due to the present embodiment
Scheme inquires about each the blocks of files index information rather than original document of file, so as to lift file polling efficiency, in view of
This, the present invention can be achieved low cost, expeditiously inquire about various data files.
In the ensuing further embodiment of the present invention, one kind is used for realization text in the present embodiment with reference to shown in figure 8
The document handling method flow chart of part inquiry, in the present embodiment, the step 202 (utilizes the file polling information, each
Inquired about in the blocks of files index information of blocks of files corresponding to file) it can be realized by following processing procedure:
Step 5021, based on keyword input by user, to the keyword rope of each blocks of files corresponding to each file
Introduce the parallel keyword match inquiry of row.
In the present embodiment, based on keyword input by user, to the key of each blocks of files corresponding to each file
When word indexing information carries out keyword match inquiry, with specific reference to the blocks of files number included by file, distributed and corresponded to by system
The process of number is responsible for inquiry operation, if for example, blocks of files number included in file is 8, carries out to file
During inquiry, 8 processes are specifically distributed, 8 task parallelisms perform, each process is according to keyword query one input by user
Blocks of files index information, the query result for the whole file is obtained eventually through the implementing result for collecting each process.
Blocks of files number provided in this embodiment according to included by file, the process for distributing corresponding number are come to file
Each blocks of files index information carries out the scheme of parallel query, can further lift file polling efficiency.
In the ensuing further embodiment of the present invention, one kind is used for realization text in the present embodiment with reference to shown in figure 9
The document handling method flow chart of part inquiry, in the present embodiment, the document handling method can also comprise the following steps:
Step 504, receiving file download of the user for corresponding document catalogue in the file directory ranking results
During request, compressing file bag is downloaded from the corresponding document catalogue.
After pushing down sortord and exporting each file directory to user, required file directory can be selected by user, with
Carry out the download of file corresponding in the catalogue.
Since the present invention is when carrying out file storage, to save the occupancy of memory space, specifically stored in file directory
Be compressing file bag rather than original document so that, the corresponding text in user is received for the file directory ranking results
During the file download request of part catalogue, the content downloaded from this document catalogue is specifically compressing file bag, subsequently can be by user
By performing decompression operation to the compressing file bag of download, to obtain required file.
It should be noted that in embodiments of the present invention, the individual character that " keyword " includes being made of a word is crucial
Word, and the keyword being made of more than one word.
In the ensuing another embodiment of the present invention, there is provided a kind of document handling apparatus for being used for realization file storage,
The structure diagram of this document processing unit with reference to shown in figure 10, this is used for realization the document handling apparatus bag of file storage
Include:
First acquisition unit 101, for obtaining pending file;
File block unit 102, for carrying out content piecemeal processing to the pending file, obtain each blocks of files and
File block information;
Index establishes unit 103, for establishing index information for each blocks of files, obtains the blocks of files rope of each blocks of files
Fuse ceases;
Storage unit 104, for by the file of predetermined file directory, the file block information and each blocks of files
Block index information carries out corresponding to storage;And processing is compressed to the pending file, the pressure of gained after compression is handled
Contracting file is stored under the predetermined file directory, and deletes the pending file.
In an embodiment of the embodiment of the present invention, the first acquisition unit 101, is specifically used for:Obtain user to
The pending file that file server uploads.
In an embodiment of the embodiment of the present invention, the file block unit 102, is specifically used for:Based on predetermined
The pending file, is divided into the blocks of files of respective numbers by data-quantity threshold;Wherein, the data volume of each blocks of files is not higher than
The data-quantity threshold;
In an embodiment of the embodiment of the present invention, the index establishes unit 103, is specifically used for:To each file
Block carries out cutting word processing, obtains the corresponding Keyword List of each blocks of files;According to each corresponding pass of blocks of files
Key word list, key word index is established for each blocks of files.
The present embodiment discloses a kind of corresponding use of document handling apparatus for being used for realization file storage with more than at the same time
In the document handling apparatus for realizing file polling, the structure diagram of this document processing unit with reference to shown in figure 11, this is used for
Realizing the document handling apparatus of file polling includes:
Second acquisition unit 201, for obtaining file polling information input by user;
Query unit 202, for utilizing the file polling information, in the blocks of files rope of blocks of files corresponding to each file
Inquired about in fuse breath, obtain blocks of files index information query result;
Query result generation unit 203, for according to the blocks of files index information query result, generation and the file
The file polling result that Query Information matches.
In an embodiment of the embodiment of the present invention, the second acquisition unit 201 is specifically used for:Obtain user's input
Be used for carry out the keyword of file polling.
In an embodiment of the embodiment of the present invention, the blocks of files index information is key word index, then described to look into
Unit 202 is ask, is specifically used for:In the key word index of each blocks of files corresponding to each file, to input by user described
Keyword carries out matching inquiry, obtains the blocks of files index information of each file and the match information of the keyword, and by institute
Match information is stated as the blocks of files index information query result.
In an embodiment of the embodiment of the present invention, the query unit 202, in each file corresponding to each file
In the key word index of block, matching inquiry is carried out to the keyword input by user, is specifically included:Based on institute input by user
Keyword is stated, parallel keyword match inquiry is carried out to the key word index of each blocks of files corresponding to each file.
In an embodiment of the embodiment of the present invention, the query result generation unit 203, is specifically used for:According to each
The blocks of files index information of a file and the match information of the keyword, obtain the matching of each file and the keyword
Degree;According to the matching degree of each file and the keyword, by matching degree descending order to the file directory of each file into
Row sequence, and exported file directory ranking results as file polling result.
In an embodiment of the embodiment of the present invention, with reference to figure 12, the file process for being used for realization file polling
Device can also include:File download unit 204, for corresponding in user is received for the file directory ranking results
During the file download request of file directory, compressing file bag is downloaded from the corresponding document catalogue.
For the document handling apparatus of file storage and inquiry is used for realization disclosed in the embodiment of the present invention three, due to
It is corresponding with the document handling method for being used for realization file storage and inquiry that various embodiments above provides, so the ratio of description
Relatively simple, related similarity, which refers to, is used for realization file storage and inquiry document handling method part in various embodiments above
Explanation, be no longer described in detail herein.
In conclusion compared with the prior art, document handling method and device provided by the invention have the advantage that:This
Invention is compressed file on file server and stores and non-memory original document, saves user resources, reduces use
Family cost, is especially suitable for medium-sized and small enterprises.The present invention is very low to applicable environmental requirement, and migration is good, Windows or Unix/
Linux environment can be applicable in, and oracle/DB2/Sybase/MySql databases also can be applicable in.So as to for needing to carry out
For the medium-sized and small enterprises of processing such as file storage, inquiry, the present invention can save file process cost, improve file process effect
Rate, has running environment good migration.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight
Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to.
For convenience of description, describe to be divided into various modules when system above or device with function or unit describes respectively.
Certainly, the function of each unit can be realized in same or multiple softwares and/or hardware when implementing the present invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can
Realized by the mode of software plus required general hardware platform.Based on such understanding, technical scheme essence
On the part that contributes in other words to the prior art can be embodied in the form of software product, the computer software product
It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are used so that a computer equipment
(can be personal computer, server, either network equipment etc.) performs some of each embodiment of the present invention or embodiment
Method described in part.
Finally, it is to be noted that, herein, the relational terms of such as first, second, third and fourth or the like
It is used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these
There are any actual relationship or order between entity or operation.Moreover, term " comprising ", "comprising" or its is any
Other variations are intended to non-exclusive inclusion, so that process, method, article or equipment including a series of elements
Not only include those key elements, but also including other elements that are not explicitly listed, or further include as this process, side
Method, article or the intrinsic key element of equipment.In the absence of more restrictions, limited by sentence "including a ..."
Key element, it is not excluded that also there are other identical element in the process, method, article or apparatus that includes the element.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (20)
- A kind of 1. document handling method, it is characterised in that file storage is used for realization, the described method includes:Obtain pending file;Content piecemeal processing is carried out to the pending file, obtains each blocks of files and file block information;Index information is established for each blocks of files, obtains the blocks of files index information of each blocks of files;The blocks of files index information of predetermined file directory, the file block information and each blocks of files correspond to depositing Storage;Processing is compressed to the pending file, the compressed file of gained is stored in the predetermined text after compression is handled Under part catalogue, and delete the pending file.
- 2. according to the method described in claim 1, it is characterized in that, described obtain pending file, including:Obtain the pending file that user uploads to file server.
- 3. according to the method described in claim 1, it is characterized in that, described carry out at content piecemeal the pending file Reason, including:Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, each blocks of files Data volume be not higher than the data-quantity threshold.
- 4. according to the method described in claim 1, it is characterized in that, described establish index information for each blocks of files, obtain every The blocks of files index information of a blocks of files, including:Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;According to the corresponding Keyword List of each blocks of files, key word index is established for each blocks of files.
- 5. a kind of document handling method, it is characterised in that file polling is used for realization, based on such as any one of claim 1-4 institutes The method stated, the document handling method for being used for realization file polling include:Obtain file polling information input by user;Using the file polling information, inquired about, obtained in the blocks of files index information of blocks of files corresponding to each file To blocks of files index information query result;According to the blocks of files index information query result, generation and the file polling knot of the file polling information match Fruit.
- 6. according to the method described in claim 5, it is characterized in that, described obtain file polling information input by user, including:Obtain the keyword input by user for being used to carry out file polling.
- 7. according to the method described in claim 6, it is characterized in that, the blocks of files index information is key word index, then institute State using the file polling information, inquired about, obtained in the blocks of files index information of blocks of files corresponding to each file Blocks of files index information query result, including:In the key word index of each blocks of files corresponding to each file, matching is carried out to the keyword input by user and is looked into Ask, obtain the blocks of files index information of each file and the match information of the keyword, and using the match information as institute State blocks of files index information query result.
- 8. the method according to the description of claim 7 is characterized in that key of each blocks of files corresponding to each file In word indexing, matching inquiry is carried out to the keyword input by user, including:Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out parallel Keyword match inquiry.
- 9. the method according to claim 7 or 8, it is characterised in that described inquired about according to the blocks of files index information is tied Fruit, generate with the file polling of the file polling information match as a result, including:According to the match information of the blocks of files index information of each file and the keyword, each file and the key are obtained The matching degree of word;According to the matching degree of each file and the keyword, by matching degree descending order to the file directory of each file into Row sequence, and exported file directory ranking results as file polling result.
- 10. according to the method described in claim 9, it is characterized in that, further include:When receiving user for the file download request of corresponding document catalogue in the file directory ranking results, from described Compressing file bag is downloaded in corresponding document catalogue.
- 11. a kind of document handling apparatus, it is characterised in that be used for realization file storage, described device includes:First acquisition unit, for obtaining pending file;File block unit, for carrying out content piecemeal processing to the pending file, obtains each blocks of files and file point Block message;Index establishes unit, for establishing index information for each blocks of files, obtains the blocks of files index information of each blocks of files;Storage unit, for the blocks of files of predetermined file directory, the file block information and each blocks of files to be indexed Information carries out corresponding to storage;And processing is compressed to the pending file, the compressed file of gained after compression is handled It is stored under the predetermined file directory, and deletes the pending file.
- 12. according to the devices described in claim 11, it is characterised in that the first acquisition unit, is specifically used for:Obtain the pending file that user uploads to file server.
- 13. according to the devices described in claim 11, it is characterised in that the file block unit, is specifically used for:Based on predetermined data-quantity threshold, the pending file is divided into the blocks of files of respective numbers;Wherein, each blocks of files Data volume be not higher than the data-quantity threshold.
- 14. according to the devices described in claim 11, it is characterised in that the index establishes unit, is specifically used for:Cutting word processing is carried out to each blocks of files, obtains the corresponding Keyword List of each blocks of files;According to described each The corresponding Keyword List of blocks of files, key word index is established for each blocks of files.
- 15. a kind of document handling apparatus, it is characterised in that file polling is used for realization, based on such as any one of claim 1-4 institutes The device stated, the document handling apparatus for being used for realization file polling include:Second acquisition unit, for obtaining file polling information input by user;Query unit, for utilizing the file polling information, in the blocks of files index information of blocks of files corresponding to each file In inquired about, obtain blocks of files index information query result;Query result generation unit, for being believed according to the blocks of files index information query result, generation with the file polling The matched file polling result of manner of breathing.
- 16. device according to claim 15, it is characterised in that the second acquisition unit is specifically used for:Obtain the keyword input by user for being used to carry out file polling.
- 17. device according to claim 16, it is characterised in that the blocks of files index information is key word index, then The query unit, is specifically used for:In the key word index of each blocks of files corresponding to each file, matching is carried out to the keyword input by user and is looked into Ask, obtain the blocks of files index information of each file and the match information of the keyword, and using the match information as institute State blocks of files index information query result.
- 18. device according to claim 17, it is characterised in that the query unit, each corresponding to each file In the key word index of blocks of files, matching inquiry is carried out to the keyword input by user, is specifically included:Based on the keyword input by user, the key word index of each blocks of files corresponding to each file is carried out parallel Keyword match inquiry.
- 19. the device according to claim 17 or 18, it is characterised in that the query result generation unit, is specifically used for:According to the match information of the blocks of files index information of each file and the keyword, each file and the key are obtained The matching degree of word;According to the matching degree of each file and the keyword, by the order of matching degree descending to the text of each file Part catalogue is ranked up, and is exported file directory ranking results as file polling result.
- 20. device according to claim 19, it is characterised in that further include:File download unit, for receiving file of the user for corresponding document catalogue in the file directory ranking results During download request, compressing file bag is downloaded from the corresponding document catalogue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711306239.1A CN108038188A (en) | 2017-12-11 | 2017-12-11 | A kind of document handling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711306239.1A CN108038188A (en) | 2017-12-11 | 2017-12-11 | A kind of document handling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108038188A true CN108038188A (en) | 2018-05-15 |
Family
ID=62101528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711306239.1A Pending CN108038188A (en) | 2017-12-11 | 2017-12-11 | A kind of document handling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038188A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362578A (en) * | 2019-07-12 | 2019-10-22 | 西南大学 | A kind of computer information data quick reference system |
CN111026827A (en) * | 2019-12-06 | 2020-04-17 | 北京地拓科技发展有限公司 | Data service method and device for soil erosion factors and electronic equipment |
CN111309678A (en) * | 2020-02-22 | 2020-06-19 | 呼和浩特市奥祥电力自动化有限公司 | Data circulation storage method and network message recording and analyzing device |
CN112734982A (en) * | 2021-01-15 | 2021-04-30 | 北京小马慧行科技有限公司 | Storage method and system for unmanned vehicle driving behavior data |
CN116263792A (en) * | 2023-04-21 | 2023-06-16 | 云目未来科技(湖南)有限公司 | Method and system for crawling complex internet data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184211A (en) * | 2011-05-03 | 2011-09-14 | 成都市华为赛门铁克科技有限公司 | File system, and method and device for retrieving, writing, modifying or deleting file |
CN102193917A (en) * | 2010-03-01 | 2011-09-21 | 中国移动通信集团公司 | Method and device for processing and querying data |
CN102915365A (en) * | 2012-10-24 | 2013-02-06 | 苏州两江科技有限公司 | Hadoop-based construction method for distributed search engine |
CN104699815A (en) * | 2015-03-24 | 2015-06-10 | 北京嘀嘀无限科技发展有限公司 | Data processing method and system |
US20150261783A1 (en) * | 2013-01-07 | 2015-09-17 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for storing and reading files |
CN106250409A (en) * | 2016-07-21 | 2016-12-21 | 中国农业银行股份有限公司 | Data query method and device |
-
2017
- 2017-12-11 CN CN201711306239.1A patent/CN108038188A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102193917A (en) * | 2010-03-01 | 2011-09-21 | 中国移动通信集团公司 | Method and device for processing and querying data |
CN102184211A (en) * | 2011-05-03 | 2011-09-14 | 成都市华为赛门铁克科技有限公司 | File system, and method and device for retrieving, writing, modifying or deleting file |
CN102915365A (en) * | 2012-10-24 | 2013-02-06 | 苏州两江科技有限公司 | Hadoop-based construction method for distributed search engine |
US20150261783A1 (en) * | 2013-01-07 | 2015-09-17 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for storing and reading files |
CN104699815A (en) * | 2015-03-24 | 2015-06-10 | 北京嘀嘀无限科技发展有限公司 | Data processing method and system |
CN106250409A (en) * | 2016-07-21 | 2016-12-21 | 中国农业银行股份有限公司 | Data query method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362578A (en) * | 2019-07-12 | 2019-10-22 | 西南大学 | A kind of computer information data quick reference system |
CN111026827A (en) * | 2019-12-06 | 2020-04-17 | 北京地拓科技发展有限公司 | Data service method and device for soil erosion factors and electronic equipment |
CN111309678A (en) * | 2020-02-22 | 2020-06-19 | 呼和浩特市奥祥电力自动化有限公司 | Data circulation storage method and network message recording and analyzing device |
CN111309678B (en) * | 2020-02-22 | 2023-01-03 | 呼和浩特市奥祥电力自动化有限公司 | Data circular storage method and network message recording and analyzing device |
CN112734982A (en) * | 2021-01-15 | 2021-04-30 | 北京小马慧行科技有限公司 | Storage method and system for unmanned vehicle driving behavior data |
CN116263792A (en) * | 2023-04-21 | 2023-06-16 | 云目未来科技(湖南)有限公司 | Method and system for crawling complex internet data |
CN116263792B (en) * | 2023-04-21 | 2023-07-18 | 云目未来科技(湖南)有限公司 | Method and system for crawling complex internet data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038188A (en) | A kind of document handling method and device | |
US6691123B1 (en) | Method for structuring and searching information | |
Spiliopoulou et al. | A data miner analyzing the navigational behaviour of web users | |
CN102906751B (en) | A kind of method of data storage, data query and device | |
US8266147B2 (en) | Methods and systems for database organization | |
CN100456298C (en) | Advertisement information retrieval system and method therefor | |
CN102929901B (en) | The method and apparatus improving data warehouse performance | |
US8171029B2 (en) | Automatic generation of ontologies using word affinities | |
CN102667761A (en) | Scalable cluster database | |
CN105843841A (en) | Small file storage method and system | |
CN102375853A (en) | Distributed database system, method for building index therein and query method | |
CN108509437A (en) | A kind of ElasticSearch inquiries accelerated method | |
CN112269816B (en) | Government affair appointment correlation retrieval method | |
CN108701134A (en) | The searching method and device of the archiving method and device of database, the database of archive | |
US7089233B2 (en) | Method and system for searching for web content | |
JP2003076715A (en) | Method and system for retrieving web pages, program and recording medium | |
CN107644050A (en) | A kind of querying method and device of the Hbase based on solr | |
CN108984626B (en) | Data processing method and device and server | |
CN108681577A (en) | A kind of novel library structure data index method | |
CN108009290A (en) | A kind of data modeling and storage method of track traffic command centre gauze big data | |
CN102467544B (en) | Information smart searching method and system based on space fuzzy coding | |
CN101944116A (en) | Complex multi-dimensional hierarchical connection and aggregation method for data warehouse | |
CN104834739A (en) | Internet information storage system | |
CN106326280A (en) | Data processing method, apparatus and system | |
CN112800083B (en) | Government decision-oriented government affair big data analysis method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180515 |
|
RJ01 | Rejection of invention patent application after publication |