CN107798132A - Distributed document turns version and processing method, system and computer-readable recording medium - Google Patents
Distributed document turns version and processing method, system and computer-readable recording medium Download PDFInfo
- Publication number
- CN107798132A CN107798132A CN201711159328.8A CN201711159328A CN107798132A CN 107798132 A CN107798132 A CN 107798132A CN 201711159328 A CN201711159328 A CN 201711159328A CN 107798132 A CN107798132 A CN 107798132A
- Authority
- CN
- China
- Prior art keywords
- queue
- service
- version
- file
- thumbnail
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 103
- 239000000284 extract Substances 0.000 claims description 16
- 238000012163 sequencing technique Methods 0.000 claims description 11
- 230000003247 decreasing effect Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
- G06F16/1794—Details of file format conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of distributed document and turns version and processing method, is stored pending file into thesaurus by file upload services, and generates and turn version queue, thumbnail queue and extraction queue;The mission number of each queue service number corresponding with the queue is carried out by modulus by a unified dispatch service, the pending task serviced according to corresponding to modulus value distribution is each;Each turn of version service carries out turning version according to the task batch extracting of distribution, is stored in thesaurus, while triggers generation thumbnail queue;The task batch extracting of each thumbnail service distribution is handled, and is stored in thesaurus;Each content extraction service is handled according to the task batch extracting of distribution, is stored in thesaurus, and trigger generation index queue;Each establishment index service is operated according to the task batch extracting of distribution, is stored in index database.The present invention also provides a kind of distributed document and turns version and system of processing and a kind of computer-readable recording medium, improves batch documents and turns version processing efficiency.
Description
Technical field
The present invention relates to a kind of file to turn version and process technology, more particularly to a kind of distributed document turns version and processing side
Method, system and computer-readable recording medium.
Background technology
PDF (Portable Document Format abbreviation, mean " portable document format ") is a kind of to be with operation
Unite platform-independent file format, that is to say, that pdf document is either in Windows, and Unix is still in the Mac of Apple Inc.
All it is general in OS operating systems.This feature makes it carries out electronic document distribution and digitlization letter on internet
Cease the preferable document format propagated.Therefore, increasing e-book, the description of product, company's proclamation, network data, electronics
Mail is beginning to use PDF format file, and PDF format file is also by as the international standard form for filing and preserving file, branch
Permanent long-term preservation is held, supports the permanent form preserved for a long time to also have OFD at present.
When carrying out form conversion to file to be stored, it is necessary to which carrying out file turns version and content processing, existing file turns
Version and content processing service are all based on specific software and operating system is developed, and file format conversion and content added
Work is less efficient, can only be successively to each particularly when entering row format conversion to large batch of e-file with content processing
Individual file is processed, it is impossible to is realized the parallel demand for turning version and content processing, can not be realized overall pipelining.
The content of the invention
One of the technical problem to be solved in the present invention, it is that providing a kind of distributed document turns version and processing method, improves
Batch documents turn version and the efficiency of content processing.
What one of the technical problem to be solved in the present invention was realized in:A kind of distributed document turns version and processing method,
Comprise the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team
Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered
Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task
Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified
Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned
In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and
Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services
The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content
Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services
The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour
Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Further, the queue in the step 2 includes turning version queue, thumbnail queue, extracts queue and index queue,
And described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail clothes
Business, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component
Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The second technical problem to be solved by the present invention, it is to provide a kind of computer-readable recording medium, improves batch text
Part turns version and the efficiency of content processing.
What the second technical problem to be solved by the present invention was realized in:A kind of computer-readable recording medium, is deposited thereon
Computer program is contained, the program realizes following steps when being executed by processor:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team
Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered
Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task
Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified
Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned
In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and
Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services
The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content
Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services
The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour
Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Further, the queue in the step 2 includes turning version queue, thumbnail queue, extracts queue and index queue,
And described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail clothes
Business, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component
Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The third technical problem to be solved by the present invention, it is that providing a kind of distributed document turns version and system of processing, improves
Batch documents turn version and the efficiency of content processing.
What the third technical problem to be solved by the present invention was realized in:A kind of distributed document turns version and system of processing,
Turn version module, content extraction module, breviary module including file uploading module, scheduler module, file and create index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services,
And generate and turn version queue, thumbnail queue and extraction queue;
The scheduler module, unify dispatch service by the mission number of each queue service corresponding with the queue by one
Number carries out modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number
And task time, the mission number represent integer sequence file, the task time represents the sequencing of tasks carrying;
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns takes
Modulus value that business calculates according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and carry out turning version,
And will turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each breviary are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that figure service calculates according to United Dispatching service treat thumbnail task
Handled, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, it is each described interior
Hold the service of extraction to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter
The extraction processing of row file content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each wound are indexed to file by a plurality of establishment index services
Service is indexed according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue to be entered
Row creates index operation, and result is stored in into index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo
Block portion successively performs.
Further, the queue in the scheduler module includes turning version queue, thumbnail queue, extracts queue and index team
Row, and described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, breviary
Figure service, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component
Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The invention has the advantages that:
1st, four services and four queues that turn version and processing are individually scheduled by United Dispatching service, realized
Overall pipelining, the step of must originally performing in order, are performed parallel, greatly improve batch documents conversion lattice
Formula and content processing efficiency;
2nd, each queue sequence can be adjusted as needed, can customize execution sequence;
3rd, each service is extending transversely by entering Mobile state as needed, and flexibility ratio is high.
Brief description of the drawings
The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the inventive method execution flow chart.
Fig. 2 is that the present invention turns version and the theory diagram of processing based on distributed e-file.
Fig. 3 is the scheduling principle schematic diagram for turning version service of the present invention.
Embodiment
Fig. 1 and Fig. 2 are referred to, a kind of distributed document of the present invention turns version and processing method, comprised the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team
Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered
Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task
Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;Wherein, team
Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team
Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service, institute
State turn version service, thumbnail service, content extraction service and create index service exist respectively with kit form, its number according to
Need to be increased and decreased, user can be adjusted permitted file processing sequence by changing the task time in queue.
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified
Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned
In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and
Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services
The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content
Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services
The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour
Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Referring again to Fig. 1 and Fig. 2, a kind of computer-readable recording medium of the invention, computer program is stored thereon with,
The program realizes following steps when being executed by processor:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team
Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered
Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task
Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;Wherein, team
Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team
Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service, institute
State turn version service, thumbnail service, content extraction service and create index service exist respectively with kit form, its number can root
According to needing to be increased and decreased, user can be adjusted by changing the task time in queue to permitted file processing sequence.
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified
Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned
In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and
Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services
The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content
Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services
The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour
Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
A kind of distributed document of the present invention turns version and system of processing, including file uploading module, scheduler module, file turn version
Module, content extraction module, breviary module and establishment index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services,
And generate and turn version queue, thumbnail queue and extraction queue;
The scheduler module, unify dispatch service by the mission number of each queue service corresponding with the queue by one
Number carries out modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number
And task time, the mission number represent integer sequence file, the task time represents the sequencing of tasks carrying;Institute
Stating the queue in scheduler module includes turning version queue, thumbnail queue, extracts queue and index queue, and described turn of version queue,
Thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail service, content extraction service
With establishment index service, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component shape
Formula is present, and its number can be increased and decreased as needed, and user can be by changing the task time in queue to permitted file processing
Order is adjusted.
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns takes
Modulus value that business calculates according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and carry out turning version,
And will turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each breviary are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that figure service calculates according to United Dispatching service treat thumbnail task
Handled, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, it is each described interior
Hold the service of extraction to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter
The extraction processing of row file content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each wound are indexed to file by a plurality of establishment index services
Service is indexed according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue to be entered
Row creates index operation, and result is stored in into index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo
Block portion successively performs.
With reference to a specific embodiment, the present invention will be further described:
With kit form create four classes be used for turn version and content processing service, respectively turn version service, thumbnail service,
Content extraction service and establishment index service;And it is corresponding store this four class with four queues and service pending fileinfo, with
Two fields of mission number and task time are stored, wherein, mission number represents integer sequence file, and task time represents
The sequencing of the tasks carrying, so that the file for needing to perform sequentially is searched in each service;
Wherein, file upload services, for being stored files into after the completion of upload in storage, and generate and turn version queue, contracting
Sketch map queue and extraction queue;
Turn version service, for turn version and turning the deposit storage of version file according to turning version queuing message, and trigger generation
Thumbnail queue;
Thumbnail service, for performing thumbnail processing according to thumbnail queue, and it is stored in thesaurus.
Content extraction service, for carrying out the extraction of file content according to extraction queue and being stored in storage, and solve generation
Into index queue;
Index service is created, for carrying out establishment index service according to index queue, and is stored in index database, is looked into for user
Ask.
The class service of the above four is respectively provided with a plurality of, and task is corresponded to for parallel processing, passes through unified adjust per a kind of service
Degree service is allocated task so that carries out file process with optimum efficiency;Specifically:All kinds of service execution processes are as follows:
(1) file upload services store pending file into thesaurus, and generation turns version queue, thumbnail queue and taken out
Take queue;
(2) version is turned by the mission number and execution that turn version queue using dispatch service and services number progress modulus (result
For A) calculate the performed task of each turn of version service;
Each turn of version service is according to the value (A) of modulus by turning task time sequential batch extraction in version queue and turn version task to enter
Row turns version, and will turn in the file deposit thesaurus after version, while triggers generation thumbnail queue;
(3) version is turned by the mission number and execution of thumbnail queue using dispatch service and services number progress modulus (knot
Fruit is B) service performed task to calculate each thumbnail;
Each thumbnail service presses task time sequential batch in thumbnail queue according to the value (B) of modulus and extracts thumbnail
Task is handled, and result is stored in thesaurus;
(4) number progress modulus is serviced by extracting the mission number of queue and performing content extraction using dispatch service
(result C) services performed task to calculate each content extraction;
Each content extraction service turns version times according to the value (C) of modulus by task time sequential batch extraction in queue is extracted
Business is carried out turning version, and the content after extraction is stored in thesaurus, while triggers generation index queue;
(5) by the mission number of index queue and content extraction service number progress modulus is performed using dispatch service
(result D) come calculate it is each establishment index service performed by task;
Each establishment index service is pressed task time sequential batch extraction index in index queue according to the value (D) of modulus and appointed
Business is created, and will create result deposit index database, is inquired about for user.
Referring to Fig. 3, exemplified by turning version service, when turning version service number and being 3 (including turn version service 1, turn version service 2
With turn version service 3), task need to be performed and be by now turning version service 1:Turn the mission number mould in version queue and turn version service number 3=0
All tasks;Task need to be performed and be by turning version service 2:Turn the mission number mould in version queue and turn all of version service number 3=1
Task;Task need to be performed and be by turning version service 3:Turn all tasks that the mission number mould in version queue turns version service number 3=2;
Similarly, thumbnail service, content extraction service and establishment index service carry out task in a manner described by United Dispatching service
Distribution, it is ensured that performed by optimal case, improve overall treatment efficiency.
Between above-mentioned all kinds of services mutually solely, perform between each service, divided equally task parallel by United Dispatching service
To each service, pipelining is realized, files in batch is greatly promoted and turns version processing efficiency;Can be according to task amount in the present invention
Size sets the service for turning version and content processing of varying number, meets that service is laterally expansible, in addition, user can voluntarily adjust
Task time in queue, to adjust the sequencing of file process.
Although the foregoing describing the embodiment of the present invention, those familiar with the art should manage
Solution, the specific embodiment described by us are merely exemplary, rather than for the restriction to the scope of the present invention, are familiar with this
The equivalent modification and change that the technical staff in field is made in the spirit according to the present invention, should all cover the present invention's
In scope of the claimed protection.
Claims (12)
1. a kind of distributed document turns version and processing method, it is characterised in that:Comprise the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version queue,
Thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is taken
Mould, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task time,
The mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and taken according to United Dispatching
The modulus value that calculates of being engaged in and turn the task time batch extracting version task to be turned of version queue and turn version, and the file after version will be turned
It is stored in thesaurus, while triggers generation thumbnail queue;
Step 4, by a plurality of thumbnail services thumbnail processing is carried out to file, each thumbnail service is according to unified
The modulus value and the task time batch extracting of thumbnail queue that dispatch service calculates treat that thumbnail task is handled, and will place
Manage in result deposit thesaurus;
Step 5, extraction processing is carried out to file by a plurality of content extraction services, each content extraction service is according to system
The modulus value and the task time batch extracting of extraction queue that one dispatch service calculates extract the extraction that task carries out file content
Processing, and the content after processing is stored in thesaurus, and trigger generation index queue;
Step 6, establishment is indexed to file by a plurality of establishment index services, each establishment index service is according to system
The modulus value and the task time batch extracting index task of index queue that one dispatch service calculates carry out establishment index operation, and
Result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
2. distributed document according to claim 1 turns version and processing method, it is characterised in that:Team in the step 2
Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team
Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service.
3. distributed document according to claim 1 turns version and processing method, it is characterised in that:Described turn of version service, contracting
Sketch map service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
4. distributed document according to claim 1 turns version and processing method, it is characterised in that:User is by changing queue
In task time permitted file processing sequence is adjusted.
5. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is held by processor
Following steps are realized during row:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version queue,
Thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is taken
Mould, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task time,
The mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and taken according to United Dispatching
The modulus value that calculates of being engaged in and turn the task time batch extracting version task to be turned of version queue and turn version, and the file after version will be turned
It is stored in thesaurus, while triggers generation thumbnail queue;
Step 4, by a plurality of thumbnail services thumbnail processing is carried out to file, each thumbnail service is according to unified
The modulus value and the task time batch extracting of thumbnail queue that dispatch service calculates treat that thumbnail task is handled, and will place
Manage in result deposit thesaurus;
Step 5, extraction processing is carried out to file by a plurality of content extraction services, each content extraction service is according to system
The modulus value and the task time batch extracting of extraction queue that one dispatch service calculates extract the extraction that task carries out file content
Processing, and the content after processing is stored in thesaurus, and trigger generation index queue;
Step 6, establishment is indexed to file by a plurality of establishment index services, each establishment index service is according to system
The modulus value and the task time batch extracting index task of index queue that one dispatch service calculates carry out establishment index operation, and
Result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
A kind of 6. computer-readable recording medium according to claim 5, it is characterised in that:Queue in the step 2
Including turning version queue, thumbnail queue, extracting queue and index queue, and described turn of version queue, thumbnail queue, extraction queue
With index queue corresponding to service be followed successively by turn version service, thumbnail service, content extraction service and create index service.
A kind of 7. computer-readable recording medium according to claim 5, it is characterised in that:Described turn of version service, breviary
Figure service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
A kind of 8. computer-readable recording medium according to claim 5, it is characterised in that:User is by changing in queue
Task time permitted file processing sequence is adjusted.
9. a kind of distributed document turns version and system of processing, it is characterised in that:Including file uploading module, scheduler module, file
Turn version module, content extraction module, breviary module and create index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services, and it is raw
Into turn version queue, thumbnail queue and extract queue;
The scheduler module, number is serviced by the mission number of each queue is corresponding with the queue by a unified dispatch service
Modulus is carried out, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and appointed
It is engaged in the time, the mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns services root
The modulus value that is calculated according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and turn version, and will
Turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each thumbnail clothes are carried out to file by a plurality of thumbnail services
The modulus value and the task time batch extracting of thumbnail queue that business calculates according to United Dispatching service treat that thumbnail task is carried out
Processing, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, each content is taken out
Service is taken to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter style of writing
The extraction processing of part content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each establishment rope are indexed to file by a plurality of establishment index services
Draw service to be created according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue
Operation is indexed, and result is stored in index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo block portion
Successively perform.
10. distributed document according to claim 9 turns version and system of processing, it is characterised in that:In the scheduler module
Queue include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, take out
Take service corresponding to queue and index queue to be followed successively by turn version service, thumbnail service, content extraction service and create index clothes
Business.
11. distributed document according to claim 9 turns version and system of processing, it is characterised in that:Described turn of version service, contracting
Sketch map service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
12. distributed document according to claim 9 turns version and system of processing, it is characterised in that:User is by changing team
Task time in row is adjusted to permitted file processing sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711159328.8A CN107798132B (en) | 2017-11-20 | 2017-11-20 | Distributed file transferring and processing method, system and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711159328.8A CN107798132B (en) | 2017-11-20 | 2017-11-20 | Distributed file transferring and processing method, system and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107798132A true CN107798132A (en) | 2018-03-13 |
CN107798132B CN107798132B (en) | 2021-06-29 |
Family
ID=61535341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711159328.8A Active CN107798132B (en) | 2017-11-20 | 2017-11-20 | Distributed file transferring and processing method, system and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107798132B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521218A (en) * | 2011-12-15 | 2012-06-27 | 方正国际软件有限公司 | File combining method and file combining device |
CN103678705A (en) * | 2013-12-30 | 2014-03-26 | 南京大学 | Vector data concurrent conversion method from VCT file to shapefile file |
CN105824788A (en) * | 2016-03-18 | 2016-08-03 | 天津城建大学 | Method and system for converting PowerPoint file into word file |
CN106844453A (en) * | 2016-12-20 | 2017-06-13 | 江苏瀚远科技股份有限公司 | A kind of electronic document format conversion method |
US20170262329A1 (en) * | 2016-03-08 | 2017-09-14 | International Business Machines Corporation | Configuring and utilizing call-home systems |
-
2017
- 2017-11-20 CN CN201711159328.8A patent/CN107798132B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521218A (en) * | 2011-12-15 | 2012-06-27 | 方正国际软件有限公司 | File combining method and file combining device |
CN103678705A (en) * | 2013-12-30 | 2014-03-26 | 南京大学 | Vector data concurrent conversion method from VCT file to shapefile file |
US20170262329A1 (en) * | 2016-03-08 | 2017-09-14 | International Business Machines Corporation | Configuring and utilizing call-home systems |
CN105824788A (en) * | 2016-03-18 | 2016-08-03 | 天津城建大学 | Method and system for converting PowerPoint file into word file |
CN106844453A (en) * | 2016-12-20 | 2017-06-13 | 江苏瀚远科技股份有限公司 | A kind of electronic document format conversion method |
Also Published As
Publication number | Publication date |
---|---|
CN107798132B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109739894B (en) | Method, device, equipment and storage medium for supplementing metadata description | |
CN106802826A (en) | A kind of method for processing business and device based on thread pool | |
CN105787119A (en) | Hybrid engine based big data processing method and system | |
CN105593818A (en) | Apparatus and method for scheduling distributed workflow tasks | |
CN110046137A (en) | By data stream packet and the system and method that store into cloud storage file | |
CN105912387A (en) | Method and device for dispatching data processing operation | |
CN109669768A (en) | A kind of resource allocation and method for scheduling task towards side cloud combination framework | |
CN108710535A (en) | A kind of task scheduling system based on intelligent processor | |
CN107784026A (en) | A kind of ETL data processing methods and device | |
CN109886859A (en) | Data processing method, system, electronic equipment and computer readable storage medium | |
CN102521218B (en) | File combining method and file combining device | |
CN108228730A (en) | Data lead-in method, device, computer equipment and readable storage medium storing program for executing | |
CN114610474B (en) | Multi-strategy job scheduling method and system under heterogeneous supercomputing environment | |
CN105677763A (en) | Image quality evaluating system based on Hadoop | |
CN103177035A (en) | Data query device and data query method in data base | |
CN110858506A (en) | Automatic scheduling method, device, equipment and computer storage medium | |
CN107273339A (en) | A kind of task processing method and device | |
CN108737462A (en) | A kind of cloud computation data center method for scheduling task based on graph theory | |
CN110147905A (en) | Information processing method, device, system and storage medium | |
CN109327321A (en) | Network model business executes method, apparatus, SDN controller and readable storage medium storing program for executing | |
CN103685492A (en) | Dispatching method, dispatching device and application of Hadoop trunking system | |
CN107798132A (en) | Distributed document turns version and processing method, system and computer-readable recording medium | |
CN109976873A (en) | The scheduling scheme acquisition methods and dispatching method of containerization distributed computing framework | |
CN112182031B (en) | Data query method and device, storage medium and electronic device | |
CN108958919A (en) | More DAG task schedule expense fairness assessment models of limited constraint in a kind of cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |