CN107798132A - Distributed document turns version and processing method, system and computer-readable recording medium - Google Patents

Distributed document turns version and processing method, system and computer-readable recording medium Download PDF

Info

Publication number
CN107798132A
CN107798132A CN201711159328.8A CN201711159328A CN107798132A CN 107798132 A CN107798132 A CN 107798132A CN 201711159328 A CN201711159328 A CN 201711159328A CN 107798132 A CN107798132 A CN 107798132A
Authority
CN
China
Prior art keywords
queue
service
version
file
thumbnail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711159328.8A
Other languages
Chinese (zh)
Other versions
CN107798132B (en
Inventor
杨迪
徐建兵
倪时龙
林振天
陈又咏
黄敬林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Shanghai Electric Power Co Ltd
Fujian Yirong Information Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Shanghai Electric Power Co Ltd
Fujian Yirong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Shanghai Electric Power Co Ltd, Fujian Yirong Information Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201711159328.8A priority Critical patent/CN107798132B/en
Publication of CN107798132A publication Critical patent/CN107798132A/en
Application granted granted Critical
Publication of CN107798132B publication Critical patent/CN107798132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of distributed document and turns version and processing method, is stored pending file into thesaurus by file upload services, and generates and turn version queue, thumbnail queue and extraction queue;The mission number of each queue service number corresponding with the queue is carried out by modulus by a unified dispatch service, the pending task serviced according to corresponding to modulus value distribution is each;Each turn of version service carries out turning version according to the task batch extracting of distribution, is stored in thesaurus, while triggers generation thumbnail queue;The task batch extracting of each thumbnail service distribution is handled, and is stored in thesaurus;Each content extraction service is handled according to the task batch extracting of distribution, is stored in thesaurus, and trigger generation index queue;Each establishment index service is operated according to the task batch extracting of distribution, is stored in index database.The present invention also provides a kind of distributed document and turns version and system of processing and a kind of computer-readable recording medium, improves batch documents and turns version processing efficiency.

Description

Distributed document turns version and processing method, system and computer-readable recording medium
Technical field
The present invention relates to a kind of file to turn version and process technology, more particularly to a kind of distributed document turns version and processing side Method, system and computer-readable recording medium.
Background technology
PDF (Portable Document Format abbreviation, mean " portable document format ") is a kind of to be with operation Unite platform-independent file format, that is to say, that pdf document is either in Windows, and Unix is still in the Mac of Apple Inc. All it is general in OS operating systems.This feature makes it carries out electronic document distribution and digitlization letter on internet Cease the preferable document format propagated.Therefore, increasing e-book, the description of product, company's proclamation, network data, electronics Mail is beginning to use PDF format file, and PDF format file is also by as the international standard form for filing and preserving file, branch Permanent long-term preservation is held, supports the permanent form preserved for a long time to also have OFD at present.
When carrying out form conversion to file to be stored, it is necessary to which carrying out file turns version and content processing, existing file turns Version and content processing service are all based on specific software and operating system is developed, and file format conversion and content added Work is less efficient, can only be successively to each particularly when entering row format conversion to large batch of e-file with content processing Individual file is processed, it is impossible to is realized the parallel demand for turning version and content processing, can not be realized overall pipelining.
The content of the invention
One of the technical problem to be solved in the present invention, it is that providing a kind of distributed document turns version and processing method, improves Batch documents turn version and the efficiency of content processing.
What one of the technical problem to be solved in the present invention was realized in:A kind of distributed document turns version and processing method, Comprise the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Further, the queue in the step 2 includes turning version queue, thumbnail queue, extracts queue and index queue, And described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail clothes Business, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The second technical problem to be solved by the present invention, it is to provide a kind of computer-readable recording medium, improves batch text Part turns version and the efficiency of content processing.
What the second technical problem to be solved by the present invention was realized in:A kind of computer-readable recording medium, is deposited thereon Computer program is contained, the program realizes following steps when being executed by processor:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Further, the queue in the step 2 includes turning version queue, thumbnail queue, extracts queue and index queue, And described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail clothes Business, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The third technical problem to be solved by the present invention, it is that providing a kind of distributed document turns version and system of processing, improves Batch documents turn version and the efficiency of content processing.
What the third technical problem to be solved by the present invention was realized in:A kind of distributed document turns version and system of processing, Turn version module, content extraction module, breviary module including file uploading module, scheduler module, file and create index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services, And generate and turn version queue, thumbnail queue and extraction queue;
The scheduler module, unify dispatch service by the mission number of each queue service corresponding with the queue by one Number carries out modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number And task time, the mission number represent integer sequence file, the task time represents the sequencing of tasks carrying;
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns takes Modulus value that business calculates according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and carry out turning version, And will turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each breviary are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that figure service calculates according to United Dispatching service treat thumbnail task Handled, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, it is each described interior Hold the service of extraction to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter The extraction processing of row file content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each wound are indexed to file by a plurality of establishment index services Service is indexed according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue to be entered Row creates index operation, and result is stored in into index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo Block portion successively performs.
Further, the queue in the scheduler module includes turning version queue, thumbnail queue, extracts queue and index team Row, and described turn of version queue, thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, breviary Figure service, content extraction service and establishment index service.
Further, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component Form is present, and its number is increased and decreased as needed.
Further, user is adjusted by changing the task time in queue to permitted file processing sequence.
The invention has the advantages that:
1st, four services and four queues that turn version and processing are individually scheduled by United Dispatching service, realized Overall pipelining, the step of must originally performing in order, are performed parallel, greatly improve batch documents conversion lattice Formula and content processing efficiency;
2nd, each queue sequence can be adjusted as needed, can customize execution sequence;
3rd, each service is extending transversely by entering Mobile state as needed, and flexibility ratio is high.
Brief description of the drawings
The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the inventive method execution flow chart.
Fig. 2 is that the present invention turns version and the theory diagram of processing based on distributed e-file.
Fig. 3 is the scheduling principle schematic diagram for turning version service of the present invention.
Embodiment
Fig. 1 and Fig. 2 are referred to, a kind of distributed document of the present invention turns version and processing method, comprised the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;Wherein, team Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service, institute State turn version service, thumbnail service, content extraction service and create index service exist respectively with kit form, its number according to Need to be increased and decreased, user can be adjusted permitted file processing sequence by changing the task time in queue.
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
Referring again to Fig. 1 and Fig. 2, a kind of computer-readable recording medium of the invention, computer program is stored thereon with, The program realizes following steps when being executed by processor:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version team Row, thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is entered Row modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task Time, the mission number represent integer sequence file, and the task time represents the sequencing of tasks carrying;Wherein, team Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service, institute State turn version service, thumbnail service, content extraction service and create index service exist respectively with kit form, its number can root According to needing to be increased and decreased, user can be adjusted by changing the task time in queue to permitted file processing sequence.
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and adjusted according to unified Modulus value that degree service calculates and turn the task time batch extracting version task to be turned of version queue and carry out turning version, and after version being turned In file deposit thesaurus, while trigger generation thumbnail queue;
Step 4, thumbnail processing, each thumbnail service basis are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that United Dispatching service calculates treat that thumbnail task is handled, and Result is stored in thesaurus;
Step 5, extraction processing, each content extraction service root are carried out to file by a plurality of content extraction services The modulus value and the task time batch extracting of extraction queue calculated according to United Dispatching service extracts task and carries out file content Extraction is handled, and the content after processing is stored in into thesaurus, and triggers generation index queue;
Step 6, establishment, each establishment index service root are indexed to file by a plurality of establishment index services The modulus value and the task time batch extracting index task of index queue calculated according to United Dispatching service carries out creating index behaviour Make, and result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
A kind of distributed document of the present invention turns version and system of processing, including file uploading module, scheduler module, file turn version Module, content extraction module, breviary module and establishment index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services, And generate and turn version queue, thumbnail queue and extraction queue;
The scheduler module, unify dispatch service by the mission number of each queue service corresponding with the queue by one Number carries out modulus, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number And task time, the mission number represent integer sequence file, the task time represents the sequencing of tasks carrying;Institute Stating the queue in scheduler module includes turning version queue, thumbnail queue, extracts queue and index queue, and described turn of version queue, Thumbnail queue, extraction queue and service corresponding to index queue are followed successively by and turn version service, thumbnail service, content extraction service With establishment index service, described turn of version service, thumbnail service, content extraction service and establishment index service are respectively with component shape Formula is present, and its number can be increased and decreased as needed, and user can be by changing the task time in queue to permitted file processing Order is adjusted.
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns takes Modulus value that business calculates according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and carry out turning version, And will turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each breviary are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that figure service calculates according to United Dispatching service treat thumbnail task Handled, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, it is each described interior Hold the service of extraction to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter The extraction processing of row file content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each wound are indexed to file by a plurality of establishment index services Service is indexed according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue to be entered Row creates index operation, and result is stored in into index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo Block portion successively performs.
With reference to a specific embodiment, the present invention will be further described:
With kit form create four classes be used for turn version and content processing service, respectively turn version service, thumbnail service, Content extraction service and establishment index service;And it is corresponding store this four class with four queues and service pending fileinfo, with Two fields of mission number and task time are stored, wherein, mission number represents integer sequence file, and task time represents The sequencing of the tasks carrying, so that the file for needing to perform sequentially is searched in each service;
Wherein, file upload services, for being stored files into after the completion of upload in storage, and generate and turn version queue, contracting Sketch map queue and extraction queue;
Turn version service, for turn version and turning the deposit storage of version file according to turning version queuing message, and trigger generation Thumbnail queue;
Thumbnail service, for performing thumbnail processing according to thumbnail queue, and it is stored in thesaurus.
Content extraction service, for carrying out the extraction of file content according to extraction queue and being stored in storage, and solve generation Into index queue;
Index service is created, for carrying out establishment index service according to index queue, and is stored in index database, is looked into for user Ask.
The class service of the above four is respectively provided with a plurality of, and task is corresponded to for parallel processing, passes through unified adjust per a kind of service Degree service is allocated task so that carries out file process with optimum efficiency;Specifically:All kinds of service execution processes are as follows:
(1) file upload services store pending file into thesaurus, and generation turns version queue, thumbnail queue and taken out Take queue;
(2) version is turned by the mission number and execution that turn version queue using dispatch service and services number progress modulus (result For A) calculate the performed task of each turn of version service;
Each turn of version service is according to the value (A) of modulus by turning task time sequential batch extraction in version queue and turn version task to enter Row turns version, and will turn in the file deposit thesaurus after version, while triggers generation thumbnail queue;
(3) version is turned by the mission number and execution of thumbnail queue using dispatch service and services number progress modulus (knot Fruit is B) service performed task to calculate each thumbnail;
Each thumbnail service presses task time sequential batch in thumbnail queue according to the value (B) of modulus and extracts thumbnail Task is handled, and result is stored in thesaurus;
(4) number progress modulus is serviced by extracting the mission number of queue and performing content extraction using dispatch service (result C) services performed task to calculate each content extraction;
Each content extraction service turns version times according to the value (C) of modulus by task time sequential batch extraction in queue is extracted Business is carried out turning version, and the content after extraction is stored in thesaurus, while triggers generation index queue;
(5) by the mission number of index queue and content extraction service number progress modulus is performed using dispatch service (result D) come calculate it is each establishment index service performed by task;
Each establishment index service is pressed task time sequential batch extraction index in index queue according to the value (D) of modulus and appointed Business is created, and will create result deposit index database, is inquired about for user.
Referring to Fig. 3, exemplified by turning version service, when turning version service number and being 3 (including turn version service 1, turn version service 2 With turn version service 3), task need to be performed and be by now turning version service 1:Turn the mission number mould in version queue and turn version service number 3=0 All tasks;Task need to be performed and be by turning version service 2:Turn the mission number mould in version queue and turn all of version service number 3=1 Task;Task need to be performed and be by turning version service 3:Turn all tasks that the mission number mould in version queue turns version service number 3=2; Similarly, thumbnail service, content extraction service and establishment index service carry out task in a manner described by United Dispatching service Distribution, it is ensured that performed by optimal case, improve overall treatment efficiency.
Between above-mentioned all kinds of services mutually solely, perform between each service, divided equally task parallel by United Dispatching service To each service, pipelining is realized, files in batch is greatly promoted and turns version processing efficiency;Can be according to task amount in the present invention Size sets the service for turning version and content processing of varying number, meets that service is laterally expansible, in addition, user can voluntarily adjust Task time in queue, to adjust the sequencing of file process.
Although the foregoing describing the embodiment of the present invention, those familiar with the art should manage Solution, the specific embodiment described by us are merely exemplary, rather than for the restriction to the scope of the present invention, are familiar with this The equivalent modification and change that the technical staff in field is made in the spirit according to the present invention, should all cover the present invention's In scope of the claimed protection.

Claims (12)

1. a kind of distributed document turns version and processing method, it is characterised in that:Comprise the following steps:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version queue, Thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is taken Mould, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task time, The mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and taken according to United Dispatching The modulus value that calculates of being engaged in and turn the task time batch extracting version task to be turned of version queue and turn version, and the file after version will be turned It is stored in thesaurus, while triggers generation thumbnail queue;
Step 4, by a plurality of thumbnail services thumbnail processing is carried out to file, each thumbnail service is according to unified The modulus value and the task time batch extracting of thumbnail queue that dispatch service calculates treat that thumbnail task is handled, and will place Manage in result deposit thesaurus;
Step 5, extraction processing is carried out to file by a plurality of content extraction services, each content extraction service is according to system The modulus value and the task time batch extracting of extraction queue that one dispatch service calculates extract the extraction that task carries out file content Processing, and the content after processing is stored in thesaurus, and trigger generation index queue;
Step 6, establishment is indexed to file by a plurality of establishment index services, each establishment index service is according to system The modulus value and the task time batch extracting index task of index queue that one dispatch service calculates carry out establishment index operation, and Result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
2. distributed document according to claim 1 turns version and processing method, it is characterised in that:Team in the step 2 Row include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, extraction team Service corresponding to row and index queue, which is followed successively by, to be turned version service, thumbnail service, content extraction service and creates index service.
3. distributed document according to claim 1 turns version and processing method, it is characterised in that:Described turn of version service, contracting Sketch map service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
4. distributed document according to claim 1 turns version and processing method, it is characterised in that:User is by changing queue In task time permitted file processing sequence is adjusted.
5. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is held by processor Following steps are realized during row:
Step 1, by least one file upload services by pending file store into thesaurus, and generate turn version queue, Thumbnail queue and extraction queue;
Step 2, by a unified dispatch service mission number of each queue number that services corresponding with the queue is taken Mould, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and task time, The mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
Step 3, to file turn version by a plurality of turns of version services and operate, it is each described to turn version service and taken according to United Dispatching The modulus value that calculates of being engaged in and turn the task time batch extracting version task to be turned of version queue and turn version, and the file after version will be turned It is stored in thesaurus, while triggers generation thumbnail queue;
Step 4, by a plurality of thumbnail services thumbnail processing is carried out to file, each thumbnail service is according to unified The modulus value and the task time batch extracting of thumbnail queue that dispatch service calculates treat that thumbnail task is handled, and will place Manage in result deposit thesaurus;
Step 5, extraction processing is carried out to file by a plurality of content extraction services, each content extraction service is according to system The modulus value and the task time batch extracting of extraction queue that one dispatch service calculates extract the extraction that task carries out file content Processing, and the content after processing is stored in thesaurus, and trigger generation index queue;
Step 6, establishment is indexed to file by a plurality of establishment index services, each establishment index service is according to system The modulus value and the task time batch extracting index task of index queue that one dispatch service calculates carry out establishment index operation, and Result is stored in index database;
The step 3 and step 5 are carried out in no particular order;The step 4 and step 6 are carried out in no particular order.
A kind of 6. computer-readable recording medium according to claim 5, it is characterised in that:Queue in the step 2 Including turning version queue, thumbnail queue, extracting queue and index queue, and described turn of version queue, thumbnail queue, extraction queue With index queue corresponding to service be followed successively by turn version service, thumbnail service, content extraction service and create index service.
A kind of 7. computer-readable recording medium according to claim 5, it is characterised in that:Described turn of version service, breviary Figure service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
A kind of 8. computer-readable recording medium according to claim 5, it is characterised in that:User is by changing in queue Task time permitted file processing sequence is adjusted.
9. a kind of distributed document turns version and system of processing, it is characterised in that:Including file uploading module, scheduler module, file Turn version module, content extraction module, breviary module and create index module;
The file uploading module, pending file is stored into thesaurus by least one file upload services, and it is raw Into turn version queue, thumbnail queue and extract queue;
The scheduler module, number is serviced by the mission number of each queue is corresponding with the queue by a unified dispatch service Modulus is carried out, the pending task serviced according to corresponding to modulus value distribution is each;Each queue includes mission number and appointed It is engaged in the time, the mission number represents integer sequence file, and the task time represents the sequencing of tasks carrying;
The file turns version module, file is carried out by a plurality of turns of version services to turn version operation, each version that turns services root The modulus value that is calculated according to United Dispatching service and turn the task time batch extracting version task to be turned of version queue and turn version, and will Turn in the deposit thesaurus of the file after version, while trigger generation thumbnail queue;
The breviary module, thumbnail processing, each thumbnail clothes are carried out to file by a plurality of thumbnail services The modulus value and the task time batch extracting of thumbnail queue that business calculates according to United Dispatching service treat that thumbnail task is carried out Processing, and result is stored in thesaurus;
The content extraction module, extraction processing is carried out to file by a plurality of content extraction services, each content is taken out Service is taken to extract task according to the modulus value of United Dispatching service calculating and the task time batch extracting of extraction queue and enter style of writing The extraction processing of part content, and the content after processing is stored in thesaurus, and trigger generation index queue;
The establishment index module, establishment, each establishment rope are indexed to file by a plurality of establishment index services Draw service to be created according to the modulus value of United Dispatching service calculating and the task time batch extracting index task of index queue Operation is indexed, and result is stored in index database;
The file turns version module and content extraction module performs in no particular order, the breviary module and establishment indices modulo block portion Successively perform.
10. distributed document according to claim 9 turns version and system of processing, it is characterised in that:In the scheduler module Queue include turning version queue, thumbnail queue, extract queue and index queue, and described turn of version queue, thumbnail queue, take out Take service corresponding to queue and index queue to be followed successively by turn version service, thumbnail service, content extraction service and create index clothes Business.
11. distributed document according to claim 9 turns version and system of processing, it is characterised in that:Described turn of version service, contracting Sketch map service, content extraction service and establishment index service exist with kit form respectively, and its number is increased and decreased as needed.
12. distributed document according to claim 9 turns version and system of processing, it is characterised in that:User is by changing team Task time in row is adjusted to permitted file processing sequence.
CN201711159328.8A 2017-11-20 2017-11-20 Distributed file transferring and processing method, system and computer readable storage medium Active CN107798132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711159328.8A CN107798132B (en) 2017-11-20 2017-11-20 Distributed file transferring and processing method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711159328.8A CN107798132B (en) 2017-11-20 2017-11-20 Distributed file transferring and processing method, system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107798132A true CN107798132A (en) 2018-03-13
CN107798132B CN107798132B (en) 2021-06-29

Family

ID=61535341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711159328.8A Active CN107798132B (en) 2017-11-20 2017-11-20 Distributed file transferring and processing method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107798132B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521218A (en) * 2011-12-15 2012-06-27 方正国际软件有限公司 File combining method and file combining device
CN103678705A (en) * 2013-12-30 2014-03-26 南京大学 Vector data concurrent conversion method from VCT file to shapefile file
CN105824788A (en) * 2016-03-18 2016-08-03 天津城建大学 Method and system for converting PowerPoint file into word file
CN106844453A (en) * 2016-12-20 2017-06-13 江苏瀚远科技股份有限公司 A kind of electronic document format conversion method
US20170262329A1 (en) * 2016-03-08 2017-09-14 International Business Machines Corporation Configuring and utilizing call-home systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521218A (en) * 2011-12-15 2012-06-27 方正国际软件有限公司 File combining method and file combining device
CN103678705A (en) * 2013-12-30 2014-03-26 南京大学 Vector data concurrent conversion method from VCT file to shapefile file
US20170262329A1 (en) * 2016-03-08 2017-09-14 International Business Machines Corporation Configuring and utilizing call-home systems
CN105824788A (en) * 2016-03-18 2016-08-03 天津城建大学 Method and system for converting PowerPoint file into word file
CN106844453A (en) * 2016-12-20 2017-06-13 江苏瀚远科技股份有限公司 A kind of electronic document format conversion method

Also Published As

Publication number Publication date
CN107798132B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109739894B (en) Method, device, equipment and storage medium for supplementing metadata description
CN106802826A (en) A kind of method for processing business and device based on thread pool
CN105787119A (en) Hybrid engine based big data processing method and system
CN105593818A (en) Apparatus and method for scheduling distributed workflow tasks
CN110046137A (en) By data stream packet and the system and method that store into cloud storage file
CN105912387A (en) Method and device for dispatching data processing operation
CN109669768A (en) A kind of resource allocation and method for scheduling task towards side cloud combination framework
CN108710535A (en) A kind of task scheduling system based on intelligent processor
CN107784026A (en) A kind of ETL data processing methods and device
CN109886859A (en) Data processing method, system, electronic equipment and computer readable storage medium
CN102521218B (en) File combining method and file combining device
CN108228730A (en) Data lead-in method, device, computer equipment and readable storage medium storing program for executing
CN114610474B (en) Multi-strategy job scheduling method and system under heterogeneous supercomputing environment
CN105677763A (en) Image quality evaluating system based on Hadoop
CN103177035A (en) Data query device and data query method in data base
CN110858506A (en) Automatic scheduling method, device, equipment and computer storage medium
CN107273339A (en) A kind of task processing method and device
CN108737462A (en) A kind of cloud computation data center method for scheduling task based on graph theory
CN110147905A (en) Information processing method, device, system and storage medium
CN109327321A (en) Network model business executes method, apparatus, SDN controller and readable storage medium storing program for executing
CN103685492A (en) Dispatching method, dispatching device and application of Hadoop trunking system
CN107798132A (en) Distributed document turns version and processing method, system and computer-readable recording medium
CN109976873A (en) The scheduling scheme acquisition methods and dispatching method of containerization distributed computing framework
CN112182031B (en) Data query method and device, storage medium and electronic device
CN108958919A (en) More DAG task schedule expense fairness assessment models of limited constraint in a kind of cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant