Summary of the invention
For solving above-mentioned middle Problems existing and defect, the invention provides one and adapt system and adapt method, this system and method can greatly improve to be adapted efficiency, reduce costs, improves quality.Described technical scheme is as follows:
System adapted in a kind of word, comprising:
Described system comprises: printed page analysis module, space of a whole page processing module and adapt merging module, described in
Printed page analysis module, for the treatment of the non-legible content of the space of a whole page, and is analyzed the per unit block in document, calculates the languages attribute of described plate by rank scanning;
Space of a whole page processing module, for auxiliary printed page analysis module, adjusts the units chunk and units chunk attribute needing interactive printed page analysis;
Adapt merging module, utilize the document that printed page analysis produces, carry out different identification by different languages and adapt, generate and different adapt text, and different text of adapting is carried out merging and generates and finally adapt text.
The method that word is adapted, comprising:
The non-legible content of the space of a whole page is processed;
Analyzed the per unit block in document by rank scanning, and calculate the languages attribute of described units chunk;
The units chunk and units chunk attribute that need interactive printed page analysis are adjusted;
By different languages different identification carried out to document and adapt, generating and different adapt text, and different text of adapting is carried out merging and generates and finally adapt text.
The beneficial effect of technical scheme provided by the invention is:
Greatly can improve and adapt efficiency, reduce costs, improve quality;
Adjusted by the interactive space of a whole page, integrate each languages and independently adapt system, can be quick, high-quality complete the task of adapting, through test can obtain, adapt according to the present invention, annual cost can save 71.6%.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail:
As shown in Figure 1, for system architecture adapted in word, comprising: printed page analysis module, space of a whole page processing module and adapt merging module, described in
Printed page analysis module, for the treatment of the non-legible content of the space of a whole page, and is analyzed the per unit block in document, calculates the languages attribute of described plate by rank scanning;
Space of a whole page processing module, for auxiliary printed page analysis module, adjusts the units chunk and units chunk attribute needing interactive printed page analysis;
Adapt merging module, utilize the document that printed page analysis produces, carry out different identification by different languages and adapt, generate and different adapt text, and different text of adapting is carried out merging and generates and finally adapt text.
The process of the non-legible content of the above-mentioned space of a whole page comprises the non-legible content etc. in black surround, impurity and image.
Processing in the non-legible content document of the space of a whole page, analyzing as far as possible accurately to make the space of a whole page, adopting following algorithm:
1) line scanning: line by line scan to image, the pixel number of the every a line of Statistics Division, utilizes its statistical nature, obtains the up-and-down boundary of every a line.
2) column scan: to each rank scanning of advancing, counts the pixel number of each row, utilizes its statistical nature, obtain the right boundary of every a line, thus obtain per unit block.
3) identification of units chunk languages: part carries out simple identifying processing to often composing a piece of writing, and analyzes the feature of Chinese and English languages, as the aspect ratio features etc. of Chinese and English word.
4) aftertreatment: the document that personalisation process is dissimilar.
Interactive printed page analysis
After automatic plate surface analysis, for the good document of most of typesetting, result can accept substantially, but for some formats more disorderly, more complicated document, need auxiliary certain interactive printed page analysis, namely adjust other attributes such as the languages of the units chunk of the space of a whole page, often block, guarantee the correctness of final edition surface analysis.
Adapt by languages identification
By the document of languages form after interactive printed page analysis, submit to and respective adapt system; To using Chinese part, adopt Han Wang and Wen Tong to identify, inconsistent part is dished out and is adapted; For English part, adopt FineReader and Wen Tong to identify, inconsistent part is dished out and is adapted.
Adapt result to merge
Different text of adapting is carried out merging generating and final adapts result.
As shown in Figure 2, for method adapted in word, the method comprises:
The non-legible content of the space of a whole page is processed;
Analyzed the per unit block in document by rank scanning, and calculate the languages attribute of described units chunk;
The units chunk and units chunk attribute that need interactive printed page analysis are adjusted;
By different languages different identification carried out to document and adapt, generating and different adapt text, and different text of adapting is carried out merging and generates and finally adapt text.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.