Embodiment
In existing electronic document,, generally all write down the view data of its image information and DIB form for each image.Image information mainly is meant color space information, comprises colouring information, palette of type, each passage of color space etc.Below, will be described with reference to the drawings according to image of electronic document data processing method of the present invention and device thereof.Here, said view data is handled the storage that comprises view data, is searched, revises, adds and delete.
(1) image of electronic document data storage
Through image of electronic document date storage method of the present invention; With scattered concentrated being stored in the newly-built view data APMB package of view data that is distributed in the electronic document; Through deviation post and the photographed image-related information thereof of index structure these images of record in this view data APMB package, and to using the description content at image place to carry out adaptive modification in the electronic document.The expansion of said view data APMB package is called ifc (Image File Cluster), is designated hereinafter simply as the IFC file.
(first embodiment)
Fig. 1 is the process flow diagram according to the image of electronic document date storage method of the first embodiment of the present invention.
With reference to figure 1, in step S1000, from electronic document, collect the image that to handle.The accessible electronic file form of the present invention comprises forms such as PDF, XPS, CEB, MARS.In this step, the collected image of available array record is stored in path or the position in electronic document on the disk.
In step S1002, set up IFC file and index structure.Here, the IFC file is meant the file that is used for storing image data and recording indexes information of new establishment, comprises the part such as data field, index, index entry of top of file information, storing image data at least.Top of file information must be positioned at the file beginning.In top of file information, fields such as definable file type, version information, compression unit and compression method.At least write down the deviation post of corresponding view data in the IFC file and the length of this view data in the index.The deviation post of index entry indication index in the IFC file.According to information recorded in index entry and the index, can confirm the position and the length of current image date corresponding data field in the IFC file.Index entry one is positioned the end of IFC file, so that later search, revise, add and operation such as deletion.Simultaneously, be index structure storage allocation space in internal memory, and set up a table, tree or other data structure, prepare against the information of insertion index or view data etc.
In step S1004, obtain the image information and the view data of this image from the present image of collecting.As stated, image information mainly is meant color space information, still, also can comprise other photographed image-related information, such as, the information such as compression parameters of view data in the original electronic document.View data generally all is the view data of DIB form when not compressing.If the view data that from electronic document, reads is the view data of having compressed, then these view data are decompressed according to the compression parameters in this electronic document, it is reduced into the view data of DIB form.
In step S1006, be present image allocation index number.In the IFC file, each view data has unique call number.The call number of view data is since 1, according to the index depth-first traversal.Can be followed successively by present image allocation index number according to the image collection order.In addition, some image organizational that also can will have a general character according to different strategies promptly, are divided in a data segment together, thereby can reduce the I/O number of operations through the mode of looking ahead with a certain data segment of buffer memory, optimize retrieval.That is to say, for each view data is distributed under it corresponding call number in data segment.Such as, comprise for each data segment at partition strategy under the situation of view data of equal number or fixed data that can make each data segment comprise the view data of specified quantity, remaining view data then is included in the last data segment.Using under the tactful nearby situation, the view data of using together is included in the same data segment.Such as, all images that will in one page, use is included in the data segment.Under the situation of size strategy, according to wide, high, the resolution or the big subsection of data volume of image.Under the situation of not having strategy,, directly be followed successively by present image allocation index number according to the image collection order to each data segment and the not restriction of index organization's mode.Consider the problem of EMS memory occupation size,, then not too be fit to buffer memory if all data of data segment are too big.Therefore, can carry out segmentation according to a granularity, that is, the quantity of the view data that each data segment comprised must not surpass the section maximum particle size.In order to solve the problem of call number conflict effectively, preferably, the call number that each image distributed is: (n-1) * maxcount+m; Wherein, N representes that this image belongs to the n section, the maxcount section of expression maximum particle size, and m representes that this image is m in this section.
In step S1008, the view data of present image is write in the data field corresponding with the call number of being distributed in the IFC file.In step S1010, upgrade the index corresponding in the index structure with the call number of being distributed according to the view data of present image.Index can write down the information such as deviation post, data length, picture traverse and height of current image date in the IFC file.According to index entry and corresponding index information, can confirm data field corresponding among the IFC with the call number of being distributed, in this data field, the view data of storage present image.
In step S1012, with using the description at present image place to replace with a doublet in the electronic document.In the present invention, said doublet is the call number of quoting He being distributed of corresponding image information.Be meant in image information under the situation of color space information that the quoting of corresponding image information is quoting the color space of present image.Owing in existing electronic document, generally all defined color space information independently, so can directly quote.
In step S1014, judge whether to also have untreated image.If also have untreated image, then repeating step S1004 is to step S1012.If all images of collecting all disposes, then in step S1016, information recorded in the index and index entry are written in the IFC file, and the description that is kept at wherein collected image is replaced by the electronic document of said doublet.
In addition, this method also can comprise the step that view data is encoded.The compress mode of view data can be set through fields such as definition compression unit and compression method in top of file information as stated.Such as, when compression unit when being view data, can be after obtaining each view data whenever; Such as; It is write before or after the IFC file, adopt the compression method of appointment that these view data are compressed respectively, and need not again entire I FC file to be compressed.When the compression unit is data segment, can after each data segment generates, adopts the compression method of appointment to come each data segment is compressed respectively, and need not again whole image data bag, each view data to be compressed.When the compression unit when not compressing (field value is 0), each view data is not compressed, but is given tacit consent to compression method the IFC file is whole compresses to this adopting again after all image data storage are in the IFC file.Like this, can further improve the compressibility of view data.Said acquiescence compression method can be compression methods such as flate, wavelet transformation or Djvu.After to Image Data Compression, can adopt methods such as ase, des that the view data of compression is encrypted.
In addition, for the parts of images in the electronic document, it is identical that image information is likely.Such as, for many small size images that are used to splice a secondary large-scale image, they use identical color space.In this case, can extract these identical image informations, and it is stored in the image information table, thereby can remove the redundant descriptor of a large amount of repetitions, reach the purpose of saving storage overhead.Can image information table be kept in the IFC file position of this image information table of record in this IFC file in its top of file information.Perhaps, can image information table be kept in the electronic document as independent file.At this moment, in step S1012, the description of using the present image place in the electronic document is replaced with the call number of quoting He being distributed of image information table.Perhaps, can in this electronic document, directly define identical image information according to the describing mode of electronic document.Such as, be under the situation of color space C1 in identical image information, can in electronic document, add the definition of color space C1, use the associated picture place directly to quote this color space C1 at it.
In sum; Through step S1004 to step S1016; According to index structure view data and index information thereof and index entry are write in the view data APMB package, and the description of using the image place in the electronic document is replaced with the call number of quoting He being distributed of corresponding image information.Here, call number is represented the position of corresponding view data in index structure.The position of view data in index can also be represented with information such as skews.
(second embodiment)
As described in the step S1006 of Fig. 1, can carry out significant segmentation to view data according to different partition strategies.The call number of distributing for present image at this moment, is a corresponding numbers in the data segment under it.Sometimes, original image collection order itself just meets the order of segmentation, therefore, can be followed successively by current image date allocation index number.But sometimes, original image collection order possibly not be so orderly.At this moment, under the situation of different partition strategies, can not be followed successively by present image allocation index number probably, but be that present image distributes under it corresponding call number in data segment according to different data segments.That is to say that with regard to the collection order of image, the call number of being distributed is a great-jump-forward.In this case, be not to write view data piecemeal, but a plurality of data segment write view data simultaneously.Therefore, can after collecting image, sort, so that can be followed successively by present image allocation index number according to the order that sorts to these images.And, only after ordering, could guarantee no matter under which kind of partition strategy, can operate piecemeal.
Below, will be with reference to figure 2 descriptions image of electronic document date storage method according to a second embodiment of the present invention.The difference of this method and image data recording method shown in Figure 1 is; Added the image ordered steps; And piecemeal information recorded in the index is outputed in the IFC file, rather than after all images relevant information records is in index structure, whole index informations are being outputed in the IFC file together.The step different with Fig. 1 below only described.
After collecting image step S1000, execution in step S1001 in step S1001, sorts to the image of collecting in a certain order.Here; The order of ordering can be corresponding with partition strategy; Can be the order that is used of image, image volume size sequence, resolution order, picture traverse order, picture altitude order, image information (such as, color space, image type) order or image name sequential scheduling.
In step 1006, because according to image being sorted, so in this step, can be followed successively by view data allocation index number according to the order of ordering with the corresponding strategy of partition strategy.Therefore, we can say also that the result of ordering has the effect of optimization, guidance to the execution of partition strategy.
In step S1018, judge whether present segment disposes.If present segment does not dispose, then repeating step S1004 is to step S1012.Otherwise, execution in step S1020.In step S1020, the information of the present segment that writes down in the index is written in the IFC file.
In step S1022, judge whether to also have untreated data segment.If also have untreated data segment, then repeating step S1004 is to step S1020.Otherwise, execution in step S1024.In step S1024, the index entry is written in the IFC file, and the description that is kept at wherein collected image is replaced by the electronic document of said doublet.
It is obvious that; Can delete step S1020; But the whole index informations that between step S1022 and step S1024, insert the photographed image-related information that will write down all sections are written to the step of IFC file together, at this moment, as long as guarantee the mutual skew of pointing to of various piece correctly.But Comparatively speaking, it is little to export EMS memory occupation piecemeal, and all output logic is simple together, but operation takies more internal memory.
(the 3rd embodiment)
As stated, can carry out significant segmentation to the view data of collecting according to different strategies, thus can be through looking ahead and the view data of buffer memory section uses more efficiently these view data.For index structure, preferably use the secondary index structure.The secondary index structure can improve the dirigibility of index organization, the speed that index loads, thereby improves the efficient of operation, and the scope of application is wider.In the secondary index structure of present embodiment, master index and segment index are set.Correspondingly, record master index inlet (deviation post of master index in the IFC file) in the IFC file.The information such as quantity of the view data that comprises in record segment index quantity, segment index deviation post and this section in the IFC file at least in the master index.Information such as the deviation post of the corresponding view data of deviation post, the current index of the data segment that the record segment index is corresponding at least in the segment index in the IFC file in this data segment, view data length.According to information recorded in master index inlet, master index and the segment index, can confirm position and the length of current image date in IFC file and index structure uniquely.
Below, provided the detailed description of the example of a secondary index structure and corresponding IFC file thereof.That following IFC file is preferred for is wide, higher primary school in 65536 and the data total length less than the image of 65536 bytes, more preferably be used for view data length and be no more than 4k and picture traverse and highly have one less than 4 image.
1. image data packets (IFC) file basic structure
The basic structure of view data APMB package is:
[Header]
[Data Section 1]
[Section Index 1]
[Data Section 2]
[Section Index 2]
[Data Section n]
[Section Index n]
[Main Index]
[Main Index Entry]
The explanation of table 1 image data packets basic structure information
<tables num= " 0001 " > <table > <tgroup cols= " 2 " > <colspec colname= " c001 " colwidth= " 35% " /> <colspec colname= " c002 " colwidth= " 65% " /> <tbody > < row > <entry morerows= " 1 " > </entry> <entry morerows= " 1 " > illustrates </entry> </ row > < row > <entry morerows= " 1 " > Header </entry> <entry morerows= " 1 " > top of file information; Identify in order to the oneself; And sensing master index (MainIndex) </entry> </ row > < row > <entry morerows= " 1 " > DataSection </entry> <entry morerows= " 1 " > image data section; Store view data </entry> </ row > < row > <entry morerows= " 1 " > SectionIndex </entry> <entry morerows= " the 1 " > segment index that this section comprises; The index of corresponding section view data; And corresponding image data section </entry> </ row > < row > <entry morerows= " 1 " > MainIndex </entry> <entry morerows= " the 1 " > master index of appointment; The index information that comprises the overall situation; And sensing segment index </entry> </ row > < row > <entry morerows= " 1 " > MainIndexEntry </entry> <entry morerows= " 1 " > master index inlet; 4 byte longs; The deviation post of record master index in image data packets, an end </entry> </ row > </ tbody > </ tgroup > </ table > </tables> that is positioned file
2. top of file information
The data structure of top of file information is as shown in the table.
The data structure of table 2 top of file information
Item |
Length (bytes) |
Description |
File Type |
4 |
fixed to four characters for " ! IFC " |
version |
4 |
file version number, currently 0x00000001 |
compression unit |
1 |
See 6 image compression 0 means no compression 1 denotes the unit of the data segment compression 2 denotes image data compression units |
compression method |
1 |
See 6 image compression 0 means no compression 1 represents coding using Flate compression Other values are reserved |
3. master index
Master index is described by multinomial index and is formed, and its basic structure is:
[Index Count]
[Section Max Count]
[Normative]
[Index Description 1]
[Index Description 2]
[Index Description n]
The descriptor that has provided corresponding segment index, information such as the view data quantity that comprise the segment index position, comprises described in index.
The data structure of table 3 primary index information
|
Length (byte) |
Explanation |
Index quantity (Index Count) |
4 |
The quantity that the index that record comprises is described |
Section maximum particle size (Section Max Count) |
2 |
The maximum quantity of the view data item that can comprise in representing every section, span is 0-65535 |
Standardization (Normative) |
1 |
In order to the mapping relations of identification index and view data standard whether.0x00 representes standard not.The data bulk that promptly preceding n-1 data segment comprises is indefinite, only satisfies the condition of the section of being not more than maximum particle size.0x01 representes it is standard.N-1 the data bulk section of equaling maximum particle size that data segment comprises promptly, the n data segment comprises all remaining data, and its data volume section of being not more than maximum particle size. |
The segment index position |
4 |
Point to the skew of segment index in image data packets |
View data quantity |
2 |
The quantity of the view data item of representing to comprise in this section, span is 0-65535 |
4. segment index
Comprise corresponding data segment position in the segment index, and the information of each view data.Its structure is:
[Data Section Position]
[Image Description 1]
[Image Description 2]
[Image Description n]
Information such as that image data information (Image Description) has comprised is wide, high, side-play amount, data length.
The data structure of table 4 segment index information
<tables num="0004"> <table > <tgroup cols="3"> <colspec colname = "c001" colwidth = "29 % " /> <colspec colname="c002" colwidth="21%" /> <colspec colname="c003" colwidth="50%" /> <tbody > <row > <entry morerows="1"> Item </entry> <entry morerows="1"> Length (bytes) </entry> <entry morerows="1"> Description </entry> </row> <row > <entry morerows="1"> the data segment location (Data Section Position) </entry> <entry morerows="1"> 4 </entry> <entry morerows="1"> the segment data segment corresponding to the index in the image data package Offset </entry> </row> <row > <entry morerows="1"> data Location </entry> <entry morerows="1"> 4 </entry> <entry morerows="1"> indicates this image data corresponding to the index number According to the segment offset, from the beginning of the data segment counting the offset </entry> </row> <row > <entry morerows = "1" > data length </entry> <entry morerows="1"> 2 </entry> <entry morerows="1"> represents an image corresponding to the index of this section Data on the number According to the data segment length </entry> </row> <row > <entry morerows="1"> width </entry> <entry morerows = "1"> 2 </entry> <entry morerows="1"> means that the width of the image corresponding to the index of this section, the unit pixels </entry> < / row> <row > <entry morerows="1"> height </entry> <entry morerows="1"> 2 </entry> <entry morerows="1"> represents the height of the image corresponding to the index of this section, the unit pixels </entry> </row> </tbody> </tgroup> </table> < / tables>
5. numbering mapping
In the image data packets, each view data has unique numbering.In order to solve the problem of call number conflict effectively, the call number that each view data is distributed is: (n-1) * maxcount+m, and wherein, n representes that this image belongs to the n section, the maxcount section of expression maximum particle size, m representes that this image is m in this section.
6. compression of images
The data compression mode is specified by compression unit, compression method.When the compression unit is data segment (field value is 1), can, each data segment adopt the compression method of appointment to come each data segment is compressed respectively after generating, and whole image data bag, each view data need not to compress again.When the compression unit is view data (field value is 2), can after the view data of obtaining the DIB form, adopt the compression method of appointment that each view data is compressed separately respectively, whole image data bag, each data segment need not to compress again.When compression unit and compression method all represent not compress (field value is 0), after generation view data APMB package, adopt method such as Flate that whole bag is compressed.
For equilibrium pressure shrinkage and read-write efficiency, the compression unit is that the processing mode of data segment is by strong recommendation.In addition, consider to be used for the situation of image mosaic often when the small size image occurs in a large number, thus the mode that diminishes compression of images is not provided here, to avoid a large amount of quality impairment.
When encrypting, if the compression unit value is not 0, then adopt specified compression method and password, the data segment after the compression or each view data are encrypted, be stored in the document; If the compression unit value is 0, then handle according to normal encryption flow.
7. storage order
Here do not limit the actual physics order between above-described top of file information, data segment, segment index, the master index, but adopted off-set value to carry out the data block location.Application program can decide the actual physical storage order between them according to concrete needs.But no matter adopt the sort of order, top of file information must be positioned at the file beginning, and the master index inlet must be positioned at end of file.
Below, with the concrete applying examples of the image data recording method that combines above-mentioned secondary index structrual description a third embodiment in accordance with the invention.
Suppose that a secondary width is 600, highly is 1200 image that in electronic document, being stored as 1200 height is 1, width is 600 small size image, representes to P1200 with P1 respectively.Because P1 is to be used for the same width of cloth image of splicing expression to P1200, so P1 is identical to the color space of P1200, it is expressed as color space C1.Here, color space information is above-mentioned image information.
In this example, according to the order of image name image P1 to P1200 is sorted, adding 1 order successively with call number is present image allocation index number, generates a normalized image data packets, comprises 1000 little images in each data segment at most.That is, the section maximum particle size is 1000, and image P1 belongs to first data segment to P1000, and first data segment comprises 1000 view data, and image P1001 belongs to second data segment to P1200, and second data segment comprises 200 view data.
With reference to figure 3, in step S1100, use array record P1 (to be stored in the perhaps position in electronic document, path on the disk) to these 1200 image positions of P1200.In step S1101, this array is sorted according to the order of image name.
In step S1102, write the top of file information of IFC file, wherein, compress unit and be set to 2, compression method is set to 1.That is be that unit adopts the flate coding to compress with the view data.Simultaneously, in internal memory, set up master index and the segment index and the data structure thereof of a sky.
In step S1104, obtain the color space C1 of image P1 and the view data D1 of DIB form.
In step S1106, be image P1 allocation index number 1.
In step S1107, use the flate coding view data D1 to be compressed the view data CD1 that obtains compressing.
In step S1108, the view data CD1 of compression is write in the IFC file in the data segment corresponding with call number 1.
In step S1110,, and be recorded among the segment index S1 Data Position POS1, data length LEN1, width W 1, a record of the height H 1 formation Record1 of view data CD1 in segment index S1 of compression.
In step S1112, the description of using image P1 place in the electronic document is replaced with color space C1 and call number 1.
Next coming in order execution in step 1104 to step 1112 couple image P2 handles to P1000.So, in segment index S1, be formed with 1000 records, that is, and Record1-Record1000.Image P1 has identical color space C1 to P1000.
Thereafter; In step S1120; At first; Deviation post and the information of the Record1-Record1000 that among segment index S1s write down of view data CD1 in the IFC file is written in the IFC file, and the amount of images 1000 that comprises in the deviation post of segment index S1 in the IFC file and this section is formed a segment index information SR1 is recorded among the master index M1.
Then, to P1200, same repeated execution of steps 1104 forms 200 records to step 1112 in segment index S2 for image P1001, that is, Record1001-Record1200, the color space of these images also are C1.
Equally; For segment index S2; In step S1120; Deviation post and the information of the Record1001-Record1200 that among segment index S2s write down of view data CD1001 in the IFC file is written in the IFC file, and the amount of images 200 that comprises in the deviation post of segment index S2 in the IFC file and this section is formed a segment index information SR2 is recorded among the master index M1.
At last; In step S1124; Information recorded and master index inlet among the master index M1 is write in the IFC file, and information recorded comprises segment index quantity 2, section maximum particle size 1000, standardization parameter 0x01, deviation post and the amount of images 1000 that comprise and segment index S2 deviation post in IFC file and the amount of images 200 that comprise thereof of segment index S1 in the IFC file among the said master index M1.At last, preserve the electronic document that the description of using the image place is replaced by index number 1-1200 and color space C1 respectively.
Through above processing, remove view data, above-mentioned secondary index structure roughly needs the 1200*14=16800 byte.
In electronic document, use following form to quote view data and color space.
<Image ID=”1”Index=”1”ColorSpace=”C1”>
</Image>
<Image ID=”1200”Index=”1200”ColorSpace=”C1”>
</Image>
If directly adopt XML to describe the use information of each image, then adopt similar following structure:
<Image ID=”1”Width=”600”Height=”1”>
<Loc>this_is_an_image1.BMP</Loc>
</Image>
<Image ID=”1200”Width=”600”Height=”1”>
<Loc>this_is_an_image 1200.PNG</Loc>
</Image>
So approximately needing extra 1200*45=54000 byte, is 3.2 times of above-mentioned secondary index structure approximately.And with these image organizational in document, also need the title of each image of additional description and the skew in electronic document, so also extra needs 25*1200=30000 byte roughly.Take all factors into consideration, the more original expression mode of above embodiment has improved 4-5 storage efficiency doubly.
Above computation process does not relate to the view data size.In actual the use, small size image size is littler usually, tends to have only 100 bytes to three, hundred bytes.In this case, the relative original description information of index structure is more little, and the compression efficiency that can provide is just high more.Original description information can often reach 30% of little view data size.And above-mentioned secondary index structure on average only can reach the big or small 3%-5% of each little view data to each little image.Therefore, the secondary index structure of the present embodiment memory cost of storage overhead during that reduced image in the electronic document significantly with operation.
Below with description how to image of electronic document date storage method according to the present invention and the image stored data are searched, revise, added and delete.
(2) image of electronic document data search method
As stated, through image of electronic document date storage method of the present invention, with being stored in the image set that is dispersed in the electronic document in the IFC file; And these images have been carried out significant segmentation according to different partition strategies; Thereby make only just can extract some view data with general character through an I/O operation, and, through these view data are carried out buffer memory; Buffer memory can be directly read follow-up searching etc. in the operation, and the I/O operation needn't be repeatedly repeatedly carried out.
(the 4th embodiment)
Fig. 4 is the process flow diagram of the image of electronic document data search method of a fourth embodiment in accordance with the invention, and the view data that this method is searched is according to storage means of the present invention and the image stored data.When searching, at first by the user or with other approach provide the position of view data in index structure that will search.As previously mentioned, available index number or other offset information come the position of presentation video data in index structure.In this manual, use call number to represent the position of corresponding view data in index structure as an example.
In step S2000, from the electronic document of as above preserving, obtain the image information of being quoted according to the call number that provides.Identical image information being written under the situation of an image information table, obtain corresponding image information through this image information table.
In step S2002, open the IFC file, obtain the index entry.The deviation post of index entry indication index in the IFC file.In the IFC file, comprise under the situation of header information, obtain its header information.
In step S2004, search the index corresponding according to the index entry, and extract information recorded in this index with the call number that is provided, comprise the deviation post of corresponding view data in the IFC file and the information such as length of this view data.Saidly search order capable of using or many forks are searched algorithm, perhaps under the fixing situation of data structure length, can calculate through byte and realize.
In step S2006, confirm position and the length of corresponding view data in the IFC file according to the index information of call number and extraction, and read this view data.Read at needs under the situation of view data of a data segment, read all images data in the data segment under this view data.
In step 2008, the image information and the view data that obtain are returned to user or buffer memory.
For the situation of view data being carried out significant segmentation according to different partition strategies, the buffer memory step can make that follow-up operation such as to search more efficient.Specifically, through extract for the first time in a certain data segment all images data and with its buffer memory after, can preferentially search the view data in the buffer memory when searching afterwards.If the view data of searching is present in the buffer memory, then directly reads the content in the buffering, and needn't repeat the I/O operation.If the view data of searching is not present in the buffer memory, then carry out the flow process of normally searching shown in Figure 4.
(the 5th embodiment)
As stated, fields such as definable compression unit and compression method in the header information of IFC file.At this moment, when reads image data from the IFC file, should decode to the view data, data segment or the entire I FC file that read with compression method, with the view data of reduction DIB form according to set compression unit.Here, decoding comprises decompression and deciphering.
Fig. 5 is the process flow diagram of image of electronic document data search method according to a fifth embodiment of the invention.The difference of the lookup method of this method and Fig. 4 is, increased the step of view data being decoded according to the compression unit that is provided with in the IFC top of file information and compression method.Below, different steps is only described.
In step 2003, judge that whether the compression unit that is provided with in the IFC top of file information is for not compressing.If the compression unit then in step S2010, decodes to entire I FC file according to the compression method that is provided with in the header information for not compressing.Otherwise, execution in step S2004.
In step S2012, judge whether the compression unit that is provided with in the IFC top of file information is data segment.If the compression unit is a data segment; Then at first, in step S2014, confirm position and the length of call number place data segment in the IFC file according to the index information of call number and extraction; And read this data segment; Then, in step S2018, the data segment that reads is decoded according to the compression method that is provided with in the header information.If the compression unit is a view data; Then at first, in step S2016, confirm view data position and the length in IFC file corresponding with call number according to the index information of call number and extraction; And read this view data; Then, in step S2020, the view data that reads is decoded according to the compression method that is provided with in the header information.
In step S2008, as required all images data in view data or the data segment are returned to user or buffer memory.
(the 6th embodiment)
Below, will the embodiment that search according to the electronic document view data of above-mentioned secondary index structure storage be described with reference to figure 6.Suppose to want image information and the view data of reading images P1010.Because the indexed data structure length fixes, so the searching to calculate and realize of segment index through byte.
In step S2100, from the electronic document of preserving, obtain corresponding color space C1 according to call number 1010.
In step 2110, open the IFC file, reading head information obtains the compression unit and is 2, compression method is information such as 1.
In step S2130, the master index inlet that writes down according to the IFC end of file finds master index M1, and reads information recorded among the master index M1, comprises segment index quantity 2, section maximum particle size 1000, standardization parameter 0x01 etc.According to call number 1010, confirm the view data that will search be arranged in 1010/1000+1=2 data segment.Suppose that the indicated deviation post of master index in the IFC file of master index inlet is PM1; Then according to the data structure of the primary index information in the table 3; The skew that jumps in the IFC file is PM1+ (4+2+1)+(4+2) * (2-1) byte place; Directly read 4 bytes, obtain the deviation post PS2 of segment index S2 in the IFC file, read the quantity 200 of the view data that 2 bytes obtain to be comprised in the 2nd data segment again.
In step S2140,, confirm the 1010%1000=10 bar information place of P1010 at segment index S2 according to call number 1010, section maximum particle size 1000 and standardization parameter 0x01.Data structure according to the segment index information that shows in the position PS2 of segment index S2 and the table 4; Can calculate PS2+ (4+2+2+2) * (10-2) the byte place of position in the IFC file of the 10th information Record1010; 4 bytes after reading obtain the position of CD1010 in the IFC file of P1010; Read the length that 2 bytes obtain CD1010 again, read the width that 2 bytes obtain P1010 again, read the height that 2 bytes obtain P1010 again.
In step S2150, utilize position and the length of the CD1010 that reads acquisition, reads image data CD1010 in the IFC file.
In step S2160, the raw image data D1010 that uses the flate coding that CD1010 is decompressed and obtains P1010.
In step S2170, view data D1010 after decompressing and corresponding color space C1 are returned to the user.
As stated,, then after the reading of data section, data segment is decoded, if the compression unit then decodes to the IFC file when opening the IFC file for not compressing if the compression unit that in the header information of IFC file, writes down is a data segment.
Equally, as stated, can in index, search normally for the first time according to call number, afterwards all images metadata cache in this call number place data segment in internal memory.Preferentially search the data in buffer section when search next time.Such as, adopting and to use nearby that strategy carries out segmentation, the image data storage that one page is all is in a data segment.When for the first time opening this page, read in this data segment all images data and segment index information thereof and with its buffer memory, preferentially in buffer memory, search when searching later at every turn.Below be to carry out the flow process that buffer memory is preferentially searched in this embodiment:
Whether inspection exists the segment index of buffer memory at internal memory;
If in internal memory, there is not the segment index of buffer memory, the normal search procedure of execution graph 6 then;
If in internal memory, there is the segment index of buffer memory, then judge that according to information recorded in the call number that provides and this segment index the view data that will search is whether in the segment index of buffering;
If the view data of searching is not present in the segment index of buffer memory, the normal search procedure of execution graph 6 then;
If the view data of searching is present in the segment index of buffer memory, then directly read the view data in the buffer memory.
From on can find out that the present invention is for the advantage of the unified management that is dispersed in the view data in the electronic document.Just search when original mode is to use, read corresponding view data, will inevitably cause an I/O operation so read at every turn.And, owing to do not have mutual relationship between the small size image, there is not information in groups yet, so be difficult to realize prefetch operation, save the I/O operation.And the IFC file can load into internal memory according to image data section or entire I FC file equigranular, has realized data pre-fetching, not necessarily need carry out the I/O operation during use, and can the data of looking ahead directly be returned, and makes the I/O performance improve.Special, when all small size images of the disposable extraction of needs, only needing the disposable IFC file that reads, original mode then needs repeatedly I/O operation carrying out traversing operation.
(3) image of electronic document data modification method
When revising, at first, judge whether amended data length is longer than the data length before revising according to image of electronic document date storage method image stored data of the present invention.If amended data length is no longer than the data length before revising, then directly in the IFC file original view data place replace, and upgrade corresponding index information according to amended view data.Otherwise, write amended view data in the IFC end of file, upgrade corresponding index information according to this view data, and write the index and the index entry of renewal in the IFC end of file.
Below, with the method for revising view data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose that the view data that will revise is P1001, new view data is ND1001.
If the length after the ND1001 compression is not more than the length of CD1001, then carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, find the position of CD1001.
3. CD1001 is replaced with the data after ND1001 compresses.
4. revise the corresponding information Record1001 of P1001 among the segment index S2, with its data length, wide, value that height is revised as ND1001.
5. preserve.
If the length after the ND1001 compression is then carried out following steps greater than the length of CD1001:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2.
3. write the data after ND1001 compresses in the IFC end of file.
4. the Record1001 among the S2 is made amendment, wide, the height of the deviation post of the data after the ND1001 compression in the IFC file, data length, image write Record1001.
5. write segment index S2 in the IFC end of file.
6. change the deviation post of segment index S2 in the IFC file among the master index M1 the new deviation post of segment index S2 in the IFC file into.
7. write master index M1 in the IFC end of file.
8. write the deviation post of master index M1 in the IFC file in the IFC end of file, that is, and the master index inlet.
9. preserve.
Another kind method is regardless of amended data length, all directly writes amended view data in the IFC end of file, upgrades corresponding index information, and writes the index and the index entry of renewal in the IFC end of file.In above example, that is, no matter the data length after the ND1001 compression how long, all directly writes view data in that the IFC last of file is in hot pursuit, and do not replace.
(4) image of electronic document data-erasure method
, can original IFC file all be untied during in deletion, generate a new IFC file, also can directly delete perhaps increment and delete according to image of electronic document storage means image stored data of the present invention.
Directly deletion may further comprise the steps: at first, revise index, the record in the index that this view data is corresponding is replaced with a null record; Then, the view data that directly institute will be deleted is replaced with 0.
The increment deletion may further comprise the steps: at first, revise index, the record in the index that this view data is corresponding is replaced with a null record; Then, write amended index and index entry again in end of file.
Below, with the method for deleted image data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose to want deleted image P1001.
Under the situation of directly deletion, carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2, find Record1001.
3. the information with Record1001 all is changed to 0.
4. the data with original CD1001 place all are made as 0.
5. preserve.
Under the situation of increment deletion, carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2.
3. the information with the Record1001 among the segment index S2 all is changed to 0.
4. write segment index S2 in the IFC end of file.
5. change the deviation post of segment index S2 in the IFC file among the master index M1 the new deviation post of segment index S2 in the IFC file into.
6. write master index M1 in the IFC end of file.
7. write the deviation post of master index M1 in the IFC end of file, that is, and the master index inlet.
8. preserve.
(5) image of electronic document data adding method
If the optimal way of acquiescence or first nearby other modes; Then with the new view data IFC end of file of writing direct; According to the mode allocation index that generates number, upgrade index according to this view data, and write the index and the index entry of renewal in the IFC end of file.
Under adopting nearby such as the situation of using tactful equal segments strategy, at first, judge with the data segment at common other view data place of using of this view data whether full, that is, and the section of reaching maximum particle size whether.If not full, then this view data added in this data segment, and distribute corresponding call number in this data segment.The call number of being distributed specifically, is call number+1 maximum in this section.Otherwise, set up a new data segment, and according to the mode allocation index that generates number.After writing view data, upgrade corresponding index according to this view data, and index and the index entry upgraded are write the IFC end of file.
Below, with the method for adding view data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose to add a new small size image file P1201.
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1.
3. read the information of the segment index S2 that writes down among the master index M1.
4. because this IFC is normalized, and the amount of images among the segment index S2 is less than maximum particle size 1000, so write CD1201 in the IFC end of file.
5. in segment index S2, add Record1201, insert deviation post, data length, P1201 wide and high of CD1201.
6. write the segment index S2 of renewal in the IFC end of file.
7. upgrade the information of the segment index S2 that writes down among the master index M1, and new deviation post and the quantity 201 of the view data that comprised thereof of segment index S2 in the IFC file is written among the master index M1.
8. master index M1 is write the IFC end of file.
9. write the new deviation post of master index M1 in the IFC end of file, that is, and the master index inlet.
10. preserve the IFC file.
(6) image of electronic document data processing equipment
Image of electronic document data processing equipment according to the present invention comprises storage unit at least, also can comprise searching unit, modification unit, delete cells and adding device.Search the unit, revise the unit, delete cells is connected with storage unit respectively with adding device.
With reference to figure 7, storage unit comprises collection module 10, image processing module 20 and output module 30 at least.Image processing module 20 comprises index process module 23 at least, and output module 30 comprises IFC file module 31 and electronic document module 32.
Only comprise at image processing module 20 under the situation of index process module 23 that collection module 10 is collected the image that will handle from electronic document, the view data and the image information of collected image is provided for image processing module 20 and output module 30.Index process module 23 is set up index structure in internal memory, be the image allocation index of collecting number, and upgrades the index corresponding with call number according to the view data of this image.IFC file module 31 writes the header information of packet file; To write in the data field corresponding the IFC file from the view data that collection module 10 receives, and information such as information recorded and index entry is written in the IFC file in the index structure that will in index process module 23, set up with the call number of in index process module 23, distributing.Electronic document module 32 replaces with a doublet with the description at use image place in the electronic document, that is, and and the call number of quoting and in index process module 23, distributing of corresponding image information.
As shown in Figure 7, image processing module 20 also comprises order module 21, and this module sorts to the image of collecting, so that image is arranged in order according to the order of segmentation.Then, the order according to ordering is that image processing module 20 provides view data and image information with output module 30.
Image processing module 20 also comprises coding module 22.When in the header information of IFC file, being provided with compression unit and compression method, coding module is encoded to view data with compression method according to the compression unit that receives from IFC file module 31.Specifically, when the compression unit was view data, coding module 22 adopted the specified coding method that the view data that receives from order module 21 is encoded, and will pass through the image encoded data then and output to IFC file module 31.When the compression unit was data segment, coding module 22 adopted the specified coding method that the view data that receives from order module 21 is carried out segment encoding according to the segment information that receives from index process module 23.Here, segment information is indicated and is comprised information such as which view data and quantity thereof in a certain data segment.When the compression unit when not compressing, coding module 23 employing specified coding methods are encoded to entire I FC file.
Image processing module 20 also comprises image information module 24, and this module is from the identical image information of extraction from the image information of collection module 10 or order module 21 receptions, and the image information recording that these are identical is in an image information table.At this moment, electronic document module 32 replaces with the description of using the present image place in the electronic document call number of quoting He being distributed of image information table.Perhaps, image information module 24 directly defines these identical image informations according to the describing method of electronic document in electronic document.
According to image of electronic document data processing equipment of the present invention; Search unit reads image data and corresponding index information from the IFC file of IFC file module 31, creating; Read corresponding image information from the electronic document of electronic document module 32, preserving; And under the situation that the view data that reads has been encoded, this view data is decoded; Revising the unit makes amendment to view data in the IFC file of in IFC file module 31, creating and corresponding index information thereof; View data in the IFC file that the delete cells deletion is created in IFC file module 31 and corresponding index information thereof; Adding device adds new view data in the IFC file of in IFC file module 31, creating and upgrades corresponding index information.
More than described according to image of electronic document data processing method of the present invention and device thereof.Through the present invention, a large amount of scrappy images have been carried out the processing and the storage of concentrating, the efficient when making visit improves greatly, and makes that statement is more simple when using these small size images in the document, has saved the data volume of describing.After document was opened, the IFC file can directly be opened, loaded, and apace, directly from the content of the IFC file that loads, read, had reduced the I/O operation amount when obtaining view data wherein afterwards.And diversified strategy also provides more dirigibilities, has improved performance for positioning image data, view data buffer memory in the IFC file.Such as, when adopting nearby strategy, mean when reading an image file with its view data probably to be used very soon with section, can look ahead to these view data, buffer memory.If, when generating the IFC file, each view data is not compressed, but entire I FC file is compressed, can also further improve the compressibility of view data.
Although in above embodiment, described several kinds of image of electronic document data storage, searched, revised, interpolation and delet method; But should be appreciated that; The intent of the present invention is from electronic document, to extract the image information and the view data of image; View data concentrated be stored in the view data APMB package, thereby significantly reduce memory cost, realize unified management.The invention is not restricted to described embodiment, other any similar distortion or replacement all should comprise in the present invention.Such as, in electronic document, use the description at present image place to replace with in this step of doublet, the doublet among the embodiment is the call number of quoting He being distributed of corresponding image information.What here, call number was represented is the position of view data in index.The position of view data in index can also be represented with information such as skews.In addition, doublet also can be tlv triple, four-tuple or the like or be the monobasic parameter, as long as defined image information and the view data that can use therein.That is to say, as long as can from the description that these are quoted, obtain image information and view data, no matter its organizational form how.Index structure is not limited to the structure shown in the embodiment, can also be that each index entry is dispersed in file and the structure that links to each other through pointer, skew everywhere.In addition, index can also separate with view data, is kept in the independent file.The invention is not restricted to only be used for document formats such as existing P DF, XPS, CEB, MARS, also be applicable to the document format of describing image information and view data with it similarly.