CN101894115B - Image data processing method of electronic document and device thereof - Google Patents

Image data processing method of electronic document and device thereof Download PDF

Info

Publication number
CN101894115B
CN101894115B CN2009101519024A CN200910151902A CN101894115B CN 101894115 B CN101894115 B CN 101894115B CN 2009101519024 A CN2009101519024 A CN 2009101519024A CN 200910151902 A CN200910151902 A CN 200910151902A CN 101894115 B CN101894115 B CN 101894115B
Authority
CN
China
Prior art keywords
image
data
view data
index
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101519024A
Other languages
Chinese (zh)
Other versions
CN101894115A (en
Inventor
仇睿恒
王毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Peking University Founder Research and Development Center
Original Assignee
BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Peking University
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIDA FANGZHENG TECHN INST Co Ltd BEIJING, Peking University, Peking University Founder Group Co Ltd filed Critical BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Priority to CN2009101519024A priority Critical patent/CN101894115B/en
Publication of CN101894115A publication Critical patent/CN101894115A/en
Application granted granted Critical
Publication of CN101894115B publication Critical patent/CN101894115B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an image data processing method of an electronic document and a device thereof. The image data processing method includes the methods of storing, searching, modifying, canceling and adding image data and comprises the steps of: acquiring image information and the image data from images collected from the electronic document; allocating index number; writing the image data into a corresponding data area in an IFC (Image File Cluster) file; updating corresponding index information according to the image data; and replacing the description using the image in the electronic document by the quote and the index number of the corresponding image information. The methods of searching, modifying, canceling and adding image data are carried out on the image data in the IFC file by an index structure. The invention intensively stores the image data distributed in the electronic document in the IFC file and carries out significative segmentation on the image data according to difference segmentation policies so as to remarkably reduce the storage expense and improve the access efficiency.

Description

Image of electronic document data processing method and device thereof
Technical field
The invention belongs to the data for electronic documents process field, be specifically related to a kind of image of electronic document data processing method and device thereof, said view data handles the storage that comprises the image of electronic document data, search, revise, add and operation such as deletion.
Background technology
Have multiple electronic file form at present, every kind of electronic file form all adopts different modes to describe image wherein.Such as, in XPS and MARS document, adopting the mode organizes documents form of XML language and Zip packing, each image is storage separately all.When comprising a lot of images in the document; Because must be to photographed image-related informations such as its header information of each iamge description, color spaces; And cause generating the bulk redundancy content, and the document volumetric expansion, thus make the storage overhead of document and the memory cost in when operation become big.And because each reading images access document repeatedly all causes the I/O number of operations too much, the image loading velocity is slack-off.In addition, for the document that adopts Zip as the bottom physical container, a large amount of images also can make compressibility reduce.Simultaneously, along with the increase of document data in the Zip container, the speed of opening document also can reduce.Again such as; In PDF document based on binary format; The view data of binary format and reusable image information are distributed in each position in the PDF document scatteredly; Thereby the extraction, the difficult management that cause image must be carried out the parsing of the degree of depth to the description of entire document, could obtain view data wherein.
In addition, for instruments such as PDF, PS, often utilize a large amount of small size images to be spliced to form a secondary large-scale image to be used for demonstration or output.So-called small size image typically refers to its height, the wide or big or small image that is lower than a certain threshold value of data volume.For this application, need identify which small size image can merge, and occurs merging wrong situation easily.So not only storage overhead is big, and technology realizes that difficulty is big.
Summary of the invention
Technical matters to be solved by this invention is to the above-mentioned deficiency that exists in the prior art; A kind of data processing method of image of electronic document efficiently and device thereof are proposed; So that the image that is dispersed in the electronic document is carried out high efficiency unified management; Thereby the memory cost of the storage overhead that significantly reduces image in the electronic document during, and improve access efficiency and loading velocity to electronic document with operation.
According to an aspect of the present invention, a kind of image of electronic document date storage method is provided, this method may further comprise the steps: from electronic document, collect image; Set up view data APMB package and index structure; With according to index structure the view data of the image of collecting and index information thereof and index entry are write in the view data APMB package; And the description of using the image place in the electronic document replaced with a doublet, said doublet is quoted and the corresponding position of view data in index structure for the image information of the image collected.
The position of corresponding view data in index structure is used in the corresponding call number of distributing for this view data in the index structure and representes.
According to a further aspect in the invention; A kind of method of searching according to image of electronic document date storage method image stored data of the present invention is provided, and this method may further comprise the steps: according to provide the position of view data in index structure that will search from the electronic document of as above preserving, obtain the image information quoted; Open the view data APMB package, and obtain the index entry from the view data APMB package; Search the index corresponding according to the index entry, and extract information recorded in this index with the position that is provided; According to the index information of position that is provided and extraction confirm position and the length of view data in the view data APMB package that will search, and read this view data; Return to user or buffer memory with image information and view data with acquisition.
According to a further aspect in the invention, the method for a kind of modification according to image of electronic document storage means image stored data of the present invention is provided, this method may further comprise the steps: judge whether amended data length is longer than the data length before revising; If amended data length is no longer than the data length before revising, then directly the view data place of beginning replaces in view data APMB package Central Plains, and upgrades corresponding index information according to amended view data; If be longer than the data length before revising with amended data length; Then write amended view data in the image data packets end of file; Upgrade corresponding index information according to this view data, and write the index information and the index entry of renewal in the image data packets end of file.
According to a further aspect in the invention; The method of a kind of deletion according to image of electronic document storage means image stored data of the present invention is provided, and this method may further comprise the steps: will with the record in the corresponding index of the view data that will revise replace with a null record; Replace with 0 with the view data that directly institute will be deleted.
According to a further aspect in the invention; The method of a kind of interpolation according to image of electronic document storage means image stored data of the present invention is provided, and this method may further comprise the steps: judge whether the data segment that common other view data used of the view data that will add with institute belongs to is full; If said data segment is not full, then said view data is added in this data segment, and distribute corresponding call number in this data segment; If said data segment is full, then set up a new data segment, write said view data, and distribute corresponding call number in this data segment; Upgrade corresponding index information according to said view data; Write the image data packets end of file with index information and index entry with renewal.
According to a further aspect in the invention, a kind of image of electronic document data processing equipment is provided, comprises storage unit, this storage unit comprises: collection module, collect the image that will handle from electronic document; The index process module is set up index structure in internal memory, be the image allocation index of collecting number, and upgrades the index corresponding with call number according to the view data of this image; The image data packets file module; Write the header information of view data APMB package; To write from the view data that collection module receives in the data field corresponding the view data APMB package, and information such as information recorded and index entry is written in the view data APMB package in the index structure that will in the index process module, set up with the call number of in the index process module, distributing; With the electronic document module, with using the description at image place to replace with a doublet in the electronic document, that is, and the call number of quoting and in the index process module, distributing of corresponding image information.
Said storage unit also can comprise order module, coding module and image information module.
Said image data processing system also can comprise searches unit, modification unit, delete cells and adding device.
According to the present invention; Through index structure the scattered view data that is distributed in the electronic document is concentrated and to be stored in the view data APMB package; Thereby make and removed the redundant information of a large amount of repetitions; Make that the information description in the electronic document is more simple, the memory cost when having saved a large amount of storage overheads with operation.And; According to different partition strategies the view data of collecting is carried out significant segmentation, thus can disposablely extract in a certain data segment all images data and with its buffer memory, preferentially search buffer memory when searching once more later on; Can reduce the I/O number of operations like this, improve access efficiency.In addition, extract identical image information, also can reduce the redundant information of a large amount of repetitions and describe.In addition, under the situation that entire I FC file is compressed, also can further improve the compressibility of view data.
Description of drawings
Fig. 1 is the process flow diagram according to the image of electronic document date storage method of the first embodiment of the present invention.
Fig. 2 is the process flow diagram of image of electronic document date storage method according to a second embodiment of the present invention.
Fig. 3 is the process flow diagram of the image of electronic document date storage method of a third embodiment in accordance with the invention.
Fig. 4 is the process flow diagram of the image of electronic document data search method of a fourth embodiment in accordance with the invention.
Fig. 5 is the process flow diagram of image of electronic document data search method according to a fifth embodiment of the invention.
Fig. 6 is the process flow diagram of image of electronic document data search method according to a sixth embodiment of the invention.
Fig. 7 is the block diagram according to the storage unit of image of electronic document data processing equipment of the present invention.
Embodiment
In existing electronic document,, generally all write down the view data of its image information and DIB form for each image.Image information mainly is meant color space information, comprises colouring information, palette of type, each passage of color space etc.Below, will be described with reference to the drawings according to image of electronic document data processing method of the present invention and device thereof.Here, said view data is handled the storage that comprises view data, is searched, revises, adds and delete.
(1) image of electronic document data storage
Through image of electronic document date storage method of the present invention; With scattered concentrated being stored in the newly-built view data APMB package of view data that is distributed in the electronic document; Through deviation post and the photographed image-related information thereof of index structure these images of record in this view data APMB package, and to using the description content at image place to carry out adaptive modification in the electronic document.The expansion of said view data APMB package is called ifc (Image File Cluster), is designated hereinafter simply as the IFC file.
(first embodiment)
Fig. 1 is the process flow diagram according to the image of electronic document date storage method of the first embodiment of the present invention.
With reference to figure 1, in step S1000, from electronic document, collect the image that to handle.The accessible electronic file form of the present invention comprises forms such as PDF, XPS, CEB, MARS.In this step, the collected image of available array record is stored in path or the position in electronic document on the disk.
In step S1002, set up IFC file and index structure.Here, the IFC file is meant the file that is used for storing image data and recording indexes information of new establishment, comprises the part such as data field, index, index entry of top of file information, storing image data at least.Top of file information must be positioned at the file beginning.In top of file information, fields such as definable file type, version information, compression unit and compression method.At least write down the deviation post of corresponding view data in the IFC file and the length of this view data in the index.The deviation post of index entry indication index in the IFC file.According to information recorded in index entry and the index, can confirm the position and the length of current image date corresponding data field in the IFC file.Index entry one is positioned the end of IFC file, so that later search, revise, add and operation such as deletion.Simultaneously, be index structure storage allocation space in internal memory, and set up a table, tree or other data structure, prepare against the information of insertion index or view data etc.
In step S1004, obtain the image information and the view data of this image from the present image of collecting.As stated, image information mainly is meant color space information, still, also can comprise other photographed image-related information, such as, the information such as compression parameters of view data in the original electronic document.View data generally all is the view data of DIB form when not compressing.If the view data that from electronic document, reads is the view data of having compressed, then these view data are decompressed according to the compression parameters in this electronic document, it is reduced into the view data of DIB form.
In step S1006, be present image allocation index number.In the IFC file, each view data has unique call number.The call number of view data is since 1, according to the index depth-first traversal.Can be followed successively by present image allocation index number according to the image collection order.In addition, some image organizational that also can will have a general character according to different strategies promptly, are divided in a data segment together, thereby can reduce the I/O number of operations through the mode of looking ahead with a certain data segment of buffer memory, optimize retrieval.That is to say, for each view data is distributed under it corresponding call number in data segment.Such as, comprise for each data segment at partition strategy under the situation of view data of equal number or fixed data that can make each data segment comprise the view data of specified quantity, remaining view data then is included in the last data segment.Using under the tactful nearby situation, the view data of using together is included in the same data segment.Such as, all images that will in one page, use is included in the data segment.Under the situation of size strategy, according to wide, high, the resolution or the big subsection of data volume of image.Under the situation of not having strategy,, directly be followed successively by present image allocation index number according to the image collection order to each data segment and the not restriction of index organization's mode.Consider the problem of EMS memory occupation size,, then not too be fit to buffer memory if all data of data segment are too big.Therefore, can carry out segmentation according to a granularity, that is, the quantity of the view data that each data segment comprised must not surpass the section maximum particle size.In order to solve the problem of call number conflict effectively, preferably, the call number that each image distributed is: (n-1) * maxcount+m; Wherein, N representes that this image belongs to the n section, the maxcount section of expression maximum particle size, and m representes that this image is m in this section.
In step S1008, the view data of present image is write in the data field corresponding with the call number of being distributed in the IFC file.In step S1010, upgrade the index corresponding in the index structure with the call number of being distributed according to the view data of present image.Index can write down the information such as deviation post, data length, picture traverse and height of current image date in the IFC file.According to index entry and corresponding index information, can confirm data field corresponding among the IFC with the call number of being distributed, in this data field, the view data of storage present image.
In step S1012, with using the description at present image place to replace with a doublet in the electronic document.In the present invention, said doublet is the call number of quoting He being distributed of corresponding image information.Be meant in image information under the situation of color space information that the quoting of corresponding image information is quoting the color space of present image.Owing in existing electronic document, generally all defined color space information independently, so can directly quote.
In step S1014, judge whether to also have untreated image.If also have untreated image, then repeating step S1004 is to step S1012.If all images of collecting all disposes, then in step S1016, information recorded in the index and index entry are written in the IFC file, and the description that is kept at wherein collected image is replaced by the electronic document of said doublet.
In addition, this method also can comprise the step that view data is encoded.The compress mode of view data can be set through fields such as definition compression unit and compression method in top of file information as stated.Such as, when compression unit when being view data, can be after obtaining each view data whenever; Such as; It is write before or after the IFC file, adopt the compression method of appointment that these view data are compressed respectively, and need not again entire I FC file to be compressed.When the compression unit is data segment, can after each data segment generates, adopts the compression method of appointment to come each data segment is compressed respectively, and need not again whole image data bag, each view data to be compressed.When the compression unit when not compressing (field value is 0), each view data is not compressed, but is given tacit consent to compression method the IFC file is whole compresses to this adopting again after all image data storage are in the IFC file.Like this, can further improve the compressibility of view data.Said acquiescence compression method can be compression methods such as flate, wavelet transformation or Djvu.After to Image Data Compression, can adopt methods such as ase, des that the view data of compression is encrypted.
In addition, for the parts of images in the electronic document, it is identical that image information is likely.Such as, for many small size images that are used to splice a secondary large-scale image, they use identical color space.In this case, can extract these identical image informations, and it is stored in the image information table, thereby can remove the redundant descriptor of a large amount of repetitions, reach the purpose of saving storage overhead.Can image information table be kept in the IFC file position of this image information table of record in this IFC file in its top of file information.Perhaps, can image information table be kept in the electronic document as independent file.At this moment, in step S1012, the description of using the present image place in the electronic document is replaced with the call number of quoting He being distributed of image information table.Perhaps, can in this electronic document, directly define identical image information according to the describing mode of electronic document.Such as, be under the situation of color space C1 in identical image information, can in electronic document, add the definition of color space C1, use the associated picture place directly to quote this color space C1 at it.
In sum; Through step S1004 to step S1016; According to index structure view data and index information thereof and index entry are write in the view data APMB package, and the description of using the image place in the electronic document is replaced with the call number of quoting He being distributed of corresponding image information.Here, call number is represented the position of corresponding view data in index structure.The position of view data in index can also be represented with information such as skews.
(second embodiment)
As described in the step S1006 of Fig. 1, can carry out significant segmentation to view data according to different partition strategies.The call number of distributing for present image at this moment, is a corresponding numbers in the data segment under it.Sometimes, original image collection order itself just meets the order of segmentation, therefore, can be followed successively by current image date allocation index number.But sometimes, original image collection order possibly not be so orderly.At this moment, under the situation of different partition strategies, can not be followed successively by present image allocation index number probably, but be that present image distributes under it corresponding call number in data segment according to different data segments.That is to say that with regard to the collection order of image, the call number of being distributed is a great-jump-forward.In this case, be not to write view data piecemeal, but a plurality of data segment write view data simultaneously.Therefore, can after collecting image, sort, so that can be followed successively by present image allocation index number according to the order that sorts to these images.And, only after ordering, could guarantee no matter under which kind of partition strategy, can operate piecemeal.
Below, will be with reference to figure 2 descriptions image of electronic document date storage method according to a second embodiment of the present invention.The difference of this method and image data recording method shown in Figure 1 is; Added the image ordered steps; And piecemeal information recorded in the index is outputed in the IFC file, rather than after all images relevant information records is in index structure, whole index informations are being outputed in the IFC file together.The step different with Fig. 1 below only described.
After collecting image step S1000, execution in step S1001 in step S1001, sorts to the image of collecting in a certain order.Here; The order of ordering can be corresponding with partition strategy; Can be the order that is used of image, image volume size sequence, resolution order, picture traverse order, picture altitude order, image information (such as, color space, image type) order or image name sequential scheduling.
In step 1006, because according to image being sorted, so in this step, can be followed successively by view data allocation index number according to the order of ordering with the corresponding strategy of partition strategy.Therefore, we can say also that the result of ordering has the effect of optimization, guidance to the execution of partition strategy.
In step S1018, judge whether present segment disposes.If present segment does not dispose, then repeating step S1004 is to step S1012.Otherwise, execution in step S1020.In step S1020, the information of the present segment that writes down in the index is written in the IFC file.
In step S1022, judge whether to also have untreated data segment.If also have untreated data segment, then repeating step S1004 is to step S1020.Otherwise, execution in step S1024.In step S1024, the index entry is written in the IFC file, and the description that is kept at wherein collected image is replaced by the electronic document of said doublet.
It is obvious that; Can delete step S1020; But the whole index informations that between step S1022 and step S1024, insert the photographed image-related information that will write down all sections are written to the step of IFC file together, at this moment, as long as guarantee the mutual skew of pointing to of various piece correctly.But Comparatively speaking, it is little to export EMS memory occupation piecemeal, and all output logic is simple together, but operation takies more internal memory.
(the 3rd embodiment)
As stated, can carry out significant segmentation to the view data of collecting according to different strategies, thus can be through looking ahead and the view data of buffer memory section uses more efficiently these view data.For index structure, preferably use the secondary index structure.The secondary index structure can improve the dirigibility of index organization, the speed that index loads, thereby improves the efficient of operation, and the scope of application is wider.In the secondary index structure of present embodiment, master index and segment index are set.Correspondingly, record master index inlet (deviation post of master index in the IFC file) in the IFC file.The information such as quantity of the view data that comprises in record segment index quantity, segment index deviation post and this section in the IFC file at least in the master index.Information such as the deviation post of the corresponding view data of deviation post, the current index of the data segment that the record segment index is corresponding at least in the segment index in the IFC file in this data segment, view data length.According to information recorded in master index inlet, master index and the segment index, can confirm position and the length of current image date in IFC file and index structure uniquely.
Below, provided the detailed description of the example of a secondary index structure and corresponding IFC file thereof.That following IFC file is preferred for is wide, higher primary school in 65536 and the data total length less than the image of 65536 bytes, more preferably be used for view data length and be no more than 4k and picture traverse and highly have one less than 4 image.
1. image data packets (IFC) file basic structure
The basic structure of view data APMB package is:
[Header]
[Data Section 1]
[Section Index 1]
[Data Section 2]
[Section Index 2]
[Data Section n]
[Section Index n]
[Main Index]
[Main Index Entry]
The explanation of table 1 image data packets basic structure information
<tables num= " 0001 " > <table > <tgroup cols= " 2 " > <colspec colname= " c001 " colwidth= " 35% " /> <colspec colname= " c002 " colwidth= " 65% " /> <tbody > < row > <entry morerows= " 1 " > </entry> <entry morerows= " 1 " > illustrates </entry> </ row > < row > <entry morerows= " 1 " > Header </entry> <entry morerows= " 1 " > top of file information; Identify in order to the oneself; And sensing master index (MainIndex) </entry> </ row > < row > <entry morerows= " 1 " > DataSection </entry> <entry morerows= " 1 " > image data section; Store view data </entry> </ row > < row > <entry morerows= " 1 " > SectionIndex </entry> <entry morerows= " the 1 " > segment index that this section comprises; The index of corresponding section view data; And corresponding image data section </entry> </ row > < row > <entry morerows= " 1 " > MainIndex </entry> <entry morerows= " the 1 " > master index of appointment; The index information that comprises the overall situation; And sensing segment index </entry> </ row > < row > <entry morerows= " 1 " > MainIndexEntry </entry> <entry morerows= " 1 " > master index inlet; 4 byte longs; The deviation post of record master index in image data packets, an end </entry> </ row > </ tbody > </ tgroup > </ table > </tables> that is positioned file
2. top of file information
The data structure of top of file information is as shown in the table.
The data structure of table 2 top of file information
Item Length (bytes) Description
File Type 4 fixed to four characters for " ! IFC "
version 4 file version number, currently 0x00000001
compression unit 1 See 6 image compression 0 means no compression 1 denotes the unit of the data segment compression 2 denotes image data compression units
compression method 1 See 6 image compression 0 means no compression 1 represents coding using Flate compression Other values are reserved
3. master index
Master index is described by multinomial index and is formed, and its basic structure is:
[Index Count]
[Section Max Count]
[Normative]
[Index Description 1]
[Index Description 2]
[Index Description n]
The descriptor that has provided corresponding segment index, information such as the view data quantity that comprise the segment index position, comprises described in index.
The data structure of table 3 primary index information
Length (byte) Explanation
Index quantity (Index Count) 4 The quantity that the index that record comprises is described
Section maximum particle size (Section Max Count) 2 The maximum quantity of the view data item that can comprise in representing every section, span is 0-65535
Standardization (Normative) 1 In order to the mapping relations of identification index and view data standard whether.0x00 representes standard not.The data bulk that promptly preceding n-1 data segment comprises is indefinite, only satisfies the condition of the section of being not more than maximum particle size.0x01 representes it is standard.N-1 the data bulk section of equaling maximum particle size that data segment comprises promptly, the n data segment comprises all remaining data, and its data volume section of being not more than maximum particle size.
The segment index position 4 Point to the skew of segment index in image data packets
View data quantity 2 The quantity of the view data item of representing to comprise in this section, span is 0-65535
4. segment index
Comprise corresponding data segment position in the segment index, and the information of each view data.Its structure is:
[Data Section Position]
[Image Description 1]
[Image Description 2]
[Image Description n]
Information such as that image data information (Image Description) has comprised is wide, high, side-play amount, data length.
The data structure of table 4 segment index information
<tables num="0004"> <table > <tgroup cols="3"> <colspec colname = "c001" colwidth = "29 % " /> <colspec colname="c002" colwidth="21%" /> <colspec colname="c003" colwidth="50%" /> <tbody > <row > <entry morerows="1"> Item </entry> <entry morerows="1"> Length (bytes) </entry> <entry morerows="1"> Description </entry> </row> <row > <entry morerows="1"> the data segment location (Data Section Position) </entry> <entry morerows="1"> 4 </entry> <entry morerows="1"> the segment data segment corresponding to the index in the image data package Offset </entry> </row> <row > <entry morerows="1"> data Location </entry> <entry morerows="1"> 4 </entry> <entry morerows="1"> indicates this image data corresponding to the index number According to the segment offset, from the beginning of the data segment counting the offset </entry> </row> <row > <entry morerows = "1" > data length </entry> <entry morerows="1"> 2 </entry> <entry morerows="1"> represents an image corresponding to the index of this section Data on the number According to the data segment length </entry> </row> <row > <entry morerows="1"> width </entry> <entry morerows = "1"> 2 </entry> <entry morerows="1"> means that the width of the image corresponding to the index of this section, the unit pixels </entry> < / row> <row > <entry morerows="1"> height </entry> <entry morerows="1"> 2 </entry> <entry morerows="1"> represents the height of the image corresponding to the index of this section, the unit pixels </entry> </row> </tbody> </tgroup> </table> < / tables>
5. numbering mapping
In the image data packets, each view data has unique numbering.In order to solve the problem of call number conflict effectively, the call number that each view data is distributed is: (n-1) * maxcount+m, and wherein, n representes that this image belongs to the n section, the maxcount section of expression maximum particle size, m representes that this image is m in this section.
6. compression of images
The data compression mode is specified by compression unit, compression method.When the compression unit is data segment (field value is 1), can, each data segment adopt the compression method of appointment to come each data segment is compressed respectively after generating, and whole image data bag, each view data need not to compress again.When the compression unit is view data (field value is 2), can after the view data of obtaining the DIB form, adopt the compression method of appointment that each view data is compressed separately respectively, whole image data bag, each data segment need not to compress again.When compression unit and compression method all represent not compress (field value is 0), after generation view data APMB package, adopt method such as Flate that whole bag is compressed.
For equilibrium pressure shrinkage and read-write efficiency, the compression unit is that the processing mode of data segment is by strong recommendation.In addition, consider to be used for the situation of image mosaic often when the small size image occurs in a large number, thus the mode that diminishes compression of images is not provided here, to avoid a large amount of quality impairment.
When encrypting, if the compression unit value is not 0, then adopt specified compression method and password, the data segment after the compression or each view data are encrypted, be stored in the document; If the compression unit value is 0, then handle according to normal encryption flow.
7. storage order
Here do not limit the actual physics order between above-described top of file information, data segment, segment index, the master index, but adopted off-set value to carry out the data block location.Application program can decide the actual physical storage order between them according to concrete needs.But no matter adopt the sort of order, top of file information must be positioned at the file beginning, and the master index inlet must be positioned at end of file.
Below, with the concrete applying examples of the image data recording method that combines above-mentioned secondary index structrual description a third embodiment in accordance with the invention.
Suppose that a secondary width is 600, highly is 1200 image that in electronic document, being stored as 1200 height is 1, width is 600 small size image, representes to P1200 with P1 respectively.Because P1 is to be used for the same width of cloth image of splicing expression to P1200, so P1 is identical to the color space of P1200, it is expressed as color space C1.Here, color space information is above-mentioned image information.
In this example, according to the order of image name image P1 to P1200 is sorted, adding 1 order successively with call number is present image allocation index number, generates a normalized image data packets, comprises 1000 little images in each data segment at most.That is, the section maximum particle size is 1000, and image P1 belongs to first data segment to P1000, and first data segment comprises 1000 view data, and image P1001 belongs to second data segment to P1200, and second data segment comprises 200 view data.
With reference to figure 3, in step S1100, use array record P1 (to be stored in the perhaps position in electronic document, path on the disk) to these 1200 image positions of P1200.In step S1101, this array is sorted according to the order of image name.
In step S1102, write the top of file information of IFC file, wherein, compress unit and be set to 2, compression method is set to 1.That is be that unit adopts the flate coding to compress with the view data.Simultaneously, in internal memory, set up master index and the segment index and the data structure thereof of a sky.
In step S1104, obtain the color space C1 of image P1 and the view data D1 of DIB form.
In step S1106, be image P1 allocation index number 1.
In step S1107, use the flate coding view data D1 to be compressed the view data CD1 that obtains compressing.
In step S1108, the view data CD1 of compression is write in the IFC file in the data segment corresponding with call number 1.
In step S1110,, and be recorded among the segment index S1 Data Position POS1, data length LEN1, width W 1, a record of the height H 1 formation Record1 of view data CD1 in segment index S1 of compression.
In step S1112, the description of using image P1 place in the electronic document is replaced with color space C1 and call number 1.
Next coming in order execution in step 1104 to step 1112 couple image P2 handles to P1000.So, in segment index S1, be formed with 1000 records, that is, and Record1-Record1000.Image P1 has identical color space C1 to P1000.
Thereafter; In step S1120; At first; Deviation post and the information of the Record1-Record1000 that among segment index S1s write down of view data CD1 in the IFC file is written in the IFC file, and the amount of images 1000 that comprises in the deviation post of segment index S1 in the IFC file and this section is formed a segment index information SR1 is recorded among the master index M1.
Then, to P1200, same repeated execution of steps 1104 forms 200 records to step 1112 in segment index S2 for image P1001, that is, Record1001-Record1200, the color space of these images also are C1.
Equally; For segment index S2; In step S1120; Deviation post and the information of the Record1001-Record1200 that among segment index S2s write down of view data CD1001 in the IFC file is written in the IFC file, and the amount of images 200 that comprises in the deviation post of segment index S2 in the IFC file and this section is formed a segment index information SR2 is recorded among the master index M1.
At last; In step S1124; Information recorded and master index inlet among the master index M1 is write in the IFC file, and information recorded comprises segment index quantity 2, section maximum particle size 1000, standardization parameter 0x01, deviation post and the amount of images 1000 that comprise and segment index S2 deviation post in IFC file and the amount of images 200 that comprise thereof of segment index S1 in the IFC file among the said master index M1.At last, preserve the electronic document that the description of using the image place is replaced by index number 1-1200 and color space C1 respectively.
Through above processing, remove view data, above-mentioned secondary index structure roughly needs the 1200*14=16800 byte.
In electronic document, use following form to quote view data and color space.
<Image ID=”1”Index=”1”ColorSpace=”C1”>
</Image>
<Image ID=”1200”Index=”1200”ColorSpace=”C1”>
</Image>
If directly adopt XML to describe the use information of each image, then adopt similar following structure:
<Image ID=”1”Width=”600”Height=”1”>
<Loc>this_is_an_image1.BMP</Loc>
</Image>
<Image ID=”1200”Width=”600”Height=”1”>
<Loc>this_is_an_image 1200.PNG</Loc>
</Image>
So approximately needing extra 1200*45=54000 byte, is 3.2 times of above-mentioned secondary index structure approximately.And with these image organizational in document, also need the title of each image of additional description and the skew in electronic document, so also extra needs 25*1200=30000 byte roughly.Take all factors into consideration, the more original expression mode of above embodiment has improved 4-5 storage efficiency doubly.
Above computation process does not relate to the view data size.In actual the use, small size image size is littler usually, tends to have only 100 bytes to three, hundred bytes.In this case, the relative original description information of index structure is more little, and the compression efficiency that can provide is just high more.Original description information can often reach 30% of little view data size.And above-mentioned secondary index structure on average only can reach the big or small 3%-5% of each little view data to each little image.Therefore, the secondary index structure of the present embodiment memory cost of storage overhead during that reduced image in the electronic document significantly with operation.
Below with description how to image of electronic document date storage method according to the present invention and the image stored data are searched, revise, added and delete.
(2) image of electronic document data search method
As stated, through image of electronic document date storage method of the present invention, with being stored in the image set that is dispersed in the electronic document in the IFC file; And these images have been carried out significant segmentation according to different partition strategies; Thereby make only just can extract some view data with general character through an I/O operation, and, through these view data are carried out buffer memory; Buffer memory can be directly read follow-up searching etc. in the operation, and the I/O operation needn't be repeatedly repeatedly carried out.
(the 4th embodiment)
Fig. 4 is the process flow diagram of the image of electronic document data search method of a fourth embodiment in accordance with the invention, and the view data that this method is searched is according to storage means of the present invention and the image stored data.When searching, at first by the user or with other approach provide the position of view data in index structure that will search.As previously mentioned, available index number or other offset information come the position of presentation video data in index structure.In this manual, use call number to represent the position of corresponding view data in index structure as an example.
In step S2000, from the electronic document of as above preserving, obtain the image information of being quoted according to the call number that provides.Identical image information being written under the situation of an image information table, obtain corresponding image information through this image information table.
In step S2002, open the IFC file, obtain the index entry.The deviation post of index entry indication index in the IFC file.In the IFC file, comprise under the situation of header information, obtain its header information.
In step S2004, search the index corresponding according to the index entry, and extract information recorded in this index with the call number that is provided, comprise the deviation post of corresponding view data in the IFC file and the information such as length of this view data.Saidly search order capable of using or many forks are searched algorithm, perhaps under the fixing situation of data structure length, can calculate through byte and realize.
In step S2006, confirm position and the length of corresponding view data in the IFC file according to the index information of call number and extraction, and read this view data.Read at needs under the situation of view data of a data segment, read all images data in the data segment under this view data.
In step 2008, the image information and the view data that obtain are returned to user or buffer memory.
For the situation of view data being carried out significant segmentation according to different partition strategies, the buffer memory step can make that follow-up operation such as to search more efficient.Specifically, through extract for the first time in a certain data segment all images data and with its buffer memory after, can preferentially search the view data in the buffer memory when searching afterwards.If the view data of searching is present in the buffer memory, then directly reads the content in the buffering, and needn't repeat the I/O operation.If the view data of searching is not present in the buffer memory, then carry out the flow process of normally searching shown in Figure 4.
(the 5th embodiment)
As stated, fields such as definable compression unit and compression method in the header information of IFC file.At this moment, when reads image data from the IFC file, should decode to the view data, data segment or the entire I FC file that read with compression method, with the view data of reduction DIB form according to set compression unit.Here, decoding comprises decompression and deciphering.
Fig. 5 is the process flow diagram of image of electronic document data search method according to a fifth embodiment of the invention.The difference of the lookup method of this method and Fig. 4 is, increased the step of view data being decoded according to the compression unit that is provided with in the IFC top of file information and compression method.Below, different steps is only described.
In step 2003, judge that whether the compression unit that is provided with in the IFC top of file information is for not compressing.If the compression unit then in step S2010, decodes to entire I FC file according to the compression method that is provided with in the header information for not compressing.Otherwise, execution in step S2004.
In step S2012, judge whether the compression unit that is provided with in the IFC top of file information is data segment.If the compression unit is a data segment; Then at first, in step S2014, confirm position and the length of call number place data segment in the IFC file according to the index information of call number and extraction; And read this data segment; Then, in step S2018, the data segment that reads is decoded according to the compression method that is provided with in the header information.If the compression unit is a view data; Then at first, in step S2016, confirm view data position and the length in IFC file corresponding with call number according to the index information of call number and extraction; And read this view data; Then, in step S2020, the view data that reads is decoded according to the compression method that is provided with in the header information.
In step S2008, as required all images data in view data or the data segment are returned to user or buffer memory.
(the 6th embodiment)
Below, will the embodiment that search according to the electronic document view data of above-mentioned secondary index structure storage be described with reference to figure 6.Suppose to want image information and the view data of reading images P1010.Because the indexed data structure length fixes, so the searching to calculate and realize of segment index through byte.
In step S2100, from the electronic document of preserving, obtain corresponding color space C1 according to call number 1010.
In step 2110, open the IFC file, reading head information obtains the compression unit and is 2, compression method is information such as 1.
In step S2130, the master index inlet that writes down according to the IFC end of file finds master index M1, and reads information recorded among the master index M1, comprises segment index quantity 2, section maximum particle size 1000, standardization parameter 0x01 etc.According to call number 1010, confirm the view data that will search be arranged in 1010/1000+1=2 data segment.Suppose that the indicated deviation post of master index in the IFC file of master index inlet is PM1; Then according to the data structure of the primary index information in the table 3; The skew that jumps in the IFC file is PM1+ (4+2+1)+(4+2) * (2-1) byte place; Directly read 4 bytes, obtain the deviation post PS2 of segment index S2 in the IFC file, read the quantity 200 of the view data that 2 bytes obtain to be comprised in the 2nd data segment again.
In step S2140,, confirm the 1010%1000=10 bar information place of P1010 at segment index S2 according to call number 1010, section maximum particle size 1000 and standardization parameter 0x01.Data structure according to the segment index information that shows in the position PS2 of segment index S2 and the table 4; Can calculate PS2+ (4+2+2+2) * (10-2) the byte place of position in the IFC file of the 10th information Record1010; 4 bytes after reading obtain the position of CD1010 in the IFC file of P1010; Read the length that 2 bytes obtain CD1010 again, read the width that 2 bytes obtain P1010 again, read the height that 2 bytes obtain P1010 again.
In step S2150, utilize position and the length of the CD1010 that reads acquisition, reads image data CD1010 in the IFC file.
In step S2160, the raw image data D1010 that uses the flate coding that CD1010 is decompressed and obtains P1010.
In step S2170, view data D1010 after decompressing and corresponding color space C1 are returned to the user.
As stated,, then after the reading of data section, data segment is decoded, if the compression unit then decodes to the IFC file when opening the IFC file for not compressing if the compression unit that in the header information of IFC file, writes down is a data segment.
Equally, as stated, can in index, search normally for the first time according to call number, afterwards all images metadata cache in this call number place data segment in internal memory.Preferentially search the data in buffer section when search next time.Such as, adopting and to use nearby that strategy carries out segmentation, the image data storage that one page is all is in a data segment.When for the first time opening this page, read in this data segment all images data and segment index information thereof and with its buffer memory, preferentially in buffer memory, search when searching later at every turn.Below be to carry out the flow process that buffer memory is preferentially searched in this embodiment:
Whether inspection exists the segment index of buffer memory at internal memory;
If in internal memory, there is not the segment index of buffer memory, the normal search procedure of execution graph 6 then;
If in internal memory, there is the segment index of buffer memory, then judge that according to information recorded in the call number that provides and this segment index the view data that will search is whether in the segment index of buffering;
If the view data of searching is not present in the segment index of buffer memory, the normal search procedure of execution graph 6 then;
If the view data of searching is present in the segment index of buffer memory, then directly read the view data in the buffer memory.
From on can find out that the present invention is for the advantage of the unified management that is dispersed in the view data in the electronic document.Just search when original mode is to use, read corresponding view data, will inevitably cause an I/O operation so read at every turn.And, owing to do not have mutual relationship between the small size image, there is not information in groups yet, so be difficult to realize prefetch operation, save the I/O operation.And the IFC file can load into internal memory according to image data section or entire I FC file equigranular, has realized data pre-fetching, not necessarily need carry out the I/O operation during use, and can the data of looking ahead directly be returned, and makes the I/O performance improve.Special, when all small size images of the disposable extraction of needs, only needing the disposable IFC file that reads, original mode then needs repeatedly I/O operation carrying out traversing operation.
(3) image of electronic document data modification method
When revising, at first, judge whether amended data length is longer than the data length before revising according to image of electronic document date storage method image stored data of the present invention.If amended data length is no longer than the data length before revising, then directly in the IFC file original view data place replace, and upgrade corresponding index information according to amended view data.Otherwise, write amended view data in the IFC end of file, upgrade corresponding index information according to this view data, and write the index and the index entry of renewal in the IFC end of file.
Below, with the method for revising view data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose that the view data that will revise is P1001, new view data is ND1001.
If the length after the ND1001 compression is not more than the length of CD1001, then carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, find the position of CD1001.
3. CD1001 is replaced with the data after ND1001 compresses.
4. revise the corresponding information Record1001 of P1001 among the segment index S2, with its data length, wide, value that height is revised as ND1001.
5. preserve.
If the length after the ND1001 compression is then carried out following steps greater than the length of CD1001:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2.
3. write the data after ND1001 compresses in the IFC end of file.
4. the Record1001 among the S2 is made amendment, wide, the height of the deviation post of the data after the ND1001 compression in the IFC file, data length, image write Record1001.
5. write segment index S2 in the IFC end of file.
6. change the deviation post of segment index S2 in the IFC file among the master index M1 the new deviation post of segment index S2 in the IFC file into.
7. write master index M1 in the IFC end of file.
8. write the deviation post of master index M1 in the IFC file in the IFC end of file, that is, and the master index inlet.
9. preserve.
Another kind method is regardless of amended data length, all directly writes amended view data in the IFC end of file, upgrades corresponding index information, and writes the index and the index entry of renewal in the IFC end of file.In above example, that is, no matter the data length after the ND1001 compression how long, all directly writes view data in that the IFC last of file is in hot pursuit, and do not replace.
(4) image of electronic document data-erasure method
, can original IFC file all be untied during in deletion, generate a new IFC file, also can directly delete perhaps increment and delete according to image of electronic document storage means image stored data of the present invention.
Directly deletion may further comprise the steps: at first, revise index, the record in the index that this view data is corresponding is replaced with a null record; Then, the view data that directly institute will be deleted is replaced with 0.
The increment deletion may further comprise the steps: at first, revise index, the record in the index that this view data is corresponding is replaced with a null record; Then, write amended index and index entry again in end of file.
Below, with the method for deleted image data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose to want deleted image P1001.
Under the situation of directly deletion, carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2, find Record1001.
3. the information with Record1001 all is changed to 0.
4. the data with original CD1001 place all are made as 0.
5. preserve.
Under the situation of increment deletion, carry out following steps:
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1 and segment index S2.
3. the information with the Record1001 among the segment index S2 all is changed to 0.
4. write segment index S2 in the IFC end of file.
5. change the deviation post of segment index S2 in the IFC file among the master index M1 the new deviation post of segment index S2 in the IFC file into.
6. write master index M1 in the IFC end of file.
7. write the deviation post of master index M1 in the IFC end of file, that is, and the master index inlet.
8. preserve.
(5) image of electronic document data adding method
If the optimal way of acquiescence or first nearby other modes; Then with the new view data IFC end of file of writing direct; According to the mode allocation index that generates number, upgrade index according to this view data, and write the index and the index entry of renewal in the IFC end of file.
Under adopting nearby such as the situation of using tactful equal segments strategy, at first, judge with the data segment at common other view data place of using of this view data whether full, that is, and the section of reaching maximum particle size whether.If not full, then this view data added in this data segment, and distribute corresponding call number in this data segment.The call number of being distributed specifically, is call number+1 maximum in this section.Otherwise, set up a new data segment, and according to the mode allocation index that generates number.After writing view data, upgrade corresponding index according to this view data, and index and the index entry upgraded are write the IFC end of file.
Below, with the method for adding view data under the situation of the view data that is given in the storage means store electronic documents of using the 3rd embodiment.Suppose to add a new small size image file P1201.
1. open the IFC file.
2. utilize the lookup method of Fig. 6, read master index M1.
3. read the information of the segment index S2 that writes down among the master index M1.
4. because this IFC is normalized, and the amount of images among the segment index S2 is less than maximum particle size 1000, so write CD1201 in the IFC end of file.
5. in segment index S2, add Record1201, insert deviation post, data length, P1201 wide and high of CD1201.
6. write the segment index S2 of renewal in the IFC end of file.
7. upgrade the information of the segment index S2 that writes down among the master index M1, and new deviation post and the quantity 201 of the view data that comprised thereof of segment index S2 in the IFC file is written among the master index M1.
8. master index M1 is write the IFC end of file.
9. write the new deviation post of master index M1 in the IFC end of file, that is, and the master index inlet.
10. preserve the IFC file.
(6) image of electronic document data processing equipment
Image of electronic document data processing equipment according to the present invention comprises storage unit at least, also can comprise searching unit, modification unit, delete cells and adding device.Search the unit, revise the unit, delete cells is connected with storage unit respectively with adding device.
With reference to figure 7, storage unit comprises collection module 10, image processing module 20 and output module 30 at least.Image processing module 20 comprises index process module 23 at least, and output module 30 comprises IFC file module 31 and electronic document module 32.
Only comprise at image processing module 20 under the situation of index process module 23 that collection module 10 is collected the image that will handle from electronic document, the view data and the image information of collected image is provided for image processing module 20 and output module 30.Index process module 23 is set up index structure in internal memory, be the image allocation index of collecting number, and upgrades the index corresponding with call number according to the view data of this image.IFC file module 31 writes the header information of packet file; To write in the data field corresponding the IFC file from the view data that collection module 10 receives, and information such as information recorded and index entry is written in the IFC file in the index structure that will in index process module 23, set up with the call number of in index process module 23, distributing.Electronic document module 32 replaces with a doublet with the description at use image place in the electronic document, that is, and and the call number of quoting and in index process module 23, distributing of corresponding image information.
As shown in Figure 7, image processing module 20 also comprises order module 21, and this module sorts to the image of collecting, so that image is arranged in order according to the order of segmentation.Then, the order according to ordering is that image processing module 20 provides view data and image information with output module 30.
Image processing module 20 also comprises coding module 22.When in the header information of IFC file, being provided with compression unit and compression method, coding module is encoded to view data with compression method according to the compression unit that receives from IFC file module 31.Specifically, when the compression unit was view data, coding module 22 adopted the specified coding method that the view data that receives from order module 21 is encoded, and will pass through the image encoded data then and output to IFC file module 31.When the compression unit was data segment, coding module 22 adopted the specified coding method that the view data that receives from order module 21 is carried out segment encoding according to the segment information that receives from index process module 23.Here, segment information is indicated and is comprised information such as which view data and quantity thereof in a certain data segment.When the compression unit when not compressing, coding module 23 employing specified coding methods are encoded to entire I FC file.
Image processing module 20 also comprises image information module 24, and this module is from the identical image information of extraction from the image information of collection module 10 or order module 21 receptions, and the image information recording that these are identical is in an image information table.At this moment, electronic document module 32 replaces with the description of using the present image place in the electronic document call number of quoting He being distributed of image information table.Perhaps, image information module 24 directly defines these identical image informations according to the describing method of electronic document in electronic document.
According to image of electronic document data processing equipment of the present invention; Search unit reads image data and corresponding index information from the IFC file of IFC file module 31, creating; Read corresponding image information from the electronic document of electronic document module 32, preserving; And under the situation that the view data that reads has been encoded, this view data is decoded; Revising the unit makes amendment to view data in the IFC file of in IFC file module 31, creating and corresponding index information thereof; View data in the IFC file that the delete cells deletion is created in IFC file module 31 and corresponding index information thereof; Adding device adds new view data in the IFC file of in IFC file module 31, creating and upgrades corresponding index information.
More than described according to image of electronic document data processing method of the present invention and device thereof.Through the present invention, a large amount of scrappy images have been carried out the processing and the storage of concentrating, the efficient when making visit improves greatly, and makes that statement is more simple when using these small size images in the document, has saved the data volume of describing.After document was opened, the IFC file can directly be opened, loaded, and apace, directly from the content of the IFC file that loads, read, had reduced the I/O operation amount when obtaining view data wherein afterwards.And diversified strategy also provides more dirigibilities, has improved performance for positioning image data, view data buffer memory in the IFC file.Such as, when adopting nearby strategy, mean when reading an image file with its view data probably to be used very soon with section, can look ahead to these view data, buffer memory.If, when generating the IFC file, each view data is not compressed, but entire I FC file is compressed, can also further improve the compressibility of view data.
Although in above embodiment, described several kinds of image of electronic document data storage, searched, revised, interpolation and delet method; But should be appreciated that; The intent of the present invention is from electronic document, to extract the image information and the view data of image; View data concentrated be stored in the view data APMB package, thereby significantly reduce memory cost, realize unified management.The invention is not restricted to described embodiment, other any similar distortion or replacement all should comprise in the present invention.Such as, in electronic document, use the description at present image place to replace with in this step of doublet, the doublet among the embodiment is the call number of quoting He being distributed of corresponding image information.What here, call number was represented is the position of view data in index.The position of view data in index can also be represented with information such as skews.In addition, doublet also can be tlv triple, four-tuple or the like or be the monobasic parameter, as long as defined image information and the view data that can use therein.That is to say, as long as can from the description that these are quoted, obtain image information and view data, no matter its organizational form how.Index structure is not limited to the structure shown in the embodiment, can also be that each index entry is dispersed in file and the structure that links to each other through pointer, skew everywhere.In addition, index can also separate with view data, is kept in the independent file.The invention is not restricted to only be used for document formats such as existing P DF, XPS, CEB, MARS, also be applicable to the document format of describing image information and view data with it similarly.

Claims (33)

1. an image of electronic document date storage method is characterized in that, this method may further comprise the steps:
From electronic document, collect image;
Set up view data APMB package and index structure; With
According to index structure the view data of the image of collecting and index information thereof and index entry are write in the view data APMB package; And with using the description at image place to replace with a doublet in the electronic document, said doublet is the corresponding call number of quoting and in index structure, distributing for said view data of the image information of the image of collection;
Wherein, Said index structure is the secondary index structure; This secondary index structure comprises master index and segment index; The deviation post of the data segment that the record segment index is corresponding at least in the segment index in the view data APMB package, deviation post, the view data length of view data in this data segment that current index is corresponding, the quantity of the view data that comprises in record segment index quantity, segment index deviation post and this data segment in the view data APMB package at least in the master index;
Said view data APMB package is meant the file that is used for storing image data and recording indexes information of new establishment, and it comprises data field, index and the index entry of top of file information, storing image data at least; Top of file information one is positioned the beginning of image data packets file, and index entry one is positioned the end of image data packets file; Can the defined file type in the top of file information, version information, compression unit and compression method; At least write down the deviation post of corresponding view data in the view data APMB package and the length of this view data in the index; The deviation post of index entry indication index in the view data APMB package; According to information recorded in index entry and the index, can confirm the position and the length of current image date corresponding data field in the view data APMB package.
2. image of electronic document date storage method according to claim 1 is characterized in that, said write view data APMB package may further comprise the steps with the step of replacement electronic document:
Obtain the image information and the view data of this image from the present image of collecting;
Be present image allocation index number;
The view data of present image is write in the data field corresponding with the call number of being distributed in the view data APMB package;
View data according to present image is upgraded the index corresponding with the call number of being distributed in the index structure;
With using the description at present image place to replace with a doublet in the electronic document, said doublet is the call number of quoting He being distributed of corresponding image information;
Judge whether the image that has been untreated in addition;
If also have untreated image, then repeat the step that obtains image information and the view data of this image from the present image of collecting replaces with said doublet to the description with use present image in the electronic document step; With
If all images of collecting all disposes, then information recorded in the index and index entry are outputed in the view data APMB package, and the description that is kept at wherein collected image is replaced by the electronic document of said doublet.
3. image of electronic document date storage method according to claim 2 is characterized in that, in the step of allocation index number, is that present image distributes under it corresponding call number in data segment according to partition strategy; Said partition strategy is meant such strategy, that is, each data segment comprises the view data of equal number or fixed data, and remaining view data then is included in the last data segment.
4. image of electronic document date storage method according to claim 2 is in the step of said allocation index number, according to using nearby strategy to distribute under it corresponding call number in data segment as present image; Said use strategy nearby is meant such strategy, that is, the view data of using together is included in the same data segment.
5. image of electronic document date storage method according to claim 2 in the step of allocation index number, distributes under it in data segment call number accordingly according to the size strategy for present image; Said big or small strategy is meant such strategy,, comes segmentation according to wide, high, the resolution or the data volume size of image that is.
6. image of electronic document date storage method according to claim 2; It is characterized in that; The step of image information that obtains this image from the present image of collecting and view data; If the view data that obtains is the view data of having compressed, then these view data are decompressed according to the compression parameters in the said electronic document, it is reduced into the view data of DIB form.
7. image of electronic document date storage method according to claim 1 is characterized in that this method also comprises the step that view data is encoded, wherein,
When the compression unit that is provided with in the top of file information is view data; After obtaining each view data; Adopt the compression method of appointment in the top of file information that each view data is compressed respectively, adopt the encryption method of appointment that the view data of compression is encrypted then;
When the compression unit that is provided with in the top of file information is data segment; After obtaining each data segment; Adopt the compression method of appointment in the top of file information that each data segment is compressed respectively, adopt the encryption method of appointment that the data segment of compression is encrypted then;
When the compression unit that is provided with in the top of file information when not compressing; After generating said view data APMB package; Adopt the compression method of acquiescence that the whole image data APMB package is compressed, adopt the encryption method of appointment that the view data APMB package of compression is encrypted then.
8. image of electronic document date storage method according to claim 2; It is characterized in that; This method also is included in from the image acquisition image information of collecting and afterwards the identical image information of a plurality of images is written to the step the image information table; Simultaneously, the description of using the present image place in the electronic document is replaced with the call number of quoting He being distributed of the identical image information in the image information table.
9. image of electronic document date storage method according to claim 8; It is characterized in that; Said image information table is kept in the said view data APMB package position of this image information table of record in this view data APMB package in the top of file information of said view data APMB package.
10. image of electronic document date storage method according to claim 8 is characterized in that, said image information table is kept in the said electronic document as independent file.
11. image of electronic document date storage method according to claim 2; It is characterized in that this method also is included in from the image of collecting and obtains image information defines the identical image information of a plurality of images afterwards this electronic document according to the describing mode of said electronic document step.
12. image of electronic document date storage method according to claim 1 is characterized in that, this method also is included in and from electronic document, collects the step that image sorts to the image of collecting afterwards.
13. image of electronic document date storage method according to claim 12; It is characterized in that; In the step of ordering, the order of ordering comprises order, image volume size sequence, resolution order, picture traverse order, picture altitude order, image information order or the image name order that image is used.
14. image of electronic document date storage method according to claim 12 is characterized in that, under the situation of order with the view data segmentation according to ordering, said write view data APMB package may further comprise the steps with the step of replacement electronic document:
Obtain the image information and the view data of this image from the present image of collecting;
Order according to ordering is followed successively by present image allocation index number;
The view data of present image is write in the data segment corresponding with the call number of being distributed in the view data APMB package;
View data according to present image is upgraded the index corresponding with the call number of being distributed in the index structure;
With using the description at present image place to replace with a doublet in the electronic document, said doublet is the call number of quoting He being distributed of corresponding image information;
Judge whether present segment disposes;
If present segment does not dispose, then repeat the step that obtains image information and the view data of this image from the present image of collecting replaces with said doublet to the description with use present image in the electronic document step;
If present segment disposes, then the information with the present segment that writes down in the index is written in the view data APMB package;
Judge whether to also have untreated data segment;
If also have untreated data segment, then repeat from the present image of collecting and obtain the image information of this image and the step of view data extremely is written to the step in the view data APMB package with the information of the present segment that writes down the index; With
If all data segments all dispose, then the index entry is written in the view data APMB package, and preserves the electronic document that is replaced.
15. search according to claim 2-6 for one kind, 8-11, the method for any one the described image data recording method image stored data in 14 is characterized in that this method may further comprise the steps:
According to provide the corresponding call number of in index structure, distributing that will search for said view data from the electronic document of as above preserving, obtain the image information quoted;
Open the view data APMB package, and obtain the index entry from the view data APMB package;
Search and the corresponding index of corresponding call number that is provided according to the index entry, and extract information recorded in this index;
According to information recorded in this index of corresponding call number that is provided and extraction confirm position and the length of view data in the view data APMB package that will search, and read this view data; With
The image information and the view data that obtain are returned to the user, perhaps return to buffer memory.
16. method of searching view data according to claim 15; It is characterized in that; In the view data APMB package, comprise under the situation of top of file information, when opening the view data APMB package, from the view data APMB package, obtain its top of file information and index entry, and; Said method also comprises the step that view data is decoded
Wherein, When being provided with compression unit and compression method in the top of file information at the view data APMB package;, when opening the view data APMB package, the whole image data APMB package is decoded when not compressing in the compression unit according to the compression method that is provided with in the top of file information; When the compression unit is view data, after reads image data, this view data is decoded according to the compression method that is provided with in the top of file information; When the compression unit is data segment; Confirm position and the length of call number place data segment in the view data APMB package according to the index information of call number and extraction; And read this data segment, according to the compression method that is provided with in the top of file information this data segment is decoded then.
17. method of searching view data according to claim 15 is characterized in that, when with identical image information recording in image information table the time, the identical image information in the image information table is obtained corresponding image information by reference.
18. method of searching view data according to claim 15; It is characterized in that; Under according to the situation of partition strategy with the view data segmentation, confirm that according to the index information of position that is provided and extraction the view data that institute will search belongs to position and the length of data segment in the view data APMB package, read in this data segment all images data and with its buffer memory; When searching the view data in this data segment once more, carry out following steps:
Whether inspection exists the data in buffer section at internal memory;
If in internal memory, do not have the data in buffer section, then carry out lookup method according to claim 15;
If in internal memory, there is the data in buffer section, judge that then the view data that institute will search belongs to whether data segment is the data in buffer section;
If the data segment that will search be not the data in buffer section, then carry out lookup method according to claim 15;
If the data segment that will search be the data in buffer section, then directly read the view data in the data in buffer section.
19. a modification is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 1 to 14 this method may further comprise the steps:
Judge whether amended data length is longer than the data length before revising;
If amended data length is no longer than the data length before revising, then directly the view data place of beginning replaces in view data APMB package Central Plains, and upgrades corresponding index information according to amended view data; With
If amended data length is longer than the data length before revising; Then write amended view data in the image data packets end of file; Upgrade corresponding index information according to this view data, and write the index information and the index entry of renewal in the image data packets end of file.
20. a modification is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 1 to 14 this method may further comprise the steps:
Write amended view data in the image data packets end of file; Upgrade corresponding index information according to said view data; With
Write the index information and the index entry of renewal in the image data packets end of file.
21. a deletion is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 1 to 14 this method may further comprise the steps:
Will with the record in the corresponding index of the view data that will revise replace with a null record; With
The view data that directly institute will be deleted is replaced with 0.
22. a deletion is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 1 to 14 this method may further comprise the steps:
Will with the record in the corresponding index of the view data that will revise replace with a null record; With
Again write the index information and the index entry of renewal in the image data packets end of file.
23. an interpolation is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 1 to 14 this method may further comprise the steps:
Add new view data in the image data packets end of file, and be this its corresponding call number in index structure of view data distribution;
Upgrade corresponding index information according to said view data; With
Write the index information and the index entry of renewal in the image data packets end of file.
24. an interpolation is characterized in that according to the method for any one the described image of electronic document date storage method image stored data in the claim 3 to 5 this method may further comprise the steps:
Judge whether the data segment that common other view data used of the view data that will add with institute belongs to is full;
If said data segment is not full, then said view data is added in this data segment, and distribute corresponding call number in this data segment;
If said data segment is full, then set up a new data segment, write said view data, and distribute corresponding call number in this data segment;
Upgrade corresponding index information according to said view data; With
Index information and the index entry upgraded are write the image data packets end of file.
25. an image of electronic document data processing equipment comprises memory module, it is characterized in that, this memory module comprises collection module, image processing module and output module, wherein:
Collection module; Collect the image that to handle from electronic document; And the view data and the image information of collected image be provided for image processing module and output module; Said image processing module comprises the index process module, and said output module comprises image data packets file module and electronic document module;
The index process module is set up index structure in internal memory, be the image allocation index of collecting number, and upgrades the index corresponding with call number according to the view data of this image;
The image data packets file module; Write the header information of view data APMB package; The view data of the image that will receive from collection module writes in the data field corresponding with the call number of in the index process module, distributing the view data APMB package, and information recorded and index entry information are written in the view data APMB package in the index structure that will in the index process module, set up; With
The electronic document module, with using the description at image place to replace with a doublet in the electronic document, that is, and the call number of quoting and in the index process module, distributing of corresponding image information;
Wherein, Said index structure is the secondary index structure; This secondary index structure comprises master index and segment index; The deviation post of the data segment that the record segment index is corresponding at least in the segment index in image data packets, deviation post, the view data length of view data in this data segment that current index is corresponding, the quantity of the view data that comprises in record segment index quantity, segment index deviation post and this data segment in the view data APMB package at least in the master index;
Said view data APMB package is meant the file that is used for storing image data and recording indexes information of new establishment, and it comprises data field, index and the index entry of top of file information, storing image data at least; Top of file information one is positioned the beginning of image data packets file, and index entry one is positioned the end of image data packets file; Can the defined file type in the top of file information, version information, compression unit and compression method; At least write down the deviation post of corresponding view data in the view data APMB package and the length of this view data in the index; The deviation post of index entry indication index in the view data APMB package; According to information recorded in index entry and the index, can confirm the position and the length of current image date corresponding data field in the view data APMB package.
26. image of electronic document data processing equipment according to claim 25 is characterized in that said memory module also comprises order module, this module sorts to the image of collecting, so that image is arranged in order according to the order of segmentation.
27. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said memory module also comprises coding module, in the header information of view data APMB package, is provided with under the situation of compression unit and compression method, when the compression unit is view data; Coding module adopts the compression method of appointment that the view data that receives from collection module is encoded, and will pass through the image encoded data then and output to the image data packets file module; When the compression unit is data segment; Coding module adopts appointment according to the segment information that receives from the index process module compression method carries out segment encoding to the view data of the image that receives from collection module, will pass through the coded data section then and output to the image data packets file module; When the compression unit when not compressing, coding module employing specified coding method is encoded to the whole image data APMB package, will pass through image encoded packet file then and output to the image data packets file module.
28. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said memory module also comprises the image information module; This module extracts identical image information from the image information of the image that receives through collection module; And the image information recording that these are identical is in an image information table, and simultaneously, the electronic document module replaces with the description of using the present image place in the electronic document call number of quoting He being distributed of the identical image information in the image information table.
29. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said memory module also comprises the image information module, and this module directly defines identical image information according to the describing method of said electronic document in said electronic document.
30. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said image data processing system also comprises searches the unit, and this unit is reads image data and corresponding index information from the view data APMB package of the view data that the image data packets file module, write image, and the electronic document from the electronic document module reads corresponding image information; And under the situation that the view data that reads has been encoded, this view data is decoded.
31. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said image data processing system also comprises the modification unit, makes amendment to view data in the view data APMB package of the view data that in the image data packets file module, write image and corresponding index information thereof in this unit.
32. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said image data processing system also comprises delete cells, and this element deletion has write view data and the corresponding index information thereof in the view data APMB package of view data of image in the image data packets file module.
33. image of electronic document data processing equipment according to claim 25; It is characterized in that; Said image data processing system also comprises adding device, and this unit adds new view data in the view data APMB package of the view data that in the image data packets file module, has write image and upgrades corresponding index information.
CN2009101519024A 2009-05-18 2009-07-02 Image data processing method of electronic document and device thereof Expired - Fee Related CN101894115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101519024A CN101894115B (en) 2009-05-18 2009-07-02 Image data processing method of electronic document and device thereof

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN200910137590 2009-05-18
CN200910137590.1 2009-05-18
CN2009101519024A CN101894115B (en) 2009-05-18 2009-07-02 Image data processing method of electronic document and device thereof

Publications (2)

Publication Number Publication Date
CN101894115A CN101894115A (en) 2010-11-24
CN101894115B true CN101894115B (en) 2012-10-03

Family

ID=43103307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101519024A Expired - Fee Related CN101894115B (en) 2009-05-18 2009-07-02 Image data processing method of electronic document and device thereof

Country Status (1)

Country Link
CN (1) CN101894115B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841893A (en) * 2011-06-21 2012-12-26 北大方正集团有限公司 Method and device for processing fragmentation data in document
CN102331920B (en) * 2011-07-26 2015-04-08 深圳万兴信息科技股份有限公司 Data processing method and device
JP5713855B2 (en) * 2011-09-22 2015-05-07 株式会社ソニー・コンピュータエンタテインメント Information processing apparatus, information processing method, and data structure of content file
CN102567460B (en) * 2011-11-22 2014-05-28 中标软件有限公司 Method for image asynchronous decoding in document loading
CN102710381B (en) * 2012-06-07 2015-04-15 飞天诚信科技股份有限公司 Data transmission processing method and device
CN102929851B (en) * 2012-09-26 2015-09-30 周丽明 A kind of graphic file modification method in document
CN103902587B (en) * 2012-12-27 2017-06-27 联想(北京)有限公司 A kind of synchronous method of identification information and electronic equipment
CN103092991B (en) * 2013-02-08 2016-09-07 宁波江丰生物信息技术有限公司 The information processing method of image and device, display methods and device
CN103761277A (en) * 2014-01-09 2014-04-30 北京掌阔技术有限公司 ePub electronic book loading method and system
CN105721810B (en) * 2014-12-05 2019-06-04 北大方正集团有限公司 A kind of image compression and storing method and device
CN105808599A (en) * 2014-12-31 2016-07-27 高德软件有限公司 Information loading method and apparatus as well as electronic device
CN108733731B (en) * 2017-04-25 2021-12-24 珠海金山办公软件有限公司 Convenient method and device for changing multimedia resources in document and electronic equipment
CN107436848B (en) * 2017-08-03 2021-02-02 苏州浪潮智能科技有限公司 Method and device for realizing conversion between user data and compressed data
CN109815458A (en) * 2017-11-20 2019-05-28 北大方正集团有限公司 Picture method and apparatus to be repaired is set
CN108413942A (en) * 2018-02-08 2018-08-17 深圳凯达通光电科技有限公司 A kind of monitoring system of the cruiseway Simulations of Water Waves Due To Landslides based on big data processing
CN110807300A (en) * 2018-07-18 2020-02-18 广州金山移动科技有限公司 Image processing method and device, electronic equipment and medium
CN109388612B (en) * 2018-09-14 2021-01-15 中国科学院光电研究院 Method, equipment, system and medium for generating data summary document
CN111723230B (en) * 2019-03-19 2023-11-28 珠海金山办公软件有限公司 Picture stitching method and device, electronic equipment and storage medium
CN110688347A (en) * 2019-09-24 2020-01-14 Oppo广东移动通信有限公司 File storage method, file storage device and terminal equipment
CN113129395B (en) * 2021-05-08 2021-09-10 深圳市数存科技有限公司 Data compression encryption system
CN115190217B (en) * 2022-07-07 2024-03-26 国家计算机网络与信息安全管理中心 Data security encryption method and device integrating self-coding network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612252A (en) * 2003-10-31 2005-05-04 浙江中控技术股份有限公司 Real-time data on-line compression and decompression method
CN101393551A (en) * 2007-09-17 2009-03-25 鸿富锦精密工业(深圳)有限公司 Index establishing system and method for patent full text search

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612252A (en) * 2003-10-31 2005-05-04 浙江中控技术股份有限公司 Real-time data on-line compression and decompression method
CN101393551A (en) * 2007-09-17 2009-03-25 鸿富锦精密工业(深圳)有限公司 Index establishing system and method for patent full text search

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
宋林松.FoxBASE索引文件结构.《苏州丝绸工学院学报》.1993,第13卷(第2期), *
旷海蓉等.提高文件系统工作效率的若干技术.《计算机工程》.1998,第24卷(第1期), *
李晶皎等.文件系统索引结构的研究.《东北大学学报(自然科学版)》.2004,第25卷(第4期), *

Also Published As

Publication number Publication date
CN101894115A (en) 2010-11-24

Similar Documents

Publication Publication Date Title
CN101894115B (en) Image data processing method of electronic document and device thereof
US9405790B2 (en) System, method and data structure for fast loading, storing and access to huge data sets in real time
CN104380267B (en) Data compression/decompression device
CN102411616B (en) Method and system for storing data and data management method
US20070143664A1 (en) A compressed schema representation object and method for metadata processing
US20070150809A1 (en) Division program, combination program and information processing method
CN106233632A (en) Ozip compression and decompression
Wu Notes on design and implementation of compressed bit vectors
CN101963944B (en) Object storage method and system
CN1998241A (en) Method for encoding an XML document, decoding method, encoding and decoding method, coding device, and encoding and decoding device
US9244935B2 (en) Data encoding and processing columnar data
CN107741947A (en) The storage of random number key based on HDFS file system and acquisition methods
US20150142763A1 (en) Bitmap compression for fast searches and updates
US9171054B1 (en) Systems and methods for high-speed searching and filtering of large datasets
CN111625531B (en) Merging device based on programmable device, data merging method and database system
JP3636977B2 (en) Variable length database device and access method
CN105337617B (en) A kind of FSN files high-efficiency compression method
CN102214170A (en) Methods and systems for compressing and decompressing extensible markup language (XML) data
US10515092B2 (en) Structured record compression and retrieval
CN110399371A (en) Method, storage medium and the equipment of reduction memory consumption based on Redis database
CN102385606A (en) Method and device for accessing distributed data warehouse
CN115438114B (en) Storage format conversion method, system, device, electronic equipment and storage medium
CN104111899A (en) Cache data storage method and system and cache data reading method
US7624326B2 (en) Encoding device and method, decoding device and method, program, and recording medium
US8185565B2 (en) Information processing apparatus, control method, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220914

Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee after: Peking University

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: PEKING University FOUNDER R & D CENTER

Address before: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee before: Peking University

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: PEKING University FOUNDER R & D CENTER

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121003