CN100501728C - Image processing method, system, program, program storage medium and information processing apparatus - Google Patents

Image processing method, system, program, program storage medium and information processing apparatus Download PDF

Info

Publication number
CN100501728C
CN100501728C CNB2004800031470A CN200480003147A CN100501728C CN 100501728 C CN100501728 C CN 100501728C CN B2004800031470 A CNB2004800031470 A CN B2004800031470A CN 200480003147 A CN200480003147 A CN 200480003147A CN 100501728 C CN100501728 C CN 100501728C
Authority
CN
China
Prior art keywords
data
picture
vectorization
file
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004800031470A
Other languages
Chinese (zh)
Other versions
CN1745381A (en
Inventor
福冈茂雄
谷冈宏
宇佐美彰浩
太田健一
金田北洋
伊藤裕彦
加藤进一
秋庭朋宏
金津知俊
三沢玲司
寺尾仁秀
鹈沢充
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of CN1745381A publication Critical patent/CN1745381A/en
Application granted granted Critical
Publication of CN100501728C publication Critical patent/CN100501728C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The present invention provides the technique that specifies the original document file corresponding to the document to be copied using the document identifier such as a two-dimensional barcode, prints the document without losing image quality using the original document file, and registers the unregistered document with vectorization process so as to suppress image quality deterioration in an early stage.

Description

Image processing method, system, program, program recorded medium and messaging device
Technical field
The present invention relates to read the technology of the document of printing with computerize.
Background technology
In recent years, in the demand of environmental problem, advocate and be transformed into paperless office, and proposed to handle the various technology of digital document.
For example, as the method that paper sheet document is converted to digital document, for example, Japanese Patent Application Publication 2001-358863 has described by scanner scanning paper document, the data-switching of scanning is become digital document form (for example, JPEG or the like), and send the technology of data converted.Yet because the purpose of the above-mentioned technology described in Japanese Patent Application Publication 2001-358863 is the digital document of the image transitions precedent by scanner scanning such as JPEG or the like, it does not consider to use the search of preservation file of the document of printing to handle.Therefore, necessary duplicate printing and scan process, and the digital document image of conversion is degenerated gradually.
On the other hand, Japanese Patent Application Publication 8-147445 discloses the zone that document data is divided into each characteristic, and preserves the technology of All Ranges as raw image data (or view data of compression).Yet, because each zone is used as image and handles, need big document size, and under the situation of the editing and processing of for example amplifying or the like, image degradation.
On the other hand, proposed to search for technology corresponding to the numerical information of paper document.For example, Japanese Patent Application Publication 10-063820 discloses the technology according to the input picture identification respective digital information of scanning, and described the difference of extracting between input picture and the numerical information, and the difference of extracting is compound to the messaging device on the numerical information of identification.On the other hand, Japanese Patent Application Publication 10-285378 discloses following technology.That is, (MFP) (comprises copy function, scan function at the digital multi peripheral hardware, printing function or the like) in, affirmation comprises under the situation of the figure code of indicating page or leaf ID at the image of scanning, and under the situation of finding figure code, the corresponding page or leaf of search ID in database.If in database, find page or leaf ID, then abandon the image of current scanning, read the print data relevant, and print image is produced and is printed on the paper by printing with this page ID.On the other hand, if in database, do not find corresponding page or leaf ID, then Sao Miao image directly is replicated on the paper with replication mode, or PDL order is appended on the image of scanning so that the image transitions of scanning is become the PDL form, and sends data converted with fax or file mode.
Yet, because the technology in Japanese Patent Application Publication 10-063820 is extracted different information by search corresponding to the original figure document of exporting the paper document, so the information that additionally writes on the paper document can be retained as differential image.Yet, because the image of different information direct control scanning, so need be than large storage capacity.In addition, if there be not the original figure document of discovery corresponding to output paper document, then processing finishes.In the technology in Japanese Patent Application Publication 10-285378, if there be not the original figure document of discovery corresponding to the paper document, then PDL orders on the image that is appended to scanning so that this image transitions is become the PDL form.Yet,, need bigger document size, and this file can fill up database when PDL order only is appended on the image of scanning so that this image transitions is become the PDL form.
Summary of the invention
Consider these problems and proposed the present invention, and the purpose of this invention is to provide a kind of technology, the view data of the document that this technology basis will be duplicated is specified the original document file corresponding to the document, according to this document document printing to prevent deteriroation of image quality, and, carry out the location registration process of unregistered document so that the stage is suppressed deteriroation of image quality in early days by the execute vector processing.
To achieve these goals, for example, comprise following scheme based on image processing system of the present invention.That is, image processing system comprises:
Searcher is used for according to the input file and picture, and search is stored in the initial numberical data in the memory storage;
The vectorization device is used in the time can not specifying initial numberical data as the Search Results of searcher, carries out the vectorization of input file and picture and handles; And
Memory control device is used for the vector data of the file and picture of changing by described vectorization device is stored in memory storage.
By description below in conjunction with accompanying drawing, can understand other features and advantages of the present invention, wherein in its drawings attached, similar Reference numeral is specified identical or similar portions.
Description of drawings
Fig. 1 is the network system figure in the embodiments of the invention;
Fig. 2 is the module map of the multifunction peripheral (MFP) in the embodiments of the invention;
Fig. 3 is the module map of the administration PC in the embodiments of the invention;
Fig. 4 is the view that the outward appearance of the MFP in the embodiments of the invention is shown;
Fig. 5 is the process flow diagram that the processing sequence of the administration PC in the embodiments of the invention is shown;
Fig. 6 illustrates the data structure that will be stored in the embodiments of the invention in the database;
Fig. 7 is the view of the contents processing cut apart of the piece of explanation when scanning the paper document;
Fig. 8 illustrates the content of cutting apart the data of generation by piece;
Fig. 9 is the process flow diagram of scanning/decoding processing sequence that two-dimensional bar in an embodiment of the present invention is shown;
Figure 10 illustrates the example of the document that has two-dimensional bar in an embodiment of the present invention;
Figure 11 is the process flow diagram that the file search processing sequence in the embodiments of the invention is shown;
Figure 12 is the process flow diagram that another file search processing sequence in the embodiments of the invention is shown;
Figure 13 A and 13B illustrate in an embodiment of the present invention, handle the process flow diagram of sequence according to the layout search of the layout of the document of scanning;
Figure 14 is the view of the content of explaining that vectorization that the lines in the embodiments of the invention are cut apart is handled;
Figure 15 is the view of the content of explaining that vectorization that the lines in the embodiments of the invention are cut apart is handled;
Figure 16 is the process flow diagram that the signal of handling the in groups processing sequence in the embodiments of the invention is shown;
Figure 17 is the process flow diagram that the detection processing sequence of the graphic element in the embodiments of the invention is shown;
Figure 18 illustrates the data structure of DAOF;
Figure 19 is that the application data that illustrates in the embodiments of the invention generates the process flow diagram of handling sequence;
Figure 20 is that the tree of the file structure among the step S8002 generates the detail flowchart of handling among Figure 19;
Figure 21 A and 21B are the views of the structure of narrative mark structure tree;
Figure 22 illustrates in the embodiments of the invention, uses two-dimensional bar coding pointer information, and the process flow diagram that the pointer information of coding is appended to the sequence on the image;
Figure 23 illustrates the example that window is set in zone in an embodiment of the present invention;
Figure 24 illustrates the example that window is set in zone in an embodiment of the present invention;
Figure 25 illustrates the example that window is set in zone in an embodiment of the present invention;
Figure 26 illustrates the example that window is set in zone in an embodiment of the present invention;
Figure 27 illustrates the example that window is set in zone in an embodiment of the present invention;
Figure 28 illustrates that vectorization in the embodiments of the invention handles is provided with the content check window;
Figure 29 illustrate that vectorization in the embodiments of the invention handles another content check window is set;
Figure 30 is the process flow diagram that the processing sequence of a text block is shown;
Figure 31 illustrates the structure of the dictionary of preparing when the vectorization in execution contexts zone is handled; And
Figure 32 illustrates the structure of intermediate data when the vectorization in execution contexts zone is handled.
Embodiment
Describe the preferred embodiments of the present invention now with reference to the accompanying drawings in detail.
<systematic review 〉
Fig. 1 is the module map that illustrates based on the example of the scheme of image processing system of the present invention.
This image processing system connects the example explanation environment of office 10 and 20 by the Internet 104.Because the layout in the office 20 is identical with layout in the office 10 basically, will explain office 10 below.
MFP (multifunction peripheral :) 100 as the duplicating machine of the network scanner and the network printer, be used to control the administration PC 101 of MFP 100, client PC 102, the document management server of management document file (file server) 106, can use the form of document data or the like to search for the database 105 of processing, and acting server 103 is connected to the LAN 107 that forms in office 10.Fig. 1 only illustrates a client PC, but can connect a plurality of client PC.
Network 107 in the office 10 is connected to the Internet by acting server 103.This is equally applicable to office 20.Therefore, office 10 and 20 network can transmit information each other by the Internet.Notice that expectation avoids the third party to swarm into network in office 10 and 20 by the known technology of for example VPN or the like, but because this technology self does not directly relate to the present invention, therefore the descriptions thereof are omitted.
MFP 100 in this embodiment bears the responsibility of some Flame Image Process (anticipating or the like) of the responsibility of picture sweep unit (scanner) of paper document and scan image signal, and provides view data by LAN 109 to administration PC 101.Administration PC 101 comprises general purpose personal computer, and it has image memory device (hard disk), image processing apparatus (image processing circuit or program), display device, and input media (keyboard and mouse (mouse for example
Figure C200480003147D0011091721QIETU
) or the like pointing device).
Fig. 2 is the module map that the structure of MFP 100 is shown.With reference to Fig. 2, the picture sweep unit 110 that comprises automatic document feeder (ADF) uses from the rayed of light source the image on one or more each document that piles up document, scioptics form the image of the light of document reflection on the solid state image sensing unit, and the scan image signal that obtains to have raster order from the solid state image sensing unit is with as having for example view data of the resolution of 600dpi.In common replication processes, the picture signal of scanning experiences various treatment for correcting or the like to convert tracer signal in data processing equipment 115, and it is temporarily stored (preservation) in the memory device 111 of for example hard disk or the like.MFP 100 notice administration PC 101 tracer signals are stored, and under the control of administration PC 101 to recording unit (also as Printer Engine) 112 order output record signals, thereby on paper, form image.
Note, because the MFP of this embodiment is also as the network printer, so it has the function that produces print image data based on PostScript.As MFP during as the network printer, data processing equipment 115 can write down raster data converting to from client PC 102 outputs and by the print data that network I/F 114 receives from LAN107, and temporarily raster data is stored in the memory device 111.Then, the storage of MFP notice administration PC 101 these raster datas, and under the control of administration PC 101, export raster data to recording unit 112 orders, thereby on paper, form image.At this moment, indication is the view data that to produce by scanning or also is sent to administration PC as the information that the result of network printer function produces this view data.
Reference numeral 113 expression input equipments, it comprises various buttons and switch, and the touch panel that forms on liquid crystal display equipment screen; And Reference numeral 116 expression LCD.
In said structure, will be appreciated that by the description that provides after a while, because memory device 111 in MFP 100 directly visit by administration PC 101, the reason that administration PC 101 is connected to MFP 100 by network 109 is the high access speed of assurance memory device 111 and prevents influence to the transmission of network 107.
Fig. 3 is the module map of administration PC 101 in this embodiment.Administration PC 101 in this embodiment realizes by general purpose personal computer.With reference to Fig. 3, the CPU of Reference numeral 30 expression control entire equipment; Reference numeral 31 expression storage boot, the ROM of BIOS or the like; And Reference numeral 32 expressions are used as the RAM of the workspace of CPU 30.On RAM32, load and carry out OS and the program of the MFP 100 that is used for managing this embodiment.
The display controller of video-ram is introduced in Reference numeral 34 expressions.Reference numeral 35 expression display devices.In this embodiment, liquid crystal display is used as display device 35, but can use the display device of other for example CRT or the like.Reference numeral 36 is illustrated in the touch panel that the front surface of the screen of display device 35 provides.Reference numeral 37 expression is used for the network I/F that communicates by letter with MFP100; And Reference numeral 38 expressions are used to be connected to the I/F of network 107.
Usually, personal computer comprises for example keyboard, pointing device (mouse (mouse
Figure C200480003147D0011091721QIETU
) or the like), or the like input equipment.Yet in this embodiment, touch panel replaces these equipment as input equipment.Personal computer can comprise keyboard and mouse, or keyboard and mouse, and touch panel can use simultaneously.
Fig. 4 illustrates the outward appearance of the MFP 100 that comprises administration PC 101.As shown in Figure 4, the display device 35 and the touch panel 36 of administration PC 101 are so arranged, just look like that they look it is the part of MFP 100.Notice that the keyboard of common PC and pointing device (mouse or the like) can be accommodated in the shell of MFP, so that allow easily to manage the maintenance of PC 101.
Will be appreciated that by following description the display device 35 of administration PC 101 and touch panel 36 are basically as the user interface of MFP 100.If adopt this scheme, then can add the function of the processing that realizes administration PC 101 to MFP100.Yet because administration PC 101 comprises various processing, so in this embodiment, according to the easy exploitation of the program that realizes these processing, the general purpose personal computer that is independent of as the MFP of multifunction peripheral is introduced in the requirement of management and version updating.
The description of<processing summary 〉
Will be described below the contents processing of the system among this embodiment.
In MFP 100, be the file scanning that on ADF, is provided with view data with 600dpi resolution by picture sweep unit 110.By pre-service (various treatment for correcting), the view data of scanning is stored as the view data of one page in the memory device 111 of MFP 100.Administration PC 101 monitors by network 109 whether new image data is stored in the memory device 111 of MFP 100.Alternatively, administration PC can receive message by network 109 when MFP 100 storage file and pictures are in memory device 111.
In either case, when administration PC 101 determined that view data are stored in the memory device 111 of MFP 100, its carried out following the processing.
Separate a view data of scanning by image-region and be divided into zone (piece dividing processing).Then check the pointer information that whether comprises the position of indicating source document.For example, forming under the situation of two-dimensional bar the scanning two-dimensional bar on the presumptive area position in hypothesis.If two-dimensional bar can be identified and decode, then because this information comprises the positional information (filename+path with server name) corresponding to the source document of the document that scans, display position information on the display device 35 of administration PC 101.For example, be stored as " xxxxx " " among the " Shared Folders xxx " of display message: " in file server (document management server) " xxxxx ".
Notice that two-dimensional bar is used as pointer information, but can use other method.For example, can describe the string of indication URL, and this object can experience character recognition to obtain pointer information.And, use the method for the information that embeds by the interval between the minor alteration adjacent character, information is embedded as method of the digital watermarking in the half tone image or the like, pointer information can be included in the image.
When the user uses MFP 100 scanned documents when being used to duplicate, administration PC 101 obtains source document from the document management server of appointment, produce view data based on this source document, and at the compound two-dimensional bar image that comprises the positional information of source document in precalculated position of the view data that produces.Administration PC 101 is stored in composite image data in the memory device 111 of MFP 100.At this moment, administration PC 101 is removed the view data that had before scanned and be stored in the memory device 111, or covers the view data that produces according to source document on the view data of scanning.Then, administration PC 101 is carried out print processing to MFP 100 output print processing instruction orders to control it.
The button of the position by confirming source document to administration PC 101 being used to of sending control panel that whether the indication users be enabled in MFP provides from MFP 100 or the information of duplicating button determine whether in order to duplicate or to confirm the position of source document and scanned document.And, even during scanned document, after on display device 35, showing above-mentioned message, still on touch panel 36, showing to be used to select whether to send the button of print command in order to confirm the position, and when detecting this button of touch, carry out print processing.
And MFP 100 can be used to the editor's purpose except that duplicating.For example, when document is scanned by MFP 100, its source document found (chased up), and show the information of indicating its position.Therefore, the user can use his or she client PC to edit this source document once more by network.
When document is replicated, produce and print image data based on original document corresponding to the document.Therefore, even when the document that will scan is made dirty (contaminated) slightly, can duplicate high quality graphic all the time.
On the other hand, will can not have pointer information (two-dimensional bar or the like) by the document of MFP 100 scannings.That is, administration PC 101 view data determining to be stored in the memory device 111 of MFP 100 does not have bar code.In this case, search is near the file of the document image data of scanning in the source document of having registered, produce thumbnail image based on the raw data of finding by search, and tabulation shows thumbnail image and the information of indicating its position on the display device 35 of administration PC 101 as the candidate.Then, the user uses touch panel 36 to select a file, thereby based on corresponding source document print image.If the position of known this source document of user, he or she can clearly specify its position and file.
If there is not to find candidate corresponding to file and picture, determine that then the document of scanning is not registered, on display device 35, show the corresponding message that provides suggestion, and show and be used to indicate whether the button registered, thereby make the user confirm registration.
When the user imported register instruction (by touching " registration " button), the document data of scanning was converted into predetermined data format, and was registered is the new file in the document management server that occurs on identical network.Owing to the time determine the position of this document in registration, thus when printing positional information compound be two-dimensional bar.
Even when finding source document, if this source document is JPEG for example, the so-called image file of TIFF or the like, then the user can select whether to re-register this document after vectorization.
As MFP during as the network printer, above-mentioned message is sent to the client PC as the print data source, and shows on the screen of client PC.
Therefore, when new document is set up and printed to application program on operating in client PC, because this print data does not have two-dimensional bar, so show the window that is used to select corresponding document as mentioned above.If new document is not registered in the database of document management server, then document files own by him or that herself set up is preserved in this user's indication in document management server.
The details of<contents processing 〉
Explained the overview among this embodiment.The details of processing sequence (mainly being the processing sequence of administration PC 101) is described below with reference to the process flow diagram of Fig. 5.To explain mainly that below the paper document is by MFP 100 scan condition.The processing of carrying out during as the network printer at MFP 100 can by above and below description be understood.
Make the picture sweep unit 110 (comprising ADF) of MFP 100 can scan the document of a page or leaf, and obtain the picture signal (8 of each pixels) of 600dpi by raster order.Picture signal is passed through the pre-service (for example conversion process of treatment for correcting or the like or the like) of data processing equipment 115, and is saved as the view data of a page or leaf in memory device 111.
Process flow diagram among Fig. 5 is that the view data at a page or leaf is stored under the situation in the memory device 111 of MFP 100 when receiving storage from MFP 100 when finishing message, or when detecting the view data that stores a page or leaf in the memory device 111 at MFP100, the processing sequence of the CPU30 of execution.
At step S40, the view data that is stored in the memory device 111 is carried on the RAM 32 by LAN 109.At step S41, text/line image zone and half tone image partly separate to obtain the respective rectangular piece through the zone.Textual portions is divided into rectangular block, these rectangular blocks be combined as corresponding paragraph bunch, and line image partly is divided into table and the figure that is formed of a wire, and then these tables and figure are cut apart.On the other hand, half tone image partly is divided into relevant block, half tone image part for example, background parts, or the like standalone object.
When flow process advances to step S42, detect the pointer information (two-dimensional bar or the like) that in file and picture, writes down, and bar code is decoded as additional information.Then, from the information of step S42 decoding, detecting the pointer information (step S43) in the memory device of storage original digital file.
In the description of this embodiment, pointer information is represented by two-dimensional bar.Yet, the invention is not restricted to two-dimensional bar, and the URL of indication source document position can describe on the precalculated position of file and picture.In some cases, individual invisible pointer information can embed by so-called digital watermark technology.Certainly, can use any of these method.Aspect this, be the method for digital watermarking as embedding information, can use by the method for the interval between the adjustment adjacent character with embedding information, by the method for the pixel value in the change half tone image with embedding information, or the like.
Check whether detect pointer information at step S44.If determine to detect pointer information, then flow process advances to step S45, with (in office 10 or 20) in the document management server of checking this pointer information appointment storage file whether.If can confirm the existence of file, then flow process advances to step S46, to show this memory address and demonstration print command button and nonprint instruction button on display device 35.Note, can replace the nonprint instruction button and show Edit button, transmit button or the like.
If determine to have touched the nonprint instruction button at step S47, then this processing finishes (show Edit button if replace the nonprint instruction button, transmit button or the like, then the executive editor handles, and transmits and handles or the like).If determine to have touched the print command button at step S47, then flow process advances to step S48, to load source document by the pointer information appointment, producing print image data, and produce and the compound two-dimensional bar image that comprises pointer information in the precalculated position of the view data that produces based on this source document.By LAN 109, the composite printing image data storage in the memory device 111 of MFP 100.At this moment, deletion obtains by scanned document, and has been stored in the view data in the memory device 111.After this, flow process advances to step S49, sends print command with the data processing equipment 115 to MFP 100.Respond this instruction, the data processing equipment 115 of MFP 100 converts print image data the data (under the situation of color printer) of respective record color component to, and controlling recording equipment 112 is with print image.
On the other hand, if do not detect pointer information (comprising the situation that does not detect two-dimensional bar) at step S44, if or but pointer information can be detected not find corresponding source document at step S45, then flow process advances to step S50, and discerns processing (OCR processing) to determining in above-mentioned dividing processing as the character in the text filed piece.At step S51, from OCR result, extract word carrying out full-text search, or carry out so-called layout search according to the layout and the Properties of Objects of each object.This search is implemented at database 105.
Content by the database of database 105 management has routine structure as shown in Figure 6.As shown in Figure 6, a record comprises the position of the block that forms a page or leaf by the pointer field of indicating document location, quantity and characteristic (indication text, line image, or the information of half-tone regions) the layout field, and the character code information that is included in text filed constitutes.
By transmitting the position and the characteristic information of the block of a scan page to database 105, and handle a plurality of words that obtain, carry out the search of step S51 and handle by OCR.
If carrying out this searching request handles, and find similar alternative file, then owing to send Search Results (comprising pointer information and layout information) from database 105, so produce the thumbnail image of corresponding candidate based on candidate's layout information, and on display device 35, be shown as tabulation, thereby the prompting user selects one (step S52) among the candidate.Because each thumbnail image has reduced size, thus by using " " or the like character display, and paste suitable figure and view data with as figure and image section, produce thumbnail image.Alternatively, thumbnail image can produce by the reduction source document.The rough layout of the document of visually relatively more own or herself scanning of user and the candidate's who on display screen, shows layout by him, and select of expectation in the thumbnail image by touching it.
As mentioned above, if select one of these candidates step S51 retrieval candidate and at step S52, then flow process advances to step S53, searches for corresponding source document to use corresponding to the pointer information of the thumbnail image of selecting.If the source document of selecting is JPEG for example, TIFF, the scalar image file of BMP or the like then sends the inquiry about whether registering this image file after vectorization.If the user imports register instruction, the step S54 to S57 below then carrying out.On the other hand, if be the vectorization file corresponding to the source document of the thumbnail of selecting, then flow process advances to step S46, and need not any inquiry about vectorization, and the processing of execution in step S47 and subsequent step.
If the Search Results as step S51 does not have discovery to be similar to the file of file and picture, then send the message that similar image is not found in indication, and determine not find original digital file at step S53 at step S52.Flow process then advances to step S54.
If determine to find source document at step S45, and if this source document be JPEG for example, the scalar image file of TIFF or the like then sends inquiry about whether registering this document after vectorization at step S53.When receiving register instruction, but the processing among the execution in step S54 to S57.
If do not find source document, if or determine to find source document but it is JPEG for example, the scalar image file of BMP or the like, and receive the instruction that is used for this image file is converted to the file of vectorial code and registration conversion, then flow process advances to step S54, handles (describing later on) with the vectorization of carrying out corresponding block based on content preset.
For example, for being defined as text filed block, the profile of identification character image extracts the profile vector with the profile along character picture, thereby vectorization text zone.In this case, the OCR result is saved and is the character code corresponding to the vector data of respective symbols.
The above-mentioned processing ground of profile vector is extracted in replacement according to the profile of character picture, owing to obtained character code as the OCR result, the required character size of identification faithful reappearance file and picture, pattern and font, and can use the outline data of preparing into each character types (font type, pattern) to come vectorization OCR result's character code.
Alternatively, can automatically switch corresponding to OCR precision ground and (for example handle, OCR result's discrimination or similarity can be used as the index of OCR precision): if OCR precision height then can be carried out " and handle " based on the vectorization of the outline data of OCR result and each character types; If the OCR precision is low or do not have available corresponding font, then can carry out " extracts the profile vector based on the profile of character picture vectorization processing ".
If the outline data of each character types is installed in MFP or the client PC, then have corresponding segment form text data can by identification OCR result character code with and character size, pattern and font produce, and can be used as the vectorization data.
Suppose that one of these vectorization sequences that will use are set in advance.
Coming that by the profile vector that extracts line image the block that is confirmed as the graphics field is carried out vectorization handles.
Block for being confirmed as half-tone regions (for example natural image zone of photo) uses for example compress technique of JPEG or the like.
According to these contents processings corresponding to each piece of processing set handling of each characteristic.Each piece is carried out these vectorizations handle, and preserve the layout information of each piece.
After vectorization was handled, flow process advanced to step S55, according to layout information each piece being merged into a file, and this document was converted to the application data file that can be employed program editing.The details of the generation of application data file can be described in the back.For example, view data and vector data are embedded document, and preserve the document that embeds with rtf (RICH TEXT form) form.By loading the rtf file preserved by the application program that can handle this rtf file, but not only text data but also figure and view data can be resumed to editing mode.Note, the application data file that change be not limited to can embedded object the rtf form, but can be converted to for example alternative document form of SVG (scalable vector graphics) form or the like.
Flow process advances to step S56, with register-file in the local document management server 106 that administration PC 101 is connected to.And, the position of in database 105, registering each block that obtains by the piece dividing processing together, quantity and characteristic (text, figure, or half tone image type), handle the character code that obtains by OCR, and the pointer information when storage file in document management server 106.
After this, produce the index information (comprising pointer information) that is stored as two-dimensional bar at step S57, and flow process then advances to step S46.
When flow process advances to step S46, the position of file during the notice registration.If S47 sends print command in step, then produce two-dimensional bar, and two-dimensional bar is compound to from the view data that the memory device 111 of MFP 100 is read (step S48), thereby sends print command based on the index information that produces.
In the foregoing description, on MFP 100, be provided with and scanned document.As mentioned above, the MFP 100 of this embodiment is also as the network printer.Therefore, when when client PC 102 receives the print data of file and pictures, carry out identical processing.That is, when MFP when client PC receives print data, it also temporarily is stored in the file and picture of a page or leaf in the memory device 111.Yet send in this case, the location message of source document to client PC by network 107 from administration PC 101.
After this describe the content that each is handled in detail.
[piece dividing processing]
In piece dividing processing (being also referred to as piece selects to handle), the scan image data of a page or leaf (left side of Fig. 7) be identified as each object bunch, determine the characteristic (text/graphics/photo/lines/table etc.) of each piece, and view data is divided into the zone with different qualities, shown in Fig. 7 the right.
Below with an embodiment of description block dividing processing.
Input picture is turned to monochrome image by two-value, and follows the tracks of the pixel clusters of extracting by black pixel constraint by profile.For having the black pixel clusters of larger area, bunch in white pixel carried out profile follow the tracks of to extract white pixel bunch.In addition, recurrence is extracted black pixel clusters from have predetermined area or more white pixel bunch.
According to the size and dimension of the black pixel clusters that obtains, it is classified into the zone with different qualities.For example, determine aspect ratio approach 1 and the size pixel clusters that belongs to preset range for corresponding to a character bunch.In addition, regularly arranged and part that can be combined is defined as text filed adjacent character wherein.And, low (profile) pixel clusters that distributes is categorized as the lines zone, being categorized as table section by the shared scope of black pixel clusters that comprises regularly arranged rectangle white pixel bunch, the territorial classification of the pixel clusters that has had uncertain form wherein having distributed is the photo zone, and other pixel clusters with arbitrary shape is categorized as the graphics field, or the like.
Fig. 8 illustrates the block message of each piece that is obtained by the piece dividing processing, and is used for the input file information that supervisory packet is contained in the piece of input picture.
These message segments of each piece are used to data-switching is become can be by application program editor's form (this processing after this will be called as vectorization), or is used to implement search.
[detection of pointer information]
Will be described below the processing (corresponding to the processing of step S43) of the memory location of extraction document from image information.
Fig. 9 illustrates decoding to be added the two-dimensional bar (QR code sign) in file and picture and the process flow diagram of the sequence of output data character string.Figure 10 illustrates the example of the document 310 that appends two-dimensional bar.
By scanning these documents 310 with MFP 100, the data processing equipment among the MFP 100 is carried out various processing, and the scan image data of a page or leaf is stored in the memory device 111.Read this view data by LAN 109, and scan these view data to detect the position of predetermined two-dimentional bar code symbol 311 according to above-mentioned dividing processing result by CPU 30.Because the position detection mode of QR code is made of the same position detecting unit pattern on 3 angles at 4 angles that are arranged in symbol, so detect the position detection mode (step S300) of this QR code.
Then, decoding format information the mask pattern (step S301) to obtain error-correction level and be applied to symbol adjacent with position detection mode.
After the model of determining symbol (step S302), use the mask pattern that obtains from format information that the coding region bit pattern is carried out XOR, to remove shielding processing (step S303).Note, read sign character according to placement rule, so that decode messages data and error correction code word corresponding to this type.(step S304).
Whether the code that detects decoding comprises mistake (step S305).If detect any mistake, then flow process advances to step S306 to proofread and correct this mistake.According to mode indicators and character count designator code word data is divided into segmentation (step S307) from error correction data.The result (step S308) of mode decoding data character, thereby output according to the rules.
Notice that want coded data to represent the address information of corresponding document (pointer information) in the two-dimensional bar, its routing information and filename by the include file server name constitutes.Perhaps constitute address information by URL at corresponding document.
In this embodiment, the document 310 that appends the pointer information that uses two-dimensional bar has been described.Alternatively, pointer information can use character string to come record.In this case, (for example detect by above-mentioned selection course based on the piece of the character string of pre-defined rule, be positioned at the text block in precalculated position), and the character of character string of indication pointer information carried out character recognition, thereby obtain the address information of source document.
And, by imperceptible modulated applications in the character string of the character block 312 of example document 310 as shown in Figure 10 or 313 interval between the adjacent character so that information is embedded the character pitch, thereby can give pointer information.For example, when when (describing later on) handled in execution character identification, detecting the character pitch, can obtain pointer information.And pointer information can be composed is the invisible digital watermark in the natural image 314.
[based on the file search of pointer information]
Utilize Figure 11 to describe below and handle (corresponding to the processing of step S45) based on the file search of pointer information.
Based on the address specified file server (step S400) that is included in the pointer information.
As mentioned above, file server is meant document management server in this embodiment, but can be client PC in some cases, or is appended under the situation of MFP 100 in image store/manage function, can be MFP 100.
Notice that the address is URL or comprises server name and the routing information of filename.
After can the specified file server, the address be sent to file server (step S401).
When receiving the address, file server retrieval corresponding document (step S402).If do not find file (is "No" at step S403), then file server sends to the message of corresponding suggestion to administration PC.
On the other hand, if find file (is "Yes" at step S403), then as mentioned above, this document is transmitted to request source (administration PC) (step S408).
[file search processing]
Handling (corresponding to the flow process of step S51) based on the search of the placement position of the piece of cutting apart generation by piece will be below be described with reference to the process flow diagram of Figure 13 A and 13B.That is, do not have under the situation of pointer information at the input file and picture, pointer information can with but do not find under the situation of digital document, or be under the situation of image file at digital document, carry out this processing.
Will be described below a kind of situation, wherein the piece by the piece segmented extraction has the information shown in Fig. 8 (block message, input file information).
As the information content, example goes out the characteristic of OCR information, coordinate position, width and height, and availability.Characteristic becomes text to each block sort, lines, photo, image, one of table or the like.For simply, arrange each piece according to the ascending order of coordinate X.That is, in Fig. 8, X1<X2<X3<X4<X5<X6, and each piece is by corresponding called after piece 1,2,3,4,5 and 6.The indication of the sum of piece is included in the quantity of the piece in the input picture of a page or leaf, and is " 6 " at Fig. 8.To use these message segments from database, the file that is similar to input file to be carried out the processing sequence of layout search with reference to the flow chart description of Figure 13 A and 13B below.
In the flow process of these process flow diagrams, compare with the information of same type in the database 105 successively through MFP 100 scannings and by the multistage information that the piece dividing processing obtains.
At step S510, initialization similarity level or the like (describing later on).At step S511, the sum of comparison block.If at step S511 is "Yes", then compare each segment information of each piece in each file successively.When the information of comparison block, at step S513, S515 and S518 estimated performance, size and OCR similarity level, and at step S522 based on the whole similarity levels of these level calculation.Because the method for each similarity level of calculating can be used known technology, so with the descriptions thereof are omitted.If determine that at step S523 total similarity level is higher than predetermined threshold Th, then determine that at step S524 this document is similar candidate.In Figure 13 A and 13B, N, W and H are respectively the sums of piece in the input file and picture, and the width of each piece, and the height of each piece, and Δ N, Δ W and Δ H are the value of consideration at the error of the block message of input file.And n, w and h are respectively the sums that is stored in the piece of the file in the database, the width of each piece, and the height of each piece.Although not shown, can work as comparison position information X when step S514 comparing dimensions, positional information Y or the like.
Repeat above-mentioned search and handle at being stored in all data in the database 105, and the database file that has the similarity level that is higher than threshold value Th and be saved as candidate (step S524) is shown as thumbnail image or the like (step S52).If operating personnel must select one of a plurality of candidates, then the input operation by operating personnel comes specified file.
[vectorization processing]
Describe the processing (corresponding to the processing of step S54) of vectorization view data below in detail.
In this embodiment, prepare user interface, this user interface is used for and the type of block set handling method correspondingly.Carrying out vectorization based on the setting on these user interfaces handles.
For example,, " vector data " can be set ad hoc for text filed piece, " bianry image ", or " multivalue image ", and the ON/OFF (ON/OFF) that " appends OCR information " can be set.Selected and when being provided with as " vector data ", as mentioned above, can be further with the vectorization " of " based on the profile of character picture, " is based on the vectorization " of the outline data of OCR result and each character types, and " switches " or the like based on the automatic processing of discrimination and selects and be set to vectorization manner of execution (not shown among Figure 23).
Shown in Figure 24 or 25, the piece for piece in line image zone (zone of figure that is made of lines or the like) or lines can be provided with " vector data " specially, " image ", or " is included in " in the background object.
As shown in figure 26,, " vector data " can be set specially for the piece of table section, " bianry image ", " multivalue image ", or " is included in " in the background object, and the ON/OFF that " appends OCR information " can be set.Yet when comprising this piece in the background object, " appends OCR information " and always closes.
In addition, as shown in figure 27,, can " image " be set specially or " is included in " in the background object for the image-region of for example photo or the like.
As default setting, be assumed to the text filed " of setting vector data " (based on the vectorization of profile) and " and append OCR information ", for line image zone and lines zone are provided with " vector data ", for table section is provided with " vector data " and " appends OCR information ", and " image " is set for the photo zone.When user's input was provided with the change instruction, the user interface shown in Figure 23 to 27 was called so that the numerical value of expectation to be set.These settings are written into even can preserve the nonvolatile storage area of content, for example hard disk or the like after powered-down, and are read out from hard disk, and are provided with automatically when handling with follow-up startup next time.That is,, then content is set and handles, unless they are changed based on these in case carry out these settings.
Will be described below details based on the processing of the content of using above-mentioned interface.For example, if be the text filed " bianry image " that is provided with, then binaryzation text zone is to obtain the result that vectorization is handled; If " multivalue image " is set, then compress text zone to obtain the result that vectorization is handled by for example JPEG or the like.On the other hand,, and close " and append OCR information " if " bianry image " or " multivalue image " are set, the result of then independent output image data to handle as vectorization, and do not append any character identification result (handling) by skipping character recognition; Append OCR information " if open ", then carry out<<character recognition〉handle with the output character code, character size, and the position of recognition result, and as vectorization result's view data.If indication " vector data " is set, then carries out vectorization based on above-mentioned setting.In this case, append OCR information ", then output character code and vector data result to handle as vectorization if open "; If close it, the result of then independent output vector data to handle as vectorization.
For line image/lines zone, be provided with and read equally, and handle based on the setting of reading.
For example, when " vector data " was set, the profile that extracts the lines/line image in this piece to be producing the profile vector based on profile, thereby finished vectorization and handle, and, the data-switching in this piece become reusable vector data that is.On the other hand, when " image " is set, extracts this zone as a view data, and compress processing or the like, thereby obtain the vectorization result.When " being set being included in the background object ", skip this regional vectorization and handle, and this zone is handled as a part of background object.
For table section, be provided with and read equally, and handle based on the setting of reading.
For example, when selecting and " vector data " is set, extract ruling (ruled line), the profile of character or the like to be producing the profile vector based on profile, thereby finishes vectorization and handle.Note, can allow the additional " that is provided with to carry out ruling to handle ", and execution is handled with above-mentioned text filed identical vectorization based on the vectorization of profile.When unlatching " appends OCR information ", output character code and vector data result to handle as vectorization; When closing it, separately the output vector data are with as the vectorization result.When " bianry image " or " multivalue image " are set, and close " when appending OCR information ", only export the result of this regional view data, and do not append any character identification result (handling) without any character recognition to handle as vectorization.On the other hand, when opening " and append OCR information ", carry out<<character recognition〉handle with the output character code, character size, and the position of recognition result, and as vectorization result's view data.When " being set being included in the background object ", respective regions is handled as a part of background object without any vectorization with handling.
For the photo zone, be provided with and read equally, and handle based on the setting of reading.For example, when " image " is set, compress this zone by for example JPEG.When " being set being included in the background object ", respective regions is handled as a part of background object.
[character recognition]
In character recognition is handled, use one of method for mode matching to be identified as the image of each character extraction to obtain the respective symbols code.In this identification is handled, the observation proper vector that becomes tens of dimensions (several-ten-dimensional) digital value string to obtain by the Feature Conversion that obtains from character picture, compare with the dictionary proper vector that obtains in advance at each character types, and output is as the character types with bee-line of recognition result.Various known method all can be used for proper vector and extracts.For example, known handlebar Character segmentation becomes the net pattern, and according to the direction of character line in each mesh described character line is counted the line unit, is used as the method for feature to obtain (mesh counting) dimensional vector.
When text filed when carrying out character recognition to what extract by the piece dividing processing, determine presentation direction (level or vertical) at this zone, in respective direction, extract row, and then extract character picture.When definite presentation direction (level or vertical), calculate the level and the vertical projection of the pixel value in this zone, and if the deviation of horizontal projection greater than the deviation of vertical projection, can determine that then this zone is that level is write the zone; Otherwise, can determine that this zone is the vertical writing zone.When resolving into character string and character, under the situation that level is write, row is extracted in the usage level projection, and extracts character based on the vertical projection of the row that extracts.Under the text filed situation of vertical writing, the relation between level and the vertical parameter can be exchanged.At this moment, can detect character size.
[Character Font Recognition]
Prepare to be used for the dictionary of character recognition corresponding to word model ground, and font and character code are exported together when mating, thereby the font of identification character.That is, when the dictionary with short-range proper vector information can be specified by character recognition, determine that this dictionary is the font of respective symbols image.
[based on OCR result's character vectorization]
By using character code and the font information that obtains by character recognition and Character Font Recognition, and the outline data of preparing for each character, character information translation is partly become vector data.If the image of scanning has color character, then the color of each character is extracted, and is recorded with vector data.
By above processing, the image information that belongs to text block can be converted into according to OCR result has almost real shape, the vector data of size and color.
[based on the vectorization of profile]
When being text filed setting based on the vectorization of the profile of character picture, or when it is provided in line image (figure), line, or in the table section during execute vector processing, as described below, according to the profile of the pixel clusters of extracting this regional information translation is become vector data.
More specifically, the point sequence that constitutes the pixel of profile is divided into the part that is positioned at the some place that is considered to the angle, and approaches each part by partial line or curve.The point corresponding to maximum curvature is represented at the angle, and as shown in figure 14, obtain such point as the point corresponding to maximum curvature, at this some place, arbitrfary point Pi and the distance between the string of drawing between a left side and right and k some Pi-k that puts of some Pi separation and the Pi+k become maximum.In addition, make that R is chord length/arc length between Pi-k and the Pi+k.Then, the numerical value R point that is equal to or less than threshold value can be considered to the angle.By using least squares approach or the like, and use the 3rd splines or the like at curve at the point sequence of line, can vectorization the part of acquisition after the cutting apart of each angle.
In object has, during profile, use point sequence, it is similarly approached by partial line or curve by the white pixel profile of piece selective extraction.
As mentioned above, use partial line to approach (partial line approximation), the profile with figure of arbitrary shape can be by vectorization.When original document has coloured image, be recorded from the color of coloured image extraction figure and with vector data.
As shown in figure 15, in exterior contour approaches profile or provide the part in another exterior contour the time, two profile combination can be expressed as the line with specified width, which width.More specifically, draw line from specifying the some Qi of each Pi on another profile on the profile, wherein each line has the bee-line to respective point.When distance P iQi on average keeps constant value, approach relative section by using intermediate point Ri as the line or the curve of point sequence, and the mean value of distance P iQi is set to the width of this line or curve.As mentioned above, line or can be turned to one group of line by vector effectively with specified width, which width as the form line of one group of line.
[Figure recognition]
The processing that is used to make up the vectorization partial line of each Drawing Object after the profile vectorization of the figure that will have arbitrary shape as mentioned above will be described below.
Figure 16 illustrates the processing sequence up to the vector data of each Drawing Object of combination.
Calculate the initial and end point (step S700) of each vector data.Use each vectorial initial point information and end point information, test pattern unit (step S701).The test pattern unit is to detect the closed figures that is made of partial line.The principle that has the vector that is connected to its two ends by each close-shaped vector of application formation detects.Then, other graphic element that occurs in the graphic element or partial line are combined so that single Drawing Object (step S702) to be set.If do not have other graphic element or partial line in graphic element, then this graphic element is set to Drawing Object.
Figure 17 is the process flow diagram that the flow process of test pattern unit is shown.By getting rid of non-expectation vector, from vector data, to extract closed figures and form vector, the two ends of described non-expectation vector are not connected to other vector (step S710).The initial point that closed figures forms the relevant vector of vector is set to starting point, and vector is followed the tracks of successively clockwise.Carry out this processing, up to the arrival starting point, and the vector of all processes is combined into the closed figures (step S711) that constitutes a graphic element.And all closed figures that occur in the closed figures form vector and all are combined.In addition, the initial point of the vector that also is not combined is set to starting point, and repeats above the processing.At last, from the non-expectation vector that step S710 gets rid of, those vectors of vector that connect the closed figures that is combined into step S711 are detected, and are combined into a graphic element (step S712).
Use above the processing, can handle graph block as independent reusable Drawing Object.
[conversion process of application data]
The piece of the view data of a page or leaf is cut apart and the result of vectorization is converted into the file with intermediate data format.After this, this intermediate data format will be called as document analysis output format (DAOF).Figure 18 illustrates the data structure of DAOF.
With reference to Figure 18, the head of the information relevant with the document image data that will handle is preserved in Reference numeral 791 expressions.Characteristic information and block address information that layout description's data field 792 is preserved each piece, these pieces are at each characteristic, TEXT (text) for example, TITLE (title), CAPTION (captions), LINEART (line image), PICTURE (natural image), FRAME (frame), TABLE (table) or the like and identification.Character recognition data of description field 793 is preserved by carrying out for example TEXT, TITLE, the character identification result that the character recognition of the TEXT piece of CAPTION or the like obtains.The details of the structure of table data of description field 794 storage TABLE pieces.Iamge description data field 795 storage is handled the vector data (or the view data when being arranged to that these pieces are saved as image) of generation by the vectorization of each piece, the view data of PICTURE piece, or the like.
This DAOF self often is saved to replacing the file of intermediate data.Yet under the state of file, general file is created application program can not reuse individual subject.Therefore, will be discussed in more detail below the processing that DAOF is converted to application data.
Figure 19 illustrates the overall process sequence that converts application data to.
At step S8000, input DAOF data.At step S8002, produce file structure tree as the application data basis.At step S8004,, thereby produce real application data based on the real data among the file structure tree input DAOF.
Figure 20 is the detail flowchart that the structure tree in step S8002 generate to be handled, and Figure 21 A and 21B are the explanatory of file structure tree.As the primitive rule of The whole control, treatment scheme is converted to macro block (macroblock) (chunk) from microlith (microblock) (single).In the description of Figure 20, piece is meant microlith and macro block.
At step S8100,, reconfigure at each piece according to the correlativity of vertical direction.After beginning, flow process determines at each microlith immediately.Notice that by checking that whether the distance between the adjacent block is less, each piece has piece width (being height under the situation in the horizontal direction) much at one, or the like, can determine correlativity.Can extract distance with reference to DAOF, width, the information of height or the like.
Figure 21 A illustrates the actual page structure, and Figure 21 B illustrates the file structure tree of this page.
Result as the processing of step S8100 determines T3, and T4 and T5 to be to constitute a group V1, determines T6 and T7 constituting a group V2, and produces these groups with as the group that belongs to identical layer.
At step S8102, the existence of inspection vertical isolation sign/do not exist.Physically, isolation mark is the object that has the line characteristic among the DAOF.
In logic, isolation mark is the unit of clear and definite block in using.When detecting isolation mark, cut apart group again at identical layer.
Whether then, use the group length inspection to exist more at step S8104 cuts apart.If the group length of vertical direction and page or leaf be (distance between a plurality of highest and lowest end that occurs in the relevant page or leaf) unanimity highly, then the file structure tree generates the processing end.
Under Figure 21 A and 21B situation, group V1 and V2 do not have isolation mark, and its group is highly or not consistent with page or leaf.Therefore, flow process advances to step S8106.
At step S8106,, each piece is reconfigured according to the correlativity of horizontal direction.The definition of correlativity with and definite information and vertical direction identical.
Under Figure 21 A and 21B situation, T1 and T2 generation group H1, and V1 and V2 generation group H2.Generation group H1 and H2 are with as the group that belongs to than the identical layer of V1 and the high one deck of V2.
At step S8108, the existence of isolation mark on the inspection horizontal direction/do not exist.
Because Figure 21 A and 21B comprise isolation mark S1, this isolation mark is registered in tree, thereby produces layer H1, S1 and H2.
Use the group length inspection whether the existence of cutting apart at step S8110 not more.
When the group length and page width degree (distance between a plurality of the most left and low order end that occurs in the relevant page or leaf) unanimity of horizontal direction, then the file structure tree generates the processing end.Otherwise flow process turns back to step S8102, so that the correlativity inspection from vertical direction begins re-treatment in the upper strata of high one deck.
Under Figure 21 A and 21B situation, because group length is consistent with the page width degree, processing finishes, and represents that the V0 of the superiors of whole page or leaf is appended in the file structure tree at last.
After the file structure tree is finished, produce application data at step S8006 based on this information.
With the concrete instance of explaining below under the situation of Figure 21 A and 21B.That is, because H1 comprises two the piece T1 and the T2 of horizontal direction, it is outputted as two row.After the internal information of output T1 (with reference to DAOF, as the text of character identification result, image or the like), new row are set, and the internal information of output T2.After this, output isolation mark S1.
Because H2 comprises two the piece V1 and the V2 of horizontal direction, it is outputted as two row.Press T3, the internal information of the order output V1 of T4 and T5, and new row are set.Then, connect the internal information of the order output V2 of T6 and T7.In this way, owing to can proceed to the conversion process of application data by output in proper order, for example, the application data of conversion can have text filed or the like the order that correctly reads.
[appending pointer information]
The pointer information that will be described below step S48 among Fig. 5 appends processing.
When the document of recording storage on one page paper, based on the view data of pointer information by addition record.When the document of printing is duplicated once more, can obtain original file data easily, and can obtain the high-quality printing.
Figure 22 is that illustrating by two-dimensional bar is data character string encoding (imaging) that (the QR code sign: JIS X0510), and a bar code of conversion is appended to the process flow diagram of the sequence on the image to pointer information.
Represent the address information of corresponding document by the two-dimensional bar coded data, its routing information and filename by include file server (document management server) name constitutes.Or by the URL of corresponding document, file ID of managing in the memory device of the database 105 of storage corresponding document or MFP 100 self or the like constitutes address information.
In order to discern the dissimilar of the character that to be encoded, analyze input data sequence.And, select error detection and error-correction level, and select to store the minimum model (step S900) of input data.
Input data sequence is converted into predetermined bit sequence, and append pointing-type as required (numeral, alphanumeric, kanji, or the like) designator and end mode.In addition, bit sequence is converted into pre-determined bit code word (step S901).
At this moment, for error correction, codeword sequence is divided into the piece of predetermined quantity according to model and error-correction level, and produces the error correction code word of each piece, and the error correction code word is added after the code word data sequence (step S902).
Be connected the code word data of each piece that step S902 obtains, and required error correction code word and remainder codewords are connected (step S903) after the code word data sequence.
Then, code word module and position detection mode are set in matrix, isolation mode, timing mode, alignment pattern (step S904).
In addition, select, and change mask pattern (step S905) by calculating with the XOR of the piece that obtains at step S904 for the optimum mask pattern in symbolic coding zone.
At last, be created in the type information and the type information of the piece of step S905 acquisition, thereby finish two-dimensional code symbol (step S906).
When for example converting the digital document that is printed to raster data, the above-mentioned two-dimensional bar of call address information is appended to the precalculated position on the raster data, and then carries out image and form.The user who has received the paper that forms through such image is scanned these paper by picture sweep unit 110, thereby detects the memory location of original digital file from the pointer information of step S43.
As the means that additional information is provided for identical purpose, can use directly pointer information is appended to method in the document as character string, and so-called water mark method, the method that comprises the information that embeds by the interval (the especially interval between the adjacent character) of character string in the modulation document, in document, information is embedded method in the half tone image, or the like.
[with another relevant embodiment of file access power]
Usually, when constituting network and file server by the document management server representative is set, third-party reuse normally restricted.
All Files in being stored in document management server can be by free access, and under the condition that can be reused of some objects of whole file or file, above embodiment has been described.Therefore, will use Figure 12 to describe another embodiment relevant with a kind of situation below, in this case, when the pointer information search file in use the foregoing description, the access rights that can be designated as the file of Search Results be restricted.Figure 12 replaces Figure 11.
Because those steps among the embodiment of step S400 in S403 and Figure 11 are identical, the descriptions thereof are omitted for the general.
When file was designated, document management server was checked the access right information of this document.If the access right of this document is limited (step S404), then document management server request management PC101 sends password (step S405).Administration PC 101 alert are entered password, and the password of input is sent to file server (step S406).In this embodiment,, on display device 35, show dummy keyboard, and enter password by touching virtual key because touch panel is used to input.
Document management server is checked the password of reception to authenticate (step S407).If authentication success, server notification file address then, and be under the situation about obtaining of image file data in user's treatment desired, transmit files (step S408) to administration PC 101.Notice that the authentication method that is used for access right control is not limited to the method for step S405 and S406 use password.For example, can adopt for example popular biological identification (for example, finger print identifying), the every other authentication means of authentication of use card or the like.
In the above-described embodiments, by the additional pointer information specified file that the paper feeding document is provided.Identical control can be applied to such situation, wherein handles specified file by the step S51 of Fig. 5 and the search among the S52.
And, can limit the vectorization of explaining among the step S54 to S56 among Fig. 5 and handle.That is, when based on detecting regularly to the access rights of specifying the paper document by scanning watermark in the image information that this paper document obtains or the like, only execute vector processing when authentication success, thereby limited the use of high confidential documents.
[specifying another relevant embodiment] with file
In the above-described embodiments, describe as reference Fig. 5, the means that are used to specify original file data in the image information that obtains by scanned document are come specified documents based on the pointer information that is appended to document, or search for the respective digital file according to each object information of describing in the document.In order to specify source document more exactly, can use this two kinds of means together.Promptly, even in the time can detecting existing of source document based on the pointer information that from document, obtains, still use the object information in the document to carry out based on the layout search of layout information or based on the full-text search of the key word that obtains by character recognition, and the file that can produce high matching rate formally is appointed as source document at the file that detected.For example, even be doubt and can not come timing, still can come specified file by dwindling the hunting zone by error correction at the low portion of pointer information.Therefore, can be fast and specified file exactly.
[another embodiment of text filed vectorization]
Another embodiment with the zone vectorization of explanatory text below.
At the place that begins of the processing of a document, prepare not store the dictionary of vector data.When vectorization specify text piece, extract a character from text block, and extract feature from character picture.Feature of extracting and the coupling between the dictionary, and under their unmatched situations, the feature extraction result of character is registered in the dictionary, and produce the intermediate data of the word symbol of the character of quoting registration.Under the situation of the extraction feature of character and dictionary coupling, produce the intermediate data of the word symbol of quoting character in the dictionary.Intermediate data by in the image about the position and the size of character, and the word symbol of this character of coupling constitutes in dictionary, and has structure for example shown in Figure 31." x " represents the upper left x coordinate figure of character rectangle, " y " represents the upper left y coordinate figure of character rectangle, " w " represents the pixel value of the width of character rectangle, and " h " represents the pixel value of the height of character rectangle, and " n " indicates n character match in relevant character and the dictionary.When the finishing dealing with an of text block, produce vector data according to the image of the character of in dictionary, registering, and the character vector data of intermediate data and generation by compound to obtain the vectorization result of text block.
Figure 30 is the process flow diagram that the processing sequence of a text block is shown.
Make that Nv is the quantity of the character registered in the dictionary when handling beginning, and make that N is the quantity of the character registered in the dictionary during handling.So, Nv=N when handling beginning.
In the extraction of the character of step S1201 is handled, from character zone of image data extraction of input text piece.In the result of step S1202 inspection, whether can extract character as step S1201.If at step S1202 is "Yes", then flow process advances to step S1203; Otherwise flow process advances to step S1208.
At step S1203, extract the feature of character.The feature of character comprises aspect ratio, centre of gravity place, histogram vector of outline data or the like (can use other feature).
At step S1204, carry out the coupling between the feature of the character that step S1203 extracts and the character in dictionary, registered.At first, aspect ratio and centre of gravity place are compared with the aspect ratio and the centre of gravity place of first character in the dictionary.If they are different widely, then because this character obviously is different character, thus the character late that will compare in the selection dictionary, and do not use the information of other type to compare.If two aspect ratios almost are equal to each other, the histogram vector of comparative silhouette data then.In this case, the common distance between the compute vector, and along with the reduction of distance, the match is successful.Extract distance and be equal to or less than the character of predetermined value as candidate characters.In this way, carry out with dictionary in the coupling of all characters.
Whether check as the result of step S1204 at step S1205 and to find characters matched.If at step S1205 is "Yes", then flow process advances to step S1207; Otherwise flow process advances to step S1206.
At step S1206, the characteristic of the character of its processing well afoot of registration in dictionary.Dictionary has the form shown in Figure 32, and characteristic is added to the afterbody of dictionary.In this case, the character quantity in the dictionary increases by 1 (N=N+1).Word symbol among intermediate data ID storage Figure 31.
At step S1207, produce the intermediate data that has the form shown in Figure 31 and be endowed the word symbol in the dictionary.
When the matching treatment between all characters and the dictionary is finished in the text block, the processing of execution in step S1208.Owing to when handling beginning, in dictionary, register Nv character, and when processing finishes, in dictionary, register N character, carry out the vectorization of (N-Nv) individual character and handle.In this case, use at least one character picture of the character that is stored in the dictionary to produce vector data.
At step S1209, be compounded in the intermediate data and the vector data of each character of registering in the dictionary, and the output combined result is with the vectorization result as text block.
Handle by this,,, thereby improve outward appearance so each character in the file and picture can be coordinated because identical vector data can be used to the identical characters of each text block.Owing to use the character picture in the document to produce vector data, so also can produce the vector data of faithful to scanned document.
[another embodiment of vectorization]
In the above-described embodiments, when searcher can not be specified source document, the entire document image is carried out vectorization handle.For example, under the situation of general document, be not all objects in the document all be newly-established object, and some objects can change from other file.For example, the document creation application program is prepared some patterns (wallpaper) of background object, and the user selects and use one of these patterns usually.Therefore, this object more may appear in other document files of document files database as reusable vector data.
Therefore, handle another embodiment of (step S54) as the vectorization of Fig. 5, search comprises basically and the file of selecting to handle each matched object in each object of cutting apart by piece in database, and obtains the vector data of the object that mates separately from this document.As a result, owing to entire document does not need by vectorization, thus can guarantee vectorization faster, and can prevent because the degeneration of the picture quality that vectorization causes.
On the other hand, in Fig. 5, when source document can be designated as PDF (S51 to S53) in search is handled, have usually by character recognition being applied to the character code that the text object in the document obtains as this PDF of appended document.When this pdf document of vectorization, if use the character code file, character recognition processing and subsequent step during vectorization that then can skips steps S54 is handled.That is, can carry out vectorization more apace handles.
[another embodiment of vectorization]
When with the execute vector processing, can on display device 35, show the interface that is used to confirm the Set For Current content as shown in figure 28.This is provided with content to use the interface setting shown in Figure 23 to 27, and this is provided with in the memory device that content is stored in hard disk for example.When the user touches " conversion " button among Figure 28, the start vector processing.When the user touches " change " button, show the window that is provided with shown in Figure 23 to 27, and can change Set For Current.Yet this set changes only effective in current vectorization is handled, and configuration information is not written in the hard disk.In this way, can change the vectorization contents processing according to user's purpose or with processed document.
And, as shown in figure 29, by preparing the default " button of ", being provided with in the memory device that content can be written into hard disk or the like for example of change, thereby changed the setting that in next vectorization is handled, will be shown as default setting.In this way, can carry out handling based on the vectorization of customer objective.
As mentioned above, based on this embodiment, when the paper document is copied machine and duplicates, can be identified corresponding to the position of the original digital file of the document, and print processing, thereby obtain copy without any deteriroation of image quality based on original digital file.
Even when the time with the disabled paper document of its original digital file of scanning, still in early days stage execute vector processing registering its digital document, thereby suppress further deteriroation of image quality.
In addition, in the location registration process of when the disabled paper document of its original digital file of scanning, carrying out, update form registration this document of this document to allow application program.Therefore, can be beneficial to document treatment for correcting or the like.
Because the paper document is converted into vector data, so compare with the situation that file and picture is stored as image file, can reduce required memory capacity.
Because the paper document is converted into vector data, so even during as the Flame Image Process of amplifying or the like, still can suppress the degeneration of picture quality in application examples.
Owing to file and picture can be divided into the zone of each characteristic, and can the manner of execution that vectorization is handled be set, can produce the data of mating with user's purpose at each characteristic.
Description by embodiment can easily be understood, and handles as the great majority of the characteristic feature of this embodiment and is realized by the administration PC 101 performed computer programs as the general information treatment facility.Therefore, the present invention comprises this computer program in its scope.Usually, because the program of operation is set up and this program can be performed after being replicated or being installed in the system at the computer-readable recording medium of for example CD-ROM or the like on computers, so the present invention also comprises this computer-readable recording medium in its scope.
As mentioned above, the present invention can provide following environment.That is, specify original document file, and print processing to prevent the degeneration of picture quality based on the file of appointment corresponding to the document that will be replicated by the view data of the document that will be replicated.And, when the document that will be replicated is not registered, carry out location registration process to suppress the degeneration of picture quality in early days.
Because many obviously extensive different embodiment of the present invention can obtain under the situation that does not depart from its aim and scope, therefore are appreciated that to the invention is not restricted to its specific embodiment, but limit in the claims.

Claims (27)

1. image processing method is characterized in that comprising:
Use is presented at the user interface on the display device, specifies the be provided with step of setting at the vectorization disposal route of each characteristic according to the user;
Search is stored in the search step of the initial numberical data in the memory storage according to the input file and picture;
Corresponding to the initial numberical data of input file and picture in described search step when not designated, according to the Properties of Objects that is included in the file and picture file and picture is divided into a plurality of zones, and for each cut zone, according to the vectorization step of the vectorization disposal route execute vector processing that in step is set, is provided with for each characteristic; And
The vector data of the input file and picture of changing in described vectorization step is stored into storing step in the memory storage of described search step search.
2. the method for claim 1, described method also comprises notifying process: when specifying initial numberical data in described search step, notice is corresponding to the memory address of the initial numberical data of input file and picture, when the initial numberical data of in described search step, not specifying corresponding to the input file and picture, the memory address of the vector data of the input file and picture in described vectorization step, changed of notice.
3. the method for claim 1 is characterized in that described search step comprises step: the sign of discerning the memory address that is appended to input file and picture and indication initial numberical data; And according to the recognition result search initial numberical data that identifies.
4. method as claimed in claim 3 is characterized in that in two-dimensional bar, character string and the watermark at least one is used as the sign of the memory address of indicating initial numberical data.
5. the method for claim 1 is characterized in that described vectorization step comprises step: from the profile of the meaningful pixel of extracted region with text characteristics; And according to the profile generation profile vector that extracts.
6. the method for claim 1 is characterized in that described vectorization step comprises according to the character identification result in the zone with text characteristics and corresponding to the predetermined vector data of each character, produces the step of this regional vector data.
7. the method for claim 1, it is characterized in that described vectorization step comprises: when the character recognition precision is high, produce this regional vector data according to the character identification result in zone and the vector data of preparing for each character types with text characteristics, and the step that when the character recognition precision is low, produces the profile vector according to the profile of the pixel that forms character.
8. the method for claim 1 is characterized in that described vectorization step comprises step: detect the identical characters image from the zone with text characteristics; And by the identical characters image that is applied to each detection by the vector data of at least one generation in the identical characters image is produced this regional vector data.
9. the method for claim 1 is characterized in that described vectorization step comprises step: from the profile of the meaningful pixel of extracted region with line or line image characteristic; And according to the profile generation profile vector that extracts.
10. the method for claim 1 is characterized in that described vectorization step comprises step: from having the profile of the meaningful pixel of extracted region of showing characteristic; And according to the profile generation profile vector that extracts.
11. the method for claim 1 is characterized in that described vectorization step comprises step: the profile according to the ruling partly of the ruling with zone of showing characteristic produces the profile vector; And the textual portions in zone with table characteristic carried out be similar to the vectorization of handling at the vectorization in zone and handle with text characteristics.
12. the method for claim 1 is characterized in that described vectorization step comprises the step that the regional carries out image compression with photo characteristic is handled.
13. the method for claim 1 is characterized in that the described step that is provided with comprises that to allow the user to be provided be that domain of dependence is converted to vector data or the step of handling as view data.
14. method as claimed in claim 13 is characterized in that the described step that is provided with comprises and allows the user that the step of whether appending character identification result to vector data is set.
15. the method for claim 1 is characterized in that the described step that is provided with comprises and allows the user to be provided with domain of dependence will to be converted to vector data, still handles, also is included in the step in the background object as view data.
16. the method for claim 1, it is characterized in that also comprising that the numerical data in being stored in memory storage appends the information of the memory address of designation number data, and print the printing controlled step of numerical data of the information of the memory address of appending the designation number data.
17. the method for claim 1 is characterized in that also comprising:
The vector data of the input file and picture of having changed in described vectorization step is converted to the format conversion step that can pass through the prescribed form of document processing application routine access, and
Its feature is that also described storing step comprises the step of the vector data that is stored in the input file and picture that converts prescribed form in the described format conversion step to.
18. the method for claim 1 is characterized in that described search step comprises the step according to the character identification result search initial numberical data that is included in the character picture in the input file and picture.
19. the method for claim 1 is characterized in that described search step comprises the step according to the layout search initial numberical data of each object in the input file and picture.
20. the method for claim 1 is characterized in that described search step comprises step: when a plurality of candidate of finding corresponding to the original document data of input file and picture, show candidate; And the prompting user selects a candidate so that specify corresponding to the initial numberical data of importing file and picture.
21. the method for claim 1, it is characterized in that described vectorization step also comprises when specified initial numberical data in search step, and the initial numberical data of appointment is a view data, and when receiving the user instruction of execute vector processing, to the step of image execute vector processing.
22. the method for claim 1, wherein handle by the vectorization of in described vectorization step, carrying out, in the cut zone each is converted to reusable vector data, and wherein in described storing step, reusable vector data is stored in the memory storage.
23. method as claimed in claim 22, wherein said vectorization step also comprises step: in the reusable vector data from be stored in memory storage search basically with the vector data of the object of one of cut zone coupling; And obtain the reusable vector data searched for.
24. the method for claim 1, wherein said search step comprises a plurality of search to be handled, these a plurality of search are handled at least two that comprise in following: use the search of the sign that is appended to the input file and picture to handle, use is included in the search of the character identification result of the character picture in the input file and picture and handles, and uses the search of the layout of each object in the input file and picture to handle.
25. the method for claim 1, wherein said search step comprises step: be appended to the sign of importing file and picture by use, first search is corresponding to the initial numberical data of input file and picture; And when in described first search step, not finding initial numberical data, layout by using each object and be included in the character identification result of the character picture in the input file and picture at least one, second search is corresponding to the initial numberical data of input file and picture.
26. an image processing system is characterized in that comprising:
Setting device uses the user interface that is presented on the display device, specifies the vectorization disposal route of setting at each characteristic according to the user;
Searcher is used for searching for the initial numberical data that is stored in memory storage according to the input file and picture;
The vectorization device, be used for when not specified by described searcher corresponding to the initial numberical data of input file and picture, according to the Properties of Objects that is included in the file and picture file and picture is divided into a plurality of zones, and for each cut zone, according to the vectorization disposal route execute vector processing that is provided with for each characteristic by setting device; And
Memory control device is used for the vector data of the input file and picture of being changed by described vectorization device is stored into the memory storage of being searched for by described searcher.
27. the messaging device of the operation of the equipment that can control the outfit scanner that has scanner at least is characterized in that comprising:
Setting device uses the user interface that is presented on the display device, specifies the vectorization disposal route of setting at each characteristic according to the user;
Searcher is used for basis is stored in memory storage by the input file and picture search of the device scan of outfit scanner initial numberical data;
The vectorization device, be used for when not specified by described searcher corresponding to the initial numberical data of input file and picture, based on the Properties of Objects that comprises in the file and picture file and picture is divided into a plurality of zones, and for each cut zone, according to the vectorization disposal route execute vector processing that is provided with for each characteristic by setting device; And
Memory control device is used for the vector data of the input file and picture of being changed by described vectorization device is stored into the memory storage of being searched for by described searcher.
CNB2004800031470A 2003-01-31 2004-01-27 Image processing method, system, program, program storage medium and information processing apparatus Expired - Fee Related CN100501728C (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP024965/2003 2003-01-31
JP2003024965 2003-01-31
JP035112/2003 2003-02-13
JP415485/2003 2003-12-12

Publications (2)

Publication Number Publication Date
CN1745381A CN1745381A (en) 2006-03-08
CN100501728C true CN100501728C (en) 2009-06-17

Family

ID=36140015

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004800031470A Expired - Fee Related CN100501728C (en) 2003-01-31 2004-01-27 Image processing method, system, program, program storage medium and information processing apparatus

Country Status (1)

Country Link
CN (1) CN100501728C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424349A (en) * 2013-08-22 2015-03-18 富士施乐株式会社 IMAGE RETRIEVAL SYSTEM, INFORMATION PROCESSING APPARATUS, and IMAGE RETRIEVAL METHOD

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008124996A (en) * 2006-11-15 2008-05-29 Canon Inc Image processing apparatus, and image processing method
US8599437B2 (en) * 2009-08-21 2013-12-03 Seiko Epson Corporation Printing control device, printer driver, conversion device, printer, printing system, control method for a printing control device, and control method for a printer
CN106161831A (en) * 2015-04-15 2016-11-23 佳能(苏州)系统软件有限公司 DPS and document processing method
CN108898205A (en) * 2017-05-09 2018-11-27 罗伯特·博世有限公司 The creation of binary graphics coding, authentication method and system
CN109360210B (en) * 2018-10-16 2019-10-25 腾讯科技(深圳)有限公司 Image partition method, device, computer equipment and storage medium
CN115048915B (en) * 2022-08-17 2022-11-01 国网浙江省电力有限公司 Data processing method and system of electric power file based on operation platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1188946A (en) * 1996-12-27 1998-07-29 富士通株式会社 Apparatus and method for extracting management information from image
US5815704A (en) * 1995-02-22 1998-09-29 Kabushiki Kaisha Toshiba Document filing apparatus and method having document registration, comparison and retrieval capabilities
US6466329B1 (en) * 1997-03-28 2002-10-15 International Business Machines Corporation Method and apparatus for managing copy quality in the hardcopy or softcopy reproduction of either original pages or extrinsically received electronic page images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5815704A (en) * 1995-02-22 1998-09-29 Kabushiki Kaisha Toshiba Document filing apparatus and method having document registration, comparison and retrieval capabilities
CN1188946A (en) * 1996-12-27 1998-07-29 富士通株式会社 Apparatus and method for extracting management information from image
US6466329B1 (en) * 1997-03-28 2002-10-15 International Business Machines Corporation Method and apparatus for managing copy quality in the hardcopy or softcopy reproduction of either original pages or extrinsically received electronic page images

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424349A (en) * 2013-08-22 2015-03-18 富士施乐株式会社 IMAGE RETRIEVAL SYSTEM, INFORMATION PROCESSING APPARATUS, and IMAGE RETRIEVAL METHOD

Also Published As

Publication number Publication date
CN1745381A (en) 2006-03-08

Similar Documents

Publication Publication Date Title
US7391917B2 (en) Image processing method
US7542605B2 (en) Image processing apparatus, control method therefor, and program
JP4251629B2 (en) Image processing system, information processing apparatus, control method, computer program, and computer-readable storage medium
JP4266784B2 (en) Image processing system and image processing method
US7640269B2 (en) Image processing system and image processing method
CN100448257C (en) Image processing apparatus and method therefor
US7681121B2 (en) Image processing apparatus, control method therefor, and program
CN100440108C (en) Image processing apparatus, control method therefor, and program
US8520006B2 (en) Image processing apparatus and method, and program
EP1455284A2 (en) Image processing method and image processing system
JP4393161B2 (en) Image processing apparatus and image processing method
JP3862694B2 (en) Image processing apparatus, control method therefor, and program
JP4338189B2 (en) Image processing system and image processing method
CN100501728C (en) Image processing method, system, program, program storage medium and information processing apparatus
JP4185858B2 (en) Image processing apparatus, control method therefor, and program
JP2005149097A (en) Image processing system and image processing method
JP2006134042A (en) Image processing system
JP2005157447A (en) Image processing system and method
JP2005149098A (en) Image processing system, image processor and image processing method
JP2006166207A (en) Information processor, information processing method, storage medium, and program
JP2008084127A (en) Image forming device
JP2006148663A (en) Image processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090617

Termination date: 20160127

EXPY Termination of patent right or utility model