CN102710887B - The image processing apparatus and an image processing method - Google Patents

The image processing apparatus and an image processing method Download PDF

Info

Publication number
CN102710887B
CN102710887B CN 201110409732 CN201110409732A CN102710887B CN 102710887 B CN102710887 B CN 102710887B CN 201110409732 CN201110409732 CN 201110409732 CN 201110409732 A CN201110409732 A CN 201110409732A CN 102710887 B CN102710887 B CN 102710887B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
information
document
color
compression format
format
Prior art date
Application number
CN 201110409732
Other languages
Chinese (zh)
Other versions
CN102710887A (en )
Inventor
冈田茂
大谷和宏
小柳胜也
袖浦稔
安达真太郎
张臻瑞
上条裕义
Original Assignee
富士施乐株式会社
Filing date
Publication date
Grant date

Links

Abstract

本发明公开了一种图像处理装置及图像处理方法。 The present invention discloses an image processing apparatus and an image processing method. 该图像处理装置包括如下部件。 The image processing apparatus includes the following components. 文档类型判断单元基于利用文档读取器获得的作为文档读取结果的读取信息来判断文档属于哪种文档类型。 A document type determining unit based on the read information using a result of reading as document reader document obtained by the document type determining which documents belong. 压缩格式设定单元基于由文档类型判断单元判断出的文档类型来设定用于从读取信息生成图像数据的压缩格式。 Compression format setting section based on the determination by the document type determining unit sets the document type for generating image data compression format from the read information. 生成器利用由压缩格式设定单元设定的压缩格式压缩读取信息,以便生成与文档对应的图像数据。 Generator using the compression format by the compression format setting unit reads the compressed information corresponding to the document to generate image data.

Description

图像处理装置及图像处理方法 The image processing apparatus and an image processing method

技术领域 FIELD

[0001]本发明涉及图像处理装置和图像处理方法。 [0001] The present invention relates to an image processing apparatus and an image processing method.

背景技术 Background technique

[0002]未经审查的日本专利申请公开N0.2003-317034披露了一种文档分拣系统,该文档分拣系统利用字符识别装置确定存储在基本词典中的词语的出现频率并且判断文档类型。 [0002] Japanese Unexamined Patent Application Publication N0.2003-317034 document discloses a sorting system, the document sorting system using the frequency of occurrence of the character recognition apparatus determines the words stored in the basic dictionary and determines the document type.

[0003]未经审查的日本专利申请公开N0.9-65143披露了一种具有地图文档识别功能的图像处理装置,该图像处理装置判断文档是否为地图并且基于判断结果切换处理。 [0003] Japanese Unexamined Patent Application Publication N0.9-65143 discloses an image processing apparatus having a map document identification function, the image processing apparatus determines whether the document processing based on the map and the switching determination result.

[0004] 未经审查的日本专利申请公开N0.2004-297212披露了一种图像处理装置,该图像处理装置通过查找表示输入文档的全部内容的数据的特征来判断文档类型。 [0004] Japanese Unexamined Patent Application Publication N0.2004-297212 discloses an image processing apparatus, an image processing apparatus which represents the entire contents of feature data of the input document is determined by searching the document type.

发明内容 SUMMARY

[0005]本发明的目的在于提供一种图像处理装置和图像处理方法,该图像处理装置和该图像处理方法能够设定适合文档类型的压缩格式而无需用户设定压缩格式。 [0005] The object of the present invention is to provide an image processing apparatus and an image processing method, the image processing device and image processing method capable of setting the appropriate document type compression formats without requiring the user to set the compression format.

[0006]根据本发明的第一方面,提供一种图像处理装置,所述图像处理装置包括如下部件。 [0006] According to a first aspect of the present invention, there is provided an image processing apparatus, the image processing apparatus includes the following components. 文档类型判断单元基于利用文档读取器获得的作为文档读取结果的读取信息来判断文档属于哪种文档类型。 A document type determining unit based on the read information using a result of reading as document reader document obtained by the document type determining which documents belong. 压缩格式设定单元基于由所述文档类型判断单元判断出的文档类型来设定用于从所述读取信息生成图像数据的压缩格式。 Compression format setting section based on the determination by the document of the document type determining unit sets the type of compression format for generating image data from the read information. 生成器利用由所述压缩格式设定单元设定的压缩格式压缩所述读取信息,以便生成与所述文档对应的图像数据。 Generator using the compression format by the compression of the compression format setting unit sets the read information to generate image data corresponding to the document.

[0007]根据本发明的第二方面,根据本发明的第一方面所述的图像处理装置还可以包括:颜色信息提取单元,其从所述读取信息中提取颜色信息。 [0007] According to a second aspect of the invention, according to the first aspect of the present invention, an image processing apparatus may further include: a color information extraction unit that extracts color information from the read information. 所述压缩格式设定单元可以基于由所述颜色信息提取单元提取出的颜色信息来设定用于从所述读取信息生成图像数据的压缩格式。 The compression format setting unit may set the format for the information read from the compressed image data is generated based on the color information extracted by the color information extraction unit.

[0008]根据本发明的第三方面,在根据本发明的第二方面所述的图像处理装置中,当所述文档类型判断单元没有判断出所述文档类型时,所述压缩格式设定单元可以基于由所述颜色信息提取单元提取出的所述颜色信息来设定压缩格式。 [0008] According to a third aspect of the present invention, in the image processing apparatus according to the second aspect of the present invention, when the document type determining means determines that the document type is not, the compression format setting section may be based on the extracted color information by the color information extraction unit sets the compression format.

[0009]根据本发明的第四方面,在根据本发明的第二方面所述的图像处理装置中,当所述读取信息中最常出现的色值的比例与在所述读取信息中最常出现的色值的预定范围内的色值的比例之和超过阈值时,所述压缩格式设定单元可以基于所述颜色信息设定第一压缩格式,在所述第一压缩格式中,包含在所述读取信息中的颜色的数目减少为预定的颜色数目。 [0009] According to a fourth aspect of the present invention, in the image processing apparatus according to the second aspect of the present invention, when the ratio of the color value most frequently occurring in the read information with the read information in the ratio of color values ​​within a predetermined range of most frequently occurring color values, and exceeds a threshold value, the compression format setting unit may first compression format based on the color information setting, in said first compression format, the number of colors included in the read information is reduced to a predetermined number of colors.

[0010]根据本发明的第五方面,在根据本发明的第四方面所述的图像处理装置中,当所述读取信息中最常出现的色值的比例与在所述读取信息中最常出现的色值的预定范围内的色值的比例之和小于或等于所述阈值时,所述压缩格式设定单元可以基于所述颜色信息设定第二压缩格式,在所述第二压缩格式中,包含在所述读取信息中的颜色的数目大于在所述第一压缩格式中使用的颜色数目。 [0010] According to a fifth aspect of the present invention, in the image processing apparatus according to a fourth aspect of the present invention, when the ratio of the color value most frequently occurring in the read information with the read information in when the ratio of color values ​​within a predetermined range of most frequently occurring color value is less than or equal to the threshold, the compression format setting unit may set the second compression format based on the color information, the second compression format, reading the information contained in the number of colors is greater than the number of colors used in the first compression format.

[0011]根据本发明的第六方面,根据本发明的第一至第五方面的任一方面所述的图像处理装置还可以包括:多页面设置单元,当所述文档包括多个页面时,所述多页面设置单元设置是否要为所述多个页面中的每一页设定压缩格式。 [0011] According to a sixth aspect of the present invention, according to any of the first to fifth aspects of the present invention according to an aspect of the image processing apparatus may further comprise: a multi-page unit setting, when the document includes a plurality of pages, the multi-page setting unit sets whether or not to compress the plurality of pages in the format of each page is set. 当所述多页面设置单元设置不用为所述多个页面中的每一页设定压缩格式时,所述生成器可以利用为第一页设定的压缩格式来生成与全部所述多个页面对应的图像数据。 When the multi-page format setting unit is provided without compression for said plurality of pages each page is set, the generator may use a first set of page compression format to generate all of the plurality of pages corresponding to the image data.

[0012]根据本发明的第七方面,在根据本发明的第六方面所述的图像处理装置中,当所述多页面设置单元设置要为所述多个页面中的每一页设定压缩格式时,所述文档类型判断单元可以为所述多个页面中的每一页判断文档类型,所述压缩格式设定单元可以为所述多个页面中的每一页设定压缩格式,并且所述生成器可以利用为所述多个页面中的每一页设定的压缩格式生成图像数据。 [0012] According to a seventh aspect of the present invention, in the image processing apparatus according to a sixth aspect of the present invention, when the multi-page setting unit to be compressed to a plurality of pages each page setting when the format of the document type determining unit may be a plurality of pages each page of the document type determination, formatting the compression unit may compress the plurality of pages in the format of each page is set, and the generator may generate image data using the compression format set for each page of said plurality of pages.

[0013]根据本发明的第八方面,提供一种图像处理方法,所述图像处理方法包括:基于作为文档读取结果获得的读取信息判断文档属于哪种文档类型;基于所判断出的文档类型设定用于从所述读取信息生成图像数据的压缩格式;以及利用所设定的压缩格式压缩所述读取信息,以便生成与所述文档对应的图像数据。 [0013] According to an eighth aspect of the present invention, there is provided an image processing method, the image processing method comprising: a document type is what information is determined based on the read document as the document reading results obtained; document based on the determined setting a type of compression format information of the read image data generated from; and using the set compression formats the read information to generate image data corresponding to the document.

[0014]根据本发明的第一方面,可以提供一种图像处理装置,该图像处理装置能够设定适合文档类型的压缩格式而无需用户设定压缩格式。 [0014] According to a first aspect of the present invention can provide an image processing apparatus, an image processing apparatus capable of setting the appropriate document type compression formats without requiring the user to set the compression format.

[0015]根据本发明的第二方面,除通过本发明的第一方面获得的优点之外,即使无法判断出文档类型,也能够设定压缩格式而无需用户设定压缩格式。 [0015] According to a second aspect of the present invention, in addition to the advantage obtained by the first aspect of the present invention, even if the document type can not be determined, it is possible to set the compression format without requiring the user to set the compression format.

[0016]根据本发明的第三方面,与不具有第三方面的图像处理装置的构造的图像处理装置相比,可以根据是否能够判断文档类型更有效地设定压缩格式。 [0016] According to a third aspect of the present invention, as compared with the configuration of the image processing apparatus having no image processing apparatus of the third aspect, it can be determined depending on whether the document type is more effective to set the compression format.

[0017]根据本发明的第四方面,除通过本发明的第二方面获得的优点之外,与不具有第四方面的图像处理装置的构造的图像处理装置相比,可以更有效地设定压缩格式。 [0017] According to a fourth aspect of the present invention, in addition to the advantage obtained by the second aspect of the present invention, as compared with the configuration of the image processing apparatus having no image processing apparatus of the fourth aspect, can be more effectively set Compression format.

[0018]根据本发明的第五方面,除通过本发明的第四方面获得的优点之外,与不具有第五方面的图像处理装置的构造的图像处理装置相比,可以更有效地设定压缩格式。 [0018] According to a fifth aspect of the present invention, in addition to the advantage obtained by the fourth aspect of the invention, as compared with the configuration of the image processing apparatus having no image processing apparatus of the fifth aspect, can be more effectively set Compression format.

[0019]根据本发明的第六方面,除通过本发明的第一至第五方面的任一方面获得的优点之外,与不具有第六方面的图像处理装置的构造的图像处理装置相比,当文档具有多个页面时,可以更有效地为所有页面设定压缩格式。 [0019] According to a sixth aspect of the present invention, in addition to the advantages of any of the first to fifth aspects of the present aspect is obtained, as compared with the configuration of the image processing apparatus having no image processing apparatus of the sixth aspect when a document with multiple pages, you can more effectively set the compression format for all pages.

[0020]根据本发明的第七方面,除通过本发明的第六方面获得的优点之外,可以为所有页面设定压缩格式而无需用户为所有页面设定压缩格式。 [0020] According to a seventh aspect of the present invention, in addition to the advantage obtained by the sixth aspect of the present invention, pages may be set for all compression formats without requiring the user to set the compression format for all pages.

[0021]根据本发明的第八方面,可以提供一种图像处理方法,该图像处理方法能够设定适合文档类型的压缩格式而无需用户设定压缩格式。 [0021] According to an eighth aspect of the present invention can provide an image processing method, the image processing method can be set for the document type of compression format without requiring the user to set the compression format.

附图说明 BRIEF DESCRIPTION

[0022]基于下列附图,详细地描述本发明的示例性实施例,其中: [0022] Based on the following drawings, exemplary embodiments of the present invention is described in detail, in which:

[0023]图1示出根据本发明的示例性实施例的图像处理装置的硬件构造的实例; [0023] FIG 1 illustrates an example of a hardware image processing apparatus according to an exemplary embodiment of the present invention is constructed;

[0024]图2示出在图1所示图像处理装置中运行的处理程序的构造; [0024] FIG. 2 shows a configuration of the processing program running in the image processing apparatus shown in Figure 1;

[0025]图3示出图2所示文档类型判断单元的构造; [0025] FIG. 3 shows a configuration of a document type determining unit shown in FIG 2;

[0026]图4示出图2所示压缩格式设定单元的构造; [0026] FIG. 4 shows a configuration shown in FIG. 2 compression format setting means;

[0027]图5A和5B分别示出文档判断信息的实例和文档类型压缩格式关联信息的实例; [0027] Figures 5A and 5B illustrate an example of document type and document instance determined compression format information of related information;

[0028]图6是示出由图4所示最常出现色判断部分执行的处理的频率分布图; [0028] FIG. 6 is a frequency distribution diagram illustrating processing executed by the color determining portion occurs most frequently shown in FIG 4;

[0029]图7A至7C是示出通过处理程序执行的处理的流程图; [0029] FIGS. 7A to 7C are flowcharts illustrating processing performed by a processing program;

[0030]图8示出待通过根据本示例性实施例的处理程序处理的第一文档; [0030] FIG. 8 illustrates a document to be processed by the first processing program according to the present exemplary embodiment;

[0031]图9示出待通过根据本示例性实施例的处理程序处理的第二文档; [0031] FIG 9 illustrates a document processing program to be the second example of embodiment of the process according to the present exemplary;

[0032]图10示出待通过根据本示例性实施例的处理程序处理的第三文档; [0032] FIG. 10 shows a document processing program to be through a third embodiment of the process according to the present embodiment example;

[0033]图11示出待通过根据本示例性实施例的处理程序处理的第四文档;以及 [0033] FIG. 11 illustrates a fourth document to be processed by a processing program according to the present exemplary embodiment; and

[0034]图12示出待通过根据本示例性实施例的处理程序处理的第五文档。 [0034] FIG. 12 illustrates a document processing program to be through a fifth embodiment of the process according to the present exemplary embodiment exemplary.

具体实施方式 Detailed ways

[0035]图1示出了根据本发明的示例性实施例的图像处理装置2的硬件构造的实例。 [0035] FIG. 1 shows an example of an image processing apparatus according to an exemplary embodiment of the present invention according to the configuration of the hardware 2.

[0036] 如图1所示,图像处理装置2包括控制器21、通信装置22、记录装置24、用户界面(UI)装置25和图像读取器27。 [0036] 1, the image processing apparatus 2 includes a controller 21, a communication device 22, the recording apparatus 24, a user interface (UI) 25 and an image reading apparatus 27. 控制器21包括例如中央处理单元(CPU)等计算单元212和例如存储器等存储单元214。 The controller 21 includes a central processing unit (CPU) computing unit 212 and the like, for example, 214 memory storage unit.

[0037] UI装置25包括例如液晶显示器(IXD)或阴极射线管(CRT)等显示单元以及键盘或触控面板。 [0037] UI device 25 includes a liquid crystal display (IXD) or a cathode ray tube (CRT) display means such as a keyboard or a touch panel.

[0038]图像读取器27例如是扫描仪,并且图像读取器27从例如文档等记录介质中读取图像等并将读取出的图像转换成例如位图形式的读取信息。 [0038] The image reader 27 is, for example, a scanner, image reader and the recording medium such as a document or the like and the like reads an image read from the image converter 27 into, for example, read information in the form of a bitmap.

[0039]也就是说,图像处理装置2具有用作计算机的硬件部件,该硬件部件能够执行信息处理并且能够与其他图像处理装置或终端通信。 [0039] That is, as having two hardware components of the computer image processing apparatus, the hardware component capable of performing information processing and capable of communicating with a terminal or other image processing apparatus.

[0040]在附图中,相同的部件和步骤分别以相同的附图标记和步骤编号表示。 [0040] In the drawings, like parts and steps, respectively, the same reference numerals and step numbers in FIG.

[0041]在本示例性实施例中,图像处理装置2包括图像读取器27。 [0041] In the present exemplary embodiment, the image processing apparatus 2 includes an image reader 27. 然而,图像处理装置2也可以是不具有图像读取器的个人计算机(PC),在这种情况下,图像处理装置2可以经由局域网(LAN)与图像读取器连接。 However, the image processing apparatus 2 may be a personal computer (PC) without having an image reader, in this case, the image processing apparatus 2 may be connected via a local area network (LAN) and the image reader.

[0042]图2示出了在图1所示图像处理装置2中运行的处理程序3的构造。 [0042] FIG. 2 shows a configuration handler runs in the image processing apparatus 23 of FIG. 1.

[0043]如图2所示,处理程序3包括文档读取信息接收器302、文档读取信息存储单元304、压缩格式手动指定单元306、文档判断信息存储单元308、文档类型压缩格式存储单元310、颜色信息提取单元314、颜色分布计算器316、文档类型判断单元32、压缩格式设定单元34、压缩处理器372、图像数据生成器374、多页面设置单元376、图像数据输出单元378和异常通知单元380。 [0043] As shown in FIG 3 includes a document processing program reading information receiver 302, document reading the second information storage unit 304, the manual compression format specifying unit 306, the document information storage unit 308 is determined, the type of compression format of the document storage unit 310 , color information extraction unit 314, a color distribution calculator 316, a document type determining unit 32, the compression format setting section 34, a compression processor 372, the image data generator 374, multi-page setting unit 376, the image data output unit 378 and the abnormal a notification unit 380.

[0044] 处理程序3经由例如记录介质240(如图1所示)提供给图像处理装置2。 [0044] The processing program via a recording medium 240 (FIG. 1) to the image processing apparatus 2. 然后,处理程序3加载到存储单元214中,并且利用图像处理装置2的硬件资源在安装于图像处理装置2中的操作系统(OS)(未示出)下执行。 Then, the processing program loaded into the storage unit 214, and use of hardware resources of the image processing apparatus 2 operating system installed in the image processing apparatus 2 (the OS) (not shown) for execution.

[0045]在本示例性实施例中,处理程序3由软件实现。 [0045] In the present exemplary embodiment, the processing program implemented by software. 然而,处理程序3的全部功能或部分功能也可以由例如现场可编程门阵列(FPGA)等硬件实现。 However, all of the processing program or partial functionality of 3 may be realized by a field programmable gate array (FPGA) and other hardware.

[0046]图3示出了图2所示文档类型判断单元32的构造。 [0046] FIG. 3 shows a configuration of a document type determining unit 32 shown in FIG 2.

[0047]如图3所示,文档类型判断单元32包括布局分析部分322、字符识别部分324、特定字符串判断部分326、特定字符串位置判断部分330、特定字符串尺寸判断单元332和文档判断部分338。 [0047] 3, a document type determining unit 32 includes the layout analyzing section 322, the character recognition section 324, the specific character string determining portion 326, the specific character string position judging section 330, the specific character string and the document size determination unit 332 determines section 338.

[0048]图4示出了图2所示压缩格式设定单元34的构造。 [0048] FIG. 4 shows a configuration of compression format setting unit 34 shown in FIG. 2.

[0049]如图4所示,压缩格式设定单元34包括文档类型信息接收器342、文档类型压缩格式接收器344、文档类型压缩格式关联部分348、压缩格式确定部分350、黑白格式设定部分362、限制色格式设定部分364和高质量格式设定部分366。 [0049] As shown in FIG 4, the compression format setting section 34 includes a document type information receiver 342, the type of compression format of the received document 344, the document type associated with the compression format 348, the compression format determining section 350, black and white format setting section 362, limited color format setting section 364 and a high-quality format setting section 366.

[0050]在处理程序3中,文档读取信息接收器302接收从图像读取器27获得的读取信息(文档读取信息),并且将文档读取信息输出到文档读取信息存储单元304。 [0050] In the handler 3, document reader information receiver 302 receives the read information obtained from the image reader 27 (read information from the document), the document reading and document reading information to the information storage unit 304 .

[0051]文档读取信息存储单元304存储从文档读取信息接收器302输出的文档读取信息。 [0051] The document reading information stored in the storage unit 304 to read a document information receiver 302 outputs information read from the document.

[0052]压缩格式手动指定单元306利用UI装置25接收从用户获得的压缩格式手动指定信息,并且将所接收到的压缩格式手动指定信息输出到压缩格式设定单元34。 [0052] Compression format manually specifying unit 30625 receives manually compressed format specification information obtained from the user using the UI unit, and the compressed information to the specified format manually compressed format setting section 34 is received. 压缩格式手动指定信息表示是否通过根据预定的压缩格式压缩文档读取信息来生成图像数据。 Manual compression format specification information indicates the compressed document according to a predetermined compression format information is read by the image data generated.

[0053]压缩格式手动指定信息包括不压缩文档读取信息而生成图像数据的指令、通过根据预定的压缩格式压缩文档读取信息来生成图像数据的指令(稍后将要描述)以及通过根据所需的压缩格式压缩文档读取信息来生成图像数据的指令。 [0053] Compression format manually specifying information includes the document reading instruction information is not compressed image data is generated, the instruction is generated by the compressed document image data according to a predetermined compression format to read information (to be described later) according to the desired and by the compression formats document reading instruction information to generate image data.

[0054]文档判断信息存储单元308存储稍后将要参照图5A描述的文档判断信息。 [0054] Analyzing the document information storage unit 308 stores later with reference to FIG. 5A described determination information document.

[0055]文档判断信息用于判断被读取文档的类型,并且是利用UI装置25从用户获得的。 [0055] Document judgment information for judging the type of the document to be read, using the UI device 25 and is obtained from the user.

[0056]文档类型压缩格式存储单元310存储稍后将要参照图5B描述的文档类型压缩格式关联信息。 It will be stored later with reference to 310 [0056] Document Type compression format document storing unit of the type described in FIG. 5B compression format related information.

[0057]文档类型压缩格式关联信息用于根据文档类型设定压缩格式,并且是利用UI装置25从用户获得的。 [0057] Document Type compression format used to compress the related information according to the type of document format set, using the UI device 25 and is obtained from the user.

[0058]图5A示出了文档判断信息的实例,而图5B示出了文档类型压缩格式关联信息的实例。 [0058] FIG 5A shows an example of judgment information document, while Figure 5B shows an example of Document Type compression format related information.

[0059]如图5A所示,文档判断信息包括表示文档的类型的文档类型、表征文档的类型的特定字符串、表示围绕特定字符串所在位置的区域的位置信息以及表示特定字符串所在区域的尺寸的尺寸信息。 [0059] As shown in FIG, 5A determines the document information includes information indicating the type of document type of the document, the document to characterize a particular string type, position information indicating the position of the string around a particular area and represents the particular area where the character string is located size size information.

[0060]如图5B所示,文档类型压缩格式关联信息包括文档类型和根据文档类型设定的压缩格式。 [0060] shown in FIG. 5B compression format document type associated information includes the type of document formats and compression set in accordance with the document type.

[0061]颜色信息提取单元314(图2)从存储在文档读取信息存储单元304中的文档读取信息中提取与包含在被读取文档中的颜色对应的颜色信息。 Extracting a color corresponding to the color information contained in the document to be read [0061] The color information extraction unit 314 (FIG. 2) reads information from the document stored in the document information storage unit 304 is read in.

[0062]颜色分布计算器316基于所提取出的颜色信息计算颜色分布。 [0062] The color distribution profile calculator 316 based on the extracted color information calculating color.

[0063]如果颜色信息是以RGB三维色空间表示的,则颜色信息提取单元314从文档读取信息中提取每个像素的三种颜色(即红色(R)、绿色(G)和蓝色(B))中的每一种的色值(例如亮度值)。 [0063] If the RGB color information is represented by a three-dimensional color space, the extracting unit 314 extracts color information of three colors (i.e., red (R) for each pixel in the document reading information from, green (G) and blue ( color value B)) in each of (e.g., luminance value).

[0064]在这种情况下,颜色分布计算器316计算各个像素中的色值出现的频率,并且生成表示色值和色值出现的频率之间的关系的频率分布图。 [0064] In this case, the calculator 316 calculates a frequency distribution of the color value of each pixel in the color appearing, and generates a frequency distribution showing the relationship between the frequency of the color value and the color value occurs.

[0065]下面,描述颜色信息以RGB三维色空间表示的情况。 [0065] Next, a case is described in the RGB color information of three-dimensional color space representation. 然而,颜色信息也可以由例如L*a*b色空间等其他色空间表示。 However, the color information may be represented by, for example another color space L * a * b color space.

[0066]在文档类型判断单元32中,布局分析部分322(图3)分析文档读取信息,以便分拣出包含在文档中的例如字符、表格、例如照片等非人工图片、计算机图形(CG)和绘画等对象。 [0066] In the document type determining unit 32, the layout analyzing section 322 (FIG. 3) Analysis document reading information in order to sort out included in a document such as a character, table, and other non-human image such as a photograph, computer graphics (CG ) and objects painting. 然后,布局分析部分322将拣出的对象与位置信息相关联。 Then, the layout section 322 and the object location information associated with the culling analysis.

[0067]例如,可以通过检测各种线条、边框、格线、颜色信息和边缘以及执行图案匹配来进行上述对象分拣。 [0067] For example, the target can be detected by a variety of sorting lines, borders, ruled line, color and edge information and performing pattern matching. 然而,对象分拣技术不限于这种类型。 However, the object is not limited to this type of sorting technology.

[0068]字符识别部分324利用例如光学字符识别(OCR)功能来分析文档读取信息。 [0068] The character recognition section 324 analyzes the document reading information using, for example optical character recognition (OCR) function. 在这种情况下,字符识别部分324执行词素分析,以便将文档读取信息划分成尺寸最小的有意义的字符串。 In this case, the character recognition section 324 performs morphological analysis, document reading order information into the smallest meaningful string.

[0069]按照如下方式执行字符识别。 [0069] The character recognition is performed as follows. 将通过读取文档获得的表示字符的图像数据与预先存储的图案核对,以便指定字符并生成字符数据(字符串)。 The data representing a character image obtained by reading the document with the collation pattern stored in advance, and to generate the character data specifies the character (string).

[0070]上述词素分析是指如下处理。 [0070] The morphological analysis refers to a process. 使用预先存储的包括与语法规则和词语有关的信息的词典将一个句子细分成词素(有意义的最小语言单位),并且判断所细分成的词素的词类。 It comprises using pre-stored information relating to the grammatical rules and the dictionary words in a sentence subdivided into morphemes (meaningful minimum unit of language), and determines the subdivided part of speech of the morphemes.

[0071]字符识别部分324计算所检测到的字符串的位置,以便生成字符串与字符串的位置相关联的位置信息。 [0071] The character recognition section 324 calculates the position of the detected character string, so as to generate position information string associated with the string.

[0072]特定字符串判断部分326判断包含在由文档判断信息存储单元308提供的文档判断信息中的特定字符串是否包含在由字符识别部分324检测到的字符串中。 [0072] The determination section 326 determines a specific character string contained in the character string is determined by the document information storage unit 308 determines the document provided is information contained in the specific string by the character recognition portion 324 in the detected.

[0073]如果特定字符串包含在由字符识别部分324检测到的字符串中,则特定字符串判断部分326将与所检测到的特定字符串有关的信息输出到特定字符串位置判断部分330和特定字符串尺寸判断单元332。 [0073] If a specific character string contained in the character string detected by the character recognition section 324, the section 326 determines the specific character string associated with the specific character string detected information output judging section 330 to a particular position and a string size judging unit 332 a specific string.

[0074]如果特定字符串不包含在由字符识别部分324检测到的字符串中,则特定字符串判断部分326将与特定字符串有关的信息(无特定字符串信息)输出到文档判断部分338。 [0074] If the string does not contain the specific character string detected by the character recognition section 324, the section 326 determines the specific character string information relating to a particular character string (non-specific character string information) to the document judging section 338 .

[0075]如果特定字符串的位置位于围绕以位置信息(位置信息包含在文档判断信息中且与相应的特定字符串相关联)表示的位置的区域内,则特定字符串位置判断部分330生成表明特定字符串的位置与包含在位置信息中的位置一致的信息,并且将该信息(位置一致信息)输出到文档判断部分338。 [0075] If the position of the specific string is located in the area surrounding (position information contained in the document is determined and the corresponding information associated with a specific character string) indicated by the position of the position information, the specific string position judging section 330 generates show position of the specific character string containing the location information in the identical position information, and the information (position information consistent) judging section 338 is output to the document.

[0076]如果特定字符串的尺寸在以尺寸信息(尺寸信息包含在文档判断信息中且与相应的特定字符串相关联)表示的尺寸的范围内,则特定字符串位置判断部分330生成表明特定字符串的尺寸与包含在尺寸信息中的尺寸一致的信息,并且将该信息(尺寸一致信息)输出到文档判断部分338。 [0076] If the size of the range of a particular string in a size information (information contained in the document size is determined and the corresponding information associated with a specific character string) represented by the size of the specific string position judging section 330 generates a specific show the size of the character string included in the size information is consistent size information, and the information (consistent with the size information) to the document judging section 338.

[0077]如果位置一致信息和尺寸一致信息均已为预定的特定字符串生成,则文档判断部分338判定与文档读取信息对应的文档是与该特定字符串相关的文档,并且生成表示所判断出的与文档读取信息对应的文档类型的文档类型信息。 [0077] If the position information is consistent size and consistent information has been generated for the predetermined specific character string, the document determination section 338 determines that information corresponding to the document reading a document is associated with that particular document character string, and generates the determined document reading out the document type information corresponding to the document type.

[0078]此外,文档判断部分338将文档类型信息输出到压缩格式设定单元34。 [0078] In addition, the document determination section 338 outputs the document type information setting section 34 to the compression format.

[0079]与此相反,如果位置一致信息和尺寸一致信息中的一者尚未生成或者无特定字符串信息已生成,则文档判断部分338生成表明没有判断出与文档读取信息对应的文档类型的文档类型信息。 [0079] In contrast to this, if the position information is consistent and uniform size information has not been generated by a specific character string or no information has been generated, the judging section 338 generates the document indicates that the document type is not determined that the read information corresponding to the document document type information.

[0080] 下面,参照图5A具体地描述上述处理。 [0080] Next, with reference to FIG. 5A process specifically described above.

[0081]例如,现在假定特定字符串判断部分326已经从文档读取信息中检测到特定字符串“AAA”。 [0081] For example, now assume that a specific character string determination portion 326 has been detected specific character string "AAA" in the information read from the document.

[0082]在这种情况下,如果特定字符串位置判断部分330判定特定字符串“AAA”的位置位于围绕以位置信息#1表示的位置的区域内并且特定字符串尺寸判断单元332判定特定字符串“AAA”的尺寸在以尺寸信息#1表示的尺寸的范围内,则为特定字符串“AAA”生成位置一致信息和尺寸一致信息。 [0082] In this case, if a specific character string position judging section 330 determines the position of a specific character string "AAA" is located within about position 1 to position information represented by an area # specific character string and the size determination unit 332 determines that a particular character string "AAA" size in the range of size to the size information is represented by # 1, for the specific character string "AAA" to generate position information consistent size and consistent information.

[0083]因此,文档判断部分338判定与文档读取信息对应的文档类型是“文档A”。 [0083] Thus, the document determination section 338 determines that the document reading information corresponding to the document type is "Document A".

[0084]在上述示例性实施例中,当位置一致信息和尺寸一致信息均已为预定的特定字符串生成时,文档判断部分338判定包含该特定字符串的文档类型并且生成文档类型信息。 [0084] In the exemplary embodiment, when the position information is consistent size and consistent information has been generated for a predetermined specific character string, determining portion 338 determines that the document type of the document containing a specific character string and generates the document type information. 然而,当位置一致信息和尺寸一致信息中的一者已为预定的特定字符串生成时,文档判断部分338也可以判定包含该特定字符串的文档类型。 However, when the position information is consistent and uniform size information is generated by a predetermined specified character string, the document determination section 338 may determine the type of document that contains the specific string.

[0085]在上述示例性实施例中,位置信息和尺寸信息各自具有预定范围,并且判断特定字符串的位置和尺寸是否分别在位置信息和尺寸信息的范围内。 [0085] In the exemplary embodiment, position information and size information each having a predetermined range, and determines the position and size of a specific character string are within the scope of whether the position information and the size information. 作为选择,位置信息和尺寸信息也可以各自具有预定的特定值。 Alternatively, the position information and size information may each have a predetermined specific value. 于是,当特定字符串的位置和尺寸接近预定的特定值时,可以给予特定字符串的位置和尺寸较高的分数。 Thus, when the position and size of a particular string near the predetermined specific value, can be given a higher position and a size fraction of a particular string. 如果给予位置和尺寸的分数的总值超过阈值,则文档判断部分338可以判定包含该特定字符串的文档类型。 If the total score given position and size exceeds a threshold value, the document determination section 338 may determine the type of document that contains a specific string.

[0086]在上述示例性实施例中,基于特定字符串的位置和尺寸判断文档类型。 [0086] In the exemplary embodiment, the document type is determined based on the position and size of the specific string. 然而,也可以以不同方式判断文档类型。 However, you can also determine the type of document in different ways. 例如,为了判断文档类型是否为设计图,如果在特定位置(例如右下部或左上部)包含有预定数目的横、竖格线,则可以判定文档类型为设计图。 For example, in order to judge whether the document type of design, if contained in a particular position (e.g., upper left or lower right portion) with a predetermined number of horizontal and vertical grid lines, it may be determined as the document type design.

[0087]在压缩格式设定单元34中,文档类型信息接收器342(图4)从文档类型判断单元32接收文档类型信息,并且将所接收到的文档类型信息输出到文档类型压缩格式关联部分348。 [0087] In the compression format setting unit 34, the document type information receiver 342 (FIG. 4) 32 receives a document type information document type determining unit, and outputs the received information to the document type associated with the document type compression format section 348.

[0088]文档类型压缩格式接收器344从文档类型压缩格式存储单元310接收文档类型压缩格式关联信息,并且将所接收到的文档类型压缩格式关联信息输出到文档类型压缩格式关联部分348。 [0088] Document Type compression formats receiver 344 compression format storage unit 310 receives a document type from the document type associated with the compression format information, and the received document type associated compression format information to the document type associated compression format portion 348.

[0089]文档类型压缩格式关联部分348基于文档类型压缩格式关联信息确定与以文档类型信息表示的文档类型对应的压缩格式。 [0089] Document Type compression format associated with the compression part 348 based on a document type associated with the format information to determine a document type to the document type information indicates a corresponding compression format.

[0090]此外,如果文档类型压缩格式关联部分348确定压缩格式为黑白格式,则文档类型压缩格式关联部分348指示黑白格式设定部分362设定压缩格式。 [0090] In addition, if the document type associated with the compression format determining section 348 as a black and white format compression format, the format of the document type associated compression section 348 indicates black and white format setting section 362 sets the compression format. 如果文档类型压缩格式关联部分348确定压缩格式为限制色格式,则文档类型压缩格式关联部分348指示限制色格式设定部分364设定压缩格式。 If the document type associated with the compression format determining section 348 compression format is limited color format, the document type associated with the compression format 348 indicates the format limited color setting section 364 sets the compression format. 如果文档类型压缩格式关联部分348确定压缩格式为高质量格式,则文档类型压缩格式关联部分348指示高质量格式设定部分366设定压缩格式。 If the document type associated with the compression format determining section 348 compressed format for high-quality format, the document type associated with the compression format 348 indicates the format setting section 366 is set high quality compression format.

[0091] “黑白格式设定”是如下数据压缩设定。 [0091] The "black and white format setting" is set as data compression. 当生成图像数据时,将待表达的颜色限制为只有黑色和白色。 When generating image data, the color to be expressed is limited to only black and white.

[0092] “限制色格式设定”是如下数据压缩设定。 [0092] "limited color format setting" is set as data compression. 当生成图像数据时,将待表达的颜色限制为预定数目的颜色。 When generating image data, the color to be expressed is limited to a predetermined number of colors. 生成与每一种颜色对应的层并且将各个层压缩为二值数据,以便生成具有支持多层结构的文件格式(例如便携文档格式(PDF))的图像数据。 Generating each color layer corresponding to the individual layers and compress the binary data to generate image data having a multi-layered structure supported file format (e.g., Portable Document Format (PDF)) of.

[0093] “高质量格式设定”是如下数据压缩设定。 [0093] "high-quality format setting" is set as data compression. 当生成图像数据时,从原始图像中提取例如字符、照片和CG等对象。 When generating image data, extracts objects such as characters, pictures and the like from the original image CG. 然后,生成各个层并且将各个层压缩成适合所提取的对象的格式,并且随后将各个层整合在一起以便生成具有支持多层结构的文件格式(例如PDF)的图像数据。 Then, a layer of the respective layers and the respective compressed into a suitable format the extracted object, and then the various layers together so as to generate image data having a multi-layered structure supported file formats (e.g., PDF) of.

[0094]也就是说,在压缩表示文档读取信息的数据时图像数据所再现的颜色的数目根据采用哪种压缩格式(即“黑白格式设定”、“限制色格式设定”或“高质量格式设定”)而不同。 [0094] That is, in the compressed document reading represents the number of color data information of the image data reproduced from the compressed format (i.e., "black and white format setting", "limited color format setting" which uses or "high quality format setting ") is different. 按照“黑白格式设定”、“限制色格式设定”和“高质量格式设定”的顺序能够获得的再现色的数目越来越大。 According to the number of "black and white format setting", "limited color format setting" Color reproduction order and "high-quality format setting" can be obtained by increasing. 在采用“高质量设定”的情况下,可再现的颜色的数目最大。 In the case of "high setting", the number of reproducible colors maximum.

[0095] “限制色”是指通过基于频率分布图的计算提取出的一种或多种代表色。 [0095] "limited color" refers to one or more of the extracted representative color is calculated based on frequency distribution of FIG. 在“限制色设定”的情况下,颜色与预定的限制色不同的像素分别被转换成RGB色空间内与限制色中的一种最接近(色值与限制色中的一种的色值最接近)的颜色。 In the case of "limited color setting", a predetermined color different from the limited color pixels are converted to a closest color value and limited by a single (and limited by a single color value in a color of the color in the RGB color space the closest) color. 这样,通过将颜色的数目减少为限制色的数目,“限制色格式设定”的压缩比高于“高质量格式设定”的压缩比。 Thus, the number of colors will be reduced by limiting the number of colors, "limited color format setting" compression ratio higher than the "high-quality format setting" compression ratio.

[0096] 如果文档类型信息不表示文档类型(即如果没有判断出文档类型),则文档类型压缩格式关联部分348将没有判断出文档类型的信息(类型未指定信息)输出到压缩格式确定部分350。 [0096] If the document type information does not indicate the type of document (i.e., if it is not determined that the type of document), the document type compression associated format portion 348 is not judged that the document type information (type not information specified) to the compression format determining section 350 .

[0097]如果文档类型信息表示文档类型并且相应的压缩格式是“彩色设定格式”,则意味着可以采用“限制色格式设定”和“高质量格式设定”中的任何一种。 Any [0097] If the document type information indicates the type of document and the corresponding compression format is "color formatting", it means that "limited color format setting" and "high-quality format setting" in use.

[0098]从而,文档类型压缩格式关联部分348将该信息(颜色设定信息)输出到压缩格式确定部分350。 [0098] Thus, the document type associated with the compression format of the information portion 348 (color setting information) is output to the compression format determining section 350.

[0099] 参照图5B,具体地描述上述处理。 [0099] Referring to Figure 5B, the above-described process is specifically described.

[0100]例如,如果判定文档类型是“文档A”,则文档类型压缩格式关联部分348指示限制色格式设定部分364设定压缩格式以便根据“限制色格式设定”压缩“文档A”。 [0100] For example, if it is determined document type is "Document A", the document type associated with the compression format 348 indicates the format limited color setting section 364 sets the compression format for compressed "Document A" in accordance with "limited color formatting."

[0101]如果判定文档类型是“文档D”,则文档类型压缩格式关联部分348将类型未指定信息输出到压缩格式确定部分350(图4)。 [0101] If it is determined document type is "Document D", the document type associated with the compression format is not the type of the section 348 outputs information to a specified compression format determining section 350 (FIG. 4).

[0102]如果判定文档类型是“文档B”,则文档类型压缩格式关联部分348将颜色设定信息输出到压缩格式确定部分350。 [0102] If it is determined document type is "document B", the document type associated compression format color setting section 348 outputs information to the compression format determining section 350.

[0103]在文档类型压缩格式关联部分348没有将压缩格式确定为一种格式的情况下,压缩格式确定部分350确定所要设定的压缩格式。 [0103] In determining the format to a format associated with the case where the type of document format compression section 348 is not compressed, the compression format determining section 350 determines a compressed format to be set.

[0104] 压缩格式确定部分350包括最常出现色确定部分354和包含色确定部分356。 [0104] Compression format determining section 350 comprises the most frequently occurring color determining section 354 and the determination portion 356 comprises a color.

[0105]图6是示出由最常出现色确定部分354执行的处理的频率分布图。 [0105] FIG. 6 is a frequency distribution diagram illustrating the processing performed by the section 354 determines most frequently appeared colors.

[0106]在图6中,横轴是每一种RGB颜色的色值(例如亮度),而纵轴是相对于每个色值的像素出现频率(像素数目)。 [0106] In FIG. 6, the horizontal axis represents each color RGB color value (e.g. luminance) and the vertical axis with respect to each pixel color value occurrence frequency (number of pixels).

[0107]在本示例性实施例中,作为颜色分布给出了示出相对于RGB值的像素出现频率的二维频率分布图。 [0107] In the present exemplary embodiment, given as a two-dimensional color distribution histogram pixel RGB values ​​with respect to the frequency of occurrence shown. 然而,也可以生成示出相对于单独的R色、G色和B色的像素出现频率的三维频率分布图。 However, it may be generated with respect to the illustrated three-dimensional histogram of the frequency of the individual colors R, G, and B colors of the pixels occurs.

[0108]为了便于图示,在本示例性实施例中,作为颜色分布示出了二维频率分布图。 [0108] For ease of illustration, in the present exemplary embodiment, as a two-dimensional color distribution histogram shown in FIG. 然而,也可以生成计算和分析相对于每个单独的R色、G色和B色的像素出现频率的三维频率分布图。 However, calculations and analysis may be generated three-dimensional frequency distribution of occurrence frequency for each of the individual colors R, G, and B colors of the pixels with respect.

[0109]在这种情况下,在三维频率分布图中,像素的数目与对应于2563个立方体中的每一个的颜色相关联,其中,对于RGB三维空间的三个维度中的每个维度都形成有256个分区。 [0109] In this case, the frequency of the three-dimensional distribution diagram, the number of pixels corresponding to each color cube 2563 is associated, wherein, for the three dimensions of the RGB three-dimensional space in each dimension formed 256 partitions.

[0110]最常出现色确定部分354指定与最常出现点对应的色值,并且将包含在该最常出现点的预定色值宽度内的色值设定为最常出现色。 [0110] the most frequently occurring color determining portion 354 specifies the most frequently occurring color value corresponding to the point, and the color values ​​contained in the set of color values ​​within a predetermined width of the points is the most commonly occurring most frequently occurring color.

[0111]更具体来说,在图6所示的实例中,最常出现色确定部分354指定点B作为最常出现点并且将包含在点B的预定色值宽度W2内的色值设定为最常出现色。 [0111] More specifically, in the example shown in FIG. 6, the most frequent color determination section 354 as specified point B and the point most common predetermined color values ​​included in the point B color value setting the width W2 It is the most common color.

[0112]然后,最常出现色确定部分354判断最常出现色在整个文档中的比例是否小于或等于阈值(例如80%)。 [0112] Then, the most frequently occurring color determining section 354 determines the most frequently occurring color in the entire document in the ratio is less than or equal to a threshold (e.g., 80%).

[0113] 如果找到的最常出现色的比例小于或等于阈值,则最常出现色确定部分354指示高质量格式设定部分366设定压缩格式。 [0113] If the ratio found most frequently occurring color is less than or equal to the threshold value, the most frequently occurring color determining section 354 indicates a high-quality format setting section 366 sets the compression format.

[0114]与此相反,如果最常出现色确定部分354判定找到的最常出现色的比例大于阈值,则包含色确定部分356执行如下处理。 [0114] In contrast to this, if the most frequently occurring color determining section 354 determines the ratio to find the most frequently occurring color value is greater than a threshold, the determination section 356 performs color comprising the following processing.

[0115]包含色确定部分356对频率分布图进行采样和量化,然后指定出现频率大于或等于阈值的色值并且将包含在所指定的色值的预定范围内的色值设定为包含在文档中的颜色(包含色)。 Color value is set to [0115] comprising a color determining portion 356 of the frequency distribution is sampled and quantized, and then specify the color value of the frequency is greater than or equal to the threshold value occurs and contained within the predetermined range specified color value is included in the document colors (including color).

[0116]包含色确定部分356还计算包含色的数目并且判断包含色的数目是否大于或等于阈值。 [0116] A color determining section 356 further comprises calculating the number of colors contained and determines whether the number of colors is greater than or equal to a threshold.

[0117]如果包含色的数目大于或等于阈值,则包含色确定部分356指示高质量格式设定部分366设定压缩格式。 [0117] If the number contains a color value is greater than or equal to the threshold, the color determination section 356 contains an indication of high quality format setting section 366 sets the compression format.

[0118]如果判定包含色的数目小于阈值并且已经从文档类型压缩格式关联部分348输出了颜色设定信息,则包含色确定部分356指示限制色格式设定部分364设定压缩格式。 [0118] If it is determined the number of color comprising less than a threshold portion 348 and outputs a color setting information from the document format has associated compression type, comprising a color determination section 356 indicates limited color format setting section 364 sets the compression format.

[0119]如果判定包含色的数目小于阈值并且没有从文档类型压缩格式关联部分348输出颜色设定信息,则包含色确定部分356判断包含色是否被限制为只有黑色和白色或者被限制为色值在黑色或白色的预定范围内的颜色。 [0119] If it is determined the number comprising the color is less than a threshold and no compression associated format portion 348 outputs a color setting information from the document type, contains color determining section 356 determines comprise whether the color is limited to only black and white, or is limited to a color value color within a predetermined range of black or white.

[0120]如果该判断结果是肯定(是),则包含色确定部分356指示黑白格式设定部分362设定压缩格式。 [0120] If the determination result is affirmative (YES), the determination portion 356 comprises a color indicative of black and white format setting section 362 sets the compression format. 如果该判断结果是否定(否),则包含色确定部分356指示限制色格式设定部分364设定压缩格式。 If the determination is negative (NO), the determination portion 356 comprises a color indication limited color format setting section 364 sets the compression format.

[0121]黑白格式设定部分362将压缩格式设定为黑白格式,并且将与所设定的压缩格式有关的信息输出到压缩处理器372。 [0121] Among the format setting section 362 will be set to black and white format compression format, and the compression format associated with the set of compressed information is output to the processor 372.

[0122]限制色格式设定部分364将压缩格式设定为限制色格式,并且将与所设定的压缩格式有关的信息输出到压缩处理器372。 [0122] limited color format setting section 364 is set to the compression format limited color format, and outputs information relating to the set compression format to compression processor 372.

[0123]高质量格式设定部分366将压缩格式设定为高质量格式,并且将与所设定的压缩格式有关的信息输出到压缩处理器372。 [0123] The high quality format setting section 366 for high-quality compression format set format, and outputs information relating to the set compression format to compression processor 372.

[0124]在本示例性实施例中,压缩格式的具体实例是黑白格式、限制色格式和高质量格式。 [0124] In the present exemplary embodiment, specific examples of black and white format is a compressed format, the format and quality of the limited color format. 然而,压缩格式不限于这些类型。 However, the compression format is not limited to these types.

[0125]压缩处理器372(图2)根据由黑白格式设定部分362、限制色格式设定部分364或高质量格式设定部分366设定的压缩格式对文档读取信息执行压缩处理。 [0125] compression processor 372 (FIG. 2) is set according to a black and white format portion 362, limited color format setting section 364 or high-quality compression format setting section 366 reads the document formatting information to perform compression processing.

[0126]如果文档包括多个页面,则多页面设置单元376设置第二页和随后的页面是否遵循为第一页设定的压缩格式或者设置是否要为每一页设定压缩格式。 [0126] If the document includes a plurality of pages, the multi-page setting unit 376 sets the second page and subsequent pages of whether to follow the first page or set compression format to compressed format settings are set for each page.

[0127]可以利用UI装置25从用户获得该设置。 [0127] 25 may be obtained from the user using the UI setting means.

[0128]如果文档包括多个页面并且第二页和随后的页面遵循为第一页设定的压缩格式,则压缩处理器372按照为第一页设定的压缩格式压缩第二页和随后的页面。 [0128] If the document includes a plurality of pages and the second page and subsequent pages following the first page set compression format, the compression processor 372 according to the first page of the second set of compression formats and subsequent pages page.

[0129]与此相反,如果文档包括多个页面并且要为每一页设定压缩格式,则压缩处理器372控制文档类型判断单元32和压缩格式设定单元34执行上述处理。 [0129] In contrast, if the document includes a plurality of pages and each page compression format is to be set, and compressed format setting unit 32 executes the process processor 34 controls the document type determining unit 372 is compressed.

[0130]图像数据生成器374为经历过压缩处理的信息生成例如TOF格式的图像。 [0130] The image data generator 374 generates the information such as an image compression format TOF processing experienced. 如果文档包括多个页面,则通过组合经历过压缩处理的多个页面来生成图像数据。 If the document includes a plurality of pages, by combining a plurality of pages undergone compression processing to generate image data.

[0131]图像数据输出单元378将所生成的图像数据显示在用作显示装置的UI装置25上。 [0131] The image data output unit 378 the generated image data is displayed on the UI device 25 as a display device.

[0132]如果压缩格式确定部分350没有确定出压缩格式,则异常通知单元380将警告信息显示在用作显示装置的UI装置25上或者使得用作扬声器的UI装置25发出警告。 [0132] If the compression format determining section 350 determines that no compressed format, the abnormality notification unit 380 displays a warning message on the UI device 25 as a display device or a speaker such as UI device 25 issues a warning.

[0133]图7A至7C是示出通过处理程序3执行的处理的流程图。 [0133] FIGS. 7A to 7C are flowcharts illustrating the processing program executed by the processing 3.

[0134]在步骤S102中,文档读取信息接收器302接收作为文档读取结果而获得的文档读取信息。 [0134] In step S102, the document reading information receiver 302 receives a document reading result obtained by the document reading information.

[0135]在步骤S104中,压缩格式手动指定单元306判断是否给出了在生成图像数据时压缩文档读取信息的指令。 [0135] In step S104, the manual compression format specifying unit 306 determines whether the archive instruction is given at the time of generating image data reading information. 如果步骤S104中的结果是否定,则处理转入步骤S106。 If the result of step S104 is negative, the process proceeds to step S106. 如果步骤S104中的结果是肯定,则处理转入步骤S108。 If the result of step S104 is affirmative, the process proceeds to step S108.

[0136]在步骤S106中,图像数据生成器374不压缩文档读取信息而生成图像数据,并且完成处理。 [0136] In step S106, the image data generator 374 does not generate read information compressed document image data, and completes the processing.

[0137]在步骤S108中,压缩格式手动指定单元306判断是否指定了压缩格式。 [0137] In step S108, the manual compression format specifying unit 306 determines whether the specified compression format. 如果已经指定了压缩格式,则处理转入步骤SI 10。 If the compression format has been specified, the process proceeds to step SI 10. 如果没有指定压缩格式,即如果要自动地设定压缩格式,则处理转入步骤SI 12。 If no compression format, i.e. if the compression format to be automatically set, the process proceeds to step SI 12.

[0138]在步骤SllO中,压缩处理器372利用所指定的压缩格式执行压缩处理。 [0138] In step SllO, the compression processor 372 using the specified compression process performs compression format.

[0139]在步骤S112中,文档类型判断单元32判断文档是否属于特定类型。 [0139] In step S112, the document type determining unit 32 determines the document belongs to a particular type. 如果文档属于特定类型,则处理转入步骤S114。 If the document of a particular type, the process proceeds to step S114. 如果在步骤S112中判定文档不属于特定类型或者没有判断出文档类型,则处理转入步骤S162。 If it is determined in step S112 does not belong to a particular type of document or document type is not judged, the process proceeds to step S162.

[0140]在步骤S114中,压缩格式设定单元34的文档类型压缩格式关联部分348判断是否指定了与文档类型对应的压缩格式。 [0140] In step S114, the document type of compression format setting section 34 of the compression section 348 determines whether the format associated with the document type corresponding to the specified compression format. 如果步骤S114中的结果是肯定,则处理转入步骤S116。 If the result of step S114 is affirmative, the process proceeds to step S116. 如果步骤S114中的结果是否定,则处理转入步骤S162。 If the result of step S114 is negative, the process proceeds to step S162.

[0141]在步骤S116中,文档类型压缩格式关联部分348判断所指定的压缩格式是否为黑白格式。 [0141] In step S116, the format of the document type associated compression section 348 determines whether the specified compression format monochrome format. 如果指定了黑白格式,则处理转入步骤S118。 If black and white format is specified, the process proceeds to step S118. 如果没有指定黑白格式,则处理转入步骤SI 20。 If no black and white format, the process proceeds to step SI 20.

[0142]在步骤SI 18中,黑白格式设定部分362将压缩格式设定为黑白格式,并且压缩处理器372利用黑白格式执行压缩处理。 [0142] In step SI 18, the black and white format setting section 362 will be set to black and white format compression format, and the decompression processor 372 performs compression processing using the black and white format.

[0143]在步骤S120中,文档类型压缩格式关联部分348判断所指定的压缩格式是限制色格式和高质量格式中的一者还是仅仅指定了彩色设定。 [0143] In step S120, the document type associated compression section 348 determines whether the specified format to a compressed format and a high quality format limited color formats or only one specified color is set. 如果指定了限制色格式和高质量格式中的一者,则处理转入步骤S122。 If the limited color format and a high quality format is designated, the process proceeds to step S122. 如果仅仅指定了彩色设定,则处理转入步骤S172。 If only a specified color is set, the process proceeds to step S172.

[0144]在步骤S122中,限制色格式设定部分364或高质量格式设定部分366设定相应的压缩格式,并且压缩处理器372利用限制色格式或高质量格式执行压缩处理。 [0144] In step S122, the limited color format setting section 364 or high-quality format setting portion 366 set the appropriate compression formats, the processor 372 and the compression format or using a limited color quality format compression processing is performed.

[0145] 在步骤S142中,多页面设置单元376判断文档是否包括多个页面。 [0145] In step S142, the multi-page setting unit 376 determines whether the document includes a plurality of pages. 如果文档包括多个页面,则处理转入步骤S144。 If the document includes multiple pages, the process proceeds to step S144. 如果文档只包括一个页面,则处理转入步骤S150。 If the document includes only one page, the process proceeds to step S150.

[0146]在步骤S144中,多页面设置单元376判断第二页和随后的页面是否遵循为第一页设定的压缩格式。 [0146] In step S144, the setting unit 376 determines a multi-page and second page following the subsequent page is the first page set in a compressed format. 如果步骤S144中的结果是肯定,则处理转入步骤S146。 If the result of step S144 is affirmative, the process proceeds to step S146. 如果步骤S144中的结果是否定,则处理转入步骤S148。 If the result of step S144 is negative, the process proceeds to step S148.

[0147]在步骤S146中,压缩处理器372利用为第一页设定的压缩格式对第二页和随后的页面执行压缩处理。 [0147] In step S146, the processor 372 using the compressed compression format for the first page and the second page set page is performed subsequent compression process.

[0148] 在步骤S148中,压缩处理器372判断所有的页面是否都已经历过压缩处理。 [0148] In step S148, the decompression processor 372 determines whether all pages have undergone compression processing. 如果步骤S148中的结果是肯定,则处理转入步骤S150。 If the result of step S148 is affirmative, the process proceeds to step S150. 如果步骤S148中的结果是否定,则处理返回到步骤SI 12。 If the result of step S148 is negative, the process returns to step SI 12.

[0149]在步骤S150中,图像数据生成器374生成经历过压缩处理的图像数据。 [0149] In step S150, the image data generator 374 generates image data having undergone compression processing. 于是,完成处理。 Then, the processing is completed.

[0150]在步骤S162中,压缩格式设定单元34的最常出现色确定部分354判断最常出现色的比例是否小于或等于阈值。 [0150] In step S162, the compression format setting section 34 of the most frequently occurring color determining section 354 determines the ratio of the most frequently occurring color is less than or equal to a threshold. 如果步骤S162中的结果是肯定,则处理转入步骤S184。 If the result of step S162 is affirmative, the process proceeds to step S184. 如果步骤S162中的结果是否定,则处理转入步骤S164。 If the result of step S162 is negative, the process proceeds to step S164.

[0151 ]在步骤S164中,压缩格式设定单元34的包含色确定部分356判断包含色的数目是否大于或等于阈值。 [0151] In step S164, the compression unit includes a color format setting section 356 determines whether the number of colors determined comprises greater than or equal to the threshold 34. 如果步骤S164中的结果是肯定,则处理转入步骤S184。 If the result of step S164 is affirmative, the process proceeds to step S184. 如果步骤S164中的结果是否定,则处理转入步骤S166。 If the result of step S164 is negative, the process proceeds to step S166.

[0152]在步骤S166中,压缩格式设定单元34的包含色确定部分356判断包含色是否被限制为只有黑色和白色或者被限制为色值在黑色或白色的预定范围内的颜色。 [0152] In step S166, the color determination section 356 comprising compressed comprises determining whether the color is limited to only black and white or color values ​​of the color is limited to black or white in the predetermined range formatting unit 34. 如果步骤S166中的结果是肯定,则处理转入步骤S180。 If the result of step S166 is affirmative, the process proceeds to step S180. 如果步骤S166中的结果是否定,则处理转入步骤S182o If the result of step S166 is negative, the process proceeds to step S182o

[0153]在步骤S172中,压缩格式设定单元34的最常出现色确定部分354判断最常出现色的比例是否小于或等于阈值。 [0153] In step S172, the compression format setting section 34 of the most frequently occurring color determining section 354 determines the ratio of the most frequently occurring color is less than or equal to a threshold. 如果步骤S172中的结果是肯定,则处理转入步骤S184。 If the result of step S172 is affirmative, the process proceeds to step S184. 如果步骤S172中的结果是否定,则处理转入步骤S174。 If the result of step S172 is negative, the process proceeds to step S174.

[0154]在步骤S174中,压缩格式设定单元34的包含色确定部分356判断包含色的数目是否大于或等于阈值。 [0154] In step S174, the compression unit includes a color format setting section 356 determines whether the number of colors determined comprises greater than or equal to the threshold 34. 如果步骤S174中的结果是肯定,则处理转入步骤S184。 If the result of step S174 is affirmative, the process proceeds to step S184. 如果步骤S174中的结果是否定,则处理转入步骤S182。 If the result of step S174 is negative, the process proceeds to step S182.

[0155]在步骤S180中,压缩格式设定单元34的黑白格式设定部分362将压缩格式设定为黑白格式。 [0155] In step S180, the compression format setting section 34 in black and white format setting section 362 will be set to black and white format compressed format.

[0156]在步骤S182中,压缩格式设定单元34的限制色格式设定部分364将压缩格式设定为限制色格式。 [0156] In step S182, the compression format setting section 34 is limited color format setting section 364 is set to the compression format limited color format.

[0157]在步骤S184中,压缩格式设定单元34的高质量格式设定部分366将压缩格式设定为高质量格式。 [0157] In step S184, the compression format setting section 34 is high-quality format setting section 366 is set to high quality compression format format.

[0158]下面,借助于具体实例描述通过根据本示例性实施例的处理程序3执行的处理。 [0158] Next, by means of the process described in Example 3, performed in accordance with specific processing program according to the present exemplary embodiment.

[0159]图8示出了待通过根据本示例性实施例的处理程序3处理的第一文档。 [0159] FIG. 8 shows a first document to be processed by the program according to the present exemplary embodiment 3 processing.

[0160]图8所示的第一文档是估价单。 The first document shown in [0160] FIG. 8 is a valuation. 该文档的背景是白色,以E表示的部分是红色,而其他字符和线条是黑色。 The background document is white, some with E represents the red, while other characters and lines are black.

[0161]现在假定包含在图5A所示文档判断信息中的文档类型的文档A是估价单,相应的特定字符串“AAA”是“估价单”,相应的位置信息#1表示“文档的顶部中心”,并且相应的尺寸信息#1表示“较大的尺寸(字体)”。 [0161] It is now assumed included in the document information is determined as shown in FIG. 5A in the document A document type is written estimate, corresponding to the specific character string "AAA" is "single-valued", corresponding to the position information # 1 indicates the top "of the document Center ", and the corresponding size information # 1 indicates" the larger size (font). "

[0162]在图8所示的实例中,字符串“估价单”位于文档的顶部中心处,并且该字符串的尺寸大于其他字符串的尺寸。 [0162] In the example shown in FIG. 8, the character string "single-valued" at the top center of the document, and the size of the character string is larger than that of the other strings.

[0163]因此,文档类型判断单元32(图2)基于图5A所示的文档判断信息判定图8所示文档是估价单。 [0163] Thus, the document type determining unit 32 (FIG. 2) based on the document information shown in FIG. 5A determination shown in FIG. 8 determines that the document is a valuation.

[0164]然后,压缩格式设定单元34的文档类型压缩格式关联部分348(图4)基于文档类型压缩格式关联信息确定为估价单使用哪种压缩格式。 [0164] Then, the compression format setting section 34 of the document type associated compression format portion 348 (FIG. 4) compression format based on the document type of compression format is determined to estimate the related information which a single use.

[0165]例如,如果图5B中所示的文档A是估价单,则文档类型压缩格式关联部分348将压缩格式设定为限制色格式,并且如果图5B中所示的文档C是估价单,则文档类型压缩格式关联部分348将压缩格式设定为黑白格式。 [0165] For example, if the document A shown in FIG. 5B is a single-valued, the document type associated compression format to compressed format setting section 348 is limited color format, and the document C shown in FIG 5B, if a single estimate, the document type associated with section 348 compression format compression format is set to black and white format.

[0166]如果图5B中所示的文档B是估价单,则压缩格式为“彩色格式”,并且文档类型压缩格式关联部分348无法确定要使用哪种压缩格式(S卩,限制色格式还是高质量格式)。 [0166] If the document B shown in FIG. 5B is a single-valued, the compression format is "color format", and the document type associated with the compression format 348 can not determine which compression format (S Jie to be used, limited color format or high quality format). 因而,压缩格式确定部分350确定压缩格式。 Thus, the compression format determining section 350 determines a compression format.

[0167]白色占据图8所示文档的90%或更多,因此最常出现色确定部分354分析频率分布图并且判定最常出现色是白色且白色在文档中的比例超过阈值(例如50%)。 [0167] FIG. 8 occupy 90% of white shown in the document or more, the most commonly occurring and therefore the frequency analysis section 354 determines the color map and determines the most frequently occurring color is white and white ratio in the document exceeds the threshold value (e.g. 50% ).

[0168] 此外,在图8所示的文档中,包含色只有白色、黑色和红色(E部分),因而包含色确定部分356判定包含色的数目小于或等于阈值。 [0168] Further, in the document shown in FIG. 8, comprising only white color, black and red (E part), and thus it includes a color determination section 356 determines the number of color comprising less than or equal to a threshold.

[0169 ]从而,限制色格式设定部分364将压缩格式设定为限制色格式。 [0169] Thus, the limited color format setting section 364 is set to the compression format limited color format.

[0170]图9示出了待通过根据本示例性实施例的处理程序3处理的第二文档。 [0170] FIG. 9 shows a second document to be processed by the program according to the present exemplary embodiment 3 processing.

[0171]图9所示的第二文档是地铁线路图,该地铁线路图包括白色的背景、红色、蓝色、黄色和绿色的线条以及黑色的字符。 The second document shown in [0171] FIG. 9 is a subway, subway map including the white background, red, blue, yellow and green and black lines of characters.

[0172]如果图9所示的文档类型不包含在图5A所示的文档判断信息中,则文档类型判断单元32将类型未指定信息输出到压缩格式确定部分350,并且压缩格式确定部分350确定压缩格式。 [0172] If the document type shown in FIG. 9, the document is not included in the determination information shown in FIG. 5A, the document type determining unit 32 outputs information of the specified type is not determined to the compression format portion 350, and the compression format determining section 350 determines Compression format.

[0173]白色占据图9所示文档的50%或更多,因此最常出现色确定部分354分析频率分布图并且判定最常出现色是白色且白色在文档中的比例超过阈值(例如50%)。 [0173] FIG white occupy 50% or more of the document shown in FIG 9, and therefore the most frequent color determining section 354 analyzes the frequency distribution and determines the most frequently occurring color is white and white ratio in the document exceeds the threshold value (e.g. 50% ).

[0174] 此外,在图9所示的文档中,包含色有白色、黑色、红色、蓝色、黄色和绿色,因而如果阈值是5,则包含色确定部分356判定包含色的数目大于阈值。 [0174] Further, in the document shown in FIG. 9, comprising the color white, black, red, blue, yellow and green, so that if the threshold is 5, comprising a color determination section 356 determines the number of contained color is greater than a threshold value.

[0175]从而,高质量格式设定部分366将压缩格式设定为高质量格式。 [0175] Thus, the high quality format setting section 366 is set to high quality compression format format.

[0176]图10示出了待通过根据本示例性实施例的处理程序3处理的第三文档。 [0176] FIG 10 shows a third document to be processed by the program according to the present exemplary embodiment 3 processing.

[0177]图10所示的第三文档是设计图。 The third document shown in [0177] FIG. 10 is a design in FIG. 该文档的背景是白色,以F表示的部分(尺寸被修改的部分)是红色,而其他字符和线条是黑色。 The background document is white, part (part size modified) to F represents the red, while other characters and lines are black.

[0178]现在假定包含在图5A所示文档判断信息中的文档类型的文档D是设计图,相应的特定字符串“DDD”是“图号”,相应的位置信息#4表示“文档右下部的框体内”,并且相应的尺寸信息#4表示“容纳在框体内的尺寸”。 Document type of the document D [0178] Assuming now that the document contained in the determination information as shown in FIG. 5A is a design, the specific character string corresponding to "DDD" is the "drawing number" # 4 corresponding to the position information indicates "right lower part of the document box body ", and the corresponding size information # 4 indicates" housed in the casing size. "

[0179]在图10所示的实例中,字符串“图号”位于文档右下部的框体内。 [0179] In the example shown in FIG. 10, a character string "drawing number" is located in the lower right portion of the housing of the document. 因此,文档类型判断单元32基于图5A所示的文档判断信息判定图10所示文档是设计图。 Thus, the document based on the document type determining unit 32 shown in FIG. 5A determines that the determination information document shown in FIG. 10 is a design in FIG.

[0180]然后,压缩格式设定单元34的文档类型压缩格式关联部分348基于文档类型压缩格式关联信息确定为设计图使用哪种压缩格式。 [0180] Then, the compression format setting section 34 of the document type associated with the compression format 348 compression format based on the type of document related information compression format is designed to determine what use FIG.

[0181]例如,如果图5B中所示的文档D是设计图,则文档类型压缩格式关联部分348无法确定压缩格式,从而压缩格式确定部分350确定压缩格式。 [0181] For example, if the document D shown in FIG. 5B is designed, the document type associated compression format determining section 348 can not be compressed format to compressed format determining section 350 determines a compression format.

[0182]白色占据图10所示文档的90%或更多,因此最常出现色确定部分354分析频率分布图并且判定最常出现色是白色且白色在文档中的比例超过阈值(例如50%)。 [0182] Figure 10 shows the white occupies 90% or more of the document, so the most common color determination section 354 analyzes the frequency profile and determine the most common color is white and white in the document proportion exceeds a threshold value (for example 50% ).

[0183] 此外,在图10所示的文档中,包含色只有白色、黑色和红色(F部分),因而包含色确定部分356判定包含色的数目小于或等于阈值。 [0183] Further, in the document shown in FIG. 10, it comprises only white color, black and red (F section), and thus includes a color determination section 356 determines the number of color comprising less than or equal to a threshold.

[0184]从而,限制色格式设定部分364将压缩格式设定为限制色格式。 [0184] Thus, the limited color format setting section 364 is set to the compression format limited color format.

[0185]图11示出了待通过根据本示例性实施例的处理程序3处理的第四文档。 [0185] FIG. 11 shows a fourth document to be processed by the program according to the present exemplary embodiment 3 processing.

[0186]图11所示的第四文档是修理报告。 The fourth document shown in [0186] FIG. 11 is a repair reports. 该文档的背景是白色,以G表示的部分(打印部分)是蓝色,以H表示的部分是红色,而其他字符和线条是黑色。 Background of the document is white portion (printing portion) is blue represented by G, H represents the section to red, and the other is black characters and lines.

[0187]如果图11所示的文档类型不包含在图5A所示的文档判断信息中,则文档类型判断单元32将类型未指定信息输出到压缩格式确定部分350,并且压缩格式确定部分350确定压缩格式。 [0187] If the document type shown in FIG. 11 is not included in determining the document information shown in FIG. 5A, the document type determining unit 32 outputs information of the specified type is not determined to the compression format portion 350, and the compression format determining section 350 determines Compression format.

[0188]白色占据图11所示文档的50%或更多,因此最常出现色确定部分354分析频率分布图并且判定最常出现色是白色且白色在文档中的比例超过阈值(例如50%)。 [0188] FIG. 11 white occupy 50% or more of the document, so the most common color determination section 354 analyzes the frequency profile and determine the most common color is white and white in the document proportion exceeds a threshold value (for example 50% ).

[0189] 此外,在图11所示的文档中,包含色只有白色、黑色、蓝色和红色,因而包含色确定部分356判定包含色的数目小于阈值(例如5)。 [0189] Further, in the document shown in FIG. 11, only containing the color white, black, blue and red, and thus includes a color determination section 356 determines the number of color comprising less than a threshold (e.g. 5).

[0190]从而,限制色格式设定部分364将压缩格式设定为限制色格式。 [0190] Thus, the limited color format setting section 364 is set to the compression format limited color format.

[0191]图12示出了待通过根据本示例性实施例的处理程序3处理的第五文档。 [0191] FIG. 12 shows a fifth embodiment of a document processing program to be treated by the process according to this exemplary embodiment 3.

[0192]图12所示的文档包括例如估价单、设计图和修理报告等多个文档。 Document shown in [0192] FIG 12 includes a plurality of single valued documents, design and repair reports.

[0193]在这种情况下,如果多页面设置单元376设置第二页和随后的页面遵循已经为第一页设定的压缩格式,则按照已经为估价单设定的压缩格式压缩第二页和随后的页面。 [0193] In this case, if the multi-page setting unit 376 sets the second page and subsequent pages following the first page is already set in a compressed format, the format is already in a compressed estimate a single compressing the second page set and subsequent pages.

[0194]与此相反,如果多页面设置单元376设置要为每一页设定压缩格式,则利用已经为估价单、设计图和修理报告分别设定的压缩格式对各个页面执行压缩处理。 [0194] In contrast to this, if the multi-page setting unit 376 is provided for each page compression format is set, the compression format is already written estimate, design and repair reports are set for each page performs compression processing.

[0195]需要注意的是:颜色信息提取单元314和颜色分布计算器316可以仅仅在文档类型压缩格式关联部分348无法将压缩格式确定为一种压缩格式时才执行处理。 [0195] Note that: a color information extraction unit 314 and color distribution calculator 316 may simply be compressed in a format associated with the document type section 348 determines to perform the compression format can not be treated as a compression-only format.

[0196]为了解释和说明起见,已经提供了对于本发明的示例性实施例的以上描述。 [0196] For the sake of illustration and description has been provided for the above exemplary embodiments of the present invention will be described. 本发明并非意在穷举或将本发明限制在所披露的具体形式。 The present invention is not intended to be exhaustive or to limit the invention to the precise forms disclosed. 显然,许多修改和变型对于本领域的技术人员而言是显而易见的。 Obviously, many modifications and variations to those skilled in the art is obvious. 实施例的选取和描述是为了更好地解释本发明的原理及其实际应用,从而使本领域的其他技术人员能够理解本发明适用于各种实施例,并且本发明的各种变型适合于所设想的特定用途。 Example embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to understand the present invention is applicable to various embodiments and various modifications as are suited to the present invention the particular use contemplated. 本发明意在用前面的权利要求书及其等同内容来限定本发明的保护范围。 The present invention is intended to use the preceding claims and their equivalents to limit the scope of the present invention.

Claims (7)

  1. 1.一种图像处理装置,包括: 文档类型判断单元,其基于利用文档读取器获得的作为文档读取结果的读取信息来判断文档属于哪种文档类型,所述判断的过程包括: 检测所述读取信息中的字符串,并计算所检测到的字符串的位置; 判断特定字符串是否包含在所检测到的字符串中,并且如果判定特定字符串包含在所检测到的字符串中,则: 当所检测到的字符串中的所述特定字符串的位置位于围绕以与所述特定字符串相关联的位置信息表示的位置的区域内时,为所述特定字符串生成位置一致信息, 计算所检测到的字符串中的所述特定字符串的尺寸,并且当所检测到的字符串中的所述特定字符串的尺寸在以与所述特定字符串相关联的尺寸信息表示的尺寸的范围内时,为所述特定字符串生成尺寸一致信息,以及当已为所述特定字符串生成位置一致信息和尺寸 An image processing apparatus comprising: a document type determining unit which based on the use document reader read information obtained as a result of reading the document to determine which HTML document belongs, the judging process comprising: detecting in the read character string information, and calculates the position of the detected character string; determining whether a particular string in the character string detected, and if it is determined the specific character string contained in the character string detected , then: when the position of the specific string is to be detected is located within the position information about the position of the character string associated with the particular representation of the region, it is consistent with the position of the specific string generated information, calculating the size of the specific character strings in the character string detected, and when the size of the specific string is to be detected in the size of the information to the character string associated with the particular representation of the within the range of sizes for the particular size matching string generating information, and when the specific character string has to generate location information and size consistent 一致信息时,判定所述文档类型是与所述特定字符串相关的文档类型; 压缩格式设定单元,其基于由所述文档类型判断单元判断出的文档类型来设定用于从所述读取信息生成图像数据的压缩格式; 生成器,其利用由所述压缩格式设定单元设定的压缩格式压缩所述读取信息,以便生成与所述文档对应的图像数据;以及颜色信息提取单元,其从所述读取信息中提取颜色信息, 其中,所述压缩格式设定单元基于由所述颜色信息提取单元提取出的颜色信息来设定用于从所述读取信息生成图像数据的压缩格式,并且当所述读取信息中最常出现的色值的比例与在所述读取信息中从所述最常出现的色值起预定范围内的色值的比例之和超过阈值时,所述压缩格式设定单元基于所述颜色信息设定第一压缩格式,在所述第一压缩格式中,包含在所述读取信息中的颜色 When the same information, determining whether the document type is a character string associated with the particular type of document; compression format setting means, based on the determination by the document type determining unit sets a document type from the reader take compression format information generating image data; generator that utilizes compression format by the compression of the compression format setting unit sets the read information to generate image data corresponding to the document; and a color information extraction unit extracting color information from the information read, wherein the compression format setting section based on the color information of the color information extracted by said extraction means sets for generating image data from the read information compression format, and when the ratio of the read color ratio information in the most frequently occurring value with the read color information in the most frequently occurring value from the starting color value within a predetermined range exceeds a threshold value and the compression format based on the color information setting means sets a first compression format, in the first compression format, reading the information contained in a color 数目减少为预定的颜色数目。 Reducing the number of a predetermined number of colors.
  2. 2.根据权利要求1所述的图像处理装置,其中, 当所述文档类型判断单元没有判断出所述文档类型时,所述压缩格式设定单元基于由所述颜色信息提取单元提取出的所述颜色信息来设定压缩格式。 The image processing apparatus according to claim 1, wherein, when the document type determining means determines that the document type is not, the setting section based on the compression format extracted by the color information extraction unit said color information to set the compression format.
  3. 3.根据权利要求1所述的图像处理装置,其中, 当所述读取信息中最常出现的色值的比例与在所述读取信息中从所述最常出现的色值起预定范围内的色值的比例之和小于或等于所述阈值时,所述压缩格式设定单元基于所述颜色信息设定第二压缩格式,在所述第二压缩格式中使用的颜色数目大于在所述第一压缩格式中使用的颜色数目。 The image processing apparatus according to claim 1, wherein the predetermined range when the read color ratio information in the most frequently occurring value with the read color information in the most frequently occurring value from the starting when the ratio of the color values ​​is less than or equal to the threshold, the compression format setting unit sets the second compression format based on the color information, the number of colors used in the second compression format is greater than the said number of colors used in the first compression format.
  4. 4.根据权利要求1至3中任一项所述的图像处理装置,还包括: 多页面设置单元,当所述文档包括多个页面时,所述多页面设置单元设置是否要为所述多个页面中的每一页设定压缩格式, 其中,当所述多页面设置单元设置不用为所述多个页面中的每一页设定压缩格式时,所述生成器利用为第一页设定的压缩格式来生成与全部所述多个页面对应的图像数据。 The image processing apparatus according to any one of claims to 3, further comprises: whether the plurality of multi-page unit setting, when the document includes a plurality of pages, the multi-page setting unit pages each page in the set compression format, wherein when the setting unit sets the multi-page format is not compressed in the plurality of pages each page is set, the generator set for the first page using predetermined compression format to generate image data corresponding to all of the plurality of pages.
  5. 5.根据权利要求4所述的图像处理装置,其中, 当所述多页面设置单元设置要为所述多个页面中的每一页设定压缩格式时,所述文档类型判断单元为所述多个页面中的每一页判断文档类型,所述压缩格式设定单元为所述多个页面中的每一页设定压缩格式,并且所述生成器利用为所述多个页面中的每一页设定的压缩格式生成图像数据。 The image processing apparatus according to claim 4, wherein when the setting unit sets the multi-page format to be compressed in the plurality of pages each page is set, the document type determining unit is the a plurality of pages each page of the document type determination, the compression format setting unit of the plurality of pages each page set compression format, and for each of said generator by using said plurality of pages a set compression format to generate image data.
  6. 6.—种图像处理方法,包括: 基于作为文档读取结果获得的读取信息判断文档属于哪种文档类型,所述判断的过程包括: 检测所述读取信息中的字符串,并计算所检测到的字符串的位置; 判断特定字符串是否包含在所检测到的字符串中,并且如果判定特定字符串包含在所检测到的字符串中,则: 当所检测到的字符串中的所述特定字符串的位置位于围绕以与所述特定字符串相关联的位置信息表示的位置的区域内时,为所述特定字符串生成位置一致信息, 计算所检测到的字符串中的所述特定字符串的尺寸,并且当所检测到的字符串中的所述特定字符串的尺寸在以与所述特定字符串相关联的尺寸信息表示的尺寸的范围内时,为所述特定字符串生成尺寸一致信息,以及当已为所述特定字符串生成位置一致信息和尺寸一致信息时,判定所述文档类型是与所述特 6.- kinds of image processing method, comprising: what document type is determined based on the read information of the document as a document reading result obtained by the determination process comprising: detecting the character string information read, and calculates location string detected; determining whether a specific character string included in the character string detected, and if it is determined that the specific character string included in the character string detected, then: when the detected character string in the the position of said particular string is located within the position information about the position of the string associated with the particular representation of the region, as the particular character string generates location information is consistent, the calculation of the detected string particular string size, and the size of the specific character string when the character string detected within the range of dimensional information associated with the particular character string representation of the size of the specific string generated size matching information, and when the same information has been generating position and size matching information, it is determined that the specific character string and the document type is the Laid 定字符串相关的文档类型; 基于所判断出的文档类型设定用于从所述读取信息生成图像数据的压缩格式;以及利用所设定的压缩格式压缩所述读取信息,以便生成与所述文档对应的图像数据,并且所述方法还包括: 从所述读取信息中提取颜色信息;以及基于提取出的颜色信息来设定用于从所述读取信息生成图像数据的压缩格式,其中,当所述读取信息中最常出现的色值的比例与在所述读取信息中从所述最常出现的色值起预定范围内的色值的比例之和超过阈值时,基于所述颜色信息设定第一压缩格式,在所述第一压缩格式中,包含在所述读取信息中的颜色的数目减少为预定的颜色数目。 Document type associated with the given string; document based on the determined type of compression format is set for reading the image data from the information generation; using the read information and the set compression formats, to generate corresponding to the document image data, and the method further comprises: extracting color information from the read information; and based on the extracted color information is set for reading the information from the compressed format to generate image data wherein, when the proportion of the color values ​​in the color values ​​of the ratio of the read information with the most frequently occurring color in the read information from the most frequently occurring value from the predetermined range exceeds a threshold value and, based on the compression information setting a first color format, in the first compression format comprising a predetermined number of colors to reduce the number of the read color information.
  7. 7.—种图像处理装置,包括: 文档类型判断单元,其基于利用文档读取器获得的作为文档读取结果的读取信息来判断文档属于哪种文档类型,所述判断的过程包括: 检测所述读取信息中的字符串,并计算所检测到的字符串的位置; 判断特定字符串是否包含在所检测到的字符串中,并且如果判定特定字符串包含在所检测到的字符串中,则: 当所检测到的字符串中的所述特定字符串的位置位于围绕以与所述特定字符串相关联的位置信息表示的位置的区域内时,为所述特定字符串生成位置一致信息, 计算所检测到的字符串中的所述特定字符串的尺寸,并且当所检测到的字符串中的所述特定字符串的尺寸在以与所述特定字符串相关联的尺寸信息表示的尺寸的范围内时,为所述特定字符串生成尺寸一致信息,以及当已为所述特定字符串生成位置一致信息和尺寸 7.- image processing apparatus, comprising: a document type determining unit which based on the use document reader read information obtained as a result of reading the document to determine which HTML document belongs, the judging process comprising: detecting in the read character string information, and calculates the position of the detected character string; determining whether a particular string in the character string detected, and if it is determined the specific character string contained in the character string detected , then: when the position of the specific string is to be detected is located within the position information about the position of the character string associated with the particular representation of the region, it is consistent with the position of the specific string generated information, calculating the size of the specific character strings in the character string detected, and when the size of the specific string is to be detected in the size of the information to the character string associated with the particular representation of the within the range of sizes for the particular size matching string generating information, and when the specific character string has to generate location information and size consistent 一致信息时,判定所述文档类型是与所述特定字符串相关的文档类型; 颜色信息提取单元,其从作为文档读取结果获得的读取信息中提取颜色信息; 压缩格式设定单元,当所述文档类型判断单元没有判断出所述文档类型时,所述压缩格式设定单元基于由所述文档类型判断单元判断出的文档类型来设定用于从所述读取信息生成图像数据的压缩格式,当所述文档类型判断单元没有判断出所述文档类型时,所述压缩格式设定单元基于由所述颜色信息提取单元提取出的颜色信息来设定用于从所述读取信息生成图像数据的压缩格式;以及生成器,其利用由所述压缩格式设定单元设定的压缩格式压缩所述读取信息,以便生成与所述文档对应的图像数据,其中, 当所述读取信息中最常出现的色值的比例与在所述读取信息中从所述最常出现的色值起预定范围内的 When the same information, determining whether the document type is a character string associated with the particular document type; color information extraction unit that extracts color information from the read information as a result of the document reading obtained; compression format setting unit, when when the document type determining means determines that the document type is not, the setting section based on the compression format of the document type determining means determines that the document type is set for generating the image data read from the information compression format, when the document type determining means determines that the document type is not, the compression format setting section based on the extracted color information from the color information extraction unit is set for reading information from the compression format generating image data; and a generator, which utilizes compression format by the compression of the compression format setting unit sets the read information to generate image data corresponding to the document, wherein, when said read taking the ratio of the color information values ​​of the most frequently occurring color in the read information from the most frequently occurring value, within a predetermined range 值的比例之和超过阈值时,所述压缩格式设定单元基于所述颜色信息设定第一压缩格式,在所述第一压缩格式中,包含在所述读取信息中的颜色的数目减少为预定的颜色数目,并且在所述第一压缩格式中的颜色为限制色,所述限制色是指提取出的多种代表颜色,所述读取信息中颜色与预定的限制色不同的像素分别被所述生成器转换成色空间内与预定的限制色中的一种最接近的颜色。 And the values ​​of the ratio exceeds a threshold value, a compression format based on the color information setting means sets a first compression format, in the first compression format, to reduce the number contained in the read color information a predetermined number of colors, and in the first compression format is limited color in the color, the more limited color refers to the extracted representative color, the color information of the limited color different from the predetermined pixels of the reading respectively within the generator converts the color space of the coupler with one closest to the predetermined limit of the color.
CN 201110409732 2011-03-28 2011-12-09 The image processing apparatus and an image processing method CN102710887B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2011-069435 2011-03-28
JP2011069435A JP2012205181A (en) 2011-03-28 2011-03-28 Image processing device and program

Publications (2)

Publication Number Publication Date
CN102710887A true CN102710887A (en) 2012-10-03
CN102710887B true CN102710887B (en) 2016-12-14

Family

ID=

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6853755B2 (en) * 2001-03-28 2005-02-08 Sharp Laboratories Of America, Inc. Method and apparatus for adaptive compression of scanned documents

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6853755B2 (en) * 2001-03-28 2005-02-08 Sharp Laboratories Of America, Inc. Method and apparatus for adaptive compression of scanned documents

Similar Documents

Publication Publication Date Title
US20040220898A1 (en) Information processing apparatus, method, storage medium and program
US20140270536A1 (en) Systems and methods for classifying objects in digital images captured using mobile devices
US6347156B1 (en) Device, method and storage medium for recognizing a document image
US7272269B2 (en) Image processing apparatus and method therefor
US7386789B2 (en) Method for determining logical components of a document
US5821929A (en) Image processing method and apparatus
US20040213458A1 (en) Image processing method and system
US20070253620A1 (en) Automated method for extracting highlighted regions in scanned source
US20020006220A1 (en) Method and apparatus for recognizing document image by use of color information
US20070237401A1 (en) Converting digital images containing text to token-based files for rendering
US7623712B2 (en) Image processing method and apparatus
US20060221357A1 (en) Information processing apparatus and method
US20020003897A1 (en) Apparatus and method for image-processing and computer program product for image-processing
US20110252315A1 (en) Image processing device, image processing method and non-transitory computer readable storage medium
US7433517B2 (en) Image processing apparatus and method for converting image data to predetermined format
US20070230810A1 (en) Image-processing apparatus, image-processing method, and computer program used therewith
JP2004234656A (en) Method for reformatting document by using document analysis information, and product
US20110167081A1 (en) Image processing apparatus and image processing method
US7106330B2 (en) Drawing comparison apparatus
US7376272B2 (en) Method for image segmentation to identify regions with constant foreground color
US20100171999A1 (en) Image processing apparatus, image processing method, and computer program thereof
US7860266B2 (en) Image processing system and image processing method
US20100232690A1 (en) Image processing apparatus, image processing method, and computer program
US20030039394A1 (en) Image processing device, image processing method, image processing program, and computer readable recording medium on which image processing program is recorded
US20070133031A1 (en) Image processing apparatus and image processing method