JP2004287865A

JP2004287865A - Image formating apparatus and method

Info

Publication number: JP2004287865A
Application number: JP2003079310A
Authority: JP
Inventors: Masayoshi Sakakibara; 正義榊原; Seizoku Go; 青粟呉; Masatoshi Tagawa; 昌俊田川
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-03-24
Filing date: 2003-03-24
Publication date: 2004-10-14

Abstract

<P>PROBLEM TO BE SOLVED: To accelerate printing in a device for printing the document of HTML or the like capable of describing data incorporated in an output image in the form of reference to the outside. <P>SOLUTION: A data acquisition part 12 downloads the display content data of respective document elements detected by the analysis of the document by a document analysis part 13, and a layout part 15 performs the processing of allocating the respective document elements to pages. In the processing, at the time of laying out the document element for which downloading by the data acquisition part 12 is not completed on the page, a decision part 16 estimates the display size of the document element from the already downloaded data amount of the document element or the like, judges whether or not the estimated size can be settled in the free area of the page, and when determining that it can not be settled, decides to lay out the document element on the next and succeeding pages and establishes the layout of the page by the document elements immediately before that. Every time the layout of the page is established, printing of the page is started. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術の分野】
本発明は、文書を構成する各文書要素の表示内容データを当該文書内の記述又は外部から取得して出力画像を生成し画像形成する装置に関する。
【０００２】
【従来の技術】
近年、ＨＴＭＬやＸＭＬ、ＸＨＴＭＬ−ｐｒｉｎｔなどのハイパーテキスト文書を印刷する機会がますます増大しており、プリンタがハイパーテキスト文書を受け入れて印刷できるようになりつつある。例えば特許文献１には、プリンタがハイパーテキスト文書や、その文書を構成する各文書要素のデータをインターネット上の各サーバから取得して印刷を行う装置が開示されている。このようにハイパーテキスト文書のデータをプリンタ自身がダウンロードして印刷する処理は、プルプリントとも呼ばれる。
【０００３】
ハイパーテキスト文書を印刷する場合、プリンタはその文書を先頭から順に解析していき、出力に組み込むデータのソースファイル名等の記述を検出すると、そのファイルの取得要求をインターネット上のサーバに送信する。そして、この要求に応じてサーバから送られてきたファイルをビットマップに展開して出力画像に組み込み、この出力画像を用紙等の所定媒体に印刷する。
【０００４】
例えば特許文献２に述べられているように、ハイパーテキスト文書を構成するすべての文書要素のデータが集められると、各要素をページ内にマッピングを決定すると共に改ページ処理を行い、その後にページ情報を作成する。
【０００５】
ここで、ハイパーテキスト文書を構成する各文書要素のデータは、インターネット上の異なる複数のサーバに蓄積されているのが一般的であり、データの取得にはある程度の時間がかかる。また、データの取得に要する時間は、サーバの応答時間等によって様々に変わるので、インターネット上に取得要求を発行した順にデータが取得できるとは限らない。このため、例えば前のページに組み込まれる要素のデータよりも後のページに組み込まれる要素のデータの方が先に取得されることもしばしばである。
【０００６】
印刷では、先頭ページから順に出力していく必要があるので、後ろのページのデータが先に取得できたとしても、その時点でこれを印刷出力することはできないため、そのデータはそれより前のページのデータがすべて揃うまではプリンタ内に保持しておかなければならい。このため、従来の装置では、一般的にハイパーテキスト文書の印刷に必要なデータ全体をプリンタにダウンロードしてから印刷の処理を行っていた。このため、従来は、大きいサイズのハイパーテキスト文書を印刷するには、プリンタ内に大容量の記憶装置を設ける必要があり、コスト高を招くという問題があった。
【０００７】
この問題を解決するために、文書に含まれるすべての文書要素が取得できていなくても、ページを構成するすべての要素が取得できたページから順に印刷を行い、印刷が終了したページを構成するデータを削除することにより、必ずしもすべてのデータを保持することなく、またすべてのデータの取得が完了するまで処理を待つことなく印刷を行うことが可能となる。
【０００８】
例えば図１（Ａ）に示すＨＴＭＬ文書を印刷する場合を考える。この場合、ＨＴＭＬ文書を取得し、さらに当該ＨＴＭＬ文書が含んでいる画像をサーバから取得する。すべての要素の取得が完了したら先頭からページへのレイアウトを行う。
【０００９】
このレイアウト処理では、まずＨＴＭＬ文書の先頭の文書要素である「テキスト部分・・・」という内容を持つテキストを、図１（Ｂ）の１ページ目の領域Ｂ１にレイアウトする。次に”＜ＩＭＧＳＲＣ＝…＞”のタグで示される画像の割当てを行い、これを１ページ目の領域Ｂ２にレイアウトする。次に「画像に続くテキスト・・・」という内容を持つテキストの割当てを行い、１ページ目の領域Ｂ３にレイアウトする。しかし、このテキストの文書要素は１ページ目にすべてを割り当てることができないため、さらに２ページ目の領域Ｂ４にレイアウトする。
【００１０】
図１（Ｃ）は、取得した画像の表示サイズが（Ｂ）の場合よりも大きい場合のレイアウト例である。先程の例と同様にテキスト領域Ｃ１および画像領域Ｃ２を割り当てる。次に「画像に続くテキスト・・・」という内容を持つテキストの割当てを行うが、１ページ目にレイアウト可能な空き領域がないため、２ページ目の先頭から領域Ｃ３にレイアウトする。
【００１１】
図１（Ｄ）は、取得した画像の表示サイズが更に大きい場合の例である。先程の例と同様にテキスト領域Ｄ１を割り当てる。次に画像領域の割当てを行うが、領域Ｄ２に示すように画像が先程より大きく、１ページ目の空き領域に入らないため、改ページされ２ページ目の先頭の領域Ｄ３にレイアウトする。「画像に続くテキスト・・・」という内容を持つテキストは領域Ｄ３の後に続く領域Ｄ４にレイアウトする。
【００１２】
このような処理を行い、例えば図１の（Ｃ）のようにレイアウトされたページの印刷が終了したら、先頭要素である「テキスト部分・・・」という内容を持つテキストおよび次の要素である画像に関するデータを破棄して、２ページ目以降の処理のために記憶領域を再利用しながら処理を進めることができる。
【００１３】
【特許文献１】
特開平１１−１３４１２５号公報
【特許文献２】
特開２０００−０６６８６７号公報
【００１４】
【発明が解決しようとする課題】
上述の方法では、画像（Ｂ２，Ｃ２，Ｄ２）の取得が完了してその表示サイズがわかってから、その画像を１ページ目に割り付けるか、２ページ目に追い出すかを判定している。このため、結果的に画像が２ページ目にレイアウトされる場合でも、その画像の取得が完了するまで１ページ目のレイアウトおよび印刷が待たされる。
【００１５】
このように、上述の方法では、ページと次のページとの境界に位置する文書要素のデータの取得が完了するまでは、その文書要素を前のページと次のページのどちらにレイアウトするか決まらない。このため、前のページのレイアウト確定は、その文書要素のデータの取得完了まで待たねばならず、その結果印刷処理の開始も待たされることになっていた。
【００１６】
本発明は、このような課題を解決するため、ハイパーテキスト文書をより高速に印刷することが可能な画像形成装置及び方法を提供することを目的としている。
【００１７】
【課題を解決するための手段】
本発明に係る装置は、複数の文書要素を含む文書の記述を解析する解析手段と、前記解析手段による解析の際に検出した各文書要素の表示内容データを取得するデータ取得手段と、前記解析手段の解析結果に基づき、前記各文書要素の各出力ページへのレイアウトを決定するレイアウト手段であって、各文書要素のレイアウトを順に決定していく際に、出力ページにレイアウトしようとする文書要素の表示内容データが取得未了の場合、その表示内容データの表示サイズを推定し、推定した表示サイズがその出力ページの未レイアウト領域のサイズに収まらない場合、その文書要素を次ページにレイアウトすることとし、その文書要素の直前の文書要素までで当該出力ページのレイアウトを確定するレイアウト手段と、前記レイアウト手段によりレイアウトが確定した出力ページの画像データを、その出力ページに含まれる各文書要素の表示内容データとそのレイアウトとに基づき生成し、媒体上に画像形成する出力手段と、を備える。
【００１８】
本発明の好適な態様では、前記レイアウト手段は、前記取得未了の表示内容データの表示サイズを推定するに際し、当該表示内容データのデータ種別をその推定の基礎情報として用いる。
【００１９】
別の好適な態様では、前記レイアウト手段は、出力ページにレイアウトしようとする文書要素の表示内容データが取得未了の場合、その表示内容データのうち現時点までにデータ取得が済んでいる部分のデータサイズからその取得済み部分の表示サイズを推定し、その取得済み部分の推定表示サイズがその出力ページの未レイアウト領域のサイズに収まらない場合、その文書要素を次ページにレイアウトすることとし、その文書要素の直前の文書要素までで当該出力ページのレイアウトを確定する。
【００２０】
本発明の別の態様では、複数の文書要素を含む文書の記述を解析する解析手段と、前記解析手段による解析の際に検出した各文書要素の表示内容データを取得するデータ取得手段と、前記解析手段の解析結果に基づき、前記各文書要素の各出力ページへのレイアウトを決定するレイアウト手段であって、各文書要素のレイアウトを順に決定していく際に、出力ページにレイアウトしようとする文書要素の表示内容データが取得未了の場合、未レイアウト領域のサイズまたは未レイアウト領域の縦横比に基づいて、その文書要素を次ページにレイアウトすることとし、その文書要素の直前の文書要素までで当該出力ページのレイアウトを確定するレイアウト手段と、前記レイアウト手段によりレイアウトが確定した出力ページの画像データを、その出力ページに含まれる各文書要素の表示内容データとそのレイアウトとに基づき生成し、媒体上に画像形成する出力手段とを備える。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態（以下実施形態という）について、図面に基づいて説明する。
【００２２】
図２は、本発明の実施形態に係るプリンタのハードウエア構成の一例を概略的に示す図である。
【００２３】
このプリンタは、ＣＰＵ（中央演算処理装置）２０１、ＲＯＭ（リード・オンリー・メモリ）２０２、ＲＡＭ（ランダム・アクセス・メモリ）２０３、操作パネル２０４、ＬＡＮ（ローカル・エリア・ネットワーク）インタフェース２０５、プリントエンジン２０６及びハードディスクドライブ（ＨＤＤ）２０７を備えている。
【００２４】
ＣＰＵ２０１は、ＲＯＭ２０２に格納された各種制御プログラムを実行することで本プリンタの各部を制御し、本プリンタの処理動作を実現する。この制御プログラムの中には、ＨＴＭＬやＸＨＴＭＬ−ｐｒｉｎｔなどの文書をページ分割して印刷するための制御動作を記述したものが含まれる。このページ分割の処理手順については、後で詳しく説明する。ＲＡＭ２０３は、ＣＰＵ２０１による各種プロブラムの実行時の作業メモリ領域（ワークメモリ）として利用され、例えばプリンタ受信データ（イメージデータやＨＴＭＬデータなど）やこれを展開したビットマップデータなどが格納される。操作パネル２０４は、プリンタに対し印刷条件を設定するためのユーザインタフェース装置であり、例えばタッチパネル式液晶ディスプレイや各種入力ボタンなどを有している。ＬＡＮインタフェース２０５は、ＬＡＮ上の装置と通信したり、ＬＡＮを介してインターネット上の装置と通信したりするための通信処理を行う。プリントエンジン２０６は、ＣＰＵ２０１の指示に従って印刷対象のビットマップデータを受け取り、用紙等の媒体に印刷する。ハードディスクドライブ（ＨＤＤ）２０７は、書き換え可能な不揮発性の記憶装置であり、ユーザやサービスエンジニアが設定した各種設定データやプログラムなどが格納される他、ＲＡＭ２０３から溢れたデータの待避領域として用いられる。
【００２５】
図３は、このプリンタ３０２が適用されるシステムの構成例を示す図である。この例では、ＬＡＮ３０３上にクライアントＰＣ（パーソナルコンピュータ）３０１とプリンタ３０２とが接続されている。クライアントＰＣ３０１とプリンタ３０２は、ＬＡＮ３０３を介してインターネット３０４に接続されている。インターネット３０４上には、サーバ３０５が存在する。
【００２６】
図３のシステムでは、例えばクライアントＰＣ３０１や図示しないモバイル装置がプリンタ３０２に対してＨＴＭＬ文書の印刷要求と、印刷対象のＨＴＭＬ文書データを送信すると、これを受信したプリンタ３０２は、そのＨＴＭＬ文書データを解析し、ビットマップデータを生成して媒体に印刷する。ここで、ＨＴＭＬ文書の中にはＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）等により示されるオブジェクト（文書要素）が含まれる場合があり、このような場合プリンタ３０２は、そのＵＲＬ等に従って、そのオブジェクトが存在するインターネット３０４上のサーバ３０５からそのオブジェクトのデータをダウンロードし、これを用いてビットマップデータを生成する。以上のように、ＨＴＭＬ文書をそのままプリンタ３０２に送って印刷させる方式はダイレクトプリント方式とも呼ばれる。また、このほか、プリンタ３０２の操作パネル２０４からＨＴＭＬ文書のＵＲＬを直接入力したり、クライアントＰＣ３０１や図示しないモバイル装置からプリンタ３０２に対してＨＴＭＬ文書のＵＲＬを送信したりして、プリンタ３０２に印刷指示を行うことも可能である（プルプリント）。この場合、プリンタ３０２は、入力されたＵＲＬに従ってインターネット３０４上からそのＨＴＭＬ文書をダウンロードし、印刷処理を行う。
【００２７】
本実施形態のプリンタ３０２は、上記のように他のサーバに存在するデータをダウンロードし、出力画像に組み込んで印刷する際に、ページを構成する文書要素中に表示内容データの取得が未了（以下「未取得」という）のものがある場合、この要素を当該ページから除外するか否かを決定することにより、未取得要素の取得待ちを抑制し、より高速な印刷を実現している。
【００２８】
プリンタ３０２における文書データ処理機構１０を、図４を参照して説明する。図４の処理機構１０は、ＲＯＭ２０２やＨＤＤ２０７に格納されたプログラムや設定データをＣＰＵ２０１で実行することにより実現される。
【００２９】
この処理機構１０において、主制御部１１は、この処理機構１０の全体的な制御を行う手段である。データ取得部１２は、主制御部１１からのデータ取得要求に応じて、ＬＡＮ２０やインターネット等のネットワークを介してデータを取得する手段である。文書解析部１３は、入力されるＨＴＭＬ文書やＨＴＭＬ文書を含むＭＩＭＥ（ＭｕｌｔｉｐｕｒｐｏｓｅＩｎｔｅｒｎｅｔＭａｉｌＥｘｔｅｎｓｉｏｎｓ）文書を解析して個々の文書要素を検出し、それら文書要素群が構成する論理構造を求める。文書構成情報管理部１４は、検出された文書要素や論理構造の情報に基づき、文書構成情報を作成して管理する（詳細は後述）。
【００３０】
ここで、文書要素の中には、その内容のデータ（印刷結果に示される内容という意味で、表示内容データと呼ぶ）がＨＴＭＬ文書と共にＭＩＭＥ文書内にすでに含まれているものもあれば、外部参照のもの（すなわちＨＴＭＬ文書の当該要素の記述には、表示内容データへのリンク情報（例えばＵＲＬ）が含まれる）もある。後者については、文書解析部１３は、主制御部１１を介して、データ取得部１２にその表示内容データのダウンロードを要求する。ここで、ＨＴＭＬ文書のデータや、ＨＴＭＬ文書やＭＩＭＥ文書を先頭から順に解析していく際に検出した各文書要素の表示内容データ、この解析の進行に従って逐次作成、更新される文書構成情報等の各種データは、ワークメモリであるＲＡＭ２０３に格納される。なお、ＨＴＭＬ文書を含むＭＩＭＥ文書内又は外部からのダウンロードにより得られる表示内容データは、ＨＴＭＬやページ記述言語等の記述言語で記述されていたり、符号化やデータ圧縮が施されていたりすることも少なくない。このような場合は、表示内容データをプリントエンジン２０６で取扱可能なイメージデータ形式（例えばビットマップイメージ）に展開する必要がある。このような展開処理をソフトウエアベースで行う場合は、この処理のための作業領域にＲＡＭ２０３が利用される。
【００３１】
レイアウト部１５は、文書構成情報や各文書要素の表示内容データ（或いはそれを展開したデータ）を元に、各文書要素が表示される領域を、先頭ページから順に決定していく。すなわち、先頭から順に各文書要素の表示領域をページに割り付けていく。これがレイアウト処理である。ここで決定されたレイアウトをもとに、描画部（図示省略）は先頭ページから順に各ページのページイメージを、ＲＡＭ２０３とは別に設けられているページバッファ（図示省略）上に生成する。このようにして生成されたページイメージが、プリントエンジン２０６に渡され、用紙上に印刷される。
【００３２】
ここで、各ページのレイアウト時に、そのページを構成する文書要素の一部が取得できていないため、レイアウトを確定できない場合がある。このような場合への対処のために決定部１６が設けられている。決定部１６は、未取得の文書要素をレイアウトするために取得完了を待つか、あるいはこの文書要素を割当て対象から除外して当該ページのレイアウトを継続するかを決定する。この決定処理は、取得済の文書要素群を途中までレイアウトした結果や、前述の文書構成情報などに基づき行われる。
【００３３】
表示領域推定部１７は、取得処理中（すなわち取得は完了していない）の文書要素に関してその表示サイズを推定するための処理モジュールである。表示領域推定部１７による推定結果は、決定部１６において未取得要素の取得待ちをするか否かを決定するのに使用される。例えば、ページの中で既に取得した文書要素群をレイアウトした後に残る残領域（この残領域が、取得未完了の文書要素をレイアウトする領域の候補となる）よりも、表示領域推定部１７で推定されたサイズの方が大きい場合は、当該文書要素をこのページにはレイアウトしないものと決定した上で、ページレイアウトが継続される。
【００３４】
図５は、この文書データ処理機構１０における、文書解析部１３及び文書構成情報管理部１４の処理手順を示すフローチャートである。
【００３５】
例えばプルプリントの場合は、クライアントＰＣ３０１や操作パネル２０４から印刷対象のＨＴＭＬ文書データのＵＲＬが入力されると、データ取得部１２がそのＵＲＬに示される文書データの取得要求を、例えばＨＴＴＰ１．１のＧＥＴ命令の形でＬＡＮ２０上に発行し、この取得要求に応じてＬＡＮ２０やこれにつながるインターネット上のサーバから返信されてくる文書データを取得する（Ｓ１００）。ここで、取得された文書データはＲＡＭ２０３に格納される。文書データを取得すると、文書構成情報管理部１４に、その文書の情報を登録する（情報内容については図６，７を参照して後で説明する）（Ｓ１０１）。
【００３６】
次に文書解析部１３が、その文書データを先頭から順に解析していく（Ｓ１０２）。この解析では、文書のデータを順に構文解析していき、文書の最後に達していなければ（Ｓ１０３）、タグ検出により文書要素を見つけるまでデータ解析を進める（Ｓ１０４）。ここで文書要素を見つけると、この要素の情報を文書構成情報管理部１４に登録するとともに、すでに登録されている文書や文書要素の情報の中で、この新たに登録した文書要素に関連するものに、この新規登録の要素の情報を反映させる（Ｓ１０５）。
【００３７】
ここで、文書要素が、当該ＨＴＭＬ文書内に組み込まれたテキストである場合は、ＨＴＭＬ文書に含まれるそれらテキストが、その文書要素に対応する表示内容データと認識され、表示内容データの所在を示す情報が文書構成情報管理部１４に登録される。例えば、ＨＴＭＬ文書に組み込まれたテキストの文書要素は、文書要素を検出した際には、その表示内容データはすでにＲＡＭ２０３上にロードされている。
【００３８】
一方、そのＨＴＭＬ文書において、文書要素がその実体データ（表示内容データ）へのリンク情報の形で文書要素が記述されている場合（例えば、文書に組み込むべき画像のソースのＵＲＬがＩＭＧタグ内に記述されている場合など）は、その文書要素のデータをダウンロード等により取得する必要がある。そこで、文書解析部１３は、検出した文書要素の記述中に、他の場所に存在するデータへのリンク情報（例えばＩＭＧタグ内に記述された画像データのＵＲＬ）があるかどうかを判定し（Ｓ１０６）、そのようなリンク情報に従ってデータの取得処理を行う（Ｓ１０７）。このとき、リンク情報が外部のＵＲＬである場合は、データ取得部１２が、そのＵＲＬの指し示すデータをインターネット上のサーバからダウンロードする。また、リンク情報が、そのＨＴＭＬ文書を含むＭＩＭＥ文書、あるいはＸＨＴＭＬ−ｐｒｉｎｔ文書内にコード情報として組み込まれたイメージデータ（インライン・イメージとよばれる）を指している場合には、ＭＩＭＥ文書あるいはＸＨＴＭＬ−ｐｒｉｎｔ文書の解析を進め、そのインライン・イメージのコードデータが見つかるのを待つ。
【００３９】
ここで、図５は、１つの文書データ（ここでは、ＨＴＭＬデータのみならず、その中で参照される画像等のデータも、１つの文書データとして扱う）を取得したときに行う一連のプロセスを示している。したがって、Ｓ１０７によりリンク先データの取得が要求されると、そのデータについて、図５に示すプロセスが別途開始され、取得要求元のプロセス等と並行的（コンカレント）に実行される。図５中の破線によるループはこのことを示している。データをインターネット上から取得する場合もあるので、各取得対象のデータがいつ取得できるか分からないため、このように各データを非同期的に取得する構成となっている。Ｓ１０７で取得要求を行ったリンク先のデータについては、そのデータの取得のためのプロセスにより、データ取得処理が行われて表示内容データが取得され、この表示内容データに関する情報が文書構成情報管理部１４に登録される。
【００４０】
次にレイアウト部１５において、文書構成情報管理部１４に登録済の文書要素のレイアウト処理を行う（Ｓ１０８）。本処理の詳細については後で説明する。
【００４１】
次に図６及び図７を参照して、文書構成情報管理部１４が管理する文書構成情報と、その作成手順について説明する。
【００４２】
図６において、（ａ）は入力されるＨＴＭＬ文書の記述内容を概略的に示すと共に、各記述と、文書構成情報管理部１４におけるデータ管理上の各項目との対応関係を示している。また、（ｂ）は、（ａ）のＨＴＭＬ文書についての、文書構成情報管理部１４におけるデータ管理構造を示している。また、図７は、文書構成情報管理部１４が管理する文書構成情報の、ある時点におけるスナップショットを概念的に示したものである。
【００４３】
図６（ａ）に示すＨＴＭＬ記述全体が一つの文書データを構成する。この文書データにＤ−００１という一意的な識別情報が付与され、これに対応する管理情報（図７の（ａ１）参照）が作成される。この管理情報は、Ｄ−００１という識別情報で検索できる形で、ＲＡＭ２０３に格納される。なお、識別情報Ｄ−００１は、ユーザやクライアントＰＣ３０１から印刷対象のＵＲＬ”ｈｔｔｐ：／／ｘｘｘ．ｙｙｙ．ｊｐ／ｉｎｄｅｘ．ｈｔｍ”がプリンタ３０２に入力された時に文書構成情報管理部１４が付与する。そして、データ取得部１２がこのＵＲＬに対応するＨＴＭＬ文書のデータをダウンロードすると、文書構成情報管理部１４が、その文書データの管理情報を作成し、その時点で判明しているその文書データに関する情報をその管理情報に登録する。
【００４４】
例えば、図７（ａ１）では、ＨＴＭＬ文書Ｄ−００１の管理情報として、（１）識別情報（ＤＩＤ：ドキュメントＩＤ）、（２）当該文書のデータ名（ファイル名）、（３）当該文書データの所在場所を示すパス（ファイル名とともに、ＵＲＬから抽出できる）、（４）当該文書データのデータサイズ（ダウンロード完了時に分かる）、（５）取得時間（ダウンロードに要した時間）、（６）データ要素（当該文書内の先頭に現れる文書要素の管理情報へのポインタ）、の６つの項目からなる管理情報を作成する。ＨＴＭＬ文書の場合、第６項目「データ要素」以外はダウンロード完了時点で判明しているので、これらの値が管理情報に登録される。
【００４５】
さて、このＨＴＭＬ文書の本体部分（＜ＢＯＤＹ＞〜＜／ＢＯＤＹ＞）を先頭から構文解析していくと、まずテキストデータからなる文書要素（エレメント）が検出される。このとき、文書構成情報管理部１４は、この要素にＥ−００１００１という一意的な識別情報を付与し、これに対応する管理情報（図７の（ｂ１）参照）を作成する。この管理情報は、Ｅ−００１００１という識別情報で検索できる形で、ＲＡＭ２０３に格納される。
【００４６】
例えば、図７（ｂ１）では、文書要素Ｅ−００１００１の管理情報として、（１）識別情報（ＥＩＤ：エレメントＩＤ）、（２）当該文書要素の要素種（テキスト、テーブル、リスト、画像、リンク（参照情報）など）、（３）当該文書要素の描画サイズ（内容データが確定すると分かる）、（４）次要素（文書データ中で当該文書要素の次に現れる文書要素の管理情報へのポインタ）、（５）内容（当該文書要素の表示内容データへのポインタ）、の５つの項目からなる管理情報を作成する。このうち、要素種は、このテキスト文書要素Ｅ−００１００１を検出した時点で判明しているので、この時点で管理情報に登録される。また、テキストの場合、レイアウト条件（ページサイズやマージン量、フォントの種類やサイズ、字送りなど。ＣＳＳ：ＣａｓｃａｄｉｎｇＳｔｙｌｅＳｈｅｅｔｓ等により更に詳細に規定される場合もある）が予め指定されていれば、テキストデータが取得できた時点で描画サイズを求めることができるので、描画サイズの値も登録することができる。
【００４７】
また、この要素Ｅ−００１００１の内容であるテキストデータは、ＨＴＭＬ文書Ｄ−００１内に含まれており、この要素Ｅ−００１００１を認識（例えば終了タグを認識するなど）した段階ですでに得られているので、文書構成情報管理部１４は、この内容データ（テキスト）に対してＢ−００１００１という識別情報（ＢＩＤ：ブロックＩＤ）を付与し、この内容データを識別情報Ｂ−００１００１で検索可能な形でＲＡＭ２０３に格納する（図７の（ｃ１）参照）。そして、この要素Ｅ−００１００１の管理情報の第５項目「内容」には、この内容データへのポインタ（Ｂ−００１００１）が登録される（図７の（ｂ１）の参照）。
【００４８】
なお、要素Ｅ−００１００１の管理情報のうち、「次要素」は、文書データＤ−００１の解析が進まないと判明しないので、空欄にしておく。
【００４９】
このようにして文書データＤ−００１から、先頭の文書要素Ｅ−００１００１が検出されると、文書構成情報管理部１４は、当該文書データＤ−００１の管理情報の第６項目「データ要素」に対し、この先頭要素Ｅ−００１００１へのポインタを登録する（図７の（ａ１）参照）。
【００５０】
以上、文書中の先頭の要素Ｅ−００１００１が検出された時の処理を示した。この後更にＨＴＭＬ文書Ｄ−００１の解析を進めていくと、次に＜ＩＭＧ＞タグによって示される、画像を内容とする文書要素が検出される。すると、文書構成情報管理部１４は、この要素に対しＥ−００１００２という一意的な識別情報を付与し、これに対応する管理情報（図７の（ｂ２）参照）を作成する。また、この時点で、要素Ｅ−００１００１の次要素Ｅ−００１００２が判明したので、要素Ｅ−００１００１の管理情報にこの次要素へのポインタを登録する。
【００５１】
ここで、この要素Ｅ−００１００２の内容は、リンク記述（この例ではＵＲＬ”ｈｔｔｐ：／／ａａａ．ｂｂｂ．ｊｐ／ｉｍａｇｅ．ｊｐｇ”）の形で示されているので、管理情報の「要素種」には、要素内容がリンクであることを示す情報が登録され、このリンクに従ってデータ名やパスが登録される。また「描画サイズ」は、リンク先の情報が取得できるまで分からないので、この時点では「（未確定）」となっている。
【００５２】
このように内容データが、外部のデータへのリンクで示される文書要素が検出されると、前述のようにリンク先データのダウンロードのためのプロセスが開始される（図５のＳ１０７）。このとき文書構成情報管理部１４は、ダウンロード対象のデータに対して一意的な識別情報Ｄ−００２を付与して管理情報を作成し、上記プロセスにより、データがダウンロードできると、そのデータに関する情報を管理情報に登録する（図７の（ａ２）参照）。このように、本実施形態では、プリンタ３０２の外部からダウンロードするデータと、そのＨＴＭＬ文書の記述に含まれる文書要素とを概念上区別し、前者には識別情報ＤＩＤを与えて文書データとして扱い、後者には識別情報ＥＩＤを付与して要素として扱う。したがって、文書要素Ｅ−００１００２のリンクに従ってダウンロードする画像データについても、まず文書データとしての識別情報Ｄ−００２が付与され、管理情報（ａ２）が作成される。ダウンロードが終了するまでは、「サイズ」と「取得時間」は「（未確定）」になっている。
【００５３】
そして、このダウンロードするデータＤ−００２は、画像データであるので、文書構成情報管理部１４は、この画像データに対して一意な識別情報Ｅ−００１００３を付与して文書要素として扱い、管理情報（図７の（ｂ３））を作成する。この要素Ｅ−００１００３は、「要素種」が画像であることはこの時点で分かっているが、描画サイズについては圧縮データの展開等が完了するまでは分からないため、管理情報における「描画サイズ」は「（未確定）」としている。また、「内容」についても、ダウンロードが終わるまで確定しないので、「（未取得）」としている。これらの「描画サイズ」や「内容」は、ダウンロードが終了した段階で、登録される。
【００５４】
ダウンロードが終了すると、ダウンロードしたデータサイズと所用時間が確定するため、これらを識別情報Ｄ−００２の「サイズ」と「取得時間」に登録する。ダウンロードされた画像データの実体はブロックＩＤ（ＢＩＤ）で識別可能な情報として格納される。Ｅ−００１００３の「内容」にはこのブロックＩＤが登録される。また、ダウンロードしたデータの属性情報をもとに画像のサイズを確定し、「描画サイズ」に登録する。
【００５５】
なお、文書要素のリンク記述が、ＨＴＭＬ文書の中に組み込まれたデータ（インライン・イメージなど。以下、インライン・データと呼ぶ）を指している場合も、上記と同様、そのリンク先のデータを起点とする、文書データ、文書要素データ、及びデータ実体、の階層構造の管理情報群で管理を行うことができる。この場合、文書データレベルの管理情報の「データ名」には、例えばリンク記述に示されたそのデータの識別情報を登録すればよい。ただしこの場合、ダウンロードは行う必要がないので、文書データレベルの管理情報を作成しないですますことも可能である。なお、リンク先がＨＴＭＬ文書の外か内かの判断は、基本的には、当該リンク先がＵＲＬの形で記述されている場合は外、ＭＩＭＥ文書やＸＨＴＭＬ−ｐｒｉｎｔ文書で添付データへの参照を示す場合に用いられる「ｃｉｄ：」（コンテントＩＤ）の形で記述されている場合は内、と判断できる。
【００５６】
さて、文書データＤ−００１の解析に戻ると、文書要素Ｅ−００１００２の次に更に＜ＩＭＧ＞タグによって示される文書要素が検出されるので、文書構成情報管理部１４はこの要素を識別情報Ｅ−００１００４（図７の（ｂ４））で管理すると共に、このタグ中のリンク先からダウンロードするデータを識別情報Ｄ−００３で管理する。また、要素Ｅ−００１００２の管理情報には、この次要素Ｅ−００１００４へのポインタを登録する。そして、このデータＤ−００３中の画像データを識別情報Ｅ−００１００５の文書要素として管理し（図７の（ｂ５））、更にこの要素については画像データの実体がこの時点で存在するので、この実体データを識別情報Ｂ−００１００２で管理する（図７の（ｃ２））。
【００５７】
そして、更に文書データＤ−００１の解析を進めると、要素Ｅ−００１００４の次にテキスト要素が検出されるので、この要素を識別情報Ｅ−００１００６で管理し、この管理情報に対し、そのテキストの実体データＢ−００１００ｘへのポインタを登録する（図７の（ｂ６））。
【００５８】
以上、図６（ａ）に例示したＨＴＭＬ文書の解析と、これに従った文書構成情報の構築過程を、具体的な文書構成情報の例である図７を参照して説明した。この図７に示した各文書データ、文書要素の管理情報の集まりが、文書構成情報である。図６（ｂ）は、図６（ａ）の解析に従って作成又は取得される管理情報や実体データ相互の関係を矢印で示したものであり、実線のブロックが文書データ及び文書要素の管理情報や実体データを示し、破線のブロックはＨＴＭＬ文書外へのリンクを示している。
【００５９】
このように、図５に示す解析処理に従って文書構成情報が作成され、解析処理やダウンロード処理の進行に従ってその文書構成情報が更新されていく。本実施形態では、このような解析処理と並行して、主制御部１１及びレイアウト部１５が印刷出力のためのページレイアウト処理を実行する。この処理を、図８を参照して説明する。
【００６０】
図８はレイアウト部１５におけるレイアウト処理の一例を示すフローチャートである。本処理は、図５のＳ１０８のように文書解析処理の途中に実行することもできるが、取得処理中の文書要素の表示内容データの一部を取得した際に呼出して実行することもできる。また、定期的に本処理を呼び出す構成も可能である。
【００６１】
図８の処理は、印刷対象の文書に含まれるすべての文書要素のレイアウトを終了せずに中断することもある。中断時には処理経過に関する情報が保持され、本処理が再度呼出された際、保持された情報をもとにレイアウトを継続する。
【００６２】
この処理が呼び出されると、まずレイアウト部１５は、処理の途中で中断しているページ情報ｐがあるか調べる（Ｓ２０１）。ページ情報ｐは、現在レイアウト対象とするページに対し、レイアウトされる文書要素群を示す情報である。Ｓ２０１の判定で、処理途中のページがない場合は新規のページ情報を作成して、これをｐとする（Ｓ２０２）。
【００６３】
次に文書構成情報から、まだレイアウトされていない文書要素のうち最初に現れるものを検索し、これをｅとする（Ｓ２０３）。ここで、文書要素が他の文書により構成される場合（例えば図７の（ｂ２））もあるが、その場合はその文書（例えば図７の（ａ２））を構成する要素が再帰的に検索される。次に、文書要素ｅの表示内容データが取得済であるかどうかを文書構成情報から調べる（Ｓ２０４）。例えば、図７に示した文書要素の管理情報において「内容」の欄を調べ、その欄の値が「未取得」であれば当該文書要素の表示内容データは未取得である。また、「内容」欄にポインタが示されている場合は、そのポインタをたどってＢＩＤで示される表示内容データに行き着けば、表示内容データは取得済みと判定される。ポインタの先のデータが更に「内容」欄にポインタを含む場合は、再帰的にポインタをたどっていき、判定を行う。
【００６４】
この判定の結果、当該文書要素の表示内容データが未取得であればＳ２１２に進む。これに対し、当該文書要素の表示内容データが取得済の場合は、当該要素の表示内容データの表示サイズを算出し（Ｓ２０５）、この表示サイズがレイアウト中の現ページｐの空き領域（すなわち当該文書要素に先行する取得済の各文書要素をレイアウトした後に、残った領域）にレイアウト可能か調べる（Ｓ２０６）。この判定は、算出した表示サイズが空き領域に収まるか否かを調べることにより行えばよい。
【００６５】
この結果レイアウト可能であれば、当該文書要素を現ページｐにレイアウトする（Ｓ２１１）。
【００６６】
Ｓ２０６においてレイアウト可能でないと判定した場合、文書要素ｅを分割してレイアウトすることが可能かどうかを判定する（Ｓ２０７）。分割してレイアウト可能と判定された場合、空き領域に応じて文書要素ｅを分割して現ページｐにレイアウトし、その結果その文書要素ｅの中で残った部分を新たに文書要素ｅとする（Ｓ２０８）。これにより、現ページｐは空き領域へのレイアウトが終了したためレイアウトが確定する。確定したページ情報は印刷のための処理に直接渡される（あるいは後の処理のために保持される）。そして次ページに相当する新たなページ情報を作成し、これを現ページｐとして（Ｓ２０９）、Ｓ２０６に戻って分割された残りの要素ｅに対する処理を継続する。
【００６７】
Ｓ２０７において、文書要素ｅの分割が不可能な場合は、現ページｐのレイアウトを確定し、ページ情報は印刷のための処理に直接渡される（あるいは後の処理のために保持される）。そして次ページに相当する新たなページ情報を作成し、これを現ページｐとして（Ｓ２１０）、文書要素ｅをレイアウトする（Ｓ２１１）。
【００６８】
Ｓ２０７における文書要素ｅが分割可能かどうかの判断は、例えば文書要素の種別により行われる。例えばテキスト要素の場合は分割可能であり、画像要素の場合は分割不可能であると判断するなどである。このような判断ルールを本装置に予め登録しておけばよい。また、本処理においては領域の分割が可能かどうかを判断したが、領域の縮小処理等、他の処理についての可能性の判断を追加してもよい。例えば文書要素の縮小処理が可能と判断した場合は、その要素の表示内容データをページの空き領域に収まるように縮小し、該空き領域にレイアウトすることができる。この判断も、分割可能性の判断と同様、予め判断ルールを登録しておくことにより自動化が可能である。例えば、画像要素の場合は縮小可能、テキスト要素でありかつフォントサイズが指定されている場合は縮小不可能など、要素の種別などに基づく判断ルールが考えられる。
【００６９】
Ｓ２０４の判定において、文書要素ｅの表示内容データの取得が完了していなかった場合には、その表示内容データの取得完了まで現ページｐのレイアウト処理を待つかどうかを判定する（Ｓ２１２）。この判定処理については後に詳しく説明する。
【００７０】
Ｓ２１２の判定で、取得完了まで待つと判定した場合は、本処理を終了（又は中断）する。取得完了を待たないと判定した場合は、この判定の結果に応じて現ページｐのレイアウトを確定し、このページ情報ｐを印刷のための処理に直接渡す（あるいは後の処理のために保持する）。そして次ページに相当する新たなページ情報を作成し、これを現ページｐとして（Ｓ２１３）、本処理を終了（中断）する。
【００７１】
なお、Ｓ２０４では文書要素ｅの表示内容データの取得が完了しているかどうかを判断基準としているが、取得が完了していなくても表示サイズの算出が可能な場合もある（例えば画像要素を示すタグ中に、当該画像の表示サイズが明記されている場合）。従ってＳ２０４では、このような場合を考慮して、表示サイズが算出可能かどうかを判断基準としてもよい。ただし、この場合はページのレイアウトが確定しても、実際に文書要素の表示内容データの取得が完了するまで印刷処理は保留される。
【００７２】
次に、Ｓ２１２の処理の詳細な例を、図９及び図１０を参照して説明する。図９は未取得の文書要素を現在処理中のページにレイアウトするかどうかを判定する処理の一例を示すフローチャート、図１０はレイアウト可能な空き領域の一例を示す図である。
【００７３】
図１０（Ａ）は空き領域を持つ処理途中のページ（Ａ１１）を表わしている。このページの周縁部には斜線部で示すページマージン（空白域）が設定されている。ページ（Ａ１１）からこのマージンを除いた領域が、このページ上のレイアウト可能な領域（Ａ１２）である。図示例では、このレイアウト可能領域（Ａ１２）には、表示内容データ取得済のテキスト要素の領域（Ａ１３）および画像要素の領域（Ａ１４）がＨＴＭＬ文書の記述順に従って既にレイアウトされている。
【００７４】
図９の処理においては、まず現ページにおけるレイアウト可能な空き領域を判定する（Ｓ３０１）。ここでは例えば、図１０（Ｂ）に示すように、レイアウト済みの最後の文書要素である画像要素（Ａ１４）の下端の水平線から当該ページの表示可能領域の末尾までの間の矩形の網掛け域（Ｂ１１）が空き領域と判定される。なお、図１０の例では、画像要素（Ａ１４）の右側が大きく空いているので、（Ｃ）に示すように画像要素（Ａ１４）の右側の矩形の網掛け域（Ｃ１１）を空き領域と判定するようにしてもよい。また網掛け域（Ｂ１１）及び（Ｃ１１）の両方を同時に空き領域と判定するようにしてもよい。この場合、後で、注目の文書要素が空き領域に収まるかを判定する際、それら両方の網掛け域についてそれぞれ判定を行い、「収まる」と判定された方の網掛け域をその文書要素のレイアウト場所とし、両方共に収まると判定された場合は、別途定めた判定基準に基づきどちらの網掛け域をレイアウト場所に採用するかを判定すればよい。
【００７５】
空き領域が判定されると（ここでは網掛け域（Ｂ１１）が空き領域と判定されたとする）、次にその空き領域のサイズを算出し、この値が所定サイズ以下である場合（Ｓ３０２の判定結果がＹ）、または空き領域の縦横比が所定外である場合（Ｓ３０３の判定結果がＹ）は、当該文書要素ｅを現ページｐにレイアウトしない（すなわち現ページｐから除外する）と判定する（Ｓ３０７）。例えば、Ｓ３０２においては、ページのレイアウト可能領域（Ａ１２）の６分の１を、判定基準の「所定サイズ」とするなどである。また、Ｓ３０３においては、例えば縦横比が１／５から５／１の範囲を、判定基準の「所定範囲」とするなどである。これらの判定により、文書要素の配置に適さない狭い領域（縦に狭い、又は横に狭い領域）しか空いていない場合、文書要素の表示サイズ推定（Ｓ３０４）などの処理をスキップすることができる。
【００７６】
Ｓ３０２、Ｓ３０３で除外の判定をされなかった場合、次に文書要素ｅの表示サイズを推定する（Ｓ３０４。詳細な処理内容については後述）。次に、この表示サイズ推定値が空き領域（Ｂ１１）のサイズを超えるか否かを判定する（Ｓ３０５）。ここで、超えると判定した場合は、当該要素ｅを現ページｐにレイアウトしない（現ページｐから除外する）と判定する（Ｓ３０７）。Ｓ３０５で表示サイズ推定値が空き領域のサイズを超えないと判定した場合は当該要素ｅを現ページｐにレイアウトする（現ページｐから除外しない）と判定する（Ｓ３０６）。
【００７７】
以上の処理において、Ｓ３０４の表示サイズ推定方法としては、例えば、文書要素ｅの属性をもとに一律に決定する方法がある。例えば属性として当該文書要素ｅのデータ種別に着目した場合、文書要素ｅがテキストならば表示サイズはＸ、画像ならば表示サイズはＹなどと、該要素ｅのデータ種別に対応して予め登録しておいたサイズ値を推定値とする。
【００７８】
また、文書要素ｅの表示内容データのデータサイズが、そのデータの取得開始時のプロトコルなどにより分かる場合は、そのデータサイズを表示サイズに換算し、それを表示サイズの推定値としてもよい。データサイズから表示サイズへの換算係数は、当該文書要素ｅのデータ種別ごとに定めればよい。
【００７９】
また別の方法として、データの取得状況を考慮して動的に決定する方法もある。図１１を参照してこの方法の例を説明する。
【００８０】
図１１は、表示領域推定部１７において、文書要素ｅの表示内容データの取得処理に要した時間をもとに表示サイズを動的に推定する処理の一例を示すフローチャートである。この処理では、まず、該文書要素ｅの表示内容データの取得処理を開始してからの経過時間ｔを取得する（Ｓ４０１）。この経過時間の情報は、データ取得部１２が取得中の文書要素ごとに管理している。
【００８１】
次に、現在のデータ通信速度ｓを取得する（Ｓ４０２）。この通信速度ｓは、例えば、現在その表示内容データの取得のためのデータ通信に使用している通信デバイス（あるいは通信回線）の種別から決定することができる。例えば、使用している通信デバイスがアナログモデムならば５６ｋｂｐｓ、ＡＤＳＬ（ＡｓｙｎｃｈｒｏｎｏｕｓＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）モデムならば８Ｍｂｐｓと通信速度を決定するなどである。また、データ取得部１２が、データ取得状態をモニタして、通信速度ｓの実測値を算出するようにしてもよい。この場合、直前のいくつかの文書要素の表示内容データのデータサイズと、その取得に要した時間とから、現在の通信速度ｓを推定することもできる。
【００８２】
そして、当該文書要素ｅの表示サイズ算出係数ｒを取得する（Ｓ４０３）。算出係数ｒは、データサイズを表示サイズに変換するための係数であり、例えばデータ種別ごとに定めることができる。この場合、例えば、予めデータ種別ごとに算出係数ｒを定めて、図１２に示すようなテーブルを作成し本装置に登録しておき、当該文書要素ｅに対応する係数ｒをこのテーブルから取得すればよい。例えば、ＪＰＥＧ画像と通常のビットマップ画像とを比較すると、同じデータサイズを取得してもデータが圧縮されているＪＰＥＧ画像の方が表示サイズが大きくなる。算出係数ｒは、このような傾向を考慮して定めればよい。
【００８３】
そして、このようにして取得した値をもとに文書要素ｅの推定表示サイズｗを例えば以下の式（１）により算出する（Ｓ４０４）。
ｗ＝ｒ＊ｓ＊ｔ・・・式（１）
【００８４】
この図１１の処理は、文書要素ｅのデータの取得を開始する際、あるいはそのデータの一部分を取得するたびに、図８のＳ２１２、図９のＳ３０４を経由して呼出される。この処理で算出した推定値は図９のＳ３０５の判定に使用される。データを取得する（経過時間が増える）ごとに推定サイズが増加していき、図９のＳ３０５において空き領域を超えると判定された場合は、取得完了前であっても、現在処理中のページのレイアウト処理を先行して完了させることができる。
【００８５】
すなわち図１１の領域サイズ推定処理では、注目している文書要素ｅの表示内容データのうち、現在時点までに取得済みのデータ量に基づき、現在時点に取得できた部分の表示サイズを推定している。したがって、図９の手順のＳ３０５では、現在時点までに取得できた部分がページの空き領域に収まるかを判断することになる。現在時点までに取得できた部分が既に空き領域より大きい以上、完全な文書要素ｅの表示内容データは当然その空き領域に収まらないという判断が成り立つ。
【００８６】
これに対し、Ｓ３０５で推定表示サイズが空き領域以下であると判定されたとしても、完全な文書要素ｅの表示内容データが空き領域に収まる保証はない。したがって、Ｓ３０６における「要素ｅを現ページｐから除外しない」という判断は、暫定的なものである。文書要素ｅが現ページにレイアウトできるかどうかの最終判断は、当然ながらその要素ｅの完全な表示内容データのサイズが分かって初めて決まる。本実施形態の処理は、文書要素ｅが現ページｐにレイアウト「できる」ことではなく、「できない」こと（あるいは「できない」蓋然性が高いこと）を、できるだけ早い段階で検知するところに意義がある。文書要素ｅが現ページｐにレイアウトできないことが、従来より早い段階で判明すれば、その要素ｅの取得完了を待つまでもなく、その段階で現ページｐのレイアウトを確定して印刷を開始することができる。
【００８７】
以上の説明では、本発明をプリンタに適用した場合を例にとったが、以上の説明から明らかなように、本発明は、プリントサーバに適用することもできるし、パーソナルコンピュータ等にインストールされるプリンタドライバに適用することもできる。この場合、プリンタドライバは、現ページｐのレイアウトが確定した段階で、現ページｐの印刷内容を指示する印刷データを作成し、この印刷データをプリンタに対して送信して印刷を指示すればよい。
【００８８】
また、以上の説明では、ＸＨＴＭＬ‐ｐｒｉｎｔ文書を処理する場合を例にとったが、明らかなように、本発明は、ＨＴＭＬやＸＭＬなどを含む、文書要素の表示内容データを参照により記述可能な記述言語で記述された文書一般に適用可能である。
【００８９】
【発明の効果】
以上のようにこの発明によれば、文書要素が現在レイアウト処理中のページにレイアウトできないと結果的に判断されてしまう可能性が高いか否かを、その文書要素の表示内容データの取得が完了する前に判定することができる。そして、文書要素が現在レイアウト処理中のページにレイアウトできない可能性が高いと判断した場合、その前の文書要素まででそのページのレイアウトを確定することで、問題の文書要素の表示内容データの取得完了を待たずにそのページの印刷出力が可能となり、印刷処理の高速化を図ることができる。
【図面の簡単な説明】
【図１】ＨＴＭＬ文書印刷の一例を示す図である。
【図２】プリンタのハードウエア構成の一例を示す図である。
【図３】プリンタが適用されるシステムの構成例を示す図である。
【図４】プリンタの処理機構の構成例を示す図である。
【図５】文書解析部及び文書構成情報管理部の処理手順の一例を示すフローチャートである。
【図６】文書構成情報の一例を示す図である。
【図７】構成管理情報の一例を示す図である。
【図８】レイアウト処理の一例を示すフローチャートである。
【図９】未取得要素の除外判定処理の一例を示すフローチャートである。
【図１０】レイアウト可能な空き領域の一例を示す図である。
【図１１】取得開始からの経過時間にもとづき表示サイズを推定する処理の一例を示すフローチャートである。
【図１２】データサイズから表示サイズを推定するための係数テーブルの一例を示す図である。
【符号の説明】
１０文書データ処理機構、１１主制御部、１２データ取得部、１３文書解析部、１４文書構成情報管理部、１５レイアウト部、１６決定部、１７表示領域推定部、２０ＬＡＮ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus that obtains display content data of each document element constituting a document from a description in the document or from the outside, generates an output image, and forms an image.
[0002]
[Prior art]
In recent years, the number of opportunities to print hypertext documents such as HTML, XML, and XHTML-print has been increasing, and printers are now able to accept and print hypertext documents. For example, Patent Literature 1 discloses an apparatus in which a printer acquires a hypertext document and data of each document element constituting the document from each server on the Internet and prints the data. The process in which the printer itself downloads and prints the data of the hypertext document in this manner is also called pull printing.
[0003]
When printing a hypertext document, the printer analyzes the document in order from the beginning, and when detecting a description such as a source file name of data to be incorporated in the output, transmits a request to acquire the file to a server on the Internet. Then, the file sent from the server in response to this request is developed into a bitmap, incorporated into an output image, and the output image is printed on a predetermined medium such as paper.
[0004]
For example, as described in Patent Literature 2, when data of all document elements constituting a hypertext document is collected, mapping of each element is determined in a page, page break processing is performed, and then page information is processed. Create
[0005]
Here, the data of each document element constituting the hypertext document is generally stored in a plurality of different servers on the Internet, and it takes some time to acquire the data. Also, the time required for data acquisition varies depending on the response time of the server and the like, so that data cannot always be acquired in the order in which acquisition requests were issued on the Internet. For this reason, for example, it is often the case that data of an element incorporated in a later page is acquired earlier than data of an element incorporated in a previous page.
[0006]
In printing, it is necessary to output sequentially from the first page, so even if the data of the subsequent page can be obtained first, it cannot be printed out at that time, so that data is Until all the page data is available, it must be kept in the printer. For this reason, the conventional apparatus generally downloads the entire data necessary for printing a hypertext document to a printer and then performs the printing process. For this reason, conventionally, in order to print a hypertext document of a large size, it is necessary to provide a large-capacity storage device in the printer, and there has been a problem that the cost is increased.
[0007]
In order to solve this problem, even if all the document elements included in the document have not been obtained, printing is performed in order from the page where all the elements that make up the page can be obtained, and the page that has been printed is configured By deleting the data, it is possible to print without necessarily holding all the data and without waiting for the processing until the acquisition of all the data is completed.
[0008]
For example, consider the case of printing the HTML document shown in FIG. In this case, an HTML document is obtained, and an image included in the HTML document is further obtained from the server. When all elements have been acquired, layout from the top to the page is performed.
[0009]
In this layout process, first, a text having the content of "text portion...", Which is the first document element of the HTML document, is laid out in the area B1 of the first page in FIG. Next, an image indicated by a tag of “<IMG SRC =...>” Is assigned, and the image is laid out in the area B2 of the first page. Next, a text having the content "text following the image ..." is assigned, and a layout is made in the area B3 of the first page. However, since all the text document elements cannot be assigned to the first page, the document elements are further laid out in the area B4 of the second page.
[0010]
FIG. 1C is a layout example when the display size of the acquired image is larger than that in the case of FIG. The text area C1 and the image area C2 are allocated as in the previous example. Next, a text having the content "text following the image ..." is assigned, but since there is no vacant area that can be laid out on the first page, the layout is laid out in the area C3 from the top of the second page.
[0011]
FIG. 1D shows an example in which the display size of the acquired image is even larger. The text area D1 is allocated as in the previous example. Next, an image area is allocated. As shown in the area D2, since the image is larger than the previous area and does not fit in the empty area of the first page, the page is changed and the layout is laid out in the first area D3 of the second page. The text having the content "text following the image ..." is laid out in the area D4 following the area D3.
[0012]
By performing such processing, for example, when printing of a page laid out as shown in FIG. 1C is completed, a text having the content of “text portion...” As the first element and an image as the next element , Data can be discarded, and the process can be performed while reusing the storage area for the processing of the second and subsequent pages.
[0013]
[Patent Document 1]
JP-A-11-134125
[Patent Document 2]
JP 2000-066687 A
[0014]
[Problems to be solved by the invention]
In the above-described method, after the acquisition of the image (B2, C2, D2) is completed and its display size is known, it is determined whether to allocate the image to the first page or to evict the image on the second page. Therefore, even if the image is laid out on the second page as a result, the layout and printing of the first page are kept waiting until the acquisition of the image is completed.
[0015]
As described above, in the above-described method, it is not determined whether to lay out the document element on the previous page or the next page until the acquisition of the data of the document element located on the boundary between the page and the next page is completed. Absent. Therefore, when deciding the layout of the previous page, it is necessary to wait until the acquisition of the data of the document element is completed, and as a result, the start of the printing process is also waited.
[0016]
An object of the present invention is to provide an image forming apparatus and method capable of printing a hypertext document at higher speed in order to solve such a problem.
[0017]
[Means for Solving the Problems]
An apparatus according to the present invention includes: an analysis unit configured to analyze a description of a document including a plurality of document elements; a data acquisition unit configured to obtain display content data of each document element detected at the time of analysis by the analysis unit; A layout unit that determines a layout of each of the document elements on each output page based on an analysis result of the unit, and a document element that is to be laid out on an output page when sequentially determining the layout of each document element. If the display content data has not been acquired yet, the display size of the display content data is estimated. If the estimated display size does not fit in the size of the unlayout area of the output page, the document element is laid out on the next page. A layout unit that determines the layout of the output page up to the document element immediately before the document element; and Ri image data output page layout has been finalized, comprises display content data of each document element contained in the output page and generated on the basis of its layout, and output means for imaging onto a medium.
[0018]
In a preferred aspect of the present invention, when estimating the display size of the display content data that has not been acquired, the layout unit uses the data type of the display content data as basic information for the estimation.
[0019]
In another preferred aspect, when the display content data of the document element to be laid out on the output page has not been obtained yet, the layout means includes a data of a portion of the display content data which has been obtained up to the present time. The display size of the obtained portion is estimated from the size, and if the estimated display size of the obtained portion does not fit in the size of the unlayout area of the output page, the document element is laid out on the next page and the document is laid out. The layout of the output page is determined by the document element immediately before the element.
[0020]
In another aspect of the present invention, an analyzing means for analyzing a description of a document including a plurality of document elements, a data acquiring means for acquiring display content data of each document element detected at the time of analysis by the analyzing means, A layout unit that determines a layout of each of the document elements on each output page based on an analysis result of the analysis unit, and a document to be laid out on an output page when sequentially determining the layout of each document element If the display content data of the element has not been acquired, the document element is laid out on the next page based on the size of the unlayout area or the aspect ratio of the unlayout area, and the document element immediately before the document element is A layout unit that determines the layout of the output page, and image data of the output page whose layout is determined by the layout unit. The generated based on the display content data of each document element and the layout in the output page, and output means for image formed on the medium.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention (hereinafter, referred to as embodiments) will be described with reference to the drawings.
[0022]
FIG. 2 is a diagram schematically illustrating an example of a hardware configuration of the printer according to the embodiment of the present invention.
[0023]
This printer includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, an operation panel 204, a LAN (Local Area Network) interface 205, a print engine. 206 and a hard disk drive (HDD) 207.
[0024]
The CPU 201 controls various parts of the printer by executing various control programs stored in the ROM 202, and implements processing operations of the printer. This control program includes a program describing a control operation for printing a document such as HTML or XHTML-print by dividing the document into pages. The processing procedure of this page division will be described later in detail. The RAM 203 is used as a work memory area (work memory) when various programs are executed by the CPU 201, and stores, for example, printer reception data (image data, HTML data, and the like) and bitmap data obtained by developing the data. The operation panel 204 is a user interface device for setting printing conditions for the printer, and has, for example, a touch panel liquid crystal display and various input buttons. The LAN interface 205 performs a communication process for communicating with a device on the LAN or communicating with a device on the Internet via the LAN. The print engine 206 receives the bitmap data to be printed according to an instruction from the CPU 201 and prints the bitmap data on a medium such as paper. A hard disk drive (HDD) 207 is a rewritable nonvolatile storage device that stores various setting data and programs set by a user or a service engineer, and is used as a save area for data overflowing from the RAM 203.
[0025]
FIG. 3 is a diagram illustrating a configuration example of a system to which the printer 302 is applied. In this example, a client PC (personal computer) 301 and a printer 302 are connected on a LAN 303. The client PC 301 and the printer 302 are connected to the Internet 304 via the LAN 303. A server 305 exists on the Internet 304.
[0026]
In the system of FIG. 3, for example, when a client PC 301 or a mobile device (not shown) transmits a print request of an HTML document and HTML document data to be printed to the printer 302, the printer 302 that has received the request transmits the HTML document data to the printer 302. Analyze, generate bitmap data and print on media. Here, the HTML document may include an object (document element) indicated by a URL (Uniform Resource Locator) or the like, and in such a case, the printer 302 determines, according to the URL or the like, the Internet on which the object exists. The data of the object is downloaded from the server 305 on 304, and bitmap data is generated using the data. As described above, the method of directly sending the HTML document to the printer 302 and printing the same is also called a direct print method. In addition, the URL of the HTML document is directly input from the operation panel 204 of the printer 302, or the URL of the HTML document is transmitted to the printer 302 from the client PC 301 or a mobile device (not shown), and is printed on the printer 302. It is also possible to give an instruction (pull print). In this case, the printer 302 downloads the HTML document from the Internet 304 according to the input URL and performs a printing process.
[0027]
When the printer 302 according to the present embodiment downloads data existing in another server as described above and prints the data incorporated in an output image, acquisition of display content data in a document element forming a page is not completed ( In the case where there is a “non-acquired” element, it is determined whether or not this element is to be excluded from the page, thereby suppressing the waiting for acquisition of the unacquired element and realizing higher-speed printing.
[0028]
The document data processing mechanism 10 in the printer 302 will be described with reference to FIG. The processing mechanism 10 in FIG. 4 is realized by the CPU 201 executing a program or setting data stored in the ROM 202 or the HDD 207.
[0029]
In the processing mechanism 10, the main control unit 11 is a means for performing overall control of the processing mechanism 10. The data acquisition unit 12 is a unit that acquires data via a network such as the LAN 20 or the Internet in response to a data acquisition request from the main control unit 11. The document analysis unit 13 analyzes an input HTML document or a MIME (Multipurpose Internet Mail Extensions) document including the HTML document, detects individual document elements, and obtains a logical structure constituted by the document element group. The document configuration information management unit 14 creates and manages document configuration information based on the detected document element and information on the logical structure (details will be described later).
[0030]
Here, in some document elements, the data of the content (referred to as display content data in the sense of the content shown in the print result) is already included in the MIME document together with the HTML document, and in the case of the external, There is also a reference (that is, the description of the element in the HTML document includes link information (for example, URL) to the display content data). Regarding the latter, the document analysis unit 13 requests the data acquisition unit 12 to download the display content data via the main control unit 11. Here, HTML document data, display content data of each document element detected when the HTML document and the MIME document are sequentially analyzed from the top, and document configuration information that is sequentially created and updated according to the progress of the analysis. Various data are stored in the RAM 203 which is a work memory. Note that display content data obtained by downloading from a MIME document including an HTML document or from the outside may be described in a description language such as HTML or a page description language, or may be encoded or compressed. Not a few. In such a case, it is necessary to develop the display content data into an image data format (for example, a bitmap image) that can be handled by the print engine 206. When such expansion processing is performed on a software basis, the RAM 203 is used as a work area for this processing.
[0031]
The layout unit 15 determines the area in which each document element is displayed in order from the first page based on the document configuration information and the display content data of each document element (or data obtained by developing the data). That is, the display area of each document element is sequentially allocated to the page from the top. This is the layout processing. Based on the layout determined here, the drawing unit (not shown) generates a page image of each page in order from the first page on a page buffer (not shown) provided separately from the RAM 203. The page image generated in this way is passed to the print engine 206 and printed on paper.
[0032]
Here, when laying out each page, the layout may not be determined because some of the document elements constituting the page have not been obtained. The determination unit 16 is provided to deal with such a case. The deciding unit 16 decides whether to wait for the completion of the acquisition to lay out the unacquired document element or to exclude the document element from the assignment target and continue the layout of the page. This determination process is performed based on a result of laying out the acquired document element group halfway, the above-described document configuration information, and the like.
[0033]
The display area estimating unit 17 is a processing module for estimating the display size of a document element being acquired (that is, acquisition is not completed). The estimation result by the display area estimation unit 17 is used by the determination unit 16 to determine whether to wait for acquisition of an unacquired element. For example, the display area estimation unit 17 estimates the remaining area after laying out the already acquired document element group in the page (the remaining area is a candidate for the area for laying out the unacquired document element). If the size is larger, it is determined that the document element is not laid out on this page, and the page layout is continued.
[0034]
FIG. 5 is a flowchart showing a processing procedure of the document analysis unit 13 and the document configuration information management unit 14 in the document data processing mechanism 10.
[0035]
For example, in the case of pull printing, when the URL of the HTML document data to be printed is input from the client PC 301 or the operation panel 204, the data acquisition unit 12 requests the acquisition of the document data indicated by the URL, for example, according to HTTP 1.1. The document data is issued on the LAN 20 in the form of a GET command, and the document data returned from the LAN 20 or a server on the Internet connected to the LAN 20 is acquired in response to the acquisition request (S100). Here, the acquired document data is stored in the RAM 203. When the document data is obtained, the information of the document is registered in the document configuration information management unit 14 (the information content will be described later with reference to FIGS. 6 and 7) (S101).
[0036]
Next, the document analysis unit 13 analyzes the document data sequentially from the top (S102). In this analysis, syntax analysis is performed on document data in order, and if the end of the document has not been reached (S103), data analysis is performed until a document element is found by tag detection (S104). When the document element is found, the information of this element is registered in the document composition information management unit 14, and the information related to the newly registered document element among the already registered documents and document element information is registered. Then, the information of the newly registered element is reflected (S105).
[0037]
Here, when the document element is a text embedded in the HTML document, the text included in the HTML document is recognized as display content data corresponding to the document element, and indicates the location of the display content data. The information is registered in the document configuration information management unit 14. For example, when a document element of a text embedded in an HTML document is detected, the display content data is already loaded on the RAM 203.
[0038]
On the other hand, in the HTML document, when the document element is described in the form of link information to the actual data (display content data) (for example, the URL of the source of the image to be incorporated in the document is included in the IMG tag). In such a case, it is necessary to obtain the data of the document element by downloading or the like. Therefore, the document analysis unit 13 determines whether there is link information (for example, a URL of the image data described in the IMG tag) to the data existing in another place in the description of the detected document element ( S106), the data acquisition process is performed according to such link information (S107). At this time, if the link information is an external URL, the data acquisition unit 12 downloads the data indicated by the URL from a server on the Internet. When the link information indicates a MIME document including the HTML document or image data (called an in-line image) embedded as code information in the XHTML-print document, the MIME document or the XHTML- The analysis of the print document proceeds, and the process waits until code data of the inline image is found.
[0039]
Here, FIG. 5 shows a series of processes performed when one piece of document data (here, not only HTML data but also data such as an image referred to therein is treated as one piece of document data). Is shown. Therefore, when the acquisition of the link destination data is requested in S107, the process shown in FIG. 5 is separately started for the data and is executed in parallel (concurrent) with the process of the acquisition request source. The loop indicated by the broken line in FIG. 5 indicates this. Since the data may be acquired from the Internet, it is not known when the data to be acquired can be acquired. Therefore, the data is acquired asynchronously in this manner. With respect to the data at the link destination for which the acquisition request has been made in S107, a data acquisition process is performed to acquire display content data by a process for acquiring the data, and information on the display content data is stored in the document configuration information management unit. 14 is registered.
[0040]
Next, the layout unit 15 performs a layout process for the document elements registered in the document configuration information management unit 14 (S108). Details of this processing will be described later.
[0041]
Next, the document composition information managed by the document composition information management unit 14 and a procedure for creating the document composition information will be described with reference to FIGS.
[0042]
In FIG. 6, (a) schematically shows the description content of the input HTML document, and also shows the correspondence between each description and each item in data management in the document configuration information management unit 14. FIG. 2B shows a data management structure in the document configuration information management unit 14 for the HTML document of FIG. FIG. 7 conceptually shows a snapshot of document configuration information managed by the document configuration information management unit 14 at a certain point in time.
[0043]
The entire HTML description shown in FIG. 6A forms one document data. Unique identification information D-001 is assigned to this document data, and corresponding management information (see (a1) in FIG. 7) is created. This management information is stored in the RAM 203 in a form that can be searched using the identification information D-001. The identification information D-001 is given by the document configuration information management unit 14 when the URL “http://xxx.yyy.jp/index.htm” to be printed is input from the user or the client PC 301 to the printer 302. . Then, when the data acquisition unit 12 downloads the data of the HTML document corresponding to the URL, the document configuration information management unit 14 creates management information of the document data, and obtains information on the document data that is known at that time. Is registered in the management information.
[0044]
For example, in FIG. 7A1, as management information of the HTML document D-001, (1) identification information (DID: document ID), (2) data name (file name) of the document, (3) document data (Which can be extracted from the URL together with the file name), (4) the data size of the document data (to be known when the download is completed), (5) acquisition time (time required for download), and (6) data Element (a pointer to the management information of the document element appearing at the top of the document) management information including six items. In the case of an HTML document, since the items other than the sixth item “data element” are known at the time of completion of the download, these values are registered in the management information.
[0045]
When the main body (<BODY> to </ BODY>) of the HTML document is parsed from the beginning, first, a document element composed of text data is detected. At this time, the document configuration information management unit 14 assigns unique identification information E-001001 to this element, and creates management information (see (b1) in FIG. 7) corresponding to this. This management information is stored in the RAM 203 in a form that can be searched using the identification information E-001001.
[0046]
For example, in FIG. 7 (b1), as the management information of the document element E-100001, (1) identification information (EID: element ID), (2) the element type (text, table, list, image, link) of the document element (Reference information)), (3) drawing size of the document element (determined when the content data is determined), (4) next element (pointer to management information of the document element appearing next to the document element in the document data) ) And (5) Content (pointer to display content data of the document element) are created. Among them, the element type is known at the time of detecting the text document element E-100001, and is registered in the management information at this time. In the case of text, if layout conditions (page size, margin amount, font type and size, character feed, etc .; may be further specified by CSS: Cascading Style Sheets, etc.) are specified in advance. Since the drawing size can be obtained when the text data is obtained, the value of the drawing size can also be registered.
[0047]
The text data as the content of the element E-001001 is included in the HTML document D-001, and has already been obtained when the element E-001001 is recognized (for example, the end tag is recognized). Therefore, the document composition information management unit 14 assigns identification information (BID: block ID) of B-001001 to the content data (text), and the content data can be searched using the identification information B-001001. And stored in the RAM 203 (see (c1) in FIG. 7). Then, a pointer (B-100001) to the content data is registered in the fifth item “contents” of the management information of the element E-100001 (see (b1) in FIG. 7).
[0048]
In the management information of the element E-001001, the “next element” is left blank because it is not clear that the analysis of the document data D-001 does not proceed.
[0049]
When the first document element E-001001 is detected from the document data D-001 in this manner, the document configuration information management unit 14 sets the sixth item “data element” of the management information of the document data D-001 in the document data D-001. On the other hand, a pointer to the head element E-100001 is registered (see (a1) in FIG. 7).
[0050]
The processing when the head element E-100001 in the document is detected has been described above. Thereafter, when the analysis of the HTML document D-001 is further advanced, a document element having an image as the content indicated by the <IMG> tag is detected. Then, the document configuration information management unit 14 assigns unique identification information E-001002 to this element, and creates management information (see (b2) in FIG. 7) corresponding to this. At this point, since the next element E-100001 of the element E-100001 has been found, a pointer to this next element is registered in the management information of the element E-100001.
[0051]
Here, since the content of this element E-001002 is shown in the form of a link description (URL “http://aaa.bbb.jp/image.jpg” in this example), the “element type” of the management information is displayed. ", Information indicating that the element content is a link is registered, and a data name and a path are registered according to the link. The “drawing size” is “(undecided)” at this point because it is not known until the link destination information can be obtained.
[0052]
When the content element is detected as a document element indicated by a link to external data, the process for downloading the linked data is started as described above (S107 in FIG. 5). At this time, the document configuration information management unit 14 creates management information by assigning unique identification information D-002 to the data to be downloaded, and if the data can be downloaded by the above process, the information on the data is transferred. It is registered in the management information (see (a2) in FIG. 7). As described above, in the present embodiment, data to be downloaded from outside the printer 302 is conceptually distinguished from a document element included in the description of the HTML document, and the former is given identification information DID and treated as document data. The latter is given identification information EID and treated as an element. Therefore, also with respect to the image data to be downloaded according to the link of the document element E-001002, the identification information D-002 as the document data is first given, and the management information (a2) is created. Until the download is completed, “size” and “acquisition time” are “(undetermined)”.
[0053]
Since the data D-002 to be downloaded is image data, the document configuration information management unit 14 assigns unique identification information E-100003 to the image data, treats the image data as a document element, and stores the management information ( (B3) in FIG. 7 is created. At this point, it is known that the element type of the element E-001003 is an image, but the drawing size is not known until the expansion of the compressed data or the like is completed. Indicates “(undecided)”. Also, “contents” is set to “(not acquired)” because it is not determined until the download is completed. These “drawing size” and “contents” are registered when the download is completed.
[0054]
When the download is completed, the size of the downloaded data and the required time are determined, and these are registered in the “size” and “acquisition time” of the identification information D-002. The entity of the downloaded image data is stored as information that can be identified by a block ID (BID). This block ID is registered in the “contents” of E-001003. Further, the size of the image is determined based on the attribute information of the downloaded data, and registered in the “drawing size”.
[0055]
When the link description of the document element indicates data embedded in the HTML document (such as an in-line image, hereinafter referred to as in-line data), the data at the link destination is used as the starting point, as in the above description. The management can be performed by a management information group having a hierarchical structure of document data, document element data, and data entities. In this case, for example, identification information of the data indicated in the link description may be registered in the “data name” of the management information at the document data level. However, in this case, since there is no need to download, it is possible to omit the creation of document data level management information. It should be noted that the determination of whether the link destination is outside or inside the HTML document is basically based on the reference to the attached data in the MIME document or XHTML-print document when the link destination is described in the form of a URL. If it is described in the form of “cid:” (content ID) used to indicate
[0056]
Now, returning to the analysis of the document data D-001, a document element indicated by an <IMG> tag is detected next to the document element E-001002, so the document configuration information management unit 14 identifies this element with the identification information E 001004 ((b4) in FIG. 7), and data to be downloaded from the link destination in this tag is managed by the identification information D-003. In the management information of the element E-001002, a pointer to the next element E-001004 is registered. Then, the image data in the data D-003 is managed as the document element of the identification information E-001005 ((b5) in FIG. 7). The entity data is managed by the identification information B-100002 ((c2) in FIG. 7).
[0057]
Then, when the analysis of the document data D-001 is further advanced, a text element is detected next to the element E-001004. Therefore, this element is managed by the identification information E-001006, and the text of the text is added to the management information. The pointer to the entity data B-00100x is registered ((b6) in FIG. 7).
[0058]
The analysis of the HTML document illustrated in FIG. 6A and the process of constructing the document configuration information according to the analysis have been described above with reference to FIG. 7, which is a specific example of the document configuration information. The collection of the document data and the management information of the document elements shown in FIG. 7 is the document configuration information. FIG. 6B shows the relationship between the management information and the entity data created or obtained according to the analysis of FIG. 6A by arrows, and the solid-line blocks indicate the management information of the document data and the document element. It shows the entity data, and the dashed block shows a link outside the HTML document.
[0059]
As described above, the document configuration information is created according to the analysis processing illustrated in FIG. 5, and the document configuration information is updated as the analysis processing and the download processing progress. In the present embodiment, the main control unit 11 and the layout unit 15 execute page layout processing for print output in parallel with such analysis processing. This processing will be described with reference to FIG.
[0060]
FIG. 8 is a flowchart illustrating an example of a layout process in the layout unit 15. This processing can be executed in the middle of the document analysis processing as in S108 of FIG. 5, but can also be called and executed when a part of the display content data of the document element being acquired is acquired. Further, a configuration in which this processing is periodically called is also possible.
[0061]
The processing in FIG. 8 may be interrupted without ending the layout of all document elements included in the document to be printed. At the time of interruption, information on the progress of processing is held, and when this processing is called again, the layout is continued based on the held information.
[0062]
When this process is called, first, the layout unit 15 checks whether there is page information p interrupted in the middle of the process (S201). The page information p is information indicating a document element group to be laid out for a page to be laid out at present. If it is determined in S201 that there is no page being processed, new page information is created and set to p (S202).
[0063]
Next, from the document configuration information, the first appearing document element that has not been laid out is searched, and this is set as e (S203). Here, the document element may be composed of another document (for example, (b2) in FIG. 7). In that case, the elements constituting the document (for example, (a2) in FIG. 7) are recursively searched. Is done. Next, it is checked from the document configuration information whether or not the display content data of the document element e has been acquired (S204). For example, the "content" column is checked in the management information of the document element shown in FIG. 7, and if the value of the column is "unacquired", the display content data of the document element has not been acquired. When a pointer is indicated in the “contents” column, if the pointer is followed to reach the display content data indicated by the BID, it is determined that the display content data has been acquired. If the data ahead of the pointer further includes a pointer in the "content" column, the pointer is recursively followed to make a determination.
[0064]
As a result of this determination, if the display content data of the document element has not been acquired, the process proceeds to S212. On the other hand, when the display content data of the document element has been acquired, the display size of the display content data of the element is calculated (S205), and the display size is set to the free area of the current page p in the layout (that is, the display area). After laying out each acquired document element preceding the document element, it is checked whether layout is possible in the remaining area) (S206). This determination may be made by checking whether the calculated display size fits in the free area.
[0065]
As a result, if the layout is possible, the document element is laid out on the current page p (S211).
[0066]
If it is determined that the layout is not possible in S206, it is determined whether the document element e can be divided and laid out (S207). When it is determined that the document element e can be divided and laid out, the document element e is divided according to the free space and laid out on the current page p. As a result, the remaining part of the document element e is newly set as the document element e. (S208). As a result, the layout of the current page p is determined because the layout to the empty area has been completed. The determined page information is directly passed to processing for printing (or retained for later processing). Then, new page information corresponding to the next page is created, this is set as the current page p (S209), and the process returns to S206 to continue the processing for the remaining divided element e.
[0067]
In step S207, if the document element e cannot be divided, the layout of the current page p is determined, and the page information is directly passed to the processing for printing (or retained for later processing). Then, new page information corresponding to the next page is created, this is set as the current page p (S210), and the document element e is laid out (S211).
[0068]
The determination as to whether the document element e can be divided in S207 is performed based on, for example, the type of the document element. For example, it is determined that a text element can be divided and an image element cannot be divided. Such a determination rule may be registered in the present apparatus in advance. In this processing, it is determined whether or not the area can be divided. However, it is also possible to add a determination on the possibility of other processing such as the area reduction processing. For example, if it is determined that the document element can be reduced, the display content data of the element can be reduced to fit in the free area of the page and laid out in the free area. This determination can be automated by registering a determination rule in advance, similarly to the determination of the division possibility. For example, a determination rule based on the type of the element can be considered such that the image element can be reduced, and the text element can be reduced when the font size is specified.
[0069]
If it is determined in S204 that acquisition of the display content data of the document element e has not been completed, it is determined whether to wait for the layout processing of the current page p until the acquisition of the display content data is completed (S212). This determination processing will be described later in detail.
[0070]
If it is determined in S212 that the process is to wait until the acquisition is completed, the process ends (or is interrupted). If it is determined that the acquisition is not to be completed, the layout of the current page p is determined in accordance with the result of this determination, and the page information p is directly passed to the processing for printing (or retained for later processing). ). Then, new page information corresponding to the next page is created, this is set as the current page p (S213), and this processing is terminated (interrupted).
[0071]
In S204, it is determined whether or not acquisition of the display content data of the document element e has been completed. However, there is a case where the display size can be calculated even if the acquisition is not completed (for example, the image element is indicated). When the display size of the image is specified in the tag). Therefore, in S204, in consideration of such a case, whether the display size can be calculated may be used as a criterion. However, in this case, even if the layout of the page is determined, the printing process is suspended until the display content data of the document element is actually acquired.
[0072]
Next, a detailed example of the process of S212 will be described with reference to FIGS. FIG. 9 is a flowchart illustrating an example of a process of determining whether to lay out an unacquired document element on a page currently being processed, and FIG. 10 is a diagram illustrating an example of a free area that can be laid out.
[0073]
FIG. 10A shows a page (A11) in the middle of processing having a free area. A page margin (blank area) indicated by oblique lines is set at the periphery of this page. An area excluding this margin from the page (A11) is an area (A12) where the layout is possible on this page. In the illustrated example, the text element area (A13) and the image element area (A14) for which the display content data has been acquired have already been laid out in the layout possible area (A12) according to the description order of the HTML document.
[0074]
In the process of FIG. 9, first, a vacant area that can be laid out on the current page is determined (S301). Here, for example, as shown in FIG. 10B, a rectangular hatched area from the horizontal line at the lower end of the image element (A14), which is the last document element already laid out, to the end of the displayable area of the page (B11) is determined to be a free area. In the example of FIG. 10, since the right side of the image element (A14) is largely empty, the rectangular shaded area (C11) on the right side of the image element (A14) is determined as an empty area as shown in (C). You may make it. Alternatively, both the shaded areas (B11) and (C11) may be determined to be free areas at the same time. In this case, when it is later determined whether or not the document element of interest fits in the free area, the determination is performed for each of the two shaded areas, and the shaded area determined to be “fit” is determined as that of the document element. If it is determined that both of the layout locations are within the layout location, it is sufficient to determine which shaded area is to be adopted as the layout location based on a separately determined criterion.
[0075]
If a free area is determined (here, the shaded area (B11) is determined to be a free area), then the size of the free area is calculated, and if this value is equal to or smaller than a predetermined size (determination in S302) If the result is Y) or the aspect ratio of the empty area is out of the predetermined range (the determination result in S303 is Y), it is determined that the document element e is not laid out on the current page p (that is, excluded from the current page p). (S307). For example, in S302, one-sixth of the layable area (A12) of the page is set as a “predetermined size” as a criterion. In S303, for example, a range in which the aspect ratio is 1/5 to 5/1 is set as a "predetermined range" of the determination criterion. According to these determinations, when only a narrow area (a vertically narrow area or a horizontally narrow area) unsuitable for arranging the document element is available, processing such as estimating the display size of the document element (S304) can be skipped.
[0076]
If the exclusion is not determined in S302 and S303, then the display size of the document element e is estimated (S304; detailed processing contents will be described later). Next, it is determined whether or not the estimated display size exceeds the size of the free area (B11) (S305). If it is determined that the element e is exceeded, it is determined that the element e is not laid out on the current page p (is excluded from the current page p) (S307). If it is determined in S305 that the estimated display size does not exceed the size of the free area, it is determined that the element e is laid out on the current page p (not excluded from the current page p) (S306).
[0077]
In the above processing, as the display size estimation method in S304, for example, there is a method of uniformly determining the display size based on the attribute of the document element e. For example, when focusing on the data type of the document element e as an attribute, if the document element e is text, the display size is X, if the document element is an image, the display size is Y, etc., and registered in advance corresponding to the data type of the element e. The set size value is used as an estimated value.
[0078]
When the data size of the display content data of the document element e is known by the protocol at the start of the acquisition of the data, the data size may be converted into the display size, and the converted value may be used as the estimated value of the display size. The conversion coefficient from the data size to the display size may be determined for each data type of the document element e.
[0079]
As another method, there is a method of dynamically determining the data in consideration of a data acquisition situation. An example of this method will be described with reference to FIG.
[0080]
FIG. 11 is a flowchart illustrating an example of processing for dynamically estimating the display size based on the time required for the display area estimating unit 17 to acquire the display content data of the document element e. In this process, first, the elapsed time t from the start of the process of acquiring the display content data of the document element e is acquired (S401). The information on the elapsed time is managed for each document element being acquired by the data acquisition unit 12.
[0081]
Next, the current data communication speed s is obtained (S402). The communication speed s can be determined, for example, from the type of a communication device (or communication line) currently used for data communication for acquiring the display content data. For example, the communication speed is determined to be 56 kbps if the communication device used is an analog modem, and 8 Mbps if the communication device is an ADSL (Asynchronous Digital Subscriber Line) modem. Further, the data acquisition unit 12 may monitor the data acquisition state and calculate the actually measured value of the communication speed s. In this case, the current communication speed s can also be estimated from the data size of the display content data of several immediately preceding document elements and the time required for obtaining the data.
[0082]
Then, the display size calculation coefficient r of the document element e is obtained (S403). The calculation coefficient r is a coefficient for converting a data size into a display size, and can be determined, for example, for each data type. In this case, for example, a calculation coefficient r is determined in advance for each data type, a table as shown in FIG. 12 is created and registered in the apparatus, and the coefficient r corresponding to the document element e is obtained from this table. Just fine. For example, when a JPEG image is compared with a normal bitmap image, even if the same data size is acquired, the display size of a JPEG image in which data is compressed is larger. The calculation coefficient r may be determined in consideration of such a tendency.
[0083]
Then, the estimated display size w of the document element e is calculated based on, for example, the following expression (1) based on the value obtained in this manner (S404).
w = r * s * t Expression (1)
[0084]
The process in FIG. 11 is called via S212 in FIG. 8 and S304 in FIG. 9 each time acquisition of the data of the document element e is started or every time a part of the data is acquired. The estimated value calculated in this process is used for the determination in S305 in FIG. Each time data is acquired (the elapsed time increases), the estimated size increases. If it is determined in S305 in FIG. 9 that the data exceeds the free space, even if the acquisition is not completed, the size of the page currently being processed is determined. Layout processing can be completed in advance.
[0085]
In other words, in the area size estimation processing of FIG. 11, the display size of the part obtained at the current time is estimated based on the amount of data obtained up to the current time among the display content data of the document element e of interest. I have. Therefore, in step S305 in the procedure of FIG. 9, it is determined whether or not the portion obtained up to the current time fits in the free area of the page. As long as the part obtained up to the present time is already larger than the free area, it is naturally determined that the display content data of the complete document element e does not fit in the free area.
[0086]
On the other hand, even if it is determined in S305 that the estimated display size is equal to or smaller than the free area, there is no guarantee that the display content data of the complete document element e will fit in the free area. Therefore, the determination that “the element e is not excluded from the current page p” in S306 is provisional. The final determination as to whether or not the document element e can be laid out on the current page is determined only when the size of the complete display content data of the element e is known. The processing according to the present embodiment is significant in that the document element e detects not being able to be laid out on the current page p but not being able to do it (or having a high probability of being not able to do so) as early as possible. . If it is found at an earlier stage that the document element e cannot be laid out on the current page p, the layout of the current page p is determined and printing is started at that stage without waiting for the completion of acquisition of the element e. be able to.
[0087]
In the above description, the case where the present invention is applied to a printer is taken as an example. However, as will be apparent from the above description, the present invention can be applied to a print server or installed in a personal computer or the like. It can also be applied to printer drivers. In this case, when the layout of the current page p is determined, the printer driver may create print data instructing the print content of the current page p, transmit the print data to the printer, and instruct printing. .
[0088]
In the above description, the case of processing an XHTML-print document is taken as an example. However, as will be apparent, the present invention can describe the display content data of a document element, including HTML and XML, by reference. It is applicable to general documents described in a description language.
[0089]
【The invention's effect】
As described above, according to the present invention, the acquisition of the display content data of the document element is completed by determining whether there is a high possibility that the document element cannot be laid out on the page currently being laid out. Can be determined before doing so. If it is determined that there is a high possibility that the document element cannot be laid out on the page currently being laid out, the layout of the page is determined up to the previous document element, thereby obtaining the display content data of the document element in question. The printout of the page can be performed without waiting for completion, and the print processing can be speeded up.
[Brief description of the drawings]
FIG. 1 illustrates an example of printing an HTML document.
FIG. 2 is a diagram illustrating an example of a hardware configuration of a printer.
FIG. 3 is a diagram illustrating a configuration example of a system to which a printer is applied.
FIG. 4 is a diagram illustrating a configuration example of a processing mechanism of a printer.
FIG. 5 is a flowchart illustrating an example of a processing procedure of a document analysis unit and a document configuration information management unit.
FIG. 6 is a diagram illustrating an example of document configuration information.
FIG. 7 is a diagram illustrating an example of configuration management information.
FIG. 8 is a flowchart illustrating an example of a layout process.
FIG. 9 is a flowchart illustrating an example of an unacquired element exclusion determination process.
FIG. 10 is a diagram illustrating an example of a free area that can be laid out;
FIG. 11 is a flowchart illustrating an example of a process of estimating a display size based on an elapsed time from the start of acquisition.
FIG. 12 is a diagram illustrating an example of a coefficient table for estimating a display size from a data size.
[Explanation of symbols]
10 document data processing mechanism, 11 main control section, 12 data acquisition section, 13 document analysis section, 14 document configuration information management section, 15 layout section, 16 determination section, 17 display area estimation section, 20 LAN.

Claims

Analysis means for analyzing a description of a document including a plurality of document elements;
Data acquisition means for acquiring display content data of each document element detected at the time of analysis by the analysis means,
A layout unit that determines a layout of each document element on each output page based on an analysis result of the analysis unit, and tries to lay out the document element on the output page when determining the layout of each document element in order. If the display content data of the document element has not been acquired, the display size of the display content data is estimated.If the estimated display size does not fit in the size of the unlayout area of the output page, the document element is placed on the next page. A layout means for laying out and deciding the layout of the output page up to the document element immediately before the document element;
Output means for generating image data of an output page whose layout is determined by the layout means, based on display content data of each document element included in the output page and its layout, and forming an image on a medium;
An image forming apparatus comprising:

The image forming apparatus according to claim 1,
The image forming apparatus according to claim 1, wherein the layout unit uses a data type of the display content data as basic information for the estimation when estimating a display size of the display content data that has not been acquired.

The image forming apparatus according to claim 1,
When the display content data of the document element to be laid out on the output page has not been obtained, the layout unit determines the data size of the portion of the display content data for which data acquisition has been completed up to the present time, and If the display size is estimated and the estimated display size of the obtained part does not fit in the size of the unlayout area of the output page, the document element shall be laid out on the next page, and the document element immediately before the document element shall be Wherein the layout of the output page is determined.

Analysis means for analyzing a description of a document including a plurality of document elements;
Data acquisition means for acquiring display content data of each document element detected at the time of analysis by the analysis means,
A layout unit that determines a layout of each document element on each output page based on an analysis result of the analysis unit, and tries to lay out the document element on the output page when determining the layout of each document element in order. If the display content data of the document element has not been obtained, the document element is laid out on the next page based on the size of the unlayout area or the aspect ratio of the unlayout area, and the document element immediately before the document element is A layout means for determining the layout of the output page with
Output means for generating image data of an output page whose layout is determined by the layout means, based on display content data of each document element included in the output page and its layout, and forming an image on a medium;
An image forming apparatus comprising:

Analyze the description of the document including multiple document elements, acquire the display content data of each document element detected at the time of analysis, lay out the acquired display content data in order from the first page according to the document element order, and An image forming method for generating an image of each page according to a result and forming an image on a medium,
If the display content data of the document element to be laid out on the page has not been acquired, the display size of the display content data is estimated based on at least the data type of the document element, and the estimated display size is determined to be If the document element does not fit in the layout area, the document element is laid out on the next page, and the layout of the page is determined up to the document element immediately before the document element.
A method comprising:

A program incorporated in a computer system for controlling image formation by a printer connected to the computer system, the computer system comprising:
Analysis means for analyzing the description of a document including a plurality of document elements,
Data acquisition means for acquiring display content data of each document element detected at the time of analysis by the analysis means,
A layout unit that determines a layout of each document element on each output page based on an analysis result of the analysis unit, and tries to lay out the document element on the output page when determining the layout of each document element in order. If the display content data of the document element has not been acquired, the display size of the display content data is estimated.If the estimated display size does not fit in the size of the unlayout area of the output page, the document element is placed on the next page. Layout means for laying out, and determining the layout of the output page up to the document element immediately before the document element;
Output means for generating print data of the output page whose layout is determined by the layout means, based on the display content data of each document element included in the output page and the layout thereof, transmitting the print data to the printer, and instructing printing;
Program to function as