JP2016200992A

JP2016200992A - Display position acquisition program, display position acquisition device and display position acquisition method

Info

Publication number: JP2016200992A
Application number: JP2015081041A
Authority: JP
Inventors: 勇作藤井; Yusaku Fujii
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-04-10
Filing date: 2015-04-10
Publication date: 2016-12-01
Anticipated expiration: 2035-04-10
Also published as: JP6565287B2

Abstract

PROBLEM TO BE SOLVED: To provide a display position acquisition program capable of acquiring a display position of a character element in the display elements of a display data.SOLUTION: The display position acquisition program causes a computer to execute a series of processing including: to display a piece of display data which includes a character element and an image element as display elements; to acquire an image which represents the display result of the display data; to disable the display of the image element in the acquired image; to recognize characters on the image in which the display of the image element is disabled, to thereby acquire a piece of position information of the respective character elements included in the display data.SELECTED DRAWING: Figure 3

Description

本発明は、表示位置取得プログラム、表示位置取得装置、及び表示位置取得方法に関する。 The present invention relates to a display position acquisition program, a display position acquisition device, and a display position acquisition method.

インターネットの普及ととともに、インターネットを介して取得されるＨＴＭＬ文書を表示させるＷｅｂブラウザの種類が増加している。Ｗｅｂブラウザの種類に加え、Ｗｅｂブラウザのバージョン及び動作ＯＳ等の違いを考慮すると多数の組み合わせが存在する。 With the spread of the Internet, the types of Web browsers that display HTML documents obtained via the Internet are increasing. In addition to the types of Web browsers, there are many combinations in consideration of differences in Web browser versions and operating OSs.

各Ｗｅｂブラウザは、相互に描画特性が異なったり、独自の拡張機能を有していたりする。また、ＨＴＭＬに関してＷｅｂブラウザごとに解釈が異なる定義が有る。その結果、一つのＨＴＭＬ文書に関して、Ｗｅｂブラウザごとに表示結果が異なる場合が有る。 Each Web browser has different drawing characteristics from each other or has a unique extension function. Further, there is a definition regarding HTML that is interpreted differently for each Web browser. As a result, the display result may differ for each Web browser with respect to one HTML document.

そのため、ＨＴＭＬ文書の作成者は、各Ｗｅｂブラウザにおいて自らの意図した通りにＨＴＭＬ文書が表示されるか否か、すなわち、各Ｗｅｂブラウザ間でＨＴＭＬ文書の表示結果が同じになるか否かについて、テストの実施が要求される。 Therefore, the creator of the HTML document determines whether or not the HTML document is displayed as intended by each Web browser, that is, whether or not the display result of the HTML document is the same between the Web browsers. Tests are required to be performed.

現状では、各ＷｅｂブラウザにおいてＨＴＭＬ文書を実際に表示させ、表示結果が他のＷｅｂブラウザと一致するか否かが目視で確認されることでテストが行われるのが一般的である。しかし、この方法では、熟練者でなければ、ＨＴＭＬ文書内におけるいずれの記述が、表示結果の相違の発生要因となっているのかを特定するのに時間を要する。 At present, the test is generally performed by actually displaying an HTML document in each Web browser and visually confirming whether the display result matches that of another Web browser. However, in this method, it takes time to specify which description in the HTML document causes the difference in the display result unless it is an expert.

特開２０１０−３９８１５号公報JP 2010-39815 A 特開２００６−１７１８５１号公報JP 2006-171851 A

ＷｅｂブラウザによるＨＴＭＬ文書の表示結果に含まれている各表示要素の座標は、例えば、オープンソースソフトウェアであるＷｅｂＤｒｉｖｅｒを用いて取得することができる。ＷｅｂＤｒｉｖｅｒは、Ｗｅｂブラウザを外部からコントロールするプラグインソフトであり、その内容は、http://docs.seleniumhq.org/projects/webdriver/に詳しい。 The coordinates of each display element included in the display result of the HTML document by the Web browser can be acquired using, for example, WebDriver that is open source software. WebDriver is plug-in software that controls a Web browser from the outside, and its contents are detailed in http://docs.seleniumhq.org/projects/webdriver/.

ＷｅｂＤｒｉｖｅｒを用いると、Ｗｅｂブラウザ上に表示されたＨＴＭＬ文書に記述されている各タグが支配する領域、すなわち、各タグに係る表示領域の矩形座標を取得することができる。例えば、表示領域が＜ｄｉｖ＞タグで分割されている場合には、各＜ｄｉｖ＞タグで分割される領域ごとに矩形座標を取得することができる。＜ｐ＞タグで段落が定義されている場合には、当該＜ｐ＞タグで定義される段落の表示領域の矩形座標を取得することができる。 By using WebDriver, it is possible to acquire the area controlled by each tag described in the HTML document displayed on the Web browser, that is, the rectangular coordinates of the display area related to each tag. For example, when the display area is divided by <div> tags, rectangular coordinates can be acquired for each area divided by each <div> tag. When a paragraph is defined by the <p> tag, the rectangular coordinates of the display area of the paragraph defined by the <p> tag can be acquired.

また、ＨＴＭＬ文書において表示要素とされている画像は、一つの＜ｉｍｇ＞タグで定義される。したがって、ＷｅｂＤｒｉｖｅｒを用いて、或る＜ｉｍｇ＞タグの表示領域の矩形座標を取得すれば、当該矩形座標は、当該＜ｉｍｇ＞タグに係る画像の表示領域であるとみなすことができる。厳密には、ｐａｄｄｉｎｇ属性やｂｏｒｄｅｒ属性により、＜ｉｍｇ＞タグの表示領域と、実際の画像の表示領域とが異なる場合もあるが、ＷｅｂＤｒｉｖｅｒにより、ｐａｄｄｉｎｇ属性の値やｂｏｒｄｅｒ属性の値も取得できるため、実際の画像の表示領域を正確に計算することができる。 An image that is a display element in an HTML document is defined by a single <img> tag. Therefore, if the rectangular coordinates of the display area of a certain <img> tag are acquired using WebDriver, the rectangular coordinates can be regarded as the display area of the image related to the <img> tag. Strictly speaking, the <img> tag display area may differ from the actual image display area depending on the padding attribute and the border attribute, but the value of the padding attribute and the border attribute can also be acquired by WebDriver. The actual image display area can be accurately calculated.

しかしながら、文字列の場合は、画像と異なり、一つの文字列が一つのＨＴＭＬタグによって定義されているとは限らない。また、Ｗｅｂブラウザ上に表示される文字列は、ＨＴＭＬ文書の＜ｂｏｄｙ＞部に予め記述されている静的な文字列に限られない。例えば、ＪａｖａＳｃｒｉｐｔ（登録商標）等のプログラムにより、ＨＴＭＬ文書の表示時にＷｅｂブラウザ側において動的に生成される文字列もＷｅｂブラウザ上に表示される場合が有る。すなわち、ＨＴＭＬソースコードの＜ｂｏｄｙ＞部に記述されていない文字列もＷｅｂブラウザ上に表示されうる。 However, in the case of a character string, unlike an image, one character string is not always defined by one HTML tag. Further, the character string displayed on the Web browser is not limited to the static character string described in advance in the <body> portion of the HTML document. For example, a character string dynamically generated on the Web browser side when an HTML document is displayed by a program such as JavaScript (registered trademark) may be displayed on the Web browser. That is, a character string not described in the <body> portion of the HTML source code can also be displayed on the Web browser.

そこで、一側面では、表示データの表示要素のうちの文字要素の表示位置を取得可能とすることを目的とする。 Accordingly, an object of one aspect is to make it possible to acquire the display position of a character element among display elements of display data.

一つの案では、表示位置取得プログラムは、コンピュータに、文字要素と画像要素とを表示要素として含む表示データを表示し、前記表示データの表示結果を示す画像を取得し、取得された画像において、前記画像要素の表示を無効化し、前記画像要素の表示が無効化された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する、処理を実行させる。 In one proposal, the display position acquisition program displays display data including a character element and an image element as display elements on a computer, acquires an image indicating a display result of the display data, and in the acquired image, The display of the image element is invalidated, and the character recognition is performed on the image in which the display of the image element is invalidated, thereby obtaining the position information of each character element included in the display data. .

一側面によれば、表示データの表示要素のうちの文字要素の表示位置を取得可能とすることができる。 According to one aspect, it is possible to obtain the display position of the character element among the display elements of the display data.

第１の実施の形態におけるシステム構成例を示す図である。It is a figure which shows the system configuration example in 1st Embodiment. 第１の実施の形態における表示位置取得装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the display position acquisition apparatus in 1st Embodiment. 第１の実施の形態における表示位置取得装置の機能構成例を示す図である。It is a figure which shows the function structural example of the display position acquisition apparatus in 1st Embodiment. 第１の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which the display position acquisition apparatus in 1st Embodiment performs. 評価対象のＨＴＭＬ文書の一例を示す図である。It is a figure which shows an example of the HTML document of evaluation object. キャプチャ画像の一例を示す図である。It is a figure which shows an example of a captured image. 画像要素が無効化された状態のキャプチャ画像の一例を示す図である。It is a figure which shows an example of the capture image in the state where the image element was invalidated. 第２の実施の形態における表示位置取得装置の機能構成例を示す図である。It is a figure which shows the function structural example of the display position acquisition apparatus in 2nd Embodiment. 第２の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which the display position acquisition apparatus in 2nd Embodiment performs. 第３の実施の形態におけるシステム構成例を示す図である。It is a figure which shows the system configuration example in 3rd Embodiment. 第３の実施の形態における表示位置取得装置の機能構成例を示す図である。It is a figure which shows the function structural example of the display position acquisition apparatus in 3rd Embodiment. 第３の実施の形態においてＨＴＴＰプロキシが実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which an HTTP proxy performs in 3rd Embodiment. 第３の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which the display position acquisition apparatus in 3rd Embodiment performs. 第４の実施の形態における表示位置取得装置の機能構成例を示す図である。It is a figure which shows the function structural example of the display position acquisition apparatus in 4th Embodiment. 第４の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which the display position acquisition apparatus in 4th Embodiment performs. 各ＨＴＭＬ要素の表示領域の一例を示す図である。It is a figure which shows an example of the display area of each HTML element. 各ＨＴＭＬ要素の表示領域の矩形座標の取得結果の一例を示す図である。It is a figure which shows an example of the acquisition result of the rectangular coordinate of the display area of each HTML element. 第４の実施の形態による効果を具体的に説明するための図である。It is a figure for demonstrating the effect by 4th Embodiment concretely. 第５の実施の形態における表示位置取得装置の機能構成例を示す図である。It is a figure which shows the function structural example of the display position acquisition apparatus in 5th Embodiment. 第５の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence which the display position acquisition apparatus in 5th Embodiment performs. タグ階層情報の一例を示す図である。It is a figure which shows an example of tag hierarchy information. タグ階層情報における末端のタグを説明するための図である。It is a figure for demonstrating the terminal tag in tag hierarchy information.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、第１の実施の形態におけるシステム構成例を示す図である。図１において、ＨＴＴＰサーバ２０と表示位置取得装置１０とは、インターネット又はＬＡＮ（Local Area Network）等のネットワークを介して通信可能に接続される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating an example of a system configuration in the first embodiment. In FIG. 1, the HTTP server 20 and the display position acquisition device 10 are communicably connected via a network such as the Internet or a LAN (Local Area Network).

ＨＴＴＰサーバ２０は、１以上のＨＴＭＬ文書を記憶するコンピュータである。ＨＴＴＰサーバ２０は、ＨＴＴＰリクエストを受信すると、当該ＨＴＴＰリクエストに係るＵＲＬ（Uniform Resource Locator）に対応するＨＴＭＬ文書を返信する。 The HTTP server 20 is a computer that stores one or more HTML documents. When receiving the HTTP request, the HTTP server 20 returns an HTML document corresponding to a URL (Uniform Resource Locator) related to the HTTP request.

表示位置取得装置１０は、ＨＴＴＰサーバ２０に記憶されたＨＴＭＬ文書がＷｅｂブラウザ１１によって表示された状態において、ＨＴＭＬ文書に含まれている表示要素（ＨＴＭＬ要素）のうちの各文字の表示位置を取得するコンピュータである。 The display position acquisition device 10 acquires the display position of each character among the display elements (HTML elements) included in the HTML document in a state where the HTML document stored in the HTTP server 20 is displayed by the Web browser 11. Computer.

なお、表示位置取得装置１０に、ＨＴＴＰサーバ２０に対応する機能が実装されてもよい。すなわち、表示位置取得装置１０とＨＴＴＰサーバ２０とは、一つの装置によって実現されてもよい。 Note that a function corresponding to the HTTP server 20 may be implemented in the display position acquisition apparatus 10. That is, the display position acquisition device 10 and the HTTP server 20 may be realized by a single device.

図２は、第１の実施の形態における表示位置取得装置のハードウェア構成例を示す図である。図２の表示位置取得装置１０は、それぞれバスＢで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、インタフェース装置１０５、表示装置１０６、及び入力装置１０７等を有する。 FIG. 2 is a diagram illustrating a hardware configuration example of the display position acquisition apparatus according to the first embodiment. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, a display device 106, an input device 107, and the like, which are mutually connected by a bus B. .

表示位置取得装置１０での処理を実現するプログラムは、記録媒体１０１によって提供される。プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program that realizes processing in the display position acquisition apparatus 10 is provided by the recording medium 101. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って表示位置取得装置１０に係る機能を実現する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。表示装置１０６はプログラムによるＧＵＩ（Graphical User Interface）等を表示する。入力装置１０７はキーボード及びマウス等であり、様々な操作指示を入力させるために用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 realizes functions related to the display position acquisition device 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network. The display device 106 displays a GUI (Graphical User Interface) or the like by a program. The input device 107 is a keyboard, a mouse, or the like, and is used for inputting various operation instructions.

なお、記録媒体１０１の一例としては、ＣＤ−ＲＯＭ、ＤＶＤディスク、又はＵＳＢメモリ等の可搬型の記録媒体が挙げられる。また、補助記憶装置１０２の一例としては、ＨＤＤ（Hard Disk Drive）又はフラッシュメモリ等が挙げられる。記録媒体１０１及び補助記憶装置１０２のいずれについても、コンピュータ読み取り可能な記録媒体に相当する。 An example of the recording medium 101 is a portable recording medium such as a CD-ROM, a DVD disk, or a USB memory. An example of the auxiliary storage device 102 is an HDD (Hard Disk Drive) or a flash memory. Both the recording medium 101 and the auxiliary storage device 102 correspond to computer-readable recording media.

図３は、第１の実施の形態における表示位置取得装置の機能構成例を示す図である。図３において、表示位置取得装置１０は、複数のＷｅｂブラウザ１１、画像取得部１２、ＯＣＲ部１３、及び出力部１４等を有する。これら各部は、表示位置取得装置１０にインストールされた１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。 FIG. 3 is a diagram illustrating a functional configuration example of the display position acquisition apparatus according to the first embodiment. In FIG. 3, the display position acquisition apparatus 10 includes a plurality of Web browsers 11, an image acquisition unit 12, an OCR unit 13, an output unit 14, and the like. Each of these units is realized by processing that one or more programs installed in the display position acquisition apparatus 10 cause the CPU 104 to execute.

Ｗｅｂブラウザ１１は、ＨＴＭＬ文書をＨＴＭＬサーバから取得し、取得されたＨＴＭＬ文書の表示を制御する。また、Ｗｅｂブラウザ１１は、ＪａｖａＳｃｒｉｐｔ（登録商標）等のスクリプトがＨＴＭＬ文書に含まれている場合には、当該スクリプトを実行する。各Ｗｅｂブラウザ１１は、例えば、相互に異なる種類のＷｅｂブラウザ１１である。 The Web browser 11 acquires an HTML document from the HTML server, and controls display of the acquired HTML document. In addition, when a script such as JavaScript (registered trademark) is included in the HTML document, the Web browser 11 executes the script. Each Web browser 11 is, for example, a different type of Web browser 11.

画像取得部１２は、Ｗｅｂブラウザ１１によるＨＴＭＬ文書の表示結果に係る画像であって、ＨＴＭＬ文書中の表示要素（ＨＴＭＬ要素）のうち、画像要素（イメージ要素）の表示領域が無効化された画像を取得する。図３において、画像取得部１２は、ブラウザ表示部１２１、画面キャプチャ部１２２、及び画像要素無効化部１２３等を含む。 The image acquisition unit 12 is an image related to a display result of an HTML document by the Web browser 11 and is an image in which a display area of an image element (image element) is invalidated among display elements (HTML elements) in the HTML document. To get. 3, the image acquisition unit 12 includes a browser display unit 121, a screen capture unit 122, an image element invalidation unit 123, and the like.

ブラウザ表示部１２１は、評価対象のＨＴＭＬ文書に対応するＵＲＬをＷｅｂブラウザ１１に入力して、当該ＨＴＭＬ文書の表示をＷｅｂブラウザ１１に実行させる。画面キャプチャ部１２２は、Ｗｅｂブラウザ１１によるＨＴＭＬ文書の表示結果を示す画像を、例えば、画面キャプチャによって取得する。画面キャプチャ部１２２によって取得される画像を、「キャプチャ画像」という。画像要素無効化部１２３は、キャプチャ画像の領域において、ＨＴＭＬ文書の表示要素のうちの画像要素の表示を無効化する。画像要素とは、ＨＴＭＬ文書中における＜ｉｍｇ＞タグに基づいて表示された画像をいう。 The browser display unit 121 inputs a URL corresponding to the HTML document to be evaluated to the Web browser 11 and causes the Web browser 11 to display the HTML document. The screen capture unit 122 acquires an image indicating the display result of the HTML document by the Web browser 11 by screen capture, for example. An image acquired by the screen capture unit 122 is referred to as a “capture image”. The image element invalidation unit 123 invalidates the display of the image elements among the display elements of the HTML document in the capture image area. An image element refers to an image displayed based on an <img> tag in an HTML document.

ＯＣＲ部１３は、画像要素無効化部１２３によって画像要素が無効化されたキャプチャ画像に対してＯＣＲ（Optical Character Reader）処理を実行し、キャプチャ画像に含まれている各文字の文字コードと当該文字の位置情報とを取得する。出力部１４は、ＯＣＲ部１３によって取得された情報を出力する。 The OCR unit 13 performs OCR (Optical Character Reader) processing on the captured image in which the image element is invalidated by the image element invalidating unit 123, and the character code of each character included in the captured image and the character To get location information. The output unit 14 outputs information acquired by the OCR unit 13.

以下、表示位置取得装置１０が実行する処理手順について説明する。図４は、第１の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。 Hereinafter, a processing procedure executed by the display position acquisition apparatus 10 will be described. FIG. 4 is a flowchart for explaining an example of a processing procedure executed by the display position acquisition apparatus according to the first embodiment.

ステップＳ１０１において、ブラウザ表示部１２１は、表示位置取得装置１０にインストールされている複数のＷｅｂブラウザ１１のうちの１つのＷｅｂブラウザ１１を起動し、当該Ｗｅｂブラウザ１１に対して、評価対象のＨＴＭＬ文書のＵＲＬを入力する。その結果、当該ＨＴＭＬ文書が当該Ｗｅｂブラウザ１１によって表示される。なお、Ｗｅｂブラウザ１１の動作環境は、Ｗｅｂブラウザ１１の製品の種類や、ＯＳ（Operating System）、Ｗｅｂブラウザ１１の表示サイズ、及びそれらの組み合わせ等により、様々なものが存在するが、評価したい任意の環境が予め設定された後に、Ｗｅｂブラウザ１１が起動されればよい。 In step S <b> 101, the browser display unit 121 activates one of the plurality of web browsers 11 installed in the display position acquisition apparatus 10, and the HTML document to be evaluated with respect to the web browser 11. Enter the URL. As a result, the HTML document is displayed by the Web browser 11. There are various operating environments of the Web browser 11 depending on the product type of the Web browser 11, the OS (Operating System), the display size of the Web browser 11, combinations thereof, and the like. After the environment is set in advance, the web browser 11 may be activated.

図５は、評価対象のＨＴＭＬ文書の一例を示す図である。図５において、ＨＴＭＬ文書ｄ１は、ｃｅｎｔｅｒ要素ｅ１、ｄｉｖ要素ｅ２、ｄｉｖ要素ｅ３、ｉｍｇ要素ｅ４、ｄｉｖ要素ｅ５、ｉｍｇ要素ｅ６、ｉｍｇ要素ｅ７の７つのＨＴＭＬ要素（表示要素）を含む。 FIG. 5 is a diagram illustrating an example of an HTML document to be evaluated. In FIG. 5, an HTML document d1 includes seven HTML elements (display elements) including a center element e1, a div element e2, a div element e3, an img element e4, a div element e5, an img element e6, and an img element e7.

続いて、画面キャプチャ部１２２は、Ｗｅｂブラウザ１１のウィンドウ内の表示領域の画像を取得する（Ｓ１０２）。すなわち、ＨＴＭＬ文書ｄ１の表示結果を示す画像が取得される。斯かる画像の取得は、ＷｅｂＤｒｉｖｅｒを使用することで行われてもよい。ＷｅｂＤｒｉｖｅｒとは、Ｗｅｂブラウザ１１を外部からコントロールするプラグインソフトであり、ＨＴＭＬ文書ｄ１を表示しているＷｅｂブラウザ１１から、その表示内容に関する様々な情報を引き出すことができる。ＷｅｂＤｒｉｖｅｒについては、http://docs.seleniumhq.org/projects/webdriver/に詳しい。なお、ステップＳ１０２において取得された画像を、以下「キャプチャ画像」という。 Subsequently, the screen capture unit 122 acquires an image of the display area in the window of the Web browser 11 (S102). That is, an image indicating the display result of the HTML document d1 is acquired. Such image acquisition may be performed using a WebDriver. The WebDriver is plug-in software that controls the Web browser 11 from the outside, and can extract various information related to the display contents from the Web browser 11 displaying the HTML document d1. Details on WebDriver can be found at http://docs.seleniumhq.org/projects/webdriver/. The image acquired in step S102 is hereinafter referred to as “capture image”.

ＨＴＭＬ文書ｄ１に関しては、例えば、図６に示されるようなキャプチャ画像が取得される。図６は、キャプチャ画像の一例を示す図である。図６に示されるキャプチャ画像ｃ１において、画像ｇ１は、ｉｍｇ要素ｅ４に基づいて表示される。画像ｇ２は、ｉｍｇ要素ｅ６に基づいて表示される。画像ｇ３は、ｉｍｇ要素ｅ７に基づいて表示される。 For the HTML document d1, for example, a captured image as shown in FIG. 6 is acquired. FIG. 6 is a diagram illustrating an example of a captured image. In the captured image c1 shown in FIG. 6, the image g1 is displayed based on the img element e4. The image g2 is displayed based on the img element e6. The image g3 is displayed based on the img element e7.

続いて、画像要素無効化部１２３は、ＨＴＭＬ文書ｄ１に含まれているＨＴＭＬタグのうちの一つのＨＴＭＬタグを、処理対象として選択する（Ｓ１０３）。以下、選択されたＨＴＭＬタグを、「対象タグ」という。ＨＴＭＬ文書ｄ１に含まれているタグの情報についても、ＷｅｂＤｒｉｖｅｒが利用された取得されてもよいし、他の方法によって取得されてもよい。 Subsequently, the image element invalidation unit 123 selects one HTML tag among the HTML tags included in the HTML document d1 as a processing target (S103). Hereinafter, the selected HTML tag is referred to as a “target tag”. The tag information included in the HTML document d1 may also be acquired using WebDriver or may be acquired by other methods.

続いて、画像要素無効化部１２３は、対象タグが、＜ｉｍｇ＞タグであるか否かを判定する（Ｓ１０４）。対象タグが、＜ｉｍｇ＞タグでない場合（Ｓ１０４でＮｏ）、ステップＳ１０７に進む。対象タグが、＜ｉｍｇ＞タグである場合（Ｓ１０４でＹｅｓ）、画像要素無効化部１２３は、＜ｉｍｇ＞タグに対応する画像要素の表示領域（キャプチャ画像ｃ１中における領域）の座標値を、算出する（Ｓ１０５）。当該表示領域の座標値についても、ＷｅｂＤｒｉｖｅｒを利用して算出されてもよい。続いて、画像要素無効化部１２３は、キャプチャ画像ｃ１中における、対象タグの表示領域を白で塗りつぶす（Ｓ１０６）。その結果、当該表示領域に表示される予定の画像の表示は、無効化（非表示に）される。なお、白以外の色によって塗りつぶされてもよい。 Subsequently, the image element invalidating unit 123 determines whether the target tag is an <img> tag (S104). If the target tag is not the <img> tag (No in S104), the process proceeds to step S107. When the target tag is the <img> tag (Yes in S104), the image element invalidating unit 123 sets the coordinate value of the display area of the image element (area in the captured image c1) corresponding to the <img> tag. Calculate (S105). The coordinate value of the display area may also be calculated using WebDriver. Subsequently, the image element invalidating unit 123 paints the display area of the target tag in white in the captured image c1 (S106). As a result, the display of the image scheduled to be displayed in the display area is invalidated (hidden). It may be filled with a color other than white.

ＨＴＭＬ文書ｄ１中における全てのタグについて処理が終了すると（Ｓ１０７でＹｅｓ）、キャプチャ画像ｃ１は、図７に示されるようになる。 When the processing is completed for all the tags in the HTML document d1 (Yes in S107), the captured image c1 is as shown in FIG.

図７は、画像要素が無効化された状態のキャプチャ画像の一例を示す図である。図７に示されるキャプチャ画像ｃ２では、図６のキャプチャ画像ｃ１に含まれていた画像ｇ１、ｇ２、及びｇ３が無効化されている（非表示にされている）。 FIG. 7 is a diagram illustrating an example of a captured image in a state where image elements are invalidated. In the captured image c2 shown in FIG. 7, the images g1, g2, and g3 included in the captured image c1 in FIG. 6 are invalidated (hidden).

続いて、ＯＣＲ部１３は、キャプチャ画像ｃ２に対してＯＣＲを適用する（Ｓ１０８）。その結果、キャプチャ画像ｃ２に含まれている文字ごとに、文字コード及び位置情報（表示座標の値）が取得される。続いて、出力部１４は、ＯＣＲ部１３によって文字ごとに取得された文字コード及び位置情報を出力する（Ｓ１０９）。例えば、ＨＴＭＬ文書ｄ１の表示に利用されたＷｅｂブラウザ１１の識別情報等に対応付けられて、各文字の文字コード及び位置情報を含むファイルが、補助記憶装置１０２に記憶されてもよい。 Subsequently, the OCR unit 13 applies OCR to the captured image c2 (S108). As a result, the character code and position information (value of display coordinates) are acquired for each character included in the captured image c2. Subsequently, the output unit 14 outputs the character code and position information acquired for each character by the OCR unit 13 (S109). For example, a file including the character code of each character and position information may be stored in the auxiliary storage device 102 in association with the identification information of the Web browser 11 used for displaying the HTML document d1.

上述したように、第１の実施の形態によれば、ＨＴＭＬ文書ｄ１の表示結果を示す画像であるキャプチャ画像ｃ２が取得され、キャプチャ画像ｃ２に対してＯＣＲが適用される。その結果、ＨＴＭＬ文書ｄ１に含まれる表示要素のうちの文字要素の表示位置を取得することができる。すなわち、文字要素ごとにタグが付加されていない場合や、ＪａｖａＳｃｒｉｐｔ（登録商標）等のスクリプトによって動的に表示される文字要素に関しても、表示位置を取得することができる。 As described above, according to the first embodiment, the capture image c2 that is an image indicating the display result of the HTML document d1 is acquired, and OCR is applied to the capture image c2. As a result, the display position of the character element among the display elements included in the HTML document d1 can be acquired. That is, the display position can be acquired even when a tag is not added to each character element or for a character element that is dynamically displayed by a script such as JavaScript (registered trademark).

また、本実施の形態では、ＯＣＲ処理において誤認識の原因になりやすい画像要素が除去された状態で、キャプチャ画像ｃ２に対してＯＣＲが適用される。したがって、ＯＣＲ処理において、文字の配置位置を調べるためのレイアウト解析のエラー等を低減することができる。その結果、Ｗｅｂブラウザ１１によるＨＴＭＬ文書ｄ１の表示結果に対する文字認識の精度の向上を期待することができる。 In the present embodiment, OCR is applied to the captured image c2 in a state where image elements that are likely to cause misrecognition in the OCR processing are removed. Therefore, in the OCR process, it is possible to reduce layout analysis errors and the like for checking the character arrangement position. As a result, it can be expected that the accuracy of character recognition with respect to the display result of the HTML document d1 by the Web browser 11 is improved.

なお、図４に示した処理が、Ｗｅｂブラウザ１１ごとに実行されることにより、Ｗｅｂブラウザ１１ごとに、同一のＨＴＭＬ文書ｄ１について、各文字の位置情報を取得することができる。Ｗｅｂブラウザ１１ごとの各文字の位置情報を相互に比較することで、各Ｗｅｂブラウザ１１による表示結果の相違の有無を確認することができる。各位置情報の比較は、公知の方法を用いて行われればよい。この際、厳密に、座標値の一致までが要求されなくてもよい。例えば、各文字の上下左右方向の関係が一致していれば、比較された表示結果は相互に一致するものとして判定されてもよい。 4 is executed for each Web browser 11, the position information of each character can be acquired for the same HTML document d1 for each Web browser 11. By comparing the position information of each character for each Web browser 11 with each other, it is possible to confirm whether there is a difference in display results between the Web browsers 11. The comparison of the position information may be performed using a known method. At this time, strictly, it is not necessary to require the coordinate values to coincide. For example, if the vertical and horizontal relations of the characters match, the compared display results may be determined as matching each other.

次に、第２の実施の形態について説明する。第２の実施の形態では第１の実施の形態と異なる点について説明する。第２の実施の形態において、特に言及されない点については、第１の実施の形態と同様でもよい。 Next, a second embodiment will be described. In the second embodiment, differences from the first embodiment will be described. In the second embodiment, points that are not particularly mentioned may be the same as those in the first embodiment.

図８は、第２の実施の形態における表示位置取得装置の機能構成例を示す図である。図８中、図３と同一部分又は対応する部分には同一符号を付し、その説明は省略する。 FIG. 8 is a diagram illustrating a functional configuration example of the display position acquisition apparatus according to the second embodiment. In FIG. 8, the same or corresponding parts as in FIG.

図８では、画像要素無効化部１２３と画面キャプチャ部１２２との配置関係が、図３と異なっている。第２の実施の形態において、画像要素無効化部１２３による処理と画面キャプチャ部１２２による処理との実タイミング、及び当該処理の内容が、第１の実施の形態と異なるからである。 In FIG. 8, the arrangement relationship between the image element invalidation unit 123 and the screen capture unit 122 is different from that in FIG. This is because, in the second embodiment, the actual timing between the processing by the image element invalidation unit 123 and the processing by the screen capture unit 122 and the contents of the processing are different from those in the first embodiment.

図９は、第２の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。 FIG. 9 is a flowchart for explaining an example of a processing procedure executed by the display position acquisition apparatus according to the second embodiment.

ステップＳ２０１は、ステップＳ１０１と同じでよい。ステップＳ２０２において、画像要素無効化部１２３は、ＨＴＭＬ文書ｄ１に含まれているＨＴＭＬタグのうちの一つのＨＴＭＬタグを、処理対象として選択する（Ｓ２０２）。ＨＴＭＬタグの選択には、図４のステップＳ１０３と同様に、ＷｅｂＤｒｉｖｅｒが利用されてもよい。なお、ステップＳ２０２において選択されたＨＴＭＬタグを、以下「対象タグ」という。 Step S201 may be the same as step S101. In step S202, the image element invalidating unit 123 selects one HTML tag among the HTML tags included in the HTML document d1 as a processing target (S202). For the selection of the HTML tag, WebDriver may be used as in step S103 of FIG. The HTML tag selected in step S202 is hereinafter referred to as “target tag”.

続いて、画像要素無効化部１２３は、対象タグが、＜ｉｍｇ＞タグであるか否かを判定する（Ｓ２０３）。対象タグが、＜ｉｍｇ＞タグである場合（Ｓ２０３でＹｅｓ）、画像要素無効化部１２３は、対象タグのｓｔｙｌｅ属性のｖｉｓｉｂｉｌｉｔｙの値をｈｉｄｄｅｎに設定する（Ｓ２０４）。そうすることにより、対象タグのＨＴＭＬ要素を非表示にすることができる。すなわち、画像要素を非表示にすることができる。 Subsequently, the image element invalidation unit 123 determines whether or not the target tag is an <img> tag (S203). When the target tag is the <img> tag (Yes in S203), the image element invalidating unit 123 sets the visibility value of the style attribute of the target tag to hidden (S204). By doing so, the HTML element of the target tag can be hidden. That is, the image element can be hidden.

一方、対象タグが、＜ｉｍｇ＞タグでない場合（Ｓ２０３でＮｏ）、画像要素無効化部１２３は、対象タグのｓｔｙｌｅ属性のｂａｃｋｇｒｏｕｎｄ−ｉｍａｇｅの値をｎｏｎｅに設定する（Ｓ２０５）。そうすることで、対象タグのＨＴＭＬ要素の背景に画像が表示されないようにすることができる。但し、ステップＳ２０５の処理は必須ではない。なお、ステップＳ２０４及びＳ２０５は、ＷｅｂＤｒｉｖｅｒを利用して実行されてもよい。そうすることで、タグの属性の変更を、表示結果に直ちに反映することができる。すなわち、ステップＳ２０４又はＳ２０５の実行結果を、即時的に、Ｗｅｂブラウザ１１によるＨＴＭＬ文書ｄ１の表示結果に反映させることができる。 On the other hand, when the target tag is not the <img> tag (No in S203), the image element invalidating unit 123 sets the value of background-image of the style attribute of the target tag to none (S205). By doing so, an image can be prevented from being displayed in the background of the HTML element of the target tag. However, the process of step S205 is not essential. Note that steps S204 and S205 may be executed using WebDriver. By doing so, the change in the attribute of the tag can be immediately reflected in the display result. That is, the execution result of step S204 or S205 can be immediately reflected in the display result of the HTML document d1 by the Web browser 11.

ステップＳ２０４又はＳ２０５が、ＨＴＭＬ文書ｄ１中の全てのタグに対して実行されると（Ｓ２０６でＹｅｓ）、画面キャプチャ部１２２は、Ｗｅｂブラウザ１１のウィンドウ内の表示領域の画像を取得する（Ｓ２０７）。ステップＳ２０７の実行方法は、図４のステップＳ１０２と同様でもよい。ステップＳ２０７では、＜ｉｍｇ＞タグのＨＴＭＬ要素が非表示にされた状態でのＨＴＭＬ文書ｄ１の表示結果の画像が取得される。すなわち、ステップＳ２０７では、図７に示したキャプチャ画像ｃ２が取得される。 When Step S204 or S205 is executed for all the tags in the HTML document d1 (Yes in S206), the screen capture unit 122 acquires an image of the display area in the window of the Web browser 11 (S207). . The execution method of step S207 may be the same as step S102 of FIG. In step S207, an image of the display result of the HTML document d1 in a state where the HTML element of the <img> tag is hidden is acquired. That is, in step S207, the captured image c2 shown in FIG. 7 is acquired.

続いて、ステップＳ２０８及びＳ２０９において、図４のステップＳ１０８及びＳ１０９と同様の処理が実行される。 Subsequently, in steps S208 and S209, processing similar to that in steps S108 and S109 in FIG. 4 is executed.

上述したように、第２の実施の形態によれば、第１の実施の形態と同様の効果を得ることができる。 As described above, according to the second embodiment, the same effect as that of the first embodiment can be obtained.

次に、第３の実施の形態について説明する。第３の実施の形態では第１又は第２の実施の形態と異なる点について説明する。第３の実施の形態において、特に言及されない点については、第１又は第２の実施の形態と同様でもよい。 Next, a third embodiment will be described. In the third embodiment, differences from the first or second embodiment will be described. In the third embodiment, points not particularly mentioned may be the same as those in the first or second embodiment.

図１０は、第３の実施の形態におけるシステム構成例を示す図である。図１０中、図１と同一部分には同一符号を付し、その説明は省略する。 FIG. 10 is a diagram illustrating a system configuration example according to the third embodiment. 10, the same parts as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted.

図１０において、表示位置取得装置１０とＨＴＴＰプロキシ３０とは、ＬＡＮ又はインターネット等のネットワークを介して通信可能に接続される。また、ＨＴＴＰプロキシ３０とＨＴＴＰサーバ２０とは、ＬＡＮ又はインターネット等のネットワークを介して通信可能に接続される。 In FIG. 10, the display position acquisition apparatus 10 and the HTTP proxy 30 are communicably connected via a network such as a LAN or the Internet. Further, the HTTP proxy 30 and the HTTP server 20 are communicably connected via a network such as a LAN or the Internet.

ＨＴＴＰプロキシ３０は、表示位置取得装置１０からのＨＴＴＰリクエストを中継してＨＴＴＰサーバ２０に転送し、当該ＨＴＴＰリクエストに対するＨＴＴＰサーバ２０からのＨＴＴＰレスポンスを中継して表示位置取得装置１０に転送するコンピュータである。なお、第１の実施の形態では、ＨＴＴＰプロキシは明示されていないが、このことは、第１の実施の形態が、ＨＴＴＰサーバ２０と表示位置取得装置１０との間にＨＴＴＰプロキシが介在しない形態に限定されることを意図する趣旨ではない。 The HTTP proxy 30 is a computer that relays an HTTP request from the display position acquisition device 10 and transfers it to the HTTP server 20, and relays an HTTP response from the HTTP server 20 in response to the HTTP request to the display position acquisition device 10. is there. In the first embodiment, the HTTP proxy is not specified, but this is a form in which the HTTP proxy is not interposed between the HTTP server 20 and the display position acquisition device 10 in the first embodiment. It is not intended to be limited to.

図１１は、第３の実施の形態における表示位置取得装置の機能構成例を示す図である。図１１中、図３と同一部分又は対応する部分には同一符号を付し、その説明は省略する。 FIG. 11 is a diagram illustrating a functional configuration example of the display position acquisition apparatus according to the third embodiment. In FIG. 11, the same or corresponding parts as those in FIG.

図１１において、ＨＴＴＰプロキシ３０は、プロキシ部３１及び画像要素無効化部３２等を有する。プロキシ部３１は、表示位置取得装置１０とＨＴＴＰサーバ２０との間のＨＴＴＰリクエスト及びＨＴＴＰレスポンスの中継を行う。画像要素無効化部３２は、ＨＴＴＰサーバ２０からのＨＴＴＰレスポンスに画像データが含まれている場合、当該画像データを、当該画像データと同サイズの透明画像又は白一色の画像の画像データに置き換える。 In FIG. 11, the HTTP proxy 30 includes a proxy unit 31 and an image element invalidation unit 32. The proxy unit 31 relays HTTP requests and HTTP responses between the display position acquisition apparatus 10 and the HTTP server 20. When image data is included in the HTTP response from the HTTP server 20, the image element invalidation unit 32 replaces the image data with image data of a transparent image or a white color image having the same size as the image data.

すなわち、第３の実施の形態では、ＨＴＭＬ文書ｄ１から参照されている各画像データが、ＨＴＴＰプロキシ３０の画像要素無効化部３２によって、透明画像又は白一色の画像データに置換される。 In other words, in the third embodiment, each piece of image data referred to from the HTML document d1 is replaced with a transparent image or white-color image data by the image element invalidation unit 32 of the HTTP proxy 30.

一方、図１１において、表示位置取得装置１０の画像取得部１２は、画像要素無効化部１２３を含まない。第３の実施の形態において、ＨＴＭＬ文書ｄ１の画像要素は、ＨＴＴＰプロキシ３０において無効化されるからである。 On the other hand, in FIG. 11, the image acquisition unit 12 of the display position acquisition device 10 does not include the image element invalidation unit 123. This is because the image element of the HTML document d1 is invalidated in the HTTP proxy 30 in the third embodiment.

図１２は、第３の実施の形態においてＨＴＴＰプロキシが実行する処理手順の一例を説明するためのフローチャートである。 FIG. 12 is a flowchart for explaining an example of a processing procedure executed by the HTTP proxy in the third embodiment.

プロキシ部３１は、ＨＴＴＰサーバ２０から表示位置取得装置１０宛のＨＴＴＰレスポンスを受信すると（Ｓ３０１）、当該ＨＴＴＰレスポンスに画像データが含まれているか否かを判定する（Ｓ３０２）。すなわち、当該ＨＴＴＰレスポンスの実体が、画像データであるか否かが判定される。ＨＴＴＰレスポンスの実体が画像データであるか否かは、例えば、ＨＴＴＰレスポンスのＨＴＴＰヘッダのＣｏｎｔｅｎｔ−ｔｙｐｅを参照することで判定可能である。 When the proxy unit 31 receives an HTTP response addressed to the display position acquisition device 10 from the HTTP server 20 (S301), the proxy unit 31 determines whether image data is included in the HTTP response (S302). That is, it is determined whether or not the actual HTTP response is image data. Whether or not the substance of the HTTP response is image data can be determined, for example, by referring to the Content-type of the HTTP header of the HTTP response.

ＨＴＴＰレスポンスに画像データが含まれている場合（Ｓ３０２でＹｅｓ）、画像要素無効化部３２は、画像のサイズ（縦横のサイズ）が、当該画像データ（以下、「元画像データ」という。）と同じ透明画像又は白一色の画像の画像データを生成し、ＨＴＴＰレスポンス内の元画像データを、生成された画像データと入れ替える（Ｓ３０３）。 When image data is included in the HTTP response (Yes in S302), the image element invalidation unit 32 indicates that the image size (vertical / horizontal size) is the image data (hereinafter referred to as “original image data”). Image data of the same transparent image or white color image is generated, and the original image data in the HTTP response is replaced with the generated image data (S303).

なお、Ｗｅｂブラウザ１１に対してＨＴＭＬ文書ｄ１のＵＲＬが入力されると、Ｗｅｂブラウザ１１は、当該ＵＲＬに対応するＨＴＭＬ文書ｄ１をＨＴＴＰサーバ２０から取得する。その後で、Ｗｅｂブラウザ１１は、ＨＴＭＬ文書ｄ１から参照されている画像データ等を、ＨＴＴＰサーバ２０から取得する。例えば、＜ｉｍｇ＞タグによって参照されている画像データや、背景に利用される画像データ等が、ＨＴＴＰサーバ２０から取得される。ステップＳ３０３では、この際のＨＴＴＰレスポンスが処理対象とされる。 Note that when the URL of the HTML document d1 is input to the Web browser 11, the Web browser 11 acquires the HTML document d1 corresponding to the URL from the HTTP server 20. Thereafter, the Web browser 11 acquires image data and the like referred to from the HTML document d1 from the HTTP server 20. For example, image data referenced by the <img> tag, image data used for the background, and the like are acquired from the HTTP server 20. In step S303, the HTTP response at this time is processed.

ステップＳ３０３又はステップＳ３０２でＮｏの場合に続いて、プロキシ部３１は、ＨＴＴＰレスポンスを表示位置取得装置１０に送信する（Ｓ３０４）。 Following the case of No in step S303 or step S302, the proxy unit 31 transmits an HTTP response to the display position acquisition device 10 (S304).

図１３は、第３の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。図１３中、図４と同一ステップには同一ステップ番号を付し、その説明は省略する。 FIG. 13 is a flowchart for explaining an example of a processing procedure executed by the display position acquisition apparatus according to the third embodiment. In FIG. 13, the same steps as those in FIG. 4 are denoted by the same step numbers, and the description thereof is omitted.

図１３のステップＳ１０１では、ＨＴＴＰプロキシ３０による処理の効果により、ＨＴＭＬ文書ｄ１の各画像要素は、透明又は白一色で表示される。したがって、ステップＳ１０８では、画像要素が含まれないキャプチャ画像ｃ２に対してＯＣＲが適用される。 In step S101 of FIG. 13, each image element of the HTML document d1 is displayed in a transparent or white color due to the effect of the processing by the HTTP proxy 30. Accordingly, in step S108, OCR is applied to the captured image c2 that does not include an image element.

上述したように、第３の実施の形態においても、第１又は第２の実施の形態と同様の効果を得ることができる。更に、第３の実施の形態では、＜ｉｍｇ＞タグだけでなく、背景画像や画像で表示されたボタン等も透明画像又は白一色にすることができ、文字認識精度について、更なる向上が期待できる。 As described above, also in the third embodiment, the same effect as in the first or second embodiment can be obtained. Furthermore, in the third embodiment, not only the <img> tag but also a background image, a button displayed in the image, and the like can be made into a transparent image or white color, and further improvement in character recognition accuracy is expected. it can.

なお、表示位置取得装置１０に、ＨＴＴＰプロキシ３０に対応する機能が実装されてもよい。すなわち、表示位置取得装置１０とＨＴＴＰプロキシ３０とは、一つの装置によって実現されてもよい。更に、表示位置取得装置１０に、ＨＴＴＰサーバ２０に対応する機能が実装されてもよい。 Note that a function corresponding to the HTTP proxy 30 may be implemented in the display position acquisition device 10. That is, the display position acquisition device 10 and the HTTP proxy 30 may be realized by a single device. Furthermore, a function corresponding to the HTTP server 20 may be implemented in the display position acquisition device 10.

次に、第４の実施の形態について説明する。第４の実施の形態では第１〜第３の実施の形態と異なる点について説明する。第４の実施の形態において、特に言及されない点については、第１〜第３の実施の形態のいずれかの形態と同様でもよい。 Next, a fourth embodiment will be described. In the fourth embodiment, differences from the first to third embodiments will be described. In the fourth embodiment, points not particularly mentioned may be the same as those in any of the first to third embodiments.

図１４は、第４の実施の形態における表示位置取得装置の機能構成例を示す図である。図１４中、図３、図８、又は図１１と同一部分又は対応する部分には同一符号を付し、その説明は省略する。 FIG. 14 is a diagram illustrating a functional configuration example of the display position acquisition apparatus according to the fourth embodiment. In FIG. 14, the same or corresponding parts as those in FIG. 3, FIG. 8, or FIG.

図１４において、画像取得部１２は、ブラックボックス化されている。これは、第４の実施の形態において、画像取得部１２の構成は、図３、図８、及び図１１のいずれに示したものでもよいからである。すなわち、図１に示したシステム構成が採用される場合、画像取得は、図３又は図８に示した構成を有していればよい。図１０に示したシステム構成が採用される場合、画像取得部１２は、図１１に示した構成を有していればよい。 In FIG. 14, the image acquisition unit 12 is a black box. This is because, in the fourth embodiment, the configuration of the image acquisition unit 12 may be that shown in any of FIGS. 3, 8, and 11. That is, when the system configuration shown in FIG. 1 is adopted, the image acquisition may have the configuration shown in FIG. 3 or FIG. When the system configuration shown in FIG. 10 is employed, the image acquisition unit 12 only needs to have the configuration shown in FIG.

図１４において、表示位置取得装置１０は、更に、タグ表示領域取得部１５を有する。タグ表示領域取得部１５は、表示位置取得装置１０にインストールされたプログラムが、ＣＰＵ１０４に実行させる処理により実現される。タグ表示領域取得部１５は、ＨＴＭＬ文書ｄ１内の各タグに対応するＨＴＭＬ要素について、当該ＨＴＭＬ文書ｄ１の表示結果の画像における表示領域の矩形座標を取得（特定）する。矩形座標とは、例えば、当該表示領域に係る矩形領域の対角の頂点の座標値でもよいし、いずれか１つの頂点の座標値と当該矩形領域の幅及び高さとであってもよい。 In FIG. 14, the display position acquisition device 10 further includes a tag display area acquisition unit 15. The tag display area acquisition unit 15 is realized by processing executed by the CPU 104 by a program installed in the display position acquisition device 10. The tag display area acquisition unit 15 acquires (identifies) the rectangular coordinates of the display area in the display result image of the HTML document d1 for the HTML element corresponding to each tag in the HTML document d1. The rectangular coordinates may be, for example, coordinate values of diagonal vertices of the rectangular area related to the display area, or may be coordinate values of any one vertex and the width and height of the rectangular area.

図１５は、第４の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。 FIG. 15 is a flowchart for explaining an example of a processing procedure executed by the display position acquisition apparatus according to the fourth embodiment.

ステップＳ４１０において、画像取得部１２は、ＨＴＭＬ文書ｄ１の画像要素が無効化されたキャプチャ画像ｃ２を取得する。キャプチャ画像ｃ２の取得方法は、第１から第３の実施の形態のいずれの方法が採用されてもよい。 In step S410, the image acquisition unit 12 acquires the captured image c2 in which the image element of the HTML document d1 is invalidated. Any method of the first to third embodiments may be adopted as a method of acquiring the captured image c2.

続いて、タグ表示領域取得部１５は、ＨＴＭＬ文書ｄ１に含まれている各タグに対応する各ＨＴＭＬ要素の表示領域の矩形座標を取得する（Ｓ４２０）。 Subsequently, the tag display area acquisition unit 15 acquires the rectangular coordinates of the display area of each HTML element corresponding to each tag included in the HTML document d1 (S420).

図１６は、各ＨＴＭＬ要素の表示領域の一例を示す図である。図１６では、ＨＴＭＬ文書ｄ１（図５）に含まれているＨＴＭＬ要素ｅ１〜ｅ７のそれぞれに順番に対応する表示領域ａ１〜ａ７が矩形によって示されている。なお、図１６では、便宜上、ｉｍｇ要素に対応する画像も示されている。 FIG. 16 is a diagram illustrating an example of the display area of each HTML element. In FIG. 16, display areas a1 to a7 corresponding to the HTML elements e1 to e7 included in the HTML document d1 (FIG. 5) in order are indicated by rectangles. In FIG. 16, for convenience, an image corresponding to the img element is also shown.

また、図１７は、各ＨＴＭＬ要素の表示領域の矩形座標の取得結果の一例を示す図である。図１７には、各ＨＴＭＬ要素のＸＰａｔｈの値に対応付けられて、当該ＨＴＭＬ要素の表示領域の矩形座標が示されている。ステップＳ４２０では、図１７に示されるような情報が取得される。なお、図１７に示されるような情報は、例えば、ＷｅｂＤｒｉｖｅｒを用いて取得されてもよい。または、図１０に示したシステム構成が採用される場合、ＨＴＴＰプロキシ３０において、ＨＴＭＬ文書ｄ１中の各ＨＴＭＬタグに係るＨＴＭＬ要素の表示領域の矩形座標を取得するためのＪａｖａＳｃｒｉｐｔ（登録商標）が、ＨＴＭＬ文書ｄ１に対して挿入されてもよい。そうすることで、各ＨＴＭＬ要素の表示領域の矩形座標が取得されてもよい。 FIG. 17 is a diagram illustrating an example of the acquisition result of the rectangular coordinates of the display area of each HTML element. FIG. 17 shows the rectangular coordinates of the display area of the HTML element in association with the XPath value of each HTML element. In step S420, information as shown in FIG. 17 is acquired. Note that the information as illustrated in FIG. 17 may be acquired using WebDriver, for example. Alternatively, when the system configuration shown in FIG. 10 is adopted, in the HTTP proxy 30, JavaScript (registered trademark) for acquiring the rectangular coordinates of the display area of the HTML element related to each HTML tag in the HTML document d1 is It may be inserted into the HTML document d1. By doing so, the rectangular coordinates of the display area of each HTML element may be acquired.

続いて、ＯＣＲ部１３は、ＨＴＭＬ文書ｄ１に含まれているＨＴＭＬタグのうちの一つのＨＴＭＬタグを、処理対象（対象タグ）として選択する（Ｓ４３０）。続いて、ＯＣＲ部１３は、キャプチャ画像ｃ２のうち、対象タグに係るＨＴＭＬ要素に関して取得された矩形座標が示す表示領域に対してＯＣＲを適用する（Ｓ４４０）。例えば、対象タグが、ＨＴＭＬ要素ｅ１に対応するタグであれば、図１６の表示領域ａ１に対してＯＣＲが適用される。続いて、出力部１４は、取得された文字コード及び位置情報を出力する（Ｓ４５０）。 Subsequently, the OCR unit 13 selects one HTML tag among the HTML tags included in the HTML document d1 as a processing target (target tag) (S430). Subsequently, the OCR unit 13 applies OCR to the display area indicated by the rectangular coordinates acquired for the HTML element related to the target tag in the captured image c2 (S440). For example, if the target tag is a tag corresponding to the HTML element e1, OCR is applied to the display area a1 in FIG. Subsequently, the output unit 14 outputs the acquired character code and position information (S450).

ステップＳ４３０〜Ｓ４５０が全てのＨＴＭＬタグについて実行されると（Ｓ４６０でＹｅｓ）、出力部１４は、同じ文字に関して重複して記録されているＯＣＲ結果を削除する（Ｓ４７０）。すなわち、ＨＴＭＬ要素は、階層構造を有するため、或るＨＴＭＬ要素（子要素）が他のＨＴＭＬ要素（親要素）の内部に存在する場合が有る。このような場合、子要素については、親要素の表示領域に対するＯＣＲ処理と、当該子要素の表示領域に対するＯＣＲ処理とによって重複してＯＣＲ結果が得られることになる。例えば、図１６の例において、表示領域ａ５に含まれる「文字列６」及び「文字列７」については、表示領域ａ３に対するＯＣＲ処理と、表示領域ａ５に対するＯＣＲ処理とのそれぞれにおいて、ＯＣＲ結果が得られる。そこで、ステップＳ４７０では、斯かる重複が排除される。 When Steps S430 to S450 are executed for all the HTML tags (Yes in S460), the output unit 14 deletes the OCR result recorded in duplicate for the same character (S470). That is, since an HTML element has a hierarchical structure, a certain HTML element (child element) may exist inside another HTML element (parent element). In such a case, for the child element, an OCR result is obtained by overlapping the OCR process for the display area of the parent element and the OCR process for the display area of the child element. For example, in the example of FIG. 16, for “character string 6” and “character string 7” included in the display area a5, the OCR result is obtained in each of the OCR process for the display area a3 and the OCR process for the display area a5. can get. Therefore, in step S470, such duplication is eliminated.

上述したように、第４の実施の形態によれば、ＨＴＭＬ文書ｄ１に含まれる各ＨＴＭＬ要素の表示領域ごとに、ＯＣＲが適用される。一般的に、属性（大きさ等）の異なる文字列が混在した範囲に対してＯＣＲが適用される場合よりも、属性が共通する文字列ごとにＯＣＲが適用された場合の方が、高い精度で文字認識結果を得られる可能性が高い。本実施の形態によれば、キャプチャ画像ｃ２の全面にではなく、各ＨＴＭＬ要素の表示領域ごとにＯＣＲが適用されるため、属性が共通する文字列ごとにＯＣＲが実行される可能性を高めることができる。その結果、文字認識の精度の更なる向上を期待することができる。 As described above, according to the fourth embodiment, OCR is applied for each display area of each HTML element included in the HTML document d1. Generally, higher accuracy is obtained when OCR is applied to each character string having the same attribute than when OCR is applied to a range in which character strings having different attributes (sizes, etc.) are mixed. There is a high possibility that a character recognition result can be obtained. According to the present embodiment, since OCR is applied not for the entire captured image c2 but for each display area of each HTML element, the possibility that OCR is executed for each character string having a common attribute is increased. Can do. As a result, further improvement in the accuracy of character recognition can be expected.

例えば、以下のようなケースについて、第４の実施の形態によれば、ＯＣＲ処理における誤認識の可能性を低減することができる。図１８は、第４の実施の形態による効果を具体的に説明するための図である。 For example, in the following cases, according to the fourth embodiment, the possibility of erroneous recognition in the OCR process can be reduced. FIG. 18 is a diagram for specifically explaining the effect of the fourth embodiment.

図１８では、（１）に示されるように、大きな文字と小さな文字が混在している例が示されている。このような場合に、（１）に示される破線の矩形の範囲に対してＯＣＲが適用されると、誤認識が発生しやすくなる。例えば、 "２"及び"０"と同じ大きさの文字が、"０"の後にも続いていると仮定して解析された結果、"％ＯＦ"が一文字であると判定され、"％ＯＦ"に最も近似する文字が探索されてしまう可能性が有る。その結果、"％ＯＦ"の部分に関して誤認識が発生する可能性が高くなる。 FIG. 18 shows an example in which large characters and small characters are mixed as shown in (1). In such a case, if OCR is applied to the dashed rectangular range shown in (1), erroneous recognition is likely to occur. For example, as a result of analysis on the assumption that characters having the same size as “2” and “0” continue after “0”, it is determined that “% OF” is one character, and “% OF There is a possibility that the character closest to "will be searched. As a result, the possibility of erroneous recognition regarding the “% OF” portion increases.

一方、第４の実施の形態では、ＨＴＭＬソースコード上では、"２０"と、"％"と、"ＯＦＦ"とは、ＨＴＭＬ要素が区別されていることを利用し、（２）に示される破線の矩形単位でＯＣＲが適用される。その結果、（１）に関して発生するような誤認識の発生の可能性を低下させることができる。 On the other hand, in the fourth embodiment, on the HTML source code, “20”, “%”, and “OFF” are shown in (2) using the fact that HTML elements are distinguished. OCR is applied in units of broken-line rectangles. As a result, it is possible to reduce the possibility of occurrence of erroneous recognition that occurs with respect to (1).

次に、第５の実施の形態について説明する。第５の実施の形態では第４の実施の形態と異なる点について説明する。第５の実施の形態において、特に言及されない点については、第４の実施の形態と同様でもよい。 Next, a fifth embodiment will be described. In the fifth embodiment, differences from the fourth embodiment will be described. In the fifth embodiment, points not particularly mentioned may be the same as those in the fourth embodiment.

図１９は、第５の実施の形態における表示位置取得装置の機能構成例を示す図である。図１９中、図１４と同一部分又は対応する部分には同一符号を付し、その説明は省略する。 FIG. 19 is a diagram illustrating a functional configuration example of the display position acquisition apparatus according to the fifth embodiment. In FIG. 19, the same or corresponding parts as in FIG.

図１９において、表示位置取得装置１０は、更に、タグ階層構造解析部１６、画像更新部１７、及びタグ階層構造更新部１８等を有する。これら各部は、表示位置取得装置１０にインストールされる１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。 In FIG. 19, the display position acquisition apparatus 10 further includes a tag hierarchy structure analysis unit 16, an image update unit 17, a tag hierarchy structure update unit 18, and the like. Each of these units is realized by processing that one or more programs installed in the display position acquisition apparatus 10 cause the CPU 104 to execute.

タグ階層構造解析部１６は、ＨＴＭＬ文書ｄ１を解析して、ＨＴＭＬ文書ｄ１内のタグ（ＨＴＭＬ要素）の階層構造又は親子関係（以下、「タグ階層情報」という。）を示す情報を生成（特定）する。タグ階層情報は、タグ（ＨＴＭＬ要素）をノードとし、階層関係を有するタグ（ＨＴＭＬ要素）に対応するノード間に枝を有するツリー構造の情報である。 The tag hierarchy structure analysis unit 16 analyzes the HTML document d1 and generates (specifies) information indicating the hierarchy structure or parent-child relationship (hereinafter referred to as “tag hierarchy information”) of tags (HTML elements) in the HTML document d1. ) The tag hierarchy information is information of a tree structure having a tag (HTML element) as a node and a branch between nodes corresponding to a tag (HTML element) having a hierarchical relationship.

画像更新部１７は、キャプチャ画像ｃ２中において、ＯＣＲ処理が終了したＨＴＭＬ要素の表示領域内の文字列を無効化する。 The image update unit 17 invalidates the character string in the display area of the HTML element for which the OCR processing has been completed in the captured image c2.

タグ階層構造更新部１８は、ＯＣＲ処理が終了したＨＴＭＬ要素に対応するノードをタグ階層情報から削除する。 The tag hierarchy structure update unit 18 deletes the node corresponding to the HTML element for which the OCR processing has been completed from the tag hierarchy information.

図２０は、第５の実施の形態における表示位置取得装置が実行する処理手順の一例を説明するためのフローチャートである。図２０中、図１５と同一ステップには同一ステップ番号を付し、その説明は省略する。 FIG. 20 is a flowchart for explaining an example of a processing procedure executed by the display position acquisition apparatus according to the fifth embodiment. In FIG. 20, the same steps as those of FIG.

ステップＳ４２０に続いて、タグ階層構造解析部１６は、Ｗｅｂブラウザ１１からＨＴＭＬソースコードを取得して、当該ＨＴＭＬソースコードに含まれているタグの階層構造を解析し、タグ階層情報を生成する（Ｓ４２１）。本実施の形態において、ＨＴＭＬソースコードとは、ＨＴＭＬ文書ｄ１中に含まれているスクリプト等の実行後の状態のＨＴＭＬデータである。すなわち、当該スクリプト等が実行されることで、ＨＴＭＬ文書ｄ１中には、動的にＨＴＭＬ要素が追加されうる。ステップＳ４２１では、動的に追加されたＨＴＭＬ要素をも含むＨＴＭＬソースコードが取得される。すなわち、ＨＴＭＬソースコードは、Ｗｅｂブラウザ１１での表示状態に対応したＨＴＭＬデータである。斯かるＨＴＭＬソースコードの取得には、ＷｅｂＤｒｉｖｅｒが利用されてもよい。 Subsequent to step S420, the tag hierarchy structure analysis unit 16 acquires the HTML source code from the Web browser 11, analyzes the hierarchical structure of the tag included in the HTML source code, and generates tag hierarchy information ( S421). In the present embodiment, the HTML source code is HTML data in a state after execution of a script or the like included in the HTML document d1. That is, by executing the script or the like, an HTML element can be dynamically added to the HTML document d1. In step S421, an HTML source code including the dynamically added HTML element is acquired. That is, the HTML source code is HTML data corresponding to a display state on the Web browser 11. WebDriver may be used to acquire such HTML source code.

図２１は、タグ階層情報の一例を示す図である。図２１に示されるタグ階層情報において、各ノードの符号は、当該ノードに対応するＨＴＭＬ要素に対して図５において付されている符号に一致する。また、各ノードには、当該ノードに対応するタグ（ＨＴＭＬ要素）のＸＰａｔｈの値が付されている。 FIG. 21 is a diagram illustrating an example of tag hierarchy information. In the tag hierarchy information shown in FIG. 21, the code of each node matches the code attached in FIG. 5 to the HTML element corresponding to the node. Each node is assigned the XPath value of a tag (HTML element) corresponding to the node.

ステップＳ４３０の代わりに実行されるステップＳ４３０ａにおいて、ＯＣＲ部１３は、タグ階層情報における末端のタグのうちの一つのタグを、処理対象として選択する。 In step S430a executed instead of step S430, the OCR unit 13 selects one of the end tags in the tag hierarchy information as a processing target.

図２２は、タグ階層情報における末端のタグを説明するための図である。図２２において、破線の矩形に係るノードが、タグ階層情報における末端のタグに対応するノードである。すなわち、タグ階層情報における末端のタグとは、タグ階層情報を構成するノードのうち、子ノードを有さないノードに対応するタグをいう。 FIG. 22 is a diagram for explaining a terminal tag in the tag hierarchy information. In FIG. 22, a node associated with a broken-line rectangle is a node corresponding to the terminal tag in the tag hierarchy information. That is, the terminal tag in the tag hierarchy information refers to a tag corresponding to a node having no child node among the nodes constituting the tag hierarchy information.

ステップＳ４５０において、画像更新部１７は、キャプチャ画像ｃ２において、ステップＳ４３０ａにおいて選択されたタグ（ＨＴＭＬ要素）の表示領域を、例えば、白色一色で塗りつぶす。当該表示領域は、ステップＳ４２０において取得された矩形座標に基づいて特定可能である。なお、当該タグが、ｉｍｇ要素に対応するタグである場合、当該ｉｍｇ要素に対応する画像要素は、既に無効化されている。したがって、当該ｉｍｇ要素の表示領域を白色で塗りつぶすことについて、特段の意義が無い。そこで、ステップＳ４４０〜Ｓ４５１は、＜ｉｍｇ＞タグが処理対象の場合には実行されなくてもよい。 In step S450, the image update unit 17 fills the display area of the tag (HTML element) selected in step S430a with, for example, one white color in the capture image c2. The display area can be specified based on the rectangular coordinates acquired in step S420. When the tag is a tag corresponding to the img element, the image element corresponding to the img element has already been invalidated. Therefore, there is no particular significance for painting the display area of the img element in white. Therefore, steps S440 to S451 may not be executed when the <img> tag is a processing target.

続いて、タグ階層構造更新部１８は、ステップＳ４３０ａにおいて選択されたタグに対応するノードを、タグ階層情報から削除する（Ｓ４５２）。その結果、それまで末端のタグでなかったＨＴＭＬタグが、末端のタグになる可能性が有る。 Subsequently, the tag hierarchy structure update unit 18 deletes the node corresponding to the tag selected in step S430a from the tag hierarchy information (S452). As a result, there is a possibility that an HTML tag that was not a terminal tag until then becomes a terminal tag.

上述したように、第５の実施の形態によれば、第４の実施の形態と同様の効果を得ることができる。但し、第５の実施の形態では、タグの階層構造において、末端のタグに係る表示領域から順に、ＯＣＲが適用される。また、ＯＣＲが適用された表示領域は、無効化される。したがって、第４の実施の形態（図１５）におけるステップＳ４７０の処理を不要とすることができる。また、他のタグを包含するタグに対応する表示領域に対してＯＣＲが実行される時点では、当該他のタグに含まれる文字の部分は、既に無効化されている。したがって、属性が共通する文字に対してＯＣＲが適用される可能性を高めることができる。 As described above, according to the fifth embodiment, the same effect as that of the fourth embodiment can be obtained. However, in the fifth embodiment, OCR is applied in order from the display area related to the terminal tag in the tag hierarchical structure. In addition, the display area to which the OCR is applied is invalidated. Therefore, the process of step S470 in the fourth embodiment (FIG. 15) can be made unnecessary. Further, at the time when the OCR is performed on the display area corresponding to the tag including another tag, the character portion included in the other tag has already been invalidated. Therefore, the possibility that OCR is applied to characters having common attributes can be increased.

なお、上記では、ステップＳ４１０において、画像要素が無効化されたキャプチャ画像ｃ２が取得される例について説明したが、ステップＳ４１０では、画像要素の無効化前のキャプチャ画像ｃ１が取得されてもよい。この場合、ステップＳ４４０及びＳ４５０は、処理対象のタグが＜ｉｍｇ＞タグである場合には実行せずに、ステップＳ４５１は、処理対象のタグが＜ｉｍｇ＞タグである場合であっても実行されるようにすればよい。 In the above description, the example in which the captured image c2 in which the image element is invalidated is obtained in step S410. However, in step S410, the captured image c1 before invalidation of the image element may be obtained. In this case, steps S440 and S450 are not executed when the tag to be processed is an <img> tag, and step S451 is executed even when the tag to be processed is an <img> tag. You can do so.

なお、上記各実施の形態において、ＨＴＭＬ文書は、表示データの一例である。ＨＴＭＬ要素は、表示要素の一例である。すなわち、ＨＴＭＬ文書以外の表示データであって、相互に包含関係又は階層構造を有する複数の表示要素を含む表示データに関して、本実施の形態が適用されてもよい。この場合、Ｗｅｂブラウザの代わりに、当該表示データに適したブラウザが用いられればよい。また、Ｗｅｂブラウザ１１は、表示制御部の一例である。画像要素無効化部１２３は、無効化部の一例である。画面キャプチャ部１２２は、画像取得部の一例である。ＯＣＲ部１３は、位置取得部の一例である。 In each of the above embodiments, the HTML document is an example of display data. The HTML element is an example of a display element. That is, the present embodiment may be applied to display data other than an HTML document, which is display data including a plurality of display elements having an inclusion relationship or a hierarchical structure. In this case, a browser suitable for the display data may be used instead of the Web browser. The web browser 11 is an example of a display control unit. The image element invalidation unit 123 is an example of an invalidation unit. The screen capture unit 122 is an example of an image acquisition unit. The OCR unit 13 is an example of a position acquisition unit.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

以上の説明に関し、更に以下の項を開示する。
（付記１）
コンピュータに、
文字要素と画像要素とを表示要素として含む表示データを表示し、
前記表示データの表示結果を示す画像を取得し、
取得された画像において、前記画像要素の表示を無効化し、
前記画像要素の表示が無効化された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する、
処理を実行させることを特徴とする表示位置取得プログラム。
（付記２）
コンピュータに、
文字要素と画像要素とを表示要素として含む表示データを表示し、
前記表示データの表示要素のうち、画像要素の表示を無効化し、
前記表示データの表示結果について、画像要素の表示が無効化された状態の画像を取得し、
取得された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する、
処理を実行させることを特徴とする表示位置取得プログラム。
（付記３）
前記表示データに含まれる各表示要素の表示領域を特定する処理を前記コンピュータに実行させ、
前記位置情報を取得する処理は、前記表示領域ごとに文字認識を行う、
ことを特徴とする付記１又は２記載の表示位置取得プログラム。
（付記４）
前記表示データに含まれる表示要素間の階層構造を特定する処理と、
前記表示領域ごとに文字認識が行われるたびに、文字認識が行われた表示領域を無効化する処理と、
を前記コンピュータに実行させ、
前記位置情報を取得する処理は、前記階層構造において末端の表示要素に係る表示領域から順に、文字認識を行う、
ことを特徴とする付記３記載の表示位置取得プログラム。
（付記５）
文字要素と画像要素とを表示要素として含む表示データを表示する表示制御部と、
前記表示データの表示結果を示す画像を取得する画像取得部と、
取得された画像において、前記画像要素の表示を無効化する無効化部と、
前記画像要素の表示が無効化された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する位置取得部と、
を有することを特徴とする表示位置取得装置。
（付記６）
文字要素と画像要素とを表示要素として含む表示データを表示する表示制御部と、
前記表示データの表示要素のうち、画像要素の表示を無効化する無効化部と、
前記表示データの表示結果について、画像要素の表示が無効化された状態の画像を取得する画像取得部と、
取得された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する位置取得部と、
を有することを特徴とする表示位置取得装置。
（付記７）
前記表示データに含まれる各表示要素の表示領域を特定する第１の特定部を有し、
前記位置取得部は、前記表示領域ごとに文字認識を行う、
ことを特徴とする付記５又は６記載の表示位置取得装置。
（付記８）
前記表示データに含まれる表示要素間の階層構造を特定する第２の特定部と、
前記表示領域ごとに文字認識が行われるたびに、文字認識が行われた表示領域を無効化する第２の無効化部と、
を有し、
前記位置取得部は、前記階層構造において末端の表示要素に係る表示領域から順に、文字認識を行う、
ことを特徴とする付記７記載の表示位置取得装置。
（付記９）
コンピュータが、
文字要素と画像要素とを表示要素として含む表示データを表示し、
前記表示データの表示結果を示す画像を取得し、
取得された画像において、前記画像要素の表示を無効化し、
前記画像要素の表示が無効化された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する、
処理を実行することを特徴とする表示位置取得方法。
（付記１０）
コンピュータが、
文字要素と画像要素とを表示要素として含む表示データを表示し、
前記表示データの表示要素のうち、画像要素の表示を無効化し、
前記表示データの表示結果について、画像要素の表示が無効化された状態の画像を取得し、
取得された画像に対して文字認識を行うことで、前記表示データに含まれる各文字要素の位置情報を取得する、
処理を実行することを特徴とする表示位置取得方法。
（付記１１）
前記表示データに含まれる各表示要素の表示領域を特定する処理を前記コンピュータが実行し、
前記位置情報を取得する処理は、前記表示領域ごとに文字認識を行う、
ことを特徴とする付記９又は１０記載の表示位置取得方法。
（付記１２）
前記表示データに含まれる表示要素間の階層構造を特定する処理と、
前記表示領域ごとに文字認識が行われるたびに、文字認識が行われた表示領域を無効化する処理と、
を前記コンピュータが実行し、
前記位置情報を取得する処理は、前記階層構造において末端の表示要素に係る表示領域から順に、文字認識を行う、
ことを特徴とする付記１１記載の表示位置取得方法。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
On the computer,
Display the display data including the text element and the image element as display elements,
Obtaining an image indicating a display result of the display data;
In the acquired image, disable the display of the image element,
By performing character recognition on an image in which display of the image element is invalidated, position information of each character element included in the display data is acquired.
A display position acquisition program characterized by causing a process to be executed.
(Appendix 2)
On the computer,
Display the display data including the text element and the image element as display elements,
Among the display elements of the display data, invalidate the display of the image element,
For the display result of the display data, obtain an image in a state where the display of the image element is invalidated,
By performing character recognition on the acquired image, position information of each character element included in the display data is acquired.
A display position acquisition program characterized by causing a process to be executed.
(Appendix 3)
Causing the computer to execute a process of specifying a display area of each display element included in the display data;
The process of acquiring the position information performs character recognition for each display area.
The display position acquisition program according to supplementary note 1 or 2, characterized by:
(Appendix 4)
Processing for specifying a hierarchical structure between display elements included in the display data;
Each time character recognition is performed for each display area, a process for invalidating the display area for which character recognition has been performed;
To the computer,
The process of acquiring the position information performs character recognition in order from the display area related to the terminal display element in the hierarchical structure.
The display position acquisition program according to supplementary note 3, characterized by:
(Appendix 5)
A display control unit for displaying display data including character elements and image elements as display elements;
An image acquisition unit for acquiring an image indicating a display result of the display data;
In the acquired image, an invalidation unit for invalidating the display of the image element;
A position acquisition unit that acquires position information of each character element included in the display data by performing character recognition on an image in which display of the image element is invalidated;
A display position acquisition apparatus comprising:
(Appendix 6)
A display control unit for displaying display data including character elements and image elements as display elements;
Among the display elements of the display data, an invalidation unit for invalidating display of the image element,
For the display result of the display data, an image acquisition unit that acquires an image in a state where display of image elements is invalidated;
A position acquisition unit that acquires position information of each character element included in the display data by performing character recognition on the acquired image;
A display position acquisition apparatus comprising:
(Appendix 7)
A first specifying unit that specifies a display area of each display element included in the display data;
The position acquisition unit performs character recognition for each display area.
The display position acquisition device according to appendix 5 or 6, characterized in that.
(Appendix 8)
A second specifying unit for specifying a hierarchical structure between display elements included in the display data;
A second invalidation unit that invalidates the display area in which character recognition is performed each time character recognition is performed for each display area;
Have
The position acquisition unit performs character recognition in order from the display area related to the terminal display element in the hierarchical structure.
The display position acquisition device according to appendix 7, wherein
(Appendix 9)
Computer
Display the display data including the text element and the image element as display elements,
Obtaining an image indicating a display result of the display data;
In the acquired image, disable the display of the image element,
By performing character recognition on an image in which display of the image element is invalidated, position information of each character element included in the display data is acquired.
A display position acquisition method characterized by executing processing.
(Appendix 10)
Computer
Display the display data including the text element and the image element as display elements,
Among the display elements of the display data, invalidate the display of the image element,
For the display result of the display data, obtain an image in a state where the display of the image element is invalidated,
By performing character recognition on the acquired image, position information of each character element included in the display data is acquired.
A display position acquisition method characterized by executing processing.
(Appendix 11)
The computer executes a process of specifying a display area of each display element included in the display data,
The process of acquiring the position information performs character recognition for each display area.
The display position acquisition method according to appendix 9 or 10, characterized in that.
(Appendix 12)
Processing for specifying a hierarchical structure between display elements included in the display data;
Each time character recognition is performed for each display area, a process for invalidating the display area for which character recognition has been performed;
The computer executes,
The process of acquiring the position information performs character recognition in order from the display area related to the terminal display element in the hierarchical structure.
The display position acquisition method according to attachment 11, wherein the display position is acquired.

１０表示位置取得装置
１１Ｗｅｂブラウザ
１２画像取得部
１３ＯＣＲ部
１４出力部
１５タグ表示領域取得部
１６タグ階層構造解析部
１７画像更新部
１８タグ階層構造更新部
２０ＨＴＴＰサーバ
３０ＨＴＴＰプロキシ
３１プロキシ部
３２画像要素無効化部
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
１０６表示装置
１０７入力装置
１２１ブラウザ表示部
１２２画面キャプチャ部
１２３画像要素無効化部
Ｂバス DESCRIPTION OF SYMBOLS 10 Display position acquisition apparatus 11 Web browser 12 Image acquisition part 13 OCR part 14 Output part 15 Tag display area acquisition part 16 Tag hierarchy structure analysis part 17 Image update part 18 Tag hierarchy structure update part 20 HTTP server 30 HTTP proxy 31 Proxy part 32 Image element invalidation unit 100 Drive device 101 Recording medium 102 Auxiliary storage device 103 Memory device 104 CPU
105 interface device 106 display device 107 input device 121 browser display unit 122 screen capture unit 123 image element invalidation unit B bus

Claims

On the computer,
Display the display data including the text element and the image element as display elements,
Obtaining an image indicating a display result of the display data;
In the acquired image, disable the display of the image element,
By performing character recognition on an image in which display of the image element is invalidated, position information of each character element included in the display data is acquired.
A display position acquisition program characterized by causing a process to be executed.

On the computer,
Display the display data including the text element and the image element as display elements,
Among the display elements of the display data, invalidate the display of the image element,
For the display result of the display data, obtain an image in a state where the display of the image element is invalidated,
By performing character recognition on the acquired image, position information of each character element included in the display data is acquired.
A display position acquisition program characterized by causing a process to be executed.

Causing the computer to execute a process of specifying a display area of each display element included in the display data;
The process of acquiring the position information performs character recognition for each display area.
The display position acquisition program according to claim 1 or 2.

Processing for specifying a hierarchical structure between display elements included in the display data;
Each time character recognition is performed for each display area, a process for invalidating the display area for which character recognition has been performed;
To the computer,
The process of acquiring the position information performs character recognition in order from the display area related to the terminal display element in the hierarchical structure.
The display position acquisition program according to claim 3.

A display control unit for displaying display data including character elements and image elements as display elements;
An image acquisition unit for acquiring an image indicating a display result of the display data;
In the acquired image, an invalidation unit for invalidating the display of the image element;
A position acquisition unit that acquires position information of each character element included in the display data by performing character recognition on an image in which display of the image element is invalidated;
A display position acquisition apparatus comprising:

A display control unit for displaying display data including character elements and image elements as display elements;
Among the display elements of the display data, an invalidation unit for invalidating display of the image element,
For the display result of the display data, an image acquisition unit that acquires an image in a state where display of image elements is invalidated;
A position acquisition unit that acquires position information of each character element included in the display data by performing character recognition on the acquired image;
A display position acquisition apparatus comprising:

Computer
Display the display data including the text element and the image element as display elements,
Obtaining an image indicating a display result of the display data;
In the acquired image, disable the display of the image element,
By performing character recognition on an image in which display of the image element is invalidated, position information of each character element included in the display data is acquired.
A display position acquisition method characterized by executing processing.

Computer
Display the display data including the text element and the image element as display elements,
Among the display elements of the display data, invalidate the display of the image element,
For the display result of the display data, obtain an image in a state where the display of the image element is invalidated,
By performing character recognition on the acquired image, position information of each character element included in the display data is acquired.
A display position acquisition method characterized by executing processing.