JP2012230540A

JP2012230540A - Document generation support method, document generation support device and document generation support program

Info

Publication number: JP2012230540A
Application number: JP2011098263A
Authority: JP
Inventors: Kouki Kusano; 孔希草野; Takehiko Ono; 健彦大野; Momoko Nakatani; 桃子中谷; Ai Nakane; 愛中根; Yurika Katagiri; 有理佳片桐; Chihiro Takayama; 千尋高山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-04-26
Filing date: 2011-04-26
Publication date: 2012-11-22
Anticipated expiration: 2031-04-26
Also published as: JP5466197B2

Abstract

PROBLEM TO BE SOLVED: To support generation of document data 11a with high comprehension easiness.SOLUTION: A document generation support device 1 includes: browsing order acquisition means 31 for acquiring browsing order data showing flow of browsing of a reader, and calculating assumed browsing coordinates for every component to output assumed browsing coordinate data 14a in the document data 11a; object word extraction means 32 for extracting a word which matches to an object word in components and easiness of the word, calculating distance between the assumed browsing coordinates and the extracted word in the components to output word information data 15a in which the word, the easiness of the word and the distance are associated with one another; and word evaluation means 33 for calculating correction priority of the word so that correction priority becomes high to output weighted word information data 17a for a word with low easiness and at close distance for the respective words of the word information data 15a.

Description

本発明は、複数の構成要素から構成される文書データの生成を支援する文書生成支援方法、文書生成支援装置および文書生成支援プログラムに関する。 The present invention relates to a document generation support method, a document generation support apparatus, and a document generation support program that support generation of document data composed of a plurality of components.

昨今、利用者の利用する機器の種類および機会が増えていることに伴い、機器のメーカは、各機器が幅広いユーザに利用されることを想定する必要が生じている。それに伴い、機器の取扱説明書などは、ユーザの誰もが容易に理解できるように、記載されることが好ましい。 With the recent increase in the types and opportunities of devices used by users, device manufacturers have to assume that each device is used by a wide range of users. Accordingly, it is preferable that the instruction manual of the device is described so that any user can easily understand it.

誰もが理解容易な取扱説明書を作成するために、取扱説明書において機器の利用方法などを説明する際、機器に用いられる専門用語を回避し、平易な単語で説明されることが好ましい。しかし、どのような単語が専門用語であって一般的でないのか、また、どのような状況において使うべきではないのか、について作成者が判断することは、一般的に困難である。 In order to create an instruction manual that is easy for everyone to understand, it is preferable to avoid technical terms used in the equipment and explain in plain words when explaining how to use the equipment in the instruction manual. However, it is generally difficult for the creator to determine what words are technical terms and are not common, and in what circumstances they should not be used.

どのような単語が専門用語であって一般的でないのかを判断する際、単語の容易度が用いられる。単語の容易度は、例えば、非特許文献１に記載された単語親密度が相当する。単語親密度は、機器に利用される専門用語に対して、単語に馴染みがあるかどうかについて、アンケート調査を実施し、その結果得られたデータに基づいて、点数化されたものである。単語親密度が高い単語は、馴染みやすく容易度が高い一方、単語親密度が低い単語は、馴染みにくく容易度が低いと考えられる。また、取扱説明書の単語の親密度を用いて、人間に理解しやすい解析結果を出力する方法もある（特許文献１参照。）。 The word ease is used in determining what words are technical terms and not common. For example, the word familiarity described in Non-Patent Document 1 corresponds to the word ease. The word familiarity is scored on the basis of data obtained as a result of a questionnaire survey on whether or not the word is familiar with the technical term used for the device. A word having a high word familiarity is easy to be familiar and has a high degree of ease, while a word having a low word familiarity is difficult to become familiar and has a low degree of ease. There is also a method of outputting an analysis result that is easy for humans to understand by using the familiarity of words in the instruction manual (see Patent Document 1).

特開２００７−１１７７４号公報JP 2007-11774 A

佐藤、笠原、金杉、天野、”単語親密度に基づく基本語彙の選定”、人工知能学会論文誌、Ｖｏｌ．１９，Ｎｏ．６、ｐｐ．５０２−５１０、２００４Sato, Kasahara, Kanesugi, Amano, “Selection of basic vocabulary based on word familiarity”, Journal of the Japanese Society for Artificial Intelligence, Vol. 19, no. 6, pp. 502-510, 2004

しかしながら、同じ専門用語でも、取扱説明書の使われる場所によって、取扱説明書の理解容易性の評価が異なる場合がある。例えば、読み手の目につきにくい場所よりも、読み手の目につきやすい場所の単語の方が、理解容易性の評価に大きな影響を与える。 However, even in the same technical terms, the evaluation of the understandability of the instruction manual may differ depending on the location where the instruction manual is used. For example, a word in a place where the reader is easily noticed has a greater influence on the evaluation of comprehension than a place where the reader is difficult to see.

従って、取扱説明書において、専門用語が使用される位置や、紙面構成なども考慮して、取扱説明書の理解容易性が評価されるべきである。しかしながら、上記の特許文献および非特許文献に記載の方法では、専門用語の位置や紙面構成なども考慮した理解容易性の評価は困難である。 Therefore, in the instruction manual, the ease of understanding of the instruction manual should be evaluated in consideration of the position where technical terms are used and the configuration of the paper surface. However, with the methods described in the above-mentioned patent documents and non-patent documents, it is difficult to evaluate the ease of understanding in consideration of the position of technical terms and the configuration of paper.

また、取扱説明書の作成者にとっては、専門用語に慣れ親しんでいるため、客観的な評価が難しい問題がある。そこで、ユーザビリティの専門家が取扱説明書の分析および評価し、または、想定する読み手がマニュアル評価実験をする場合もある。しかしながら、このような方法は、時間面および費用面で、多大なコストがかかってしまう問題がある。 In addition, there is a problem that it is difficult for the creator of the instruction manual to objectively evaluate because it is familiar with technical terms. Therefore, usability experts may analyze and evaluate the instruction manual, or an assumed reader may conduct a manual evaluation experiment. However, there is a problem that such a method is very expensive in terms of time and cost.

そこで、作成者が想定する読み手の閲覧の流れや紙面構成を考慮して、コンピュータが、取扱説明書などの文書データの理解容易性を評価し、文書データの生成を支援する技術の開発が期待されている。 Therefore, it is expected that the computer will evaluate the ease of understanding of the document data such as the instruction manual and develop the technology that supports the generation of the document data in consideration of the reader's browsing flow and the page structure assumed by the creator. Has been.

従って本発明の目的は、理解容易性の高い文書データの生成を支援する文書生成支援方法、文書生成支援装置および文書生成支援プログラムを提供することである。 Accordingly, an object of the present invention is to provide a document generation support method, a document generation support apparatus, and a document generation support program that support generation of document data with high comprehension.

上記課題を解決するために、本発明の第１の特徴は、複数の構成要素から構成される文書データの生成を支援する文書生成支援方法に関する。すなわち本発明の第１の特徴に係る文書生成支援方法は、文書データの評価の対象となる対象単語と、当該対象単語の容易度と、を対応づけた対象単語容易度データを記憶するステップと、文書データにおいて、読み手の閲覧の流れを示す閲覧順序データを取得するとともに、構成要素ごとに、閲覧の流れにおいて読み手の目にとまりやすい座標である想定閲覧座標を算出して、想定閲覧座標データを出力するステップと、構成要素における対象単語と一致する単語および当該単語の容易度を抽出するとともに、当該構成要素における想定閲覧座標と、抽出された単語の距離を算出し、単語、当該単語の容易度および距離を関連づけた単語情報データを出力するステップと、単語情報データの各単語について、容易度が低く距離が近い単語について修正優先度が高くなるように、当該単語の修正優先度を算出して、重み付き単語情報データを出力するステップを備える。 In order to solve the above-mentioned problem, a first feature of the present invention relates to a document generation support method for supporting generation of document data composed of a plurality of components. In other words, the document generation support method according to the first feature of the present invention stores the target word ease data in which the target word to be evaluated for document data is associated with the ease of the target word. In the document data, the browsing order data indicating the reading flow of the reader is acquired, and for each component, the assumed reading coordinates that are easily caught by the reader in the reading flow are calculated, and the assumed reading coordinate data , A word that matches the target word in the component and the ease of the word are extracted, the assumed browsing coordinates in the component and the distance between the extracted words are calculated, and the word, A step of outputting word information data in which the degree of ease and the distance are associated, and for each word of the word information data, a word having a low degree of ease and a short distance So that the modified priority have increased, comprising the step of calculating the modified priority of the word, and outputs the weighted word information data.

ここで重み付き単語情報データを出力するステップは、単語の表示属性も考慮して、修正優先度を算出しても良い。 Here, in the step of outputting the weighted word information data, the correction priority may be calculated in consideration of the display attribute of the word.

また、単語の修正優先度に基づいて、構成要素ごとの修正優先度を算出し、重み付き構成要素情報データを出力するステップをさらに備えても良い。 Further, the method may further include a step of calculating a correction priority for each component based on the word correction priority and outputting weighted component information data.

本発明の第２の特徴は、複数の構成要素から構成される文書データの生成を支援する文書生成支援装置に関する。すなわち本発明の第２の特徴に係る文書生成支援装置は、文書データの評価の対象となる対象単語と、当該対象単語の容易度と、を対応づけた対象単語容易度データを記憶する対象単語容易度データ記憶部と、文書データにおいて、読み手の閲覧の流れを示す閲覧順序データを取得するとともに、構成要素ごとに、閲覧の流れにおいて読み手の目にとまりやすい座標である想定閲覧座標を算出して、想定閲覧座標データを出力する閲覧順序取得手段と、構成要素における対象単語と一致する単語および当該単語の容易度を抽出するとともに、当該構成要素における想定閲覧座標と、抽出された単語の距離を算出し、単語、当該単語の容易度および距離を関連づけた単語情報データを出力する対象単語抽出手段と、単語情報データの各単語について、容易度が低く距離が近い単語について修正優先度が高くなるように、当該単語の修正優先度を算出して、重み付き単語情報データを出力する単語評価手段を備える。 A second feature of the present invention relates to a document generation support apparatus that supports generation of document data composed of a plurality of components. That is, the document generation support apparatus according to the second feature of the present invention stores a target word that stores target word ease data in which a target word to be evaluated for document data is associated with the ease of the target word. In the ease data storage unit and document data, browsing order data indicating the reader's browsing flow is acquired, and for each component, assumed browsing coordinates that are easily caught by the reader's eyes in the browsing flow are calculated. The browsing order acquisition means for outputting the assumed browsing coordinate data, the word that matches the target word in the component and the ease of the word are extracted, and the assumed browsing coordinate in the component and the distance between the extracted words Target word extracting means for calculating word and outputting word information data in which the word, the ease and distance of the word are associated, and each word of the word information data. Te, so that the distance Simplicity is low modification priority for the word is higher closer comprises calculates the modification priority of the word, the word evaluation means for outputting the weighted word information data.

ここで単語評価手段は、単語の表示属性も考慮して、修正優先度を算出しても良い。 Here, the word evaluation means may calculate the correction priority in consideration of the display attribute of the word.

また、単語の修正優先度に基づいて、構成要素ごとの修正優先度を算出し、重み付き構成要素情報データを出力する構成要素評価手段をさらに備えても良い。 Further, it may further comprise a component evaluation means for calculating a correction priority for each component based on the word correction priority and outputting weighted component information data.

本発明の第３の特徴は、コンピュータに本発明の第１の特徴に記載のステップを実行させるための文書生成支援プログラムに関する。 A third feature of the present invention relates to a document generation support program for causing a computer to execute the steps described in the first feature of the present invention.

本発明によれば、理解容易性の高い文書データの生成を支援する文書生成支援方法、文書生成支援装置および文書生成支援プログラムを提供することができる。 According to the present invention, it is possible to provide a document generation support method, a document generation support apparatus, and a document generation support program that support generation of document data with high comprehension.

本発明の実施の形態に係る文書生成支援装置の機能ブロック図である。It is a functional block diagram of the document generation assistance apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る文書データをプレビュー表示した一例を説明する図である。It is a figure explaining an example which displayed the preview of document data concerning an embodiment of the invention. 本発明の実施の形態に係る文書データのソースの一例を説明する図である。It is a figure explaining an example of the source of document data concerning an embodiment of the invention. 本発明の実施の形態に係る文書データのプレビュー表示において、構成要素を説明する図である。It is a figure explaining a component in the preview display of document data concerning an embodiment of the invention. 本発明の実施の形態に係る対象単語容易度データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of data structure and data of object word ease data concerning an embodiment of the invention. 本発明の実施の形態に係る閲覧順序データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of a data structure and data of browsing order data concerning an embodiment of the invention. 本発明の実施の形態に係る想定閲覧座標データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of data structure and data of assumption inspection coordinate data concerning an embodiment of the invention. 本発明の実施の形態において、作成者が閲覧の流れを入力した画面の一例を説明する図である。In embodiment of this invention, it is a figure explaining an example of the screen which the creator input the flow of browsing. 本発明の実施の形態において、文書データの構成要素と、閲覧の流れと、想定閲覧座標とを対応を説明する図である。In an embodiment of the present invention, it is a figure explaining correspondence with a component of document data, a flow of browsing, and an assumed browsing coordinate. 本発明の実施の形態に係る閲覧順序取得手段による閲覧順序取得処理を説明するフローチャートである。It is a flowchart explaining the browsing order acquisition process by the browsing order acquisition means which concerns on embodiment of this invention. 本発明の実施の形態に係る単語情報データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of a data structure and data of word information data concerning an embodiment of the invention. 本発明の実施の形態に係る構成要素情報データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of a data structure and data of component element information data concerning an embodiment of the invention. 本発明の実施の形態において、単語情報データの流れとの距離を説明する図である。In embodiment of this invention, it is a figure explaining the distance with the flow of word information data. 本発明の実施の形態に係る対象単語抽出手段による対象単語抽出処理を説明するフローチャートである。It is a flowchart explaining the target word extraction process by the target word extraction means which concerns on embodiment of this invention. 本発明の実施の形態に係る重み付き単語情報データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of a data structure and data of weighted word information data concerning an embodiment of the invention. 本発明の実施の形態に係る単語評価手段による単語評価処理を説明するフローチャートである。It is a flowchart explaining the word evaluation process by the word evaluation means which concerns on embodiment of this invention. 本発明の実施の形態に係る重み付き構成要素情報データのデータ構造とデータの一例を説明する図である。It is a figure explaining an example of the data structure and data of weighted component information data concerning an embodiment of the invention. 本発明の実施の形態に係る構成要素評価手段による構成要素評価処理を説明するフローチャートである。It is a flowchart explaining the component evaluation process by the component evaluation means which concerns on embodiment of this invention. 本発明の実施の形態に係る出力手段による出力結果画面の一例を説明する図である。（その１）It is a figure explaining an example of the output result screen by the output means which concerns on embodiment of this invention. (Part 1) 本発明の実施の形態に係る出力手段による出力結果画面の一例を説明する図である。（その２）It is a figure explaining an example of the output result screen by the output means which concerns on embodiment of this invention. (Part 2) 本発明の実施の形態に係る出力手段による出力処理を説明するフローチャートである。It is a flowchart explaining the output process by the output means which concerns on embodiment of this invention. 本発明の変形例に係る閲覧順序取得手段において、作成者が入力する閲覧順序の一例である。In the browsing order acquisition means which concerns on the modification of this invention, it is an example of the browsing order which a creator inputs. 本発明の変形例において、文書データの構成要素と、想定閲覧座標とを対応を説明する図である。In the modification of this invention, it is a figure explaining a response | compatibility with the component of document data, and assumption browsing coordinates.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一または類似の部分には同一または類似の符号を付している。 Next, embodiments of the present invention will be described with reference to the drawings. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals.

（実施の形態）
本発明の実施の形態に係る文書生成支援装置１は、作成者が想定する読み手の閲覧の流れや紙面構成を考慮して、作成者が作成した文書データ１１ａを評価し、理解容易性の高い文書データの生成を支援する。文書生成支援装置１は、例えば、読み手の閲覧の流れに近くにある難しい単語や、サイズや色が目立つ単語や、そのような単語が多く含まれる構成要素を優先的に修正するよう、作成者に提案することができる。作成者はこの提案に従って、この修正対象を重点的に修正することができる。このように本発明の実施の形態に係る文書生成支援装置１は、読み手が容易に理解しやすい文書データ１１ａの生成を支援することができる。 (Embodiment)
The document generation support apparatus 1 according to the embodiment of the present invention evaluates the document data 11a created by the creator in consideration of the reader's browsing flow and the page layout assumed by the creator, and is highly understandable. Supports generation of document data. For example, the document generation support device 1 is configured so that a difficult word close to the reader's browsing flow, a word that is conspicuous in size or color, or a component that includes many such words is preferentially corrected. Can be proposed. The creator can intensively correct the correction target according to the proposal. As described above, the document generation support apparatus 1 according to the embodiment of the present invention can support the generation of the document data 11a that can be easily understood by the reader.

ここで、文書データ１１ａは、例えば、専門家が一般ユーザに向けて頒布する文書のデータである。具体的には文書データ１１ａは、機器の取扱説明書や、保守用マニュアルなどである。 Here, the document data 11a is, for example, data of a document distributed by an expert to a general user. Specifically, the document data 11a is an instruction manual for equipment, a maintenance manual, or the like.

図１に示すように、本発明の実施の形態に係る文書生成支援装置１は、記憶装置１０、中央処理制御装置３０、入力装置４０および出力装置５０を備える一般的なコンピュータである。 As shown in FIG. 1, the document generation support apparatus 1 according to the embodiment of the present invention is a general computer including a storage device 10, a central processing control device 30, an input device 40, and an output device 50.

図１に示す例においては、便宜上第１の記憶装置１０ａおよび第２の記憶装置１０ｂと二つの記憶装置を備えるが、一つの記憶装置のみを備えても良いし、複数の記憶装置を備えても良い。また、記憶装置１０ａおよび１０ｂは、文書生成支援装置１に内蔵されていなくても良い。記憶装置１０ａおよび１０ｂは、文書生成支援装置１が読み出し可能な記憶装置であれば良い。 In the example shown in FIG. 1, the first storage device 10a and the second storage device 10b and two storage devices are provided for convenience, but only one storage device or a plurality of storage devices may be provided. Also good. Further, the storage devices 10 a and 10 b may not be built in the document generation support device 1. The storage devices 10 a and 10 b may be storage devices that can be read by the document generation support device 1.

図１に示す第１の記憶装置１０ａには、処理に先立って、予め記憶されるデータが記憶される。一方第２の記憶装置１０ｂには、処理によって出力されるデータが記憶される。なお実施の形態において、第１の記憶装置１０ａおよび第２の記憶装置１０ｂを特に区別しない場合、単に記憶装置１０と記載する場合がある。 The first storage device 10a shown in FIG. 1 stores data stored in advance prior to processing. On the other hand, data output by the processing is stored in the second storage device 10b. In the embodiment, when the first storage device 10a and the second storage device 10b are not particularly distinguished, they may be simply referred to as the storage device 10.

入力装置４０は、マウス、キーボードなどである。作成者は、入力装置４０を介して、中央処理制御装置３０に指示を与えることができる。出力装置５０は、液晶ディスプレイなどである。出力装置５０は、中央処理制御装置３０の指示に従って画面にデータを表示し、作成者に閲覧させることができる。文書生成支援装置１は、このような一般的なコンピュータに、所定の処理を実行するためのプログラムがインストールされ実行されることによって実現される。 The input device 40 is a mouse, a keyboard, or the like. The creator can give an instruction to the central processing control device 30 via the input device 40. The output device 50 is a liquid crystal display or the like. The output device 50 can display data on a screen in accordance with an instruction from the central processing control device 30 and allow a creator to view the data. The document generation support apparatus 1 is realized by installing and executing a program for executing predetermined processing in such a general computer.

記憶装置１０は、文書データ記憶部１１、対象単語容易度データ記憶部１２、閲覧順序データ記憶部１３、想定閲覧座標データ記憶部１４、単語情報データ記憶部１５、構成要素情報データ記憶部１６、重み付き単語情報データ記憶部１７、重み付き構成要素情報データ記憶部１８および評価結果データ記憶部１９を備える。ここで記憶装置１０は、例えばハードディスク、ＲＡＭなどであって、不揮発性または揮発性の記憶装置である。 The storage device 10 includes a document data storage unit 11, a target word ease data storage unit 12, a browsing order data storage unit 13, an assumed browsing coordinate data storage unit 14, a word information data storage unit 15, a component information data storage unit 16, A weighted word information data storage unit 17, a weighted component information data storage unit 18, and an evaluation result data storage unit 19 are provided. Here, the storage device 10 is, for example, a hard disk or RAM, and is a nonvolatile or volatile storage device.

文書データ記憶部１１は、記憶装置１０のうち、文書データ１１ａが記憶された記憶領域である。文書データ１１ａは、複数の構成要素から構成される。ここで構成要素は、本発明の実施の形態における処理の単位である。構成要素は、改行で区切られる段落であっても良いし、「。」で区切られる文章であっても良い。また、文書データ１１ａがＨＴＭＬなどの構造化文書の場合、構成要素は、ｄｉｖなどのタグで区切られる範囲であっても良い。ｄｉｖのタグは、レイアウト、文字サイズ、色などに関するタグである。構成要素が文章の単位の場合、例えば正規表現を用いた文字列処理によって、文書データ１１ａは文章に分割される。 The document data storage unit 11 is a storage area in the storage device 10 in which the document data 11a is stored. The document data 11a is composed of a plurality of components. Here, the component is a unit of processing in the embodiment of the present invention. The component may be a paragraph delimited by a line feed or a sentence delimited by “.”. Further, when the document data 11a is a structured document such as HTML, the component may be in a range delimited by a tag such as div. The div tag is a tag related to layout, character size, color, and the like. When the constituent element is a sentence unit, the document data 11a is divided into sentences by, for example, character string processing using a regular expression.

文書データ１１ａは、文書生成支援装置１が、文書データ１１ａに含まれる文字を読み取り可能なデータである。文書データ１１ａは例えば、ＰＤＦやＨＴＭＬなどの電子データであって、文字のサイズ、色などの単語情報も設定可能なデータである。また文書データ１１ａは、紙面をスキャナなどで読み取ったデータを、ＯＣＲ（Optical Character Reader）を使って文字を電子化したデータであっても良い。 The document data 11a is data that allows the document generation support apparatus 1 to read characters included in the document data 11a. The document data 11a is electronic data such as PDF and HTML, for example, and is data in which word information such as character size and color can be set. The document data 11a may be data obtained by digitizing characters using an OCR (Optical Character Reader) from data obtained by reading a paper surface with a scanner or the like.

本発明の実施の形態において文書データ１１ａは、ＨＴＭＬ形式で表現されたデータで、構成要素は、”ｄｉｖ”のタグで区切られる単位に基づいて定義される場合について説明する。 In the embodiment of the present invention, the document data 11a is data expressed in the HTML format, and a description will be given of a case where the constituent elements are defined based on a unit delimited by a “div” tag.

図２を参照して、本発明の実施の形態に係る文書データ１１ａの一例を説明する。図２に示す文書データ１１ａは、図３に示すＨＴＭＬ形式のソースデータをプレビュー表示したものである。文書データ１１ａは、左から右に文字が流れる横書き形式で、二段組み形式である。文書データ１１ａは、タイトルとして「取扱説明書」が付され、「概要」、「手順１」、「手順２」、「手順３」、「注意事項」の項目を備える。「注意事項」の項目は、右上に小さく設けられ、注釈として参照される。 An example of the document data 11a according to the embodiment of the present invention will be described with reference to FIG. The document data 11a shown in FIG. 2 is a preview display of the HTML format source data shown in FIG. The document data 11a is a horizontal writing format in which characters flow from left to right, and is a two-column format. The document data 11 a has “instruction manual” as a title, and includes items of “overview”, “procedure 1”, “procedure 2”, “procedure 3”, and “notes”. The “Notes” item is provided small in the upper right and is referred to as an annotation.

図３および図４に示すように、本発明の実施の形態において、文書データ１１ａは、５つの構成要素Ｅ１ないしＥ５を備える。図４は、図２に示すプレビュー表示と、文書データ１１ａの構成要素Ｅ１ないしＥ５を対応づけて示す。 As shown in FIGS. 3 and 4, in the embodiment of the present invention, the document data 11a includes five components E1 to E5. FIG. 4 shows the preview display shown in FIG. 2 in association with the components E1 to E5 of the document data 11a.

第１の構成要素Ｅ１は、図３に示すＨＴＭＬソースにおいて、＜ｄｉｖｃｌａｓｓ＝“ｉｎｔｒｏｄｕｃｔｉｏｎ”＞から直後の＜／ｄｉｖ＞で囲まれる文字列である。第１の構成要素Ｅ１は、図４に示すように、「概要」のサブタイトルの文字列と、それに続く文字列を含む。第２の構成要素Ｅ２は、図３に示すＨＴＭＬソースにおいて、＜ｄｉｖｃｌａｓｓ＝“ｓｔｅｐ１”＞から直後の＜／ｄｉｖ＞で囲まれる文字列である。第２の構成要素Ｅ２は、図４に示すように、「手順１」のサブタイトルの文字列と、それに続く文字列を含む。第３の構成要素Ｅ３は、図３に示すＨＴＭＬソースにおいて、＜ｄｉｖｃｌａｓｓ＝“ｓｔｅｐ２”＞から直後の＜／ｄｉｖ＞で囲まれる文字列である。第３の構成要素Ｅ３は、図４に示すように、「手順２」のサブタイトルの文字列と、それに続く文字列を含む。第４の構成要素Ｅ４は、図３に示すＨＴＭＬソースにおいて、＜ｄｉｖｃｌａｓｓ＝“ｓｔｅｐ３”＞から直後の＜／ｄｉｖ＞で囲まれる文字列である。第４の構成要素Ｅ４は、図４に示すように、「手順３」のサブタイトルの文字列と、それに続く文字列を含む。第５の構成要素Ｅ５は、図３に示すＨＴＭＬソースにおいて、＜ｄｉｖｃｌａｓｓ＝“ｎｏｔｅ”＞から直後の＜／ｄｉｖ＞で囲まれる文字列である。第５の構成要素Ｅ５は、図４に示すように、「注意事項」のサブタイトルの文字列と、それに続く文字列を含む。 The first component E1 is a character string enclosed in </ div> immediately after <div class = “introduction”> in the HTML source shown in FIG. As shown in FIG. 4, the first component E <b> 1 includes a subtitle character string “Summary” and a character string following the character string. The second component E2 is a character string enclosed in </ div> immediately after <div class = “step1”> in the HTML source shown in FIG. As shown in FIG. 4, the second component E <b> 2 includes a subtitle character string “procedure 1” and a character string following the character string. The third component E3 is a character string surrounded by </ div> immediately after <div class = "step2"> in the HTML source shown in FIG. As shown in FIG. 4, the third component E3 includes a character string of the subtitle “Procedure 2” and a subsequent character string. The fourth component E4 is a character string surrounded by </ div> immediately after <div class = “step3”> in the HTML source shown in FIG. As shown in FIG. 4, the fourth component E4 includes a character string of the subtitle “Procedure 3” and a character string subsequent thereto. The fifth component E5 is a character string enclosed in </ div> immediately after <div class = “note”> in the HTML source shown in FIG. 3. As shown in FIG. 4, the fifth component E5 includes a subtitle character string “notes” and a subsequent character string.

ここで本発明の実施の形態において、構成要素は、＜ｄｉｖ＞タグに基づいて決定される場合を説明するが、どのような条件で決定されても良い。また、図２ないし図４に示す文書データ１１ａは、文字のみによって構成されるが、図が含まれていても良い。 Here, in the embodiment of the present invention, the case where the component is determined based on the <div> tag will be described, but it may be determined under any condition. Further, the document data 11a shown in FIGS. 2 to 4 is composed only of characters, but may include figures.

対象単語容易度データ記憶部１２は、記憶装置１０のうち、対象単語容易度データ１２ａが記憶された記憶領域である。対象単語容易度データ１２ａは、図５に示すように、文書データの評価の対象となる対象単語と、当該対象単語の容易度と、を対応づけたデータである。ここで、対象単語の容易度とは、読み手にとっての対象単語の難しさを数値化したものである。対象単語の容易度は、例えば、読み手に馴染みのある単語ほど容易度は高く、馴染みのない単語ほど容易度は低く設定される。図５に示す例では、容易度は、０から１の値を採る。容易度が「１」に近い単語ほど、理解が簡単で、容易度が「０」に近い単語ほど、理解が困難である。 The target word ease data storage unit 12 is a storage area in the storage device 10 in which the target word ease data 12a is stored. As shown in FIG. 5, the target word ease data 12a is data in which a target word to be evaluated for document data is associated with the ease of the target word. Here, the ease of the target word is a numerical value of the difficulty of the target word for the reader. The ease of the target word is set such that, for example, a word familiar to the reader has a high degree of ease, and a word unfamiliar with the degree of ease is set low. In the example illustrated in FIG. 5, the ease is a value from 0 to 1. Words with a degree of ease closer to “1” are easier to understand, and words with a degree of ease closer to “0” are more difficult to understand.

対象単語容易度データ１２ａの容易度は、例えば、単語親密度が考えられる（特許文献１および非特許文献。）。単語親密度は、機器などに利用される専門用語について、単語に馴染みがあるか否かについて、予めアンケート調査を実施して得たデータを解析して算出される。 As the ease of the target word ease data 12a, for example, word familiarity can be considered (Patent Literature 1 and Non-Patent Literature). The word familiarity is calculated by analyzing data obtained by conducting a questionnaire survey in advance as to whether or not a word is familiar with a technical term used in a device or the like.

閲覧順序データ記憶部１３は、記憶装置１０のうち閲覧順序データ１３ａおよび閲覧順序データ１３ｂが記憶された記憶領域である。想定閲覧座標データ記憶部１４は、記憶装置１０のうち想定閲覧座標データ１４ａが記憶された記憶領域である。単語情報データ記憶部１５は、記憶装置１０のうち単語情報データ１５ａが記憶された記憶領域である。構成要素情報データ記憶部１６は、記憶装置１０のうち構成要素情報データ１６ａが記憶された記憶領域である。重み付き単語情報データ記憶部１７は、記憶装置１０のうち重み付き単語情報データ１７ａが記憶された記憶領域である。重み付き構成要素情報データ記憶部１８は、記憶装置１０のうち重み付き構成要素情報データ１８ａが記憶された記憶領域である。評価結果データ記憶部１９は、記憶装置１０のうち評価結果データ１９ａが記憶された記憶領域である。 The browsing order data storage unit 13 is a storage area in which the browsing order data 13 a and the browsing order data 13 b are stored in the storage device 10. The assumed browsing coordinate data storage unit 14 is a storage area in which the assumed browsing coordinate data 14 a of the storage device 10 is stored. The word information data storage unit 15 is a storage area in which the word information data 15 a of the storage device 10 is stored. The component element information data storage unit 16 is a storage area in which the component element information data 16a of the storage device 10 is stored. The weighted word information data storage unit 17 is a storage area in which the weighted word information data 17a of the storage device 10 is stored. The weighted component information data storage unit 18 is a storage area in which the weighted component information data 18a of the storage device 10 is stored. The evaluation result data storage unit 19 is a storage area in which the evaluation result data 19a of the storage device 10 is stored.

ここで、閲覧順序データ１３ａ、閲覧順序データ１３ｂ、想定閲覧座標データ１４ａ、単語情報データ１５ａ、構成要素情報データ１６ａ、重み付き単語情報データ１７ａ、重み付き構成要素情報データ１８ａおよび評価結果データ１９ａは、後述する中央処理制御装置３０の各手段によって出力されるデータであるので、後述する。 Here, the browsing order data 13a, the browsing order data 13b, the assumed browsing coordinate data 14a, the word information data 15a, the component element information data 16a, the weighted word information data 17a, the weighted component information data 18a, and the evaluation result data 19a are: Since it is data output by each means of the central processing control device 30 described later, it will be described later.

中央処理制御装置３０は、閲覧順序取得手段３１、対象単語抽出手段３２、単語評価手段３３、構成要素評価手段３４および出力手段３５を備える。 The central processing control device 30 includes a browsing order acquisition unit 31, a target word extraction unit 32, a word evaluation unit 33, a component evaluation unit 34, and an output unit 35.

閲覧順序取得手段３１は、文書データ１１ａにおいて、読み手の閲覧の流れＦを示す閲覧順序データ１３ａを取得する。さらに閲覧順序取得手段３１は、構成要素ごとに、閲覧の流れＦにおいて読み手の目にとまりやすい座標である想定閲覧座標を算出して、想定閲覧座標データ１４ａを出力する。 The browsing order acquisition unit 31 acquires browsing order data 13a indicating a reader's browsing flow F in the document data 11a. Furthermore, the browsing order acquisition means 31 calculates the assumed browsing coordinates which are coordinates that are easily caught by the reader in the browsing flow F for each component, and outputs the assumed browsing coordinate data 14a.

閲覧順序データ１３ａは、文書データ１１ａにおける読み手の閲覧の流れＦを、構成要素の単位で示すデータである。閲覧順序データ１３ａは、文書データ１１ａにおいて作成者が読み手に読ませたい範囲でもある。閲覧順序データ１３ａは、例えば図６（ａ）に示すように、構成要素の識別子を列挙して、閲覧の流れＦを示したデータである。図６（ａ）に示すデータでは、第１の構成要素Ｅ１から、第２の構成要素Ｅ２、第３の構成要素Ｅ３、第４の構成要素Ｅ４の順で、閲覧の流れＦが示されている。図６（ａ）において、閲覧順序データ１３ａに、第５の構成要素Ｅ５の情報が含まれていない。これは、第５の構成要素Ｅ５が、作成者の閲覧の流れＦ上に存在しないことを示している。 The browsing order data 13a is data indicating a reader's browsing flow F in the document data 11a in units of components. The browsing order data 13a is also a range that the creator wants the reader to read in the document data 11a. The browsing order data 13a is data indicating the browsing flow F by listing the identifiers of the constituent elements as shown in FIG. 6A, for example. In the data shown in FIG. 6A, the browsing flow F is shown in the order of the first component E1, the second component E2, the third component E3, and the fourth component E4. Yes. In FIG. 6A, the browsing order data 13a does not include information on the fifth component E5. This indicates that the fifth component E5 does not exist on the viewing flow F of the creator.

閲覧の流れＦを入力装置４０で作成者に入力させた場合、閲覧順序データ記憶部１３はさらに、閲覧順序データ１３ｂとして、図６（ｂ）に示すように、閲覧の流れＦの座標データを記憶しても良い。閲覧順序データ１３ｂは、作成者が示した閲覧の流れＦの軌跡のｘ座標とｙ座標とを対応づけたデータである。 In the case where the creator inputs the browsing flow F with the input device 40, the browsing order data storage unit 13 further displays the coordinate data of the browsing flow F as the browsing order data 13b as shown in FIG. You may remember. The browsing order data 13b is data in which the x coordinate and the y coordinate of the trajectory of the browsing flow F indicated by the creator are associated with each other.

想定閲覧座標データ１４ａは、構成要素ごとの想定閲覧座標のデータである。想定閲覧座標データ１４ａは、例えば図７に示すように、構成要素と想定閲覧座標とを対応づけたデータである。図７において、第５の構成要素Ｅ５に想定閲覧座標が対応づけられていない。これは、第５の構成要素Ｅ５が、作成者の閲覧の流れＦ上に存在しないことを示している。 The assumed browsing coordinate data 14a is assumed browsing coordinate data for each component. Assumed browsing coordinate data 14a is data in which components and assumed browsing coordinates are associated with each other, for example, as shown in FIG. In FIG. 7, the assumed browsing coordinates are not associated with the fifth component E5. This indicates that the fifth component E5 does not exist on the viewing flow F of the creator.

ここで、想定閲覧座標は、閲覧の流れＦを考慮した、読み手の目にとまりやすい座標であって、構成要素ごとに算出される。本発明の実施の形態において想定閲覧座標は、閲覧の流れＦ上の座標であって、各構成要素に最初に重なる座標であるとする。想定閲覧座標は、読み手の目にとまりやすい座標であれば、どのようなものでも構わない。例えば、文書データ１１ａが縦書きか横書きかによっても、想定閲覧座標の算出方法が異なっても良い。 Here, the assumed browsing coordinates are coordinates that are easily caught by the reader in consideration of the browsing flow F, and are calculated for each component. In the embodiment of the present invention, it is assumed that the assumed browsing coordinates are coordinates on the browsing flow F and first overlap each component. Assumed viewing coordinates may be any coordinates as long as they are easily caught by the reader. For example, the method of calculating the assumed browsing coordinates may differ depending on whether the document data 11a is written vertically or horizontally.

閲覧順序取得手段３１は、図８に示すように、作成者に、図２に示す文書データ１１ａのプレビュー表示において、作成者が想定する読み手の閲覧の流れを、マウスやペンタブレットなどの入力装置４０を使って、作成者に閲覧の流れＦを入力させる。図６示す例では、閲覧の流れＦは、画面の左列の上から下、さらに右列の上から下の流れで入力される。なお、画面右上に表示された注意事項については、閲覧の流れＦから外れる。 As shown in FIG. 8, the browsing order acquisition unit 31 provides the creator with an input device such as a mouse or a pen tablet for the flow of reading that the creator assumes in the preview display of the document data 11 a shown in FIG. 2. 40, the creator inputs the browsing flow F. In the example shown in FIG. 6, the browsing flow F is input in the flow from the top to the bottom of the left column of the screen and further from the top to the bottom of the right column. Note that the precautions displayed in the upper right of the screen are out of the browsing flow F.

閲覧順序取得手段３１は、文書データ１１ａがプレビュー表示に閲覧の流れＦを重畳して表示する。閲覧順序取得手段３１は、文書データ１１ａのプレビュー表示における閲覧の流れＦの軌跡のｘ座標およびｙ座標を算出し、図６（ｂ）に示す閲覧順序データ１３ｂとして、閲覧順序データ記憶部１３に記憶する。 The browsing order acquisition unit 31 displays the document data 11a with the browsing flow F superimposed on the preview display. The browsing order acquisition means 31 calculates the x coordinate and y coordinate of the trajectory of the browsing flow F in the preview display of the document data 11a, and stores it in the browsing order data storage unit 13 as the browsing order data 13b shown in FIG. Remember.

さらに、閲覧順序取得手段３１は、入力された閲覧の流れＦの座標と、文書データ１１ａのプレビューにおける構成要素の表示領域の座標とを比べて、構成要素単位での閲覧の流れを算出する。閲覧順序取得手段３１は、ここで算出した構成要素単位での閲覧の流れを、閲覧順序データ１３ａとして、閲覧順序データ記憶部１３に記憶する。図６に示す例で構成要素単位での閲覧の流れは、第１の構成要素Ｅ１、第２の構成要素Ｅ２、第３の構成要素Ｅ３および第４の構成要素Ｅ４の順番となる。ここで、第５の構成要素Ｅ５は、閲覧の流れから外れる。 Further, the browsing order acquisition unit 31 compares the input coordinates of the browsing flow F with the coordinates of the display area of the component in the preview of the document data 11a, and calculates the browsing flow in units of components. The browsing order acquisition unit 31 stores the browsing flow in units of components calculated here as browsing order data 13 a in the browsing order data storage unit 13. In the example shown in FIG. 6, the flow of browsing in units of constituent elements is the order of the first constituent element E1, the second constituent element E2, the third constituent element E3, and the fourth constituent element E4. Here, the fifth component E5 deviates from the browsing flow.

閲覧順序取得手段３１は、閲覧順序データ１３ａおよび閲覧順序データ１３ｂに基づいて、想定閲覧座標を算出する。想定閲覧座標は、構成要素ごとに算出される。本発明の実施の形態において閲覧順序取得手段３１は、各構成要素について、閲覧の流れＦ上の座標であって、各構成要素に最初に重なる座標を算出し、構成要素の識別子と、その構成要素の想定閲覧座標とを対応づけて、想定閲覧座標データ１４ａを算出する。 The browsing order acquisition unit 31 calculates assumed browsing coordinates based on the browsing order data 13a and the browsing order data 13b. Assumed browsing coordinates are calculated for each component. In the embodiment of the present invention, the browsing order acquisition means 31 calculates the coordinates on the browsing flow F for each component, which first overlap each component, the identifier of the component, and its configuration The assumed browsing coordinate data 14a is calculated by associating the assumed browsing coordinates of the element.

図８に示すように、第１の構成要素Ｅ１の想定閲覧座標は、閲覧の流れＦ上の座標であって、第１の構成要素Ｅ１の上辺と重なった点Ｐ１である。第２の構成要素Ｅ２の想定閲覧座標は、閲覧の流れＦ上の座標であって、第２の構成要素Ｅ２の上辺と重なった点Ｐ２である。第３の構成要素Ｅ３の想定閲覧座標は、閲覧の流れＦ上の座標であって、第３の構成要素Ｅ３の左辺と重なった点Ｐ３である。第４の構成要素Ｅ４の想定閲覧座標は、閲覧の流れＦ上の座標であって、第４の構成要素Ｅ４の上辺と重なった点Ｐ４である。ここで、第５の構成要素Ｅ５と閲覧の流れＦは重ならないので、第５の構成要素Ｅ５について、想定閲覧座標は存在しない。従って、図７に示す想定閲覧座標データ１４ａにおいて、第５の構成要素Ｅ５には、想定閲覧座標は対応づけられていない。 As shown in FIG. 8, the assumed viewing coordinates of the first component E1 are coordinates on the viewing flow F and are a point P1 that overlaps the upper side of the first component E1. The assumed viewing coordinates of the second component E2 are coordinates on the viewing flow F, and are a point P2 that overlaps with the upper side of the second component E2. The assumed viewing coordinates of the third component E3 are coordinates on the viewing flow F and are a point P3 that overlaps the left side of the third component E3. The assumed browsing coordinates of the fourth component E4 are coordinates on the browsing flow F, and are a point P4 overlapping the upper side of the fourth component E4. Here, since the fifth component E5 and the browsing flow F do not overlap, there is no assumed browsing coordinate for the fifth component E5. Therefore, in the assumed browsing coordinate data 14a shown in FIG. 7, the assumed browsing coordinates are not associated with the fifth component E5.

図１０を参照して、本発明の実施の形態に係る閲覧順序取得手段３１による閲覧順序取得処理を説明する。 With reference to FIG. 10, the browsing order acquisition process by the browsing order acquisition means 31 which concerns on embodiment of this invention is demonstrated.

まず閲覧順序取得手段３１は、ステップＳ１０１において記憶装置１０から文書データ１１ａを読み出し、ステップＳ１０２において閲覧の流れＦを取得する。上述した様に、作成者が入力装置４０を用いて文書データ１１ａのプレビュー表示上に閲覧の流れＦを入力する。閲覧順序取得手段３１は、作成者が入力した閲覧の流れＦの座標を取得し、閲覧順序データ１３ｂを出力することができる。 First, the browsing order acquisition unit 31 reads the document data 11a from the storage device 10 in step S101, and acquires the browsing flow F in step S102. As described above, the creator uses the input device 40 to input the browsing flow F on the preview display of the document data 11a. The browsing order acquisition means 31 can acquire the coordinates of the browsing flow F input by the creator and output the browsing order data 13b.

ステップＳ１０３において閲覧順序取得手段３１は、ステップＳ１０２で取得した閲覧の流れＦに従って、文書データ１１ａの構成要素の閲覧順序を決定する。閲覧順序取得手段３１は、構成要素の表示領域と、閲覧の流れＦの座標とを比較して、構成要素の単位での閲覧の流れを決定し、閲覧順序データ１３ａを出力する。構成要素の単位の閲覧の流れが決定されると、各構成要素についてステップＳ１０５およびステップＳ１０６の処理を繰り返す。 In step S103, the browsing order acquisition unit 31 determines the browsing order of the components of the document data 11a according to the browsing flow F acquired in step S102. The browsing order acquisition unit 31 compares the display area of the component and the coordinates of the browsing flow F, determines the browsing flow in units of the component, and outputs the browsing order data 13a. When the flow of browsing in units of constituent elements is determined, the processes in steps S105 and S106 are repeated for each constituent element.

まずステップＳ１０５において閲覧順序取得手段３１は、当該構成要素と、閲覧の流れＦが重なるかを判定する。閲覧順序取得手段３１は、閲覧順序データ１３に当該構成要素の識別子が含まれている場合、当該構成要素と閲覧の流れＦが重なると判断する。構成要素と閲覧の流れＦが重なる場合、ステップＳ１０６において閲覧順序取得手段３１は、当該構成要素における想定閲覧座標を算出する。一方、重ならない場合、閲覧順序取得手段３１は、次の構成要素についてステップＳ１０５およびステップＳ１０６の処理を繰り返す。 First, in step S105, the browsing order acquisition unit 31 determines whether the component and the browsing flow F overlap each other. When the browsing order data 13 includes the identifier of the constituent element, the browsing order acquisition unit 31 determines that the constituent element and the browsing flow F overlap. When the component and the browsing flow F overlap, in step S106, the browsing order acquisition unit 31 calculates assumed browsing coordinates in the component. On the other hand, when not overlapping, the browsing order acquisition means 31 repeats the process of step S105 and step S106 about the following component.

すべての構成要素について、ステップＳ１０５およびステップＳ１０６の処理が終了すると、閲覧順序取得手段３１は、構成要素の識別子と、想定閲覧座標を算出できた場合はその想定閲覧座標と、を対応づけた想定閲覧座標データ１４ａを出力する。 When the processing of step S105 and step S106 is completed for all the constituent elements, the browsing order acquisition unit 31 assumes that the identifiers of the constituent elements are associated with the assumed browsing coordinates when the assumed browsing coordinates can be calculated. The browsing coordinate data 14a is output.

次に、対象単語抽出手段３２を説明する。対象単語抽出手段３２は、構成要素ごとに、当該構成要素における対象単語と一致する単語および当該単語の容易度を抽出する。さらに対象単語抽出手段３２は、当該構成要素における想定閲覧座標と、抽出された単語の距離を算出し、単語、当該単語の容易度および距離を関連づけた単語情報データ１５ａを出力する。さらに対象単語抽出手段３２は、構成要素情報データ１６ａを出力する。 Next, the target word extracting unit 32 will be described. The target word extraction unit 32 extracts, for each component, a word that matches the target word in the component and the ease of the word. Further, the target word extraction unit 32 calculates the assumed browsing coordinates in the component and the distance between the extracted words, and outputs the word information data 15a that associates the word, the ease of the word, and the distance. Further, the target word extraction means 32 outputs the component element information data 16a.

単語情報データ１５ａは、単語、当該単語の容易度および距離を関連づけたデータである。単語情報データ１５ａにはさらに、流れにおける出現位置や単語の表示属性が対応づけられても良い。表示属性は例えば、単語の大きさ、色などの単語の装飾情報である。 The word information data 15a is data that associates a word, the ease of the word, and the distance. The word information data 15a may further be associated with an appearance position in the flow and a display attribute of the word. The display attributes are, for example, word decoration information such as word size and color.

単語情報データ１５ａは、例えば、図１１に示すデータ構造とデータの一例を備える。単語情報データ１５ａは、文書データ１１ａに含まれる単語の識別子、単語、出現位置、流れとの距離、文字の大きさ、文字の色および容易度を対応づけたデータである。ここで、単語の識別子は、対象単語抽出手段３２が採番する。単語、文字の大きさおよび文字の色は、文書データ１１ａから取得される。対象単語抽出手段３２は、文書データ１１ａがＨＴＭＬ形式の場合、そのソースファイルのタグから容易に取得できる。容易度は、対象単語容易度データ１２ａにおいて、当該単語に対応づけられた容易度である。ここで、図の説明文の一部として単語が表示される場合、図が大きいほど、単語の文字の大きさを大きくなるように調整し、この説明文における単語の重要度が高くなるように設定しても良い。 The word information data 15a includes, for example, an example of the data structure and data shown in FIG. The word information data 15a is data in which a word identifier, a word, an appearance position, a distance from a flow, a character size, a character color, and an ease included in the document data 11a are associated with each other. Here, the target word extraction means 32 numbers the word identifier. The word, character size, and character color are acquired from the document data 11a. When the document data 11a is in the HTML format, the target word extraction unit 32 can easily obtain the target word extraction unit 32 from the tag of the source file. The easiness is an easiness associated with the word in the target word easiness data 12a. Here, when a word is displayed as a part of the explanatory text in the figure, the larger the figure, the larger the character size of the word, and the higher the importance of the word in the explanatory text. May be set.

出現位置は、閲覧の流れＦにおける単語の位置である。当該単語が、閲覧の流れＦの最初に出現するのか、後方に出現するのか、あるいは閲覧の流れＦの範囲外なのか、が示される。出現位置の算出においては、例えば図６（ａ）に示す閲覧順序データ１３ａが参照される。対象単語抽出手段３２は、構成要素ごとに、閲覧の流れの前半、中盤、後半を対応づけるとともに、閲覧順序データ１３ａに含まれない構成要素に、範囲外を対応づける。図１１に示す例は、第１の構成要素Ｅ１に含まれる単語の出現位置について前半、第２の構成要素Ｅ２および第３の構成要素Ｅ３に含まれる単語の出現位置について中盤、第４の構成要素Ｅ４に含まれる単語の出現位置について後半、第５の構成要素Ｅ５に含まれる単語の出現位置について範囲外と、それぞれ対応づける。本発明の実施の形態においては、構成要素に基づいて出現位置を決定する場合を説明するが、これに限られない。 The appearance position is a word position in the browsing flow F. Whether the word appears at the beginning of the browsing flow F, appears backward, or is outside the range of the browsing flow F is indicated. In calculating the appearance position, for example, browsing order data 13a shown in FIG. The target word extraction unit 32 associates the first half, the middle, and the second half of the flow of browsing for each component, and associates out-of-range with the components not included in the browsing order data 13a. The example shown in FIG. 11 shows the first half of the appearance position of the word included in the first component E1, the middle part of the appearance position of the word included in the second component E2 and the third component E3, and the fourth configuration. The appearance position of the word included in the element E4 is associated with the latter half, and the appearance position of the word included in the fifth component E5 is associated with out of range. In the embodiment of the present invention, the case where the appearance position is determined based on the components will be described, but the present invention is not limited to this.

流れとの距離は、単語と、各構成要素ごとの想定閲覧座標からの距離である。本発明の実施の形態において流れとの距離は、単閲覧の流れを考慮した距離である。本発明の実施の形態において流れとの距離は、単語と想定閲覧座標との物理的な距離とは異なることが一般的である。本発明の実施の形態において流れとの距離は、各構成要素について、各構成要素の想定閲覧座標を基準に読み手が閲覧する流れを考慮して算出される。 The distance to the flow is the distance from the word and the assumed viewing coordinates for each component. In the embodiment of the present invention, the distance to the flow is a distance considering a single browsing flow. In the embodiment of the present invention, the distance from the flow is generally different from the physical distance between the word and the assumed viewing coordinates. In the embodiment of the present invention, the distance from the flow is calculated for each component in consideration of the flow that the reader browses based on the assumed viewing coordinates of each component.

構成要素情報データ１６ａは、図１２示すように、構成要素の識別子と、当該構成要素に含まれる単語の識別子を対応づけたデータである。 As shown in FIG. 12, the constituent element information data 16a is data in which constituent element identifiers are associated with identifiers of words included in the constituent elements.

対象単語抽出手段３２は、文書データ１１ａから対象単語を抽出する。対象単語抽出手段３２は、文書データ１１ａを形態素解析して名詞を抽出する。対象単語抽出手段３２は、抽出された名詞のうち、対象単語容易度データ１２ａに含まれる対象単語を、文書データ１１ａに用いられている単語として抽出する。ここで抽出される単語は、本発明の実施の形態に係る文書生成支援装置１の評価対象となる。さらに対象単語抽出手段３２は、この単語の文字の大きさおよび文字の色を取得するとともに、対象単語容易度データ１２ａを参照して、この単語の容易度を取得する。対象単語抽出手段３２は、取得した単語に単語識別子を採番するとともに、単語、単語の文字の大きさ、文字の色および容易度を対応づけて、単語情報データ１５ａを生成し、単語情報データ記憶部１５に記憶する。 The target word extraction unit 32 extracts a target word from the document data 11a. The target word extracting unit 32 extracts nouns by performing morphological analysis on the document data 11a. The target word extraction means 32 extracts the target word contained in the target word ease data 12a among the extracted nouns as a word used in the document data 11a. The word extracted here becomes an evaluation target of the document generation support apparatus 1 according to the embodiment of the present invention. Further, the target word extraction means 32 acquires the character size and character color of the word, and acquires the ease of the word with reference to the target word ease data 12a. The target word extraction unit 32 assigns a word identifier to the acquired word, associates the word, the character size of the word, the color of the character, and the ease, and generates word information data 15a. Store in the storage unit 15.

対象単語抽出手段３２は、取得した単語について、その単語の含まれる構成要素を対応づけて、構成要素情報データ１６ａを生成する。また、対象単語抽出手段３２は、閲覧の流れＦを考慮して、前半、中盤、後半、範囲外など、構成要素の出現位置を算出する。本発明の実施の形態において対象単語抽出手段３２は、構成要素の出現位置の情報を、単語の出現位置の情報として、単語情報データ１５ａに記憶する。 The target word extracting unit 32 generates the component information data 16a by associating the acquired word with the component included in the word. In addition, the target word extraction unit 32 calculates the appearance positions of the constituent elements in consideration of the browsing flow F, such as the first half, the middle board, the second half, and out of range. In the embodiment of the present invention, the target word extraction unit 32 stores the information on the appearance position of the component in the word information data 15a as the information on the appearance position of the word.

さらに対象単語抽出手段３２は、各単語の流れとの距離を算出する。本発明の実施の形態において流れとの距離は、文書データ１１ａ全体における閲覧の流れＦと、各構成要素において想定閲覧座標を基準に読み手が閲覧する流れに基づいて算出される。 Further, the target word extraction unit 32 calculates a distance from each word flow. In the embodiment of the present invention, the distance from the flow is calculated based on the browsing flow F in the entire document data 11a and the flow that the reader browses based on the assumed browsing coordinates in each component.

図１３を参照して、本発明の実施の形態に係る流れとの距離の算出方法を説明する。図１３は、文書データ１１ａの第１の構成要素Ｅ１を拡大した図である。第１の構成要素Ｅ１上に、閲覧の流れＦの一部が重なっている。図８に示すように、この閲覧の流れＦは、図１３に示す図の上から下へ流れる。本発明の実施の形態において想定閲覧座標は、閲覧の流れＦ上の座標であって、各構成要素に最初に重なる座標である。従って、第１の構成要素Ｅ１の想定閲覧座標は、Ｐ１となる。また、本発明の実施の形態のおいて流れとの距離は、各構成要素について、各構成要素の想定閲覧座標を基準に読み手が閲覧する流れを考慮して算出される。そこで流れとの距離Ｌは、下記の式１で表現される。 With reference to FIG. 13, the calculation method of the distance with the flow which concerns on embodiment of this invention is demonstrated. FIG. 13 is an enlarged view of the first component E1 of the document data 11a. A part of the browsing flow F overlaps the first component E1. As shown in FIG. 8, this browsing flow F flows from the top to the bottom of the diagram shown in FIG. In the embodiment of the present invention, the assumed browsing coordinates are coordinates on the browsing flow F, and are the coordinates that first overlap each component. Accordingly, the assumed viewing coordinate of the first component E1 is P1. In the embodiment of the present invention, the distance from the flow is calculated for each component in consideration of the flow that the reader browses based on the assumed viewing coordinates of each component. Therefore, the distance L from the flow is expressed by the following formula 1.

流れとの距離Ｌ＝基本距離＋文字数ｋ×係数ｎ
＝ＳＱＲＴ（（ｘ１−ｘ０）＾２＋（ｙ１−ｙ０）＾２）＋ｋ×ｎ
・・・（式１）
ここで、基本距離とは、想定閲覧座標Ｐ１と、第１の構成要素Ｅ１の左上端の頂点Ｐ０との距離である。文字数は、第１の構成要素Ｅ１の文字列の先頭から、単語までの文字の数である。係数ｎは、各単語の文字の幅に相当する長さである。 Distance to flow L = basic distance + number of characters k × coefficient n
= SQRT ((x1-x0) ^ 2 + (y1-y0) ^ 2) + k * n
... (Formula 1)
Here, the basic distance is a distance between the assumed viewing coordinate P1 and the top left vertex P0 of the first component E1. The number of characters is the number of characters from the beginning of the character string of the first component E1 to the word. The coefficient n is a length corresponding to the character width of each word.

基本距離は、閲覧の流れＦに従って読み手が文書データ１１ａを閲覧し、この第１の構成要素Ｅ１の文字列を閲覧する際、第１の構成要素Ｅ１の文字列の先頭から、評価対象となる単語まで、読み手が目を動かす距離となる。対象単語抽出手段３２は、第１の構成要素Ｅ１の文字列の先頭から、評価対象の単語までの文字数ｋに係数ｎを乗算したものを、基本距離に加算することにより、流れとの距離Ｌとして、想定閲覧座標Ｐ１から、単語までの距離を算出することができる。 The basic distance is an evaluation target from the beginning of the character string of the first component E1 when the reader browses the document data 11a according to the browsing flow F and browses the character string of the first component E1. The distance that the reader moves his eyes to the word. The target word extraction unit 32 adds the number k of characters from the head of the character string of the first component E1 to the word to be evaluated multiplied by the coefficient n to the basic distance, thereby adding a distance L to the flow. As described above, the distance to the word can be calculated from the assumed browsing coordinates P1.

ここで、閲覧順序データ１３ａにおいて、閲覧の流れＦの範囲外とされた構成要素においても、流れとの距離Ｌは算出される。このような構成要素に含まれる単語について、対象単語抽出手段３２は、最も近い想定閲覧座標との距離を、この単語と流れとの距離Ｌと算出する。例えば、図９に示す例において、「注意事項」で始まる第５の構成要素Ｅ５は、閲覧の流れＦの範囲外である。そこで対象単語抽出手段３２は、この第５の構成要素Ｅ５に最も近い想定閲覧座標として、第３の構成要素Ｅ３の想定閲覧座標Ｐ３を特定し、この想定閲覧座標Ｐ３と単語との距離を、流れとの距離Ｌとして算出する。 Here, in the browsing order data 13a, the distance L to the flow is calculated even for components that are outside the range of the browsing flow F. For a word included in such a component, the target word extraction unit 32 calculates the distance L between this word and the flow as the distance from the closest assumed viewing coordinate. For example, in the example illustrated in FIG. 9, the fifth component E <b> 5 that starts with “Caution” is out of the range of the browsing flow F. Therefore, the target word extraction means 32 specifies the assumed browsing coordinates P3 of the third component E3 as the assumed browsing coordinates closest to the fifth component E5, and determines the distance between the assumed browsing coordinates P3 and the word, Calculated as a distance L to the flow.

図１４を参照して、本発明の実施の形態に係る対象単語抽出手段３２による対象単語抽出処理を説明する。 With reference to FIG. 14, the target word extraction process by the target word extraction means 32 which concerns on embodiment of this invention is demonstrated.

対象単語抽出手段３２は、ステップＳ２０１において記憶装置１０から文書データ１１ａを読み出すとともに、ステップＳ２０２において対象単語容易度データ１２ａを読み出す。次に対象単語抽出手段３２は、文書データ１１ａの各構成要素について、ステップＳ２０３ないしステップＳ２０９の処理を繰り返す。 The target word extraction unit 32 reads the document data 11a from the storage device 10 in step S201, and reads the target word ease data 12a in step S202. Next, the target word extracting unit 32 repeats the processing from step S203 to step S209 for each component of the document data 11a.

対象単語抽出手段３２は、ステップＳ２０３において、当該構成要素から対象単語容易度データ１２ａに登録された対象単語を抽出する。ステップＳ２０４において対象単語抽出手段３２は、ステップＳ２０３で抽出した単語と、当該構成要素とを対応づけて、構成要素情報データ１６ａに出力する。さらに、対象単語抽出手段３２は、当該構成要素について、前半、中盤、後半、範囲外などの、閲覧の流れＦにおける当該構成要素の出現位置を算出する。 In step S203, the target word extraction unit 32 extracts the target word registered in the target word ease data 12a from the component. In step S204, the target word extracting unit 32 associates the word extracted in step S203 with the constituent element and outputs the same to the constituent element information data 16a. Further, the target word extraction unit 32 calculates the appearance position of the component in the browsing flow F, such as the first half, the middle, the second half, and the out of range for the component.

次に、ステップＳ２０３で抽出された各単語について、ステップＳ２０６ないしステップＳ２０９の処理を繰り返す。 Next, the processing from step S206 to step S209 is repeated for each word extracted in step S203.

対象単語抽出手段３２は、ステップＳ２０６において対象単語容易度データ１２ａから、当該単語の容易度を抽出し、ステップＳ２０７において想定閲覧座標からの距離を算出する。さらにステップＳ２０８において対象単語抽出手段３２は、単語の文字の大きさ、色などの表示属性を取得する。 The target word extraction unit 32 extracts the ease of the word from the target word ease data 12a in step S206, and calculates the distance from the assumed viewing coordinates in step S207. In step S208, the target word extracting unit 32 acquires display attributes such as the size and color of the characters of the word.

ステップＳ２０９において対象単語抽出手段３２は、単語と、ステップＳ２０５で算出した出現位置と、ステップＳ２０６で取得した容易度と、ステップＳ２０７で算出した想定閲覧座標からの距離と、ステップＳ２０８で取得した単語の表示属性と、を対応づけたレコードを生成する。対象単語抽出手段３２は、生成したレコードを、単語情報データ１５ａに出力する。 In step S209, the target word extraction unit 32 determines the word, the appearance position calculated in step S205, the ease acquired in step S206, the distance from the assumed viewing coordinates calculated in step S207, and the word acquired in step S208. A record that associates the display attribute with is generated. The target word extraction unit 32 outputs the generated record to the word information data 15a.

すべての構成要素および単語について、ステップＳ２０３ないしステップＳ２０９の処理が終了すると、対象単語抽出手段３２による対象単語抽出処理は終了する。 When the processing from step S203 to step S209 is completed for all the constituent elements and words, the target word extraction processing by the target word extraction unit 32 is completed.

次に、単語評価手段３３を説明する。単語評価手段３３は、単語情報データ１５ａの各単語について、容易度が低く距離が近い単語について修正優先度が高くなるように、単語の修正優先度を算出して、重み付き単語情報データ１７ａを出力する。ここで、単語の修正優先度とは、文書データ１１ａにおいて、単語の修正を必要とする度合いである。修正優先度が高い単語は、文書データ１１ａの理解容易度を向上させるために、作業者によって修正されることが好ましい。ここで、単語評価手段３３は、単語の容易度および流れとの距離だけではなく、単語の表示属性も考慮して、修正優先度を算出しても良い。 Next, the word evaluation means 33 will be described. For each word in the word information data 15a, the word evaluation means 33 calculates a correction priority of the word so that the correction priority is high for a word with a low degree of ease and a short distance, and the weighted word information data 17a is obtained. Output. Here, the word correction priority is a degree of necessity of correcting the word in the document data 11a. A word having a high correction priority is preferably corrected by an operator in order to improve the understanding level of the document data 11a. Here, the word evaluation means 33 may calculate the correction priority in consideration of not only the ease of the word and the distance from the flow but also the display attribute of the word.

図１５を参照して、本発明の実施の形態に係る重み付き単語情報データ１７ａを説明する。図１５に示す重み付き単語情報データ１７ａは、図１１に示す単語情報データ１５ａと比べて、修正優先度および点数の項目を備えている点が異なる。この点数は、単語情報データ１５ａの各項目から算出される。図１５に示す例において、単語の修正優先度は、点数を降順にソートした順番である。 With reference to FIG. 15, the weighted word information data 17a according to the embodiment of the present invention will be described. The weighted word information data 17a shown in FIG. 15 is different from the word information data 15a shown in FIG. 11 in that it has correction priority and score items. This score is calculated from each item of the word information data 15a. In the example shown in FIG. 15, the word correction priority is the order in which the scores are sorted in descending order.

本発明の実施の形態において単語の修正優先度は、閲覧の流れＦに基づいて読み手の目にとまりやすい難しい単語ほど高く設定される。具体的には、単語の容易度が低く、想定閲覧座標からの距離が近く、出現位置が前半であり、文字サイズが大きく、文字の色が誘目度の高いほど、修正すべきであると考える。図１５に示す例では、修正すべき単語ほど点数が高くなるように、点数が設定される。また、修正すべき単語ほど、修正優先度の数値が小さくなるように設定される。 In the embodiment of the present invention, the word correction priority is set higher for difficult words that are easily caught by the reader based on the browsing flow F. Specifically, the easier the word is, the shorter the distance from the assumed viewing coordinate, the first half of the appearance position, the larger the character size, and the higher the color of the character, the more the character should be corrected. . In the example shown in FIG. 15, the score is set so that the word to be corrected has a higher score. Further, the correction priority value is set to be smaller for the word to be corrected.

ここで、単語評価手段３３が、（１）単語の容易度、（２）想定閲覧座標からの距離、（３）出現位置、（４）文字サイズおよび（５）文字の色の各要素点を総合的に評価して、単語の修正優先度を決定する場合を説明する。 Here, the word evaluation means 33 determines each element point of (1) word ease, (2) distance from assumed viewing coordinates, (3) appearance position, (4) character size, and (5) character color. A case will be described in which overall evaluation is performed to determine the word correction priority.

単語評価手段３３は、（１）単語の容易度について、Ｘ１＝１−容易度を設定する。単語評価手段３３は、（２）想定閲覧座標からの距離について、Ｘ２＝距離点を設定する。単語評価手段３３は、（３）出現位置について、Ｘ３＝位置点を設定する。単語評価手段３３は、（４）文字サイズについて、Ｘ４＝サイズ点を設定する。単語評価手段３３は、（６）文字の色について、Ｘ５＝誘目度点を設定する。ここで、距離点、位置点、サイズ点および誘目度点の各要素点は、単語情報データ１５ａの各データに基づいて、０から１の間の範囲内に換算されて設定される。この各要素点は、想定閲覧座標からの距離が近く、出現位置が前半であり、文字サイズが大きく、文字の色が誘目度の高いほど、１に近くなるように設定される。 The word evaluation means 33 sets (1) X1 = 1-easyness for (1) word ease. The word evaluation unit 33 sets (2) a distance point with respect to the distance from the assumed viewing coordinates (2). The word evaluation means 33 (3) sets X3 = position point for the appearance position. The word evaluation means 33 sets X4 = size point for (4) character size. The word evaluation means 33 (6) sets X5 = attraction level for the character color. Here, each element point of the distance point, the position point, the size point, and the attractiveness point is converted and set within a range between 0 and 1 based on each data of the word information data 15a. Each element point is set to be closer to 1 as the distance from the assumed viewing coordinate is closer, the appearance position is in the first half, the character size is larger, and the character color is higher in degree of attraction.

単語評価手段３３は、これらＸ１〜Ｘ５の各要素点素店を用いて、式２により、単語の点数Ｙを算出することができる。 The word evaluation means 33 can calculate the score Y of the word by Equation 2 using each of the element point stores X1 to X5.

Ｙ＝２０×ΣＸｉ（Ｉ＝１．．５）・・・（式２）
ここで、図１５に示す単語識別し「Ｗ１」の単語「ＬＡＮ」について、点数Ｙを算出するにあたり、各要素点が予め設定される。ここでは、各要素点が、（１）単語の容易度（＝０．７）について、Ｘ１＝１−０．７＝０．３、（２）閲覧順序との距離（＝近い）について、Ｘ２＝１点、（３）出現位置（＝前半）について、Ｘ３＝１点、（４）文字サイズ（＝普通）について、Ｘ４＝０．９点、（５）文字の色（＝黒）について、Ｘ５＝０．９点と設定されるとする。 Y = 20 × ΣXi (I = 1..5) (Formula 2)
Here, in calculating the score Y for the word “LAN” identified by the word “W1” shown in FIG. 15, each element point is set in advance. Here, each element point has (1) X1 = 1−0.7 = 0.3 for the word ease (= 0.7), and (2) X2 for the distance (= close) from the viewing order. = 1 point, (3) For the appearance position (= first half), X3 = 1 point, (4) For the character size (= normal), X4 = 0.9 point, (5) For the character color (= black) It is assumed that X5 = 0.9 point is set.

この場合、点数Ｙは、２０×（（１−０．７）＋１＋１＋０．９＋０．９）＝８２点と算出される。この単語は、容易度が高い言葉であるが、読み手の目にとまる可能性が高いので、点数Ｙは高く算出される。ここでは、単語の容易度、距離点、位置点、サイズ点および誘目度点のすべてを考慮して点数Ｙを算出したが、この例に限られない。例えば、単語の容易度および距離点のみを使って点数Ｙを算出しても、読み手の閲覧の流れを考慮して、単語を評価することができる。また、単語の容易度、距離点、位置点、サイズ点および誘目度点をそれぞれ重み付けして、点数Ｙが算出されても良い。 In this case, the score Y is calculated as 20 × ((1−0.7) + 1 + 1 + 0.9 + 0.9) = 82 points. Although this word is a word with high ease, it is highly likely that it will be caught by the reader's eyes, so the score Y is calculated to be high. Here, the score Y is calculated in consideration of all of the word ease, the distance point, the position point, the size point, and the attractiveness point, but the present invention is not limited to this example. For example, even if the score Y is calculated using only the word ease and distance points, the word can be evaluated in consideration of the flow of reading by the reader. Further, the score Y may be calculated by weighting the word ease, distance point, position point, size point, and attraction degree point.

単語評価手段３３は、このように算出した点数を、単語識別子「Ｗ１」の点数として、重み付き単語情報データ１７ａに出力する。単語評価手段３３は、単語情報データ１５ａのすべての単語について点数Ｙを算出すると、点数Ｙに基づいて降順にソートする。単語評価手段３３は、ソートされた順番に基づいて、修正優先度を付与する。単語評価手段３３は、単語と、この単語について算出した点数および修正優先度を対応づけて、重み付き単語情報データ１７ａに出力する。 The word evaluation means 33 outputs the score thus calculated to the weighted word information data 17a as the score of the word identifier “W1”. When the word evaluation means 33 calculates the score Y for all the words in the word information data 15a, the word evaluation means 33 sorts in descending order based on the score Y. The word evaluation means 33 gives a correction priority based on the sorted order. The word evaluation means 33 associates the word with the score calculated for this word and the modification priority, and outputs it to the weighted word information data 17a.

図１６を参照して、本発明の実施の形態に係る単語評価手段３３による単語評価処理を説明する。 With reference to FIG. 16, the word evaluation process by the word evaluation means 33 which concerns on embodiment of this invention is demonstrated.

単語評価手段３３は、ステップＳ３０１において記憶装置１０から単語情報データ１５ａを読み出し、単語情報データ１５ａの各単語について、ステップＳ３０２を処理する。単語評価手段３３は、ステップＳ３０２について、単語情報データ１５ａの単語の点数を算出する。 The word evaluation means 33 reads the word information data 15a from the storage device 10 in step S301, and processes step S302 for each word in the word information data 15a. The word evaluation means 33 calculates the score of the word of the word information data 15a about step S302.

単語情報データ１５ａのすべての単語について点数を算出すると、ステップＳ３０３において単語評価手段３３は、ステップＳ３０２で算出した点数に基づいて、修正優先度を付与する。ステップＳ３０３において単語評価手段３３は、単語と、ステップＳ３０２で算出した点数と、ステップＳ３０３で付与した修正優先度と、を対応づけて、重み付き単語情報データ１７ａに出力する。ここで単語評価手段３３は、単語情報データ１５ａの各項目も、対応づけて重み付き単語情報データ１７ａに出力しても良い。 When the scores are calculated for all the words in the word information data 15a, in step S303, the word evaluation means 33 gives a correction priority based on the scores calculated in step S302. In step S303, the word evaluation unit 33 associates the word, the score calculated in step S302, and the modification priority given in step S303, and outputs the result to the weighted word information data 17a. Here, the word evaluation means 33 may also associate each item of the word information data 15a and output it to the weighted word information data 17a.

次に、構成要素評価手段３４を説明する。構成要素評価手段３４は、単語の修正優先度に基づいて、構成要素ごとの修正優先度を算出し、重み付き構成要素情報データ１８ａを出力する。構成要素評価手段３４は、重み付き単語情報データ１７ａを参照して、文書データ１１ａの各対象単語の点数を取得する。構成要素評価手段３４は、構成要素情報データ１６ａを参照して、構成要素ごとに、当該構成要素に含まれる単語に対応する点数に基づいて、構成要素ごとの点数を算出する。 Next, the component element evaluation means 34 will be described. The component evaluation means 34 calculates a correction priority for each component based on the word correction priority, and outputs weighted component information data 18a. The component evaluation means 34 refers to the weighted word information data 17a and acquires the score of each target word in the document data 11a. The component evaluation means 34 refers to the component information data 16a, and calculates the score for each component based on the score corresponding to the word included in the component for each component.

重み付き構成要素情報データ１８ａは、図１７に示すように、構成要素の識別子と、当該構成要素に含まれる単語の識別子と、修正優先度と、点数とを対応づけたデータである。図１７に示す重み付き構成要素情報データ１８ａは、図１２に示す構成要素情報データ１６ａと比べて、修正優先度および点数の項目を備えている点が異なる。この点数は、重み付き単語情報データ１７ａの点数から算出される。図１７に示す例において、単語の修正優先度は、点数を降順にソートした順番である。 As shown in FIG. 17, the weighted component information data 18a is data in which a component identifier, a word identifier included in the component, a modification priority, and a score are associated with each other. The weighted component information data 18a shown in FIG. 17 is different from the component information data 16a shown in FIG. 12 in that it includes items of correction priority and score. This score is calculated from the score of the weighted word information data 17a. In the example shown in FIG. 17, the correction priority of words is the order in which the scores are sorted in descending order.

構成要素ごとの点数の算出方法として、構成要素に含まれるすべての対象単語の点数を加算した点数を、構成要素ごとの点数とする方法がある。この構成要素ごとに算出された点数を用いて、構成要素評価手段３４は、例えば、構成要素ごとの点数が高いほどこの構成要素の修正優先度が高くなるように、構成要素ごとの修正優先度を算出する。 As a method for calculating the score for each component, there is a method in which the score obtained by adding the scores of all target words included in the component is used as the score for each component. Using the score calculated for each component, the component evaluation means 34, for example, the modification priority for each component so that the modification priority of this component increases as the score for each component increases. Is calculated.

図１８を参照して、構成要素評価手段３４による構成要素評価処理を説明する。構成要素評価手段３４は、ステップＳ４０１において記憶装置１０から構成要素情報データ１６ａを読み出すとともに、ステップＳ４０２において重み付き単語情報データ１７ａを読み出す。次に構成要素評価手段３４は、文書データ１１ａの各構成要素について、ステップＳ４０３およびステップＳ４０４の処理を繰り返す。 With reference to FIG. 18, the component evaluation process by the component evaluation means 34 is demonstrated. The component evaluation means 34 reads the component element information data 16a from the storage device 10 in step S401, and reads the weighted word information data 17a in step S402. Next, the constituent element evaluation unit 34 repeats the processing of step S403 and step S404 for each constituent element of the document data 11a.

ステップＳ４０３において構成要素評価手段３４は、構成要素情報データ１６ａから当該構成要素に含まれるすべての単語の識別子を取得するとともに、重み付き単語情報データ１７ａからこれらの単語の点数を取得する。ステップＳ４０４において、構成要素評価手段３４は、ステップＳ４０３で取得した単語の点数の合計を、当該構成要素の点数とする。 In step S403, the constituent element evaluation means 34 acquires identifiers of all words included in the constituent element from the constituent element information data 16a, and acquires the scores of these words from the weighted word information data 17a. In step S404, the component evaluation means 34 sets the total score of the words acquired in step S403 as the score of the component.

すべての構成要素について点数を算出すると、ステップＳ４０５において構成要素評価手段３４は、ステップＳ４０４で算出した点数に基づいて、修正優先度を付与する。ステップＳ４０６において構成要素評価手段３４は、構成要素の識別子と、ステップＳ４０４で算出した点数と、ステップＳ４０５で付与した修正優先度と、を対応づけて、重み付き構成要素情報データ１８ａに出力する。 When the scores have been calculated for all the constituent elements, the constituent element evaluation means 34 assigns a correction priority in step S405 based on the scores calculated in step S404. In step S406, the component evaluation means 34 associates the identifier of the component, the score calculated in step S404, and the modification priority given in step S405, and outputs the result to the weighted component information data 18a.

次に、出力手段３５を説明する。出力手段３５は、重み付き単語情報データ１７ａおよび重み付き構成要素情報データ１８ａに基づいた評価結果を、出力装置５０に出力して、作成者に閲覧させる手段である。出力手段３５による評価結果の出力方法として、様々な方法が考えられる。例えば、表データとして一覧表示する方法や、プレビュー画面に評価結果を重畳する方法などが考えられる。 Next, the output means 35 will be described. The output means 35 is a means for outputting an evaluation result based on the weighted word information data 17a and the weighted component information data 18a to the output device 50 and allowing the creator to view it. Various methods can be considered as a method for outputting the evaluation result by the output means 35. For example, a method of displaying a list as tabular data, a method of superimposing evaluation results on a preview screen, or the like can be considered.

評価結果を表データとして一覧表示する場合、出力手段３５は、図１５に示す重み付き単語情報データ１７ａや、図１８に示す重み付き構成要素情報データ１８ａを、表形式で、出力装置５０に表示する。 When the evaluation results are displayed as a list of table data, the output unit 35 displays the weighted word information data 17a shown in FIG. 15 and the weighted component information data 18a shown in FIG. 18 on the output device 50 in a table format. To do.

プレビュー画面に評価結果を表示する場合の表示画面の一例を、図１９および図２０を参照して説明する。この場合出力手段３５は、文書データ１１ａに、評価結果を重畳した評価結果データ１９ａを生成し、この評価結果データ１９ａを出力装置５０に出力する。 An example of the display screen when the evaluation result is displayed on the preview screen will be described with reference to FIGS. 19 and 20. In this case, the output unit 35 generates evaluation result data 19a in which the evaluation result is superimposed on the document data 11a, and outputs the evaluation result data 19a to the output device 50.

図１９は、出力手段３５が、背景色で修正優先度を表現した結果表示画面Ｐ１０１の一例である。図１９に示す例において出力手段３５は、評価対象の単語および構成要素について、修正優先度に合わせて背景をつけた評価結果データ１９ａを出力する。文書データ１１ａがＨＴＭＬ形式の場合、出力手段３５は、文書データ１１ａに基づいて、タグを使って背景色を設定した評価結果データ１９ａを出力する。出力手段３５は、修正優先度の高い単語および構成要素については、誘目度の高い背景を設定し、修正優先度が下がるに従って、誘目度の低い背景を設定する。出力手段３５は、特に修正の必要のない単語および構成要素については、背景を重畳しない。 FIG. 19 is an example of a result display screen P101 in which the output unit 35 expresses the correction priority with the background color. In the example shown in FIG. 19, the output means 35 outputs evaluation result data 19a with a background according to the correction priority for the words and components to be evaluated. When the document data 11a is in the HTML format, the output unit 35 outputs the evaluation result data 19a in which the background color is set using a tag based on the document data 11a. The output means 35 sets a background with a high degree of attractiveness for words and components having a high correction priority, and sets a background with a low degree of attractiveness as the correction priority decreases. The output unit 35 does not superimpose the background on words and components that do not need to be particularly corrected.

出力手段３５は、評価結果データ１９ａに従って、文書データ１１ａのプレビュー画面にこれらの背景色を設定して表示する。従って、作成者は、背景色によって修正優先度を把握することができるので、プレビュー画面全体の流れを考慮しつつ、修正するべき項目を特定することができる。 The output means 35 sets and displays these background colors on the preview screen of the document data 11a according to the evaluation result data 19a. Therefore, the creator can grasp the correction priority based on the background color, and thus can specify an item to be corrected in consideration of the flow of the entire preview screen.

図２０は、出力手段３５が、構成要素の修正優先度を、記号で表現した結果表示画面Ｐ１０２の一例である。図２０に示す例において出力手段３５は、評価対象の構成要素について、修正優先度に合わせて記号を付した評価結果データ１９ａを出力する。出力手段３５は、修正優先度の高い構成要素については、「×」の記号を設定し、修正優先度が低い構成要素について「○」を設定する。 FIG. 20 is an example of a result display screen P102 in which the output unit 35 expresses the modification priority of the component by a symbol. In the example illustrated in FIG. 20, the output unit 35 outputs evaluation result data 19 a with symbols attached to the evaluation target components according to the correction priority. The output means 35 sets a symbol “x” for a component with a high modification priority, and sets “◯” for a component with a low modification priority.

出力手段３５は、評価結果データ１９ａに従って、文書データ１１ａのプレビュー画面にこれらの記号を付して表示する。このように修正優勢度を可視化することによって、作成者は、修正優先度を把握することができる。これにより作成者は、プレビュー画面全体の流れを考慮しつつ、修正するべき項目を特定したり、文書データ１１ａの改善に役立てることができる。また、図２０に示す例においては、文書データ１１ａ全体の修正優先度から算出された点数が表示されている。出力手段３５は例えば、すべての構成要素が「○」の場合、１００点と表示しても良い。なお、図１９および図２０を参照して説明した結果表示画面は一例であり、様々な結果表示画面が考えられる。 The output means 35 attaches and displays these symbols on the preview screen of the document data 11a according to the evaluation result data 19a. By visualizing the correction dominance in this way, the creator can grasp the correction priority. As a result, the creator can specify items to be corrected or can improve the document data 11a while considering the flow of the entire preview screen. Further, in the example shown in FIG. 20, the score calculated from the correction priority of the entire document data 11a is displayed. For example, when all the constituent elements are “◯”, the output means 35 may display 100 points. The result display screen described with reference to FIGS. 19 and 20 is an example, and various result display screens are conceivable.

本発明の実施の形態に係る文書生成支援装置１は、作成者が想定する読み手の閲覧の流れを指定することにより、文書データのうち、作成者が意図する、読ませたい範囲を指定する。このように指定された、作成者の想定する読み手の閲覧の流れを考慮して、文書生成支援装置１は、読み手が理解しやすいか否かについて、文書データに含まれる単語や構成要素を個々に評価する。これにより文書生成支援装置１は、作成者に、読み手の理解に障害となるような単語や構成要素を指摘することができる。作成者は、文書生成支援装置１によって指摘された単語や構成要素を修正することにより、読み手が理解しやすい文書データを生成することができる。 The document generation support apparatus 1 according to the embodiment of the present invention designates a range that the creator intends to read, by designating a reader's browsing flow assumed by the creator. In consideration of the reader's browsing flow assumed by the creator specified in this way, the document generation support device 1 individually determines the words and components included in the document data as to whether or not the reader can easily understand. To evaluate. As a result, the document generation support device 1 can point out to the creator a word or a constituent element that hinders the reader's understanding. The creator can generate document data that is easy for the reader to understand by correcting the words and components pointed out by the document generation support apparatus 1.

また本発明の実施の形態に係る文書生成支援装置１は、読み手の閲覧の流れを考慮することにより、文書データ１１ａ中の構成要素の単位で、修正優先度を決定することができる。従って文書生成支援装置１は、限られた紙面において表現された文書データについても、優先的に改善箇所を指摘することができる。 In addition, the document generation support apparatus 1 according to the embodiment of the present invention can determine the modification priority in units of components in the document data 11a by considering the flow of reading by the reader. Therefore, the document generation support apparatus 1 can preferentially point out an improved portion even for document data expressed on a limited page.

本発明の実施の形態に係る文書生成支援装置１は、読み手が理解しやすいか否かに関する文書データの評価を、コンピュータ処理により自動化することができる。これにより、専門家や被験者が文書データを確認する必要がないので、文書データの生成のコストを大幅に削減することができる。 The document generation support apparatus 1 according to the embodiment of the present invention can automate the evaluation of document data regarding whether or not a reader can easily understand by computer processing. Thereby, since it is not necessary for an expert or a test subject to confirm document data, the production cost of document data can be reduced significantly.

このように本発明の実施の形態に係る文書生成支援装置１は、理解容易性の高い文書データの生成を支援することができる。 As described above, the document generation support apparatus 1 according to the embodiment of the present invention can support generation of document data with high understanding.

（変形例）
ここで、閲覧順序取得手段３１の変形例に係る閲覧順序取得手段３１ａを説明する。本発明の実施の形態において閲覧順序取得手段３１は、作成者に、入力装置４０を使って閲覧の流れＦの軌跡を入力させたが、閲覧順序取得手段３１ａは、作成者に、構成要素ごとの閲覧の順序を入力させる場合を説明する。 (Modification)
Here, the browsing order acquisition means 31a which concerns on the modification of the browsing order acquisition means 31 is demonstrated. In the embodiment of the present invention, the browsing order acquisition unit 31 causes the creator to input the trajectory of the browsing flow F using the input device 40, but the browsing order acquisition unit 31a allows the creator to A case of inputting the browsing order of will be described.

作成者は、図２２に示すように、文書データ１１ａの構成要素ごとに示した想定する閲覧の流れを入力する。図２２に示す例では、まず、タイトルの「取扱説明書」を読ませ、次に、「概要」、「手順１」、「手順２」および「手順３」の順で、閲覧を想定していることを示す。「概要」、「手順１」、「手順２」および「手順３」の後続の括弧には、図３に示すソースデータの＜ｐ＞から＜／ｐ＞で区切られる文字列が入力される。ここで、この閲覧順序の入力方法は、図２２に示す例に限らない。閲覧順序取得手段３１ａが、構成要素の単位で、作成者が想定する読み手の閲覧の流れの順序がわかれば、どのように入力されても構わない。 As shown in FIG. 22, the creator inputs the assumed browsing flow shown for each component of the document data 11a. In the example shown in FIG. 22, first read the “Instruction Manual” of the title, and then assume browsing in the order of “Summary”, “Procedure 1”, “Procedure 2”, and “Procedure 3”. Indicates that In the parentheses following “Summary”, “Procedure 1”, “Procedure 2”, and “Procedure 3”, a character string delimited by <p> to </ p> of the source data shown in FIG. 3 is input. Here, the browsing order input method is not limited to the example shown in FIG. The browsing order acquisition unit 31a may be input in any way as long as the reading order of the reader assumed by the creator is known in units of components.

このように閲覧の流れが入力されると、閲覧順序取得手段３１ａは、閲覧順序データ１３ｂを生成する。閲覧順序データ１３ｂは、図６（ａ）に示すように、想定する閲覧の順序に従って、構成要素の識別子が列挙されたデータである。 When the browsing flow is input in this way, the browsing order acquisition unit 31a generates browsing order data 13b. As shown in FIG. 6A, the browsing order data 13b is data in which identifiers of constituent elements are listed according to an assumed browsing order.

文書データ１１ａは、左から右に文字が流れる横書きである。そこで変形例においては、想定閲覧座標は、閲覧の流れＦを考慮した、読み手の目にとまりやすい座標として、各構成要素の左上端の座標とする。 The document data 11a is horizontal writing in which characters flow from left to right. Therefore, in the modified example, the assumed browsing coordinates are the coordinates of the upper left corner of each component as the coordinates that are easily caught by the reader in consideration of the browsing flow F.

従って閲覧順序取得手段３１ａは、図２３に示すように、第１の構成要素Ｅ１の左上端の頂点を、第１の構成要素Ｅ１の想定閲覧座標Ｐ１とする。同様に閲覧順序取得手段３１ａは、第２の構成要素Ｅ２の想定閲覧座標Ｐ２、第３の構成要素Ｅ３の想定閲覧座標Ｐ３および第４の構成要素Ｅ４の想定閲覧座標Ｐ４を算出する。閲覧順序取得手段３１ａは、このように算出された想定閲覧座標Ｐ１ないしＰ４を、図７を参照して説明した想定閲覧座標データ１４ａとして記憶する。 Therefore, as shown in FIG. 23, the browsing order acquisition unit 31a sets the top left vertex of the first component E1 as the assumed browsing coordinate P1 of the first component E1. Similarly, the browsing order acquisition unit 31a calculates an assumed browsing coordinate P2 of the second component E2, an assumed browsing coordinate P3 of the third component E3, and an assumed browsing coordinate P4 of the fourth component E4. The browsing order acquisition unit 31a stores the assumed browsing coordinates P1 to P4 calculated in this way as the assumed browsing coordinate data 14a described with reference to FIG.

文書データ１１ａの形式によっても閲覧の流れも異なるので、想定閲覧座標は異なっても良い。例えば、文書データ１１ａが縦書き形式の場合、構成要素の右上端の頂点を、構成要素の想定座標としても良い。 Since the browsing flow differs depending on the format of the document data 11a, the assumed browsing coordinates may be different. For example, when the document data 11a is in the vertical writing format, the vertex at the upper right end of the component may be the assumed coordinate of the component.

このように、想定閲覧の流れを決定する方法および想定閲覧座標を決定する方法として、様々な方法が考えられる。 As described above, various methods are conceivable as a method for determining the flow of assumed browsing and a method for determining assumed viewing coordinates.

（その他の実施の形態）
上記のように、本発明の実施の形態および変形例によって記載したが、この開示の一部をなす論述および図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例および運用技術が明らかとなる。 (Other embodiments)
As described above, the embodiments and modifications of the present invention have been described. However, it should not be understood that the descriptions and drawings constituting a part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operational techniques will be apparent to those skilled in the art.

例えば、本発明の最良の実施の形態に記載した文書生成支援装置は、図１に示すように一つのハードウェア上に構成されても良いし、その機能や処理数に応じて複数のハードウェア上に構成されても良い。また、既存の情報処理システム上に実現されても良い。 For example, the document generation support apparatus described in the best embodiment of the present invention may be configured on a single piece of hardware as shown in FIG. 1, or a plurality of pieces of hardware depending on the functions and the number of processes. It may be configured above. Moreover, you may implement | achieve on the existing information processing system.

本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 It goes without saying that the present invention includes various embodiments not described herein. Therefore, the technical scope of the present invention is defined only by the invention specifying matters according to the scope of claims reasonable from the above description.

１文書生成支援装置
１０記憶装置
１１文書データ記憶部
１２対象単語容易度データ記憶部
１３閲覧順序データ記憶部
１４想定閲覧座標データ記憶部
１５単語情報データ記憶部
１６構成要素情報データ記憶部
１７重み付き単語情報データ記憶部
１８重み付き構成要素情報データ記憶部
１９評価結果データ記憶部
３０中央処理制御装置
３１閲覧順序取得手段
３２対象単語抽出手段
３３単語評価手段
３４構成要素評価手段
３５出力手段
４０入力装置
５０出力装置 DESCRIPTION OF SYMBOLS 1 Document production | generation assistance apparatus 10 Memory | storage device 11 Document data memory | storage part 12 Target word ease data storage part 13 Browsing order data memory | storage part 14 Assumed browsing coordinate data memory | storage part 15 Word information data memory | storage part 16 Component element information data memory | storage part 17 Weighted Word information data storage unit 18 Weighted component information data storage unit 19 Evaluation result data storage unit 30 Central processing control device 31 Browsing order acquisition unit 32 Target word extraction unit 33 Word evaluation unit 34 Component evaluation unit 35 Output unit 40 Input device 50 output device

Claims

A document generation support method for supporting generation of document data composed of a plurality of components,
Storing target word ease data in which a target word to be evaluated for document data is associated with the ease of the target word;
In the document data, browsing order data indicating a reader's browsing flow is acquired, and for each component, an assumed browsing coordinate which is a coordinate that is easily perceived by the reader in the browsing flow is calculated. Outputting data, and
The word that matches the target word in the component and the ease of the word are extracted, the assumed browsing coordinates in the component and the distance between the extracted words are calculated, and the word and the ease of the word And outputting word information data associated with the distance;
For each word of the word information data, the step of calculating the correction priority of the word and outputting the weighted word information data so that the correction priority is high for a word with low ease and a short distance. A document generation support method characterized by the above.

The document generation support method according to claim 1, wherein the step of outputting the weighted word information data calculates the modification priority in consideration of a display attribute of the word.

The document generation according to claim 1, further comprising: calculating a correction priority for each component based on the correction priority of the word and outputting weighted component information data. Support method.

A document generation support device that supports generation of document data composed of a plurality of components,
A target word ease data storage unit for storing target word ease data in which target words to be evaluated for document data are associated with the ease of the target words;
In the document data, browsing order data indicating a reader's browsing flow is acquired, and for each component, an assumed browsing coordinate which is a coordinate that is easily perceived by the reader in the browsing flow is calculated. Browsing order acquisition means for outputting data;
The word that matches the target word in the component and the ease of the word are extracted, the assumed browsing coordinates in the component and the distance between the extracted words are calculated, and the word and the ease of the word And target word extraction means for outputting word information data associated with the distance,
For each word of the word information data, a word evaluation means for calculating the correction priority of the word and outputting the weighted word information data so that the correction priority is high for a word with low ease and a short distance A document generation support apparatus comprising:

The document generation support apparatus according to claim 4, wherein the word evaluation unit calculates the modification priority in consideration of a display attribute of the word.

The component evaluation means which calculates the correction priority for every said component based on the said correction priority of the said word, and outputs weighted component information data, The further characterized by the above-mentioned. Document generation support device.

A content collection program for causing a computer to execute the steps according to any one of claims 1 to 3.