JPS63245760A

JPS63245760A - Document shaping device

Info

Publication number: JPS63245760A
Application number: JP62077465A
Authority: JP
Inventors: Miyoshi Fukui; 美佳福井; Miwako Doi; 美和子土井; Isamu Iwai; 岩井　勇
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1987-04-01
Filing date: 1987-04-01
Publication date: 1988-10-12

Abstract

PURPOSE:To easily read out a sentence by feeding a line at a position in the sentence where is not unnatural in both pronunciation and meaning. CONSTITUTION:An original sentence is sent to a document management part 3 or the original sentence is called from an original sentence storing part 2 and sent to a document management part 3 when the original sentence has been already inputted. The document management part 3 sends the original sentence to a double-line detecting part 4. The detecting part 4 divides the sentence in each clause based on the contents of a clause decision rule dictionary 5, detects a position where a clause is written in two lines and determines a position to be shaped and the number of characters to be processed. A shaping part 6 shapes the document based on the position to be shaped and the number of characters to be processed determined by the double-line detecting part 4. The shaped sentence is displayed on a display part 8 by a display control part 7.

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）本発明は文書を表示（印字）する際に文節単位に禁則処
理を行う文書整形装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Field of Application) The present invention relates to a document formatting device that performs prohibition processing for each phrase when displaying (printing) a document.

（従来の技術）文書を表示（印字）する際に、−行の文字数を決めて機
械的に区切ると、手書きの文書ではありえないような位
置での改行が起こる場合がある。そこで、改行が読み易
い位置で起こるように、文書を整形することが必要にな
ってくる。(Prior Art) When displaying (printing) a document, if the number of characters in a - line is determined and mechanically separated, line breaks may occur at positions that would be impossible in a handwritten document. Therefore, it is necessary to format the document so that line breaks occur at readable positions.

現在、一般的な表示（印字）システムでは、句読点や、
とじかっこ等の行頭禁止文字や、はじまりかっこ等の行
末禁止文字等を禁則文字として登録し、表示する際に行
末と行頭の文字をこの禁則文字と照合する方法で検出し
、禁則処理を行っている。また、一部のシステムでは行
末と行頭に分離して、行渡りをしている数字列を検出し
、禁則処理を行っているものもある。しかし、現代の和
文書に多く出現する英単語等の英字列や記号列は扱って
おらず、肝心の日本文の語句についても何も行っていな
いので、はなはだ不充分である。英文では、英単語を検
出して整形処理を行っているシステムがあるが、単語の
間のスペースによって検出する方法は日本文では適用で
きない。Currently, in general display (printing) systems, punctuation marks,
Characters that are prohibited at the beginning of a line, such as closing brackets, and characters that are prohibited at the end of a line, such as an opening parenthesis, are registered as prohibited characters, and when displayed, the characters at the end of a line and the beginning of a line are detected by matching these prohibited characters, and prohibited processing is performed. There is. In addition, some systems separate the end of a line and the beginning of a line to detect numeric strings that cross lines, and perform Kinoku processing. However, it does not deal with strings of letters and symbols such as English words that often appear in modern Japanese documents, and does not do anything about the important words and phrases of Japanese sentences, so it is extremely inadequate. In English, there are systems that detect English words and perform formatting processing, but the method of detection based on spaces between words cannot be applied to Japanese sentences.

（発明が解決しようとする問題点）このように、従来の文書整形装置では、−行の文字数を
決めて表示する際に、禁則文字や、数字列の行渡りに対
する整形処理しか行われていなかった。そのため、肝心
の日本文の語句や英記号列等の行渡りによって、文書の
意味が取りにくいことが多く、その点の読み易さや美し
さにおいて、手書き文書に劣っていた。(Problems to be Solved by the Invention) As described above, in conventional document formatting devices, when determining and displaying the number of characters in a line, only formatting processing is performed for illegal characters and line-crossing of numeric strings. Ta. As a result, it is often difficult to understand the meaning of the document due to the use of important Japanese phrases and strings of English symbols, and in this respect, it is inferior to handwritten documents in terms of readability and beauty.

そこで本発明は、文節を一単位とする語句が行渡りしな
いように整形処理することによって、意味がわかりやす
く、読み易い文書を表示（印字）することを目的とする
。Therefore, an object of the present invention is to display (print) a document whose meaning is easy to understand and easy to read by performing a formatting process so that words and phrases having a phrase as a unit are not spread all over the place.

[Structure of the invention]

（問題点を解決するための手段）本発明は、文書データを入力するための入力手段と、文
を文節に区切るための手段と、前記手段により区切られ
た文節が行渡りしないように文書を整形する整形処理手
段と、前記整形処理手段によって整形された文書を表示
する表示手段とを具備したことを特徴とするものである
。(Means for Solving the Problems) The present invention includes an input means for inputting document data, a means for dividing a sentence into clauses, and a method for dividing a document so that the clauses divided by the means do not cross over. The apparatus is characterized by comprising a formatting processing means for formatting, and a display means for displaying the document formatted by the formatting processing means.

（作　用）本発明は、計算機が文書を表示する場合に、文節を一単
位とする語句が行渡りする箇所を検出し、整形処理がで
きるようにする。(Function) When a computer displays a document, the present invention allows a computer to detect a place where phrases each having a phrase as a unit are distributed, and perform formatting processing.

（実施例）第１図はこの発明の一実施例を示すブロック図である。(Example) FIG. 1 is a block diagram showing one embodiment of the present invention.

原文は入力部１あるいはすでに入力されている場合は原
文記憶部２から呼び出されて、文書管理部３に送られる
。The original text is called from the input section 1 or from the original text storage section 2 if it has already been input, and sent to the document management section 3.

文書管理部３は、この原文を（改行コードまでを一文と
して）行渡り検出部４へ送る。The document management section 3 sends this original text (including the line feed code as one sentence) to the line crossing detection section 4.

行渡り検出部４では、文節判断規則辞書５の内容に基づ
いて、文を文節に区切り、−文節の語句が二行に渡って
いる箇所を検出して、整形処理すべき箇所と処理文字数
を決定する。The line-crossing detection unit 4 divides the sentence into clauses based on the contents of the clause judgment rule dictionary 5, detects the places where the words of the clauses extend over two lines, and determines the places to be formatted and the number of characters to be processed. decide.

整形処理部６は、行渡り検出部４で決定された整形処理
箇所と処理文字数に基づいて、文書を整形処理する。The formatting processing unit 6 formats the document based on the formatting processing location and the number of characters to be processed determined by the line crossing detection unit 4.

ここで生成された整形文は表示制御部７により、表示部
８に表示される。The formatted sentence generated here is displayed on the display section 8 by the display control section 7.

以下に、第２図（、）の例にそって、本発明の動作を示
す。行渡り検出部は、例えば、第３図（ａ）に示すよう
な行渡り検出アルゴリズムに従って、入力された改行コ
ードまでの一文を文節に区切る。The operation of the present invention will be described below with reference to the example shown in FIG. 2(,). The line crossing detection unit divides the sentence up to the input line feed code into clauses, for example, according to a line crossing detection algorithm as shown in FIG. 3(a).

３一区切った文節一つ一つの文字数を順に配列Ｗ（ト）。31 Arrange the number of characters in each separated clause in order W(g).

Ｗ■、・・・に記憶する。（アルゴリズム■参照）第２
図の例では（ｂ）のようにＷ■からＷｏまで値が記憶さ
れる。なお総文字数５Ｗ＝２６である。Store in W■, . (Refer to algorithm ■) Second
In the example shown in the figure, values from W■ to Wo are stored as shown in (b). Note that the total number of characters 5W=26.

そこで、表示される時、行末にあたる文字と次行頭にあ
たる文字が同一文節にある場所を探していく。アルゴリ
ズムの■に示すように、−行のカラム数ＣからＷ■、Ｗ
■、・・・を順に引いていき、Ｃ＝φになれば行末が文
節の区切れと一致しているので次行へすすみ、Ｃがφに
ならずにマイナスになる場合は、行渡りであると判断さ
れる。例ではＣｕχｌａｍ＝１８なので、Ｗｏを引いた
時点でＣ＝−５となり、Ｗｏの示す文節「ブロック図を
」が行渡りをしていることがわかる。Therefore, when displayed, we search for locations where the character at the end of a line and the character at the beginning of the next line are in the same clause. As shown in ■ of the algorithm, - from the number of columns in the row C to W■, W
■, ... in order, and if C = φ, the end of the line matches the break of the bunsetsu, so proceed to the next line, and if C does not become φ but becomes negative, it is a line crossing. It is determined that there is. In the example, Cuxlam=18, so when Wo is subtracted, C=-5, and it can be seen that the phrase "block diagram" indicated by Wo is overlapping.

この行渡りを整形するための処理文字数と整形箇所をア
ルゴリズムの■によって決定する。行渡りをした文節の
前部分（行末になる部分）の文字数をＴｏｐ　、後部分
（次行の行頭になる部分）の文字数をＥｎｄとすると、
この行に対してＴｏｐ文字増加して、この文節を次行へ
追い出すか、Ｅｎｄ文字引き込むかのどちらかの処理を
選択する必要がある。第３図（ｂ）に示すように、前者
の場合、整形箇所は行頭（ｅｆｆ＋１）文字目から行渡
り文節の直前（ｓ−ｗ（ｉ））文字目までとなり、後者
の場合、行渡り文節も含めて、Ｓ文字目までとなる。本
例ではＴｏｐは「ブ」のみの１で、Ｅｎｄは「ロック図
を」で、５となる。The number of characters to be processed and the position to be formatted for formatting this line transition are determined by the algorithm (■). Let Top be the number of characters in the front part (the part that becomes the end of the line) of the bunsetsu that crossed lines, and let End be the number of characters in the back part (the part that becomes the beginning of the next line).
It is necessary to select either one of adding the Top character to this line and expelling this phrase to the next line, or pulling in the End character. As shown in Figure 3(b), in the former case, the formatting point is from the first (eff+1) character of the line to the character immediately before the line-crossing clause (s-w(i)); in the latter case, the line-crossing clause Including, up to the S-th character. In this example, Top is 1 with only "B", and End is 5 with "Rock Diagram".

処理文字数の少ない方を選択することにより、Ｔｏｐ＝
１が選ばれ、整形箇所は１文字目から１７７文字目でと
決定される。By selecting the one with the smaller number of characters to process, Top=
1 is selected, and the formatting location is determined to be from the 1st character to the 177th character.

整形処理部では、この整形処理箇所に、決定された処理
文字数の整形処理を行う。例えば、一般の禁則処理のよ
うに「−文字追い出し」を行い、第２図（Ｑ）に示すよ
うな整形文が生成される。The formatting processing section performs formatting processing for the determined number of characters to be processed at this formatting processing location. For example, ``--character removal'' is performed like a general prohibition process, and a formatted sentence as shown in FIG. 2 (Q) is generated.

また、ディスプレイやプリンタの能力によっては、整形
箇所の文字間隔を操作する、バリアプル・ピッチにより
整形処理を行うことができる。Furthermore, depending on the capabilities of the display or printer, the formatting process can be performed using variable pitch, which manipulates the character spacing at the formatting location.

さらに、文そのものを校正することによる受は点の挿入
や語句の変更、全角の英数記号列を半角にする等のさま
ざまな処理法が考えられる。Furthermore, various processing methods can be considered by proofreading the sentence itself, such as inserting dots, changing words, and converting full-width alphanumeric character strings to half-width.

次に、第４図（ａ）の例における本発明の動作を示す。Next, the operation of the present invention in the example of FIG. 4(a) will be described.

この−文をカラム数２２で表示すると、第４図（ｂ）に
示されるように第二行と第二行の行末二ケ所で行渡りが
起こることになる。第３図（ａ）のアルゴリズムはこれ
らを上から検出し、検出されたものから順に整形箇所と
処理文字数を決定する逐次検出法によるので、まず、Ｗ
　（１０）の行渡りを検出する。ただし、これに対する
整形箇所に第−行は含めない。第−行から整形すると、
−行目の行末と文節の切れめがずれて、行渡りをおこす
ことになるからである。やむをえず第−行も整形箇所と
する場合は、Ｗ■のｒＩＸＭで」を追い出すか、Ｗｏの
「実行されるかを」を引き込むか、すなわちＴｏｐ＝　
４　、　Ｅｎｄ＝　７と考えて、処理文字数を決定すべ
きである。ここでは、この方法は使わず、整形箇所は第
二行のみとする。そこで処理文字数＋２とすれば第４図
（ｃ）、処理文字数−６とすれば第４図（ｄ）に、それ
ぞれ点線で示されるような整形箇所が決定される。前者
を選択すると次の行渡りはＷ（１２）のｒＩＸＭコント
ローラに」でＴｏｐ＝７゜Ｅｎｄ＝　３　、後者はＷ　
（１３）の「受は付けられる。」でＴｏｐ＝　５　、　
Ｅｎｄ＝　３となり、それぞれの整形箇所と処理文字数
は第４図（ｅ）、（ｆ）、（ｇ）、（ｈ）のように決定
されるはずである。If this - sentence is displayed using 22 columns, line crossings will occur at two locations, one at the second line and the other at the end of the second line, as shown in FIG. 4(b). The algorithm in Figure 3(a) uses a sequential detection method that detects these from above and determines the formatting location and the number of characters to be processed in order from the detected ones.
(10) Detect the crossing. However, the -th line is not included in the formatting area for this. If you format from the -th line,
- This is because the end of the line and the end of the clause are misaligned, causing line crossing. If you have no choice but to use the -th line as a formatting point, either remove "in rIXM of W" or bring in "Will it be executed" in Wo, that is, Top=
4, End=7, and the number of characters to be processed should be determined. This method is not used here, and only the second line is formatted. Therefore, if the number of characters to be processed is +2, the shaping locations are determined as shown in FIG. 4(c), and if the number of characters to be processed is -6, the shaping locations are determined as shown in FIG. 4(d), respectively, as shown by dotted lines. If you select the former, the next line will be the rIXM controller of W(12), with Top=7°End=3, and the latter will be the W(12) rIXM controller.
Top = 5 in (13) “The receipt is added.”
End=3, and the respective formatting locations and the number of characters to be processed should be determined as shown in FIGS. 4(e), (f), (g), and (h).

本アルゴリズムでは処理文字数の少ない方を選択するの
で、まず（ｃ）が選択され、（ｆ）のみが生成される。In this algorithm, the one with the smaller number of characters to be processed is selected, so (c) is selected first, and only (f) is generated.

実際の処理においては、（ｅ）、（ｆ）、（ｇ）、（ｈ
）の４つの整形箇所と整形処理文字数の組をすべて生成
し、それぞれに対する整形文を作成、表示し、ユーザに
よる選択指示で最適整形文を決定するアルゴリズムも考
えられるし、整形処理法によっては、整形処理のしやす
い方をシステムが選択して決定してもよい。In actual processing, (e), (f), (g), (h
), an algorithm can be considered that generates all four sets of formatting points and number of characters to be formatted, creates and displays formatted sentences for each, and determines the optimal formatted sentence based on selection instructions from the user.Depending on the formatting method, The system may select and determine the one that is easier to format.

また、この文節単位の行渡り検出に、従来の禁則を加え
るだけでなく、第５図（ａ）に示すような、参考文献等
の参照を表わす添字、（ｂ）に示すような桁の多い数の
数字列、（ｃ）に示すような式の取り扱い、特に（ｄ）
のような分数におけるページ渡り等、整形処理すべき特
例は、別に検出規則を設けて整形処理することも可能で
ある。In addition, in addition to adding the conventional prohibitions to the detection of passages in phrase units, we also add subscripts that indicate references to references, etc., as shown in Figure 5 (a), and characters with many digits as shown in Figure 5 (b). Handling of numerical sequences and expressions as shown in (c), especially (d)
Special cases that should be formatted, such as page transitions in fractions, can be formatted by setting a separate detection rule.

〔Effect of the invention〕

本発明によれば、文を発音上と意味上の両方からみて不
自然でない所で改行することにより、音読し易いことは
もちろん、黙読においても意識上で音読していることが
多いため、読み易くなり、文意もつかみやすくなる。ま
た、そのような人間の考える自然な文書に近づけるよう
な禁則処理が、ユーザの手をわずられせずに自動的に行
えるようになる。According to the present invention, by starting a line at a place that is not unnatural from both a pronunciation and a semantic point of view, it is not only easier to read aloud, but even when reading silently, many people read aloud consciously. It becomes easier to understand the meaning of the sentence. In addition, such constraint processing that brings the document closer to what humans think of as a natural document can be automatically performed without the user's intervention.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す全体構成図、第２図は
入力文書の一例を示す図、第３図は行渡り検出アルゴリ
ズムとその動作を示す図、第４図は入力文書の別個を示
す図、第５図は別種の行渡りの例を示す図である。FIG. 1 is an overall configuration diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of an input document, FIG. 3 is a diagram showing a line crossing detection algorithm and its operation, and FIG. 4 is a diagram showing an example of an input document. FIG. 5 is a diagram showing an example of different types of passing.

Claims

[Claims]

An input means for inputting document data, a means for dividing sentences in the document data into clauses, a formatting means for formatting the document so that the clauses divided by the means do not overlap, and the formatting. 1. A document formatting device comprising: display means for displaying a document formatted by a processing means.