JPS6022227A

JPS6022227A - European text processor

Info

Publication number: JPS6022227A
Application number: JP58131455A
Authority: JP
Inventors: Akira Sakurai; 彰桜井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-07-19
Filing date: 1983-07-19
Publication date: 1985-02-04

Abstract

PURPOSE:To improve the transmission efficiency by detecting words and the start/end of a sentence out of a text containing only input small letters and converting the head alphabet of a word and the head alphabets of the words set immediately the start and the end of a sentence into capital letters. CONSTITUTION:A text containing only small letters given from a character recognizer, etc. is fed to an input terminal 1, and the extracted 2 words are stored to a word memory 4 at a character output part 3 as well as to a proper noun detecting part 5. The part 5 retrieves a proper noun table 6 with the extracted words to detect the proper nouns in an input text. Then the part 5 delivers a detection signal to a character conversion part 9 when the proper noun is detected. At the same time, the sentence start/end detecting parts 7 and 8 detect the start and end of a stentence respectively and sends a detection signal to the part 9. The part 9 converts the head alphabet of the detected proper noun and the word extracted at the end or immediately after the end of a sentence into capital alphabets and outputted 10.

Description

【発明の詳細な説明】〔技術分野〕本発明は、大文字または小文字のみから成る英文テキス
トなどを、大文字小文字混りのテキストに自動的に変換
する欧文テキスト処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a Roman text processing device that automatically converts English text consisting only of uppercase or lowercase letters into text containing a mixture of uppercase and lowercase letters.

[Prior art]

文字認識装置において、文字単位で文字認識を行う場合
、０、Ｐ、Ｓなどの文字は大文字と小文字の識別が極め
て困難であり、誤認識する確率が高い。そこで従来は、
文字単位の認識の後処理に　。In a character recognition device, when character recognition is performed character by character, it is extremely difficult to distinguish uppercase and lowercase characters such as 0, P, and S, and there is a high probability of erroneous recognition. Therefore, conventionally,
For post-processing of character-by-character recognition.

おいて、文脈などを利用し誤認識文字の修正を行ってい
る。しかし、文字認識装置が高価になる割には、大文字
と小文字の認識エラーを十分減らすことができないとい
う問題があった。In this process, misrecognized characters are corrected using context and other factors. However, although the character recognition device is expensive, there is a problem in that it cannot sufficiently reduce errors in recognizing uppercase and lowercase letters.

もし、大文字または小文字のみから成る英文テキストな
どを、大文字小文字混りのテキストに自動的に変換する
手段が得られれば、文字を大文字または小文字のみとし
て認識するように文字認識装置を構成することができ、
装置価格を下げ得ると共に大文字と小文字の認識エラー
を減らせる可能性がある。また、英文テキストなどを伝
送するシステムにおいても、大文字または小文字のみか
ら成るテキストを送信し、受信側で大文字小文字混りテ
キストに変換できるから、伝送効率を改善できる可能性
がある。If a means could be obtained to automatically convert English text consisting only of uppercase or lowercase letters into mixed case text, it would be possible to configure a character recognition device to recognize characters as only uppercase or lowercase letters. I can do it,
It has the potential to lower device costs and reduce case recognition errors. Furthermore, even in systems that transmit English text, it is possible to transmit text consisting only of uppercase or lowercase letters and convert it to mixed case text on the receiving side, potentially improving transmission efficiency.

〔the purpose〕

本発明はこのような点に鑑みてなされたものであり、そ
の目的は、大文字または小文字のみがら成る英文テキス
トなどを大文字小文字混りテキストに自動的に変換する
欧文テキスト処理装置を提供することにある。The present invention has been made in view of these points, and its purpose is to provide a Roman text processing device that automatically converts English text consisting only of uppercase or lowercase letters into mixed case text. be.

〔Example〕

本発明の一実施例を第１図によって説明する。 An embodiment of the present invention will be described with reference to FIG.

文字認識装置、テレックス、コンピュータなどから小文
字（または大文字）のみから成るテキストが入力端子１
に入力される。単語抽出部２は入力テキストから単語を
抽出し、抽出した単語を小文字のコード列として出力す
る。抽出された単語は文字出力部３の単語メモリ４に格
納されるとともに、固有名詞検出部５に入力される。Text consisting only of lowercase (or uppercase) letters from a character recognition device, telex, computer, etc. is input to input terminal 1.
is input. The word extractor 2 extracts words from the input text and outputs the extracted words as a lowercase code string. The extracted words are stored in the word memory 4 of the character output section 3 and are also input to the proper noun detection section 5.

６は固有名詞を格納した固有名詞テーブルである。固有
名詞検出部５は抽出された単語について固有名詞テーブ
ル６を検索することにより、入力テキスト中の固有名詞
の検出を行い、検出した場合に検出信号を出力する。７
は入力テキストからの文の始り（ＳＴＸ）を検出する文
始り検出部であり、文の始りを検出すると、検出信号を
出力する。８は入力テキストの終り（ピリオド、疑問符
、感嘆符など）を検出する交絡り検出部であり、検出す
ると検出信号を出力する。6 is a proper noun table storing proper nouns. The proper noun detection unit 5 detects proper nouns in the input text by searching the proper noun table 6 for the extracted words, and outputs a detection signal when a proper noun is detected. 7
is a sentence start detection unit that detects the start of a sentence (STX) from the input text, and outputs a detection signal when the start of a sentence is detected. Reference numeral 8 denotes a confounding detection unit that detects the end of input text (period, question mark, exclamation mark, etc.), and outputs a detection signal when detected.

上記固有名詞検出部５、文始り検出部７、交絡り検出部
８から出力される各検出信号はテキスト出力部３中の大
文字変換部９に入力される。この大文字変換部９は、単
語抽出部２によって抽出され単語メモリ４に格納された
単語のうち、固有名詞検出部５によって検出された固有
名詞、文始り検出部→によって文始りが検出された直後
に抽出された単語、あるいは交絡り検出部８によって交
絡りが検出された直後に抽出された単語の先頭文字のみ
を大文字に変換する。この変換後の単語が出力端１０よ
りプリンタなどへ送出される。Each detection signal output from the proper noun detection section 5, sentence start detection section 7, and entanglement detection section 8 is input to the uppercase conversion section 9 in the text output section 3. The uppercase conversion unit 9 detects the proper noun detected by the proper noun detection unit 5 and the beginning of a sentence by the sentence start detection unit → from among the words extracted by the word extraction unit 2 and stored in the word memory 4. Only the first letter of the word extracted immediately after the word is extracted, or the word extracted immediately after the entanglement is detected by the entanglement detection unit 8, is converted into an uppercase letter. The converted words are sent from the output terminal 10 to a printer or the like.

変換例を第２図に示す。この図の（ａ）は入力テキスト
であり、これは（ｂ）に示すような出力テキストに変換
される。すなわち文の先頭文字や、固有名詞の先頭文字
（第２図の円で囲んだ文字）は大文字に変換される。An example of conversion is shown in FIG. In this figure, (a) is an input text, which is converted into an output text as shown in (b). That is, the first letter of a sentence or the first letter of a proper noun (the circled letter in FIG. 2) is converted to an uppercase letter.

本発明の他の実施例を第３図によって説明する。Another embodiment of the present invention will be described with reference to FIG.

この図において、１〜１１は前記実施例の対応部と同一
であるので、その説明は省略し、それ以外の部分のみに
ついて以下説明する。In this figure, since 1 to 11 are the same as the corresponding parts in the previous embodiment, their explanation will be omitted, and only the other parts will be explained below.

１１はイタリック書体で印字すべき単語を登録したイタ
リック単語テーブルである。イタリック検出部１２は、
出力端子１０より送出される単語についてイタリック単
語テーブル１１を検索し、イタリック書体で印刷すべき
単語の場合は検出信号をＣＧセレクタ１３に与える。１
４はローマン体の文字パターンを発生するローマン体キ
ャラクタゼネレータ、１５はイタリック体の文字パター
ンを発生するイタリック体キャラクタゼネレータである
。ＣＧセレクタ１３は、イタリック検出部１２から検出
信号が与えられない時は、出力端子１０より送出される
単語の各文字コードに対応するローマン体文字パターン
をローマン体キャラクタゼネレータ１４によって発生さ
せ、それを印字バッファ１６に出力するが、イタリック
検出部１２より検出信号が与えられた時は、単語の各文
字コードに対応するイタリック体の文字パターンをイタ
リック体キャラクタゼネレータ１５によって発生させ、
それを印字バッファ１６に出力する。印字バッファ１６
に得られた文字パターンデータは端子１７よりプロッタ
などへ出力される。11 is an italic word table in which words to be printed in italic font are registered. The italic detection unit 12 is
An italic word table 11 is searched for the word sent from the output terminal 10, and if the word should be printed in an italic font, a detection signal is given to the CG selector 13. 1
4 is a roman character generator that generates a roman character pattern, and 15 is an italic character generator that generates an italic character pattern. When the CG selector 13 is not given a detection signal from the italic detector 12, the CG selector 13 causes the Roman character generator 14 to generate a Roman character pattern corresponding to each character code of the word sent from the output terminal 10. When a detection signal is given from the italic detector 12, the italic character generator 15 generates an italic character pattern corresponding to each character code of the word.
It outputs it to the print buffer 16. Print buffer 16
The character pattern data obtained is outputted from the terminal 17 to a plotter or the like.

このように本実施例によれば、特定の単語を異書体に変
換することができる。As described above, according to this embodiment, a specific word can be converted into an allograph.

〔effect〕

前記各実施例に示したように、本発明によれば、大文字
または小文字のみから成る欧文テキストを大文字小文字
混りのテキストに自動的に変換することができる。従っ
て、本発明による欧文テキスト処理装置を、たとえば文
字認識装置の後処理装置として用いれば、大文字または
小文字のみ文字単位で認識するように文字認識装置を構
成することができ、安価でかつ大文字と小文字の認識エ
ラーを発生しない文字認識装置を実現できる。また、本
発明による欧文テキスト処理装置をデータ転送システム
の受信端末に設ければ、送信側端末では大文字または小
文字のみから成るテキストを転送すればよくなり、送信
端末を簡略化でき、また伝送効率を上げることができる
。As shown in the embodiments described above, according to the present invention, a Roman text consisting only of uppercase or lowercase letters can be automatically converted into a text containing mixed uppercase and lowercase letters. Therefore, if the Roman text processing device according to the present invention is used, for example, as a post-processing device for a character recognition device, the character recognition device can be configured to recognize only uppercase or lowercase letters on a character-by-character basis. It is possible to realize a character recognition device that does not generate recognition errors. Furthermore, if the Roman text processing device according to the present invention is installed at the receiving terminal of a data transfer system, the sending terminal only needs to transfer text consisting of uppercase or lowercase letters, which simplifies the sending terminal and improves transmission efficiency. can be raised.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す概略ブロック図、第２
図は入力テキストと出力テキストの例を示す図、第３図
は本発明の他の実施例を示す概略ブロック図である。FIG. 1 is a schematic block diagram showing one embodiment of the present invention, and FIG.
The figure shows an example of input text and output text, and FIG. 3 is a schematic block diagram showing another embodiment of the invention.

Claims

[Claims]

(1) A Roman text processing device that converts a Roman input text consisting only of uppercase or lowercase letters into a mixed case text, which includes a word extraction unit that extracts words from the input text, and a word extraction unit that extracts words from the input text, and a sentence start part that extracts words from the input text. a sentence start detection unit that detects the end of a sentence from an input text, a proper noun table that stores proper nouns, and a proper noun table that stores the words extracted by the word extraction unit. A proper noun detection unit that detects proper nouns by searching, and a word extracted immediately after the beginning of a sentence is detected by the sentence start detection unit among the words extracted by the word extraction unit or the word above. For words extracted immediately after the end of a sentence is detected by the end-of-sentence detector, only the first letter of the word is output as uppercase and the remaining letters are output as lowercase.For other words, all letters are output as lowercase.Text output A Roman text processing device having a section.