JPS63140329A

JPS63140329A - Sentence reading system

Info

Publication number: JPS63140329A
Application number: JP61286025A
Authority: JP
Inventors: Fukami Kamiyama; 神山　ふかみ
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-12-02
Filing date: 1986-12-02
Publication date: 1988-06-11

Abstract

PURPOSE:To read aloud a sentence smoothly without being conscious of fluctuation in a naturalized word, by providing a conversion part which convert the naturalized word having the fluctuation to a standard for. CONSTITUTION:When the naturalized word exists in a text inputted from a sentence input part 11, and it is not the standard form, it is converted to the standard form at the conversion part 121, and a word dictionary 16 is retrieved. In other words, when the text inputted from the input part 11 is the naturalized word having the fluctuation, a dictionary retrieval part 12 sends it to the conversion part 121. The conversion part 121 converts it to the reference form by a prescribed conversion rule, and retrieves the word dictionary. A word identification part 13 selects an optimum word string based on a retrieved result, and generates a reading string by a reading string storage part 14, and performs voice synthesis at a voice output part, then, outputs voice. In such way, it is possible to read aloud the sentence without being conscious of the fluctuation of the naturalized word.

Description

【発明の詳細な説明】〔概　要〕本発明は日本語情報処理装置における文章読み上げ方式
において、表記された文章中のゆらぎのある外来語によ
る単語辞書検索、単語同定を容易にするため、ゆらぎの
ある外来語を標準形に変換する変換部を設け、標準形に
変換された外来語について単語辞書検索を行い、単語同
定するようにしたものである。[Detailed Description of the Invention] [Summary] The present invention uses fluctuations in a text reading method in a Japanese information processing device to facilitate word dictionary searches and word identification using foreign words with fluctuations in written sentences. A conversion unit is provided to convert a certain foreign word into a standard form, and a word dictionary search is performed on the foreign word converted to the standard form to identify the word.

[Industrial application field]

本発明は日本語情報処理装置に関し、特に、表記された
文章を入力し、その文字列中の単語を識別するための文
章解析を行い音声出力する文章読み上げ方式に関する。The present invention relates to a Japanese language information processing device, and more particularly to a text reading method that inputs written text, analyzes the text to identify words in the character string, and outputs the text aloud.

〔従来の技術及び発明が解決しようとする問題点〕日本
語情報処理において、単語、文節、句等の文字列からな
る単語等の表記部分とその読み方、文法等を格納した単
語辞書を有し、表記された文章をこの単語辞書を参照し
て単語等を固定し、読み列を生成する文章読み上げ方式
は既に知られている。[Prior art and problems to be solved by the invention] In Japanese information processing, there is a word dictionary that stores the notation parts of words, etc. consisting of character strings such as words, phrases, phrases, etc., their pronunciation, grammar, etc. , a sentence reading method is already known in which a written sentence is referred to in this word dictionary, words, etc. are fixed, and a reading sequence is generated.

この場合、表記された文章中に、いわゆる“ゆらぎ”の
あるカタカナ外来語、例えば、バイオリンに対してヴァ
イオリン、ウィスキーに対してウィスキー等が含まれる
ことがある。即ち、外来語は主として欧米語から国語に
取り入れたことばであり、日本語の音にない原音をどの
ように書き表わすかが問題となり、書き表わし方は語に
よって必ずしも一定しておらず、又、人によっても様々
なためである。In this case, the written sentence may include katakana loanwords with so-called "fluctuations", such as "violin" for violin, "whiskey" for whiskey, etc. In other words, foreign words are mainly words that have been introduced into the Japanese language from Western languages, and the problem is how to write the original sounds that do not exist in Japanese, and the way they are written is not necessarily the same depending on the word. This is because it varies depending on the person.

このように書き表わし方がまちまちな外来語を許す日本
語にあって、機械翻訳、文書メールの読み上げ、データ
ベース検索の出力、読書機等の文章解析として外来語の
単語処理技術の早期開発が重要な課題となっている。As the Japanese language allows foreign words to be written in different ways, it is important to develop word processing technology for foreign words as early as possible for machine translation, text-mail reading, database search output, and text analysis for reading devices. This has become a major issue.

第３２図は従来の文章読み上げ方式の構成図である。第
３図において、３１は文章入力部、３２は辞書検索部、
３３は単語同定部、３４は読み列格納部、３５は音声出
力部、そして３６は単語辞書である。辞書検索部３２と
単語同定部３３により文章解析部３７を構成する。FIG. 32 is a block diagram of a conventional text reading system. In FIG. 3, 31 is a text input section, 32 is a dictionary search section,
33 is a word identification section, 34 is a reading sequence storage section, 35 is an audio output section, and 36 is a word dictionary. The dictionary search section 32 and the word identification section 33 constitute a text analysis section 37.

文章解析部３７は例えばキーボード等の文章入力部３１
から入力された文章を、単語辞書３６がら参照して候補
単語をピックアップし単語同定部３３で最適な単語の組
合せを選択する。単語辞書３６には単語の表記とその読
み方、文法、アクセント等が格納されており、それらか
ら読み列格納部３４により読み列をつくり、音声出力部
３５で音声合成を行って音声を出力する。The text analysis unit 37 includes a text input unit 31 such as a keyboard, for example.
The word dictionary 36 is used to refer to the input sentence, and candidate words are picked up, and the word identification section 33 selects the optimal combination of words. The word dictionary 36 stores word notations, their pronunciations, grammar, accents, etc. From these, a pronunciation sequence is created by a pronunciation storage section 34, and a speech output section 35 performs speech synthesis and outputs speech.

上記の説明から明らかな如く、書き方が一定でない外来
語のために、すべてのバリエーションが単語辞書に登録
されていないと、辞書中に登録されていない外来語が文
章中に存在した場合、単語同定に失敗し文章をなめらか
に読み上げることができなくなる。例えば、辞書中には
「データー」、「ウィスキー」、「バリエーション」と
登録されており文章の方には「データ」、「ウィスキー
」、「ヴアリエーション」という形で表記されていた場
合、前記の如く単語同定は失敗する。As is clear from the above explanation, if all variations of a loan word are not registered in the word dictionary because the writing style is not fixed, if a loan word that is not registered in the dictionary exists in a sentence, word identification will be difficult. This makes it difficult to read sentences smoothly. For example, if the words ``data,''``whiskey,'' and ``variation'' are registered in the dictionary, but the sentences are written as ``data,''``whiskey,'' and ``variation,'' then Like this, word identification fails.

[Means for solving problems]

本発明は上述した問題点を解消した文章読み上げ方式を
提供することにあり、その手段は、カタカナ外来語の書
き方の標準形を予め設定しておき、表記した文章中のゆ
らぎのあるカタカナ外来語につい、て所定の変換規則に
より前記標準形に変換する変換部を備え、変換された標
準形について単語辞書検索を行い、単語同定するように
したことを特徴とする。The object of the present invention is to provide a text reading method that solves the above-mentioned problems, and the means thereof is to set a standard form for writing katakana foreign words in advance, and read out katakana foreign words that have fluctuations in written sentences. The present invention is characterized in that it includes a conversion unit that converts the converted standard form into the standard form according to a predetermined conversion rule, and performs a word dictionary search on the converted standard form to identify words.

〔Example〕

第１図は本発明に係る文章読み上げ方式の構成図である
。第１図において、１１は文章入力部、１２は辞書検索
部、１３は単語同定部、１２１は外来語標準形変換部、
１４は読み列格納部、１５は音声出力部、そして１６は
単語辞書部である。辞書検索部１２、単語同定部１３お
よび外来語標準形変換部１２１により文章解析部１７を
構成する。FIG. 1 is a block diagram of a text reading method according to the present invention. In FIG. 1, 11 is a text input section, 12 is a dictionary search section, 13 is a word identification section, 121 is a foreign word standard form conversion section,
14 is a reading sequence storage section, 15 is an audio output section, and 16 is a word dictionary section. The dictionary search section 12, the word identification section 13, and the foreign word standard form conversion section 121 constitute the sentence analysis section 17.

このような構成において、キーボード等により文章入力
部１１から入力された文の中に外来語がありこれが定め
られた標準形でなかった場合、変換部１２１で標準形に
変換して単語辞書１６を検索する。即ち、文章入力部１
１に入力された文が、例えば「ヴァイオリンの音をコン
ピューターで」とすると、辞書検索部１２では上記文章
中のカタカナ表記の部分を外来語標準形変換部１２１に
送り、「ヴァイオリン」の「ヴア」を「ハ」に、「コン
ピューター」の長音部分をとって「コンピュタ」とし単
語辞書１６を検索し候補単語をピンクアンプする。単語
辞書１６には第２図に示すように、標準形への変換結果
を示す“表記”、“読み”、“文法”、“アクセント”
などが格納されている。In such a configuration, if there is a foreign word in a sentence inputted from the text input section 11 using a keyboard or the like and it is not in the prescribed standard form, the conversion section 121 converts it into the standard form and converts it into the word dictionary 16. search for. That is, text input section 1
For example, if the sentence input in step 1 is ``the sound of a violin on a computer,'' the dictionary search unit 12 sends the katakana notation in the sentence to the loan word standard form conversion unit 121, and converts it to the ``voice'' of ``violin.'''' as ``ha'' and the long part of ``computer'' as ``computer'', search the word dictionary 16, and pink-amplify candidate words. As shown in Figure 2, the word dictionary 16 includes "notation", "reading", "grammar", and "accent" that indicate the conversion result to the standard form.
etc. are stored.

外来語標準形変換部１２１は、上記の例の「ヴア、ヴイ
、ヴ、ヴエ、ヴオ」を［バ、ビ、ブ、べ、ボ］、に、「
長音はすべて除く」に、さらに、「ライ、ウニ、ウォ」
を「ライ、ウニ、ウォ」にするなど表記文章中のゆらぎ
部分の変換規則を持っている。The loanword standard form conversion unit 121 converts "Vua, Vui, Vu, Vue, Vuo" in the above example into [Ba, Bi, Bu, Be, Bo], "
"Excluding all long sounds" and "rai, uni, wo"
It has rules for converting fluctuations in written sentences, such as changing ``rai, uni, wo''.

この変換は例えばソフト的に行われ、ＩＦ文、ＧＯＴＯ
文等により変換される。This conversion is performed by software, for example, using IF statements, GOTO
Converted by sentences, etc.

候補単語のピックアップ後は単語同定部１３で最適な単
語列を選択し、読み列格納部１４では確定した単語列に
従って読み列「バイオリンノオトオコンピューターデ」
を生成し格納し、音声出力部１５は読み列に基づき音声
合成を行い音声出力をする。After picking up the candidate words, the word identification section 13 selects the most suitable word string, and the reading string storage section 14 stores the reading string "Violin no Otoo Computer De" according to the determined word string.
The speech output unit 15 performs speech synthesis based on the reading sequence and outputs speech.

〔Effect of the invention〕

本発明によれば、日本語情報処理装置における、文章か
らその読みを生成する文章読み上げ方式において、外来
語のゆらぎを気にすることなく日本語の円滑な単語同定
処理と文章解析が可能となる。According to the present invention, in a text-to-speech method for generating pronunciations from sentences in a Japanese information processing device, smooth word identification processing and text analysis of Japanese are possible without worrying about fluctuations in foreign words. .

[Brief explanation of the drawing]

第１図は本発明に係る文章読み上げ方式の構成図、第２図は第１図構成における単語辞書の詳細図、および第３図は従来の構成図である。（符号の説明）１１　、３１・・・文章入力部、１２　、３２・・・辞書検索部、１３　、３３・・・単語同定部、１４．３４・・・読み列格納部、１５．３５・・・音声出力部、１６　、３６・・・単語辞書、１２１・・・外来語標準形変換部。従来の文章読み上げ方式の構成図第３図 FIG. 1 is a configuration diagram of a text reading method according to the present invention, Figure 2 is a detailed diagram of the word dictionary in the configuration shown in Figure 1, and FIG. 3 is a conventional configuration diagram. (Explanation of symbols) 11, 31... text input section, 12, 32...dictionary search section, 13, 33...word identification section, 14.34... reading sequence storage section, 15.35...Audio output section, 16, 36...word dictionary, 121...Loan word standard form conversion unit. Configuration diagram of conventional text reading method Figure 3

Claims

[Claims]

1. In the text reading method of a Japanese information processing device that inputs a written sentence, analyzes the text to identify the words in the string, and outputs the sound, a standard form for writing katakana foreign words is set in advance. , comprising a converting unit that converts a katakana foreign word with fluctuations in the written sentence into the standard form according to a predetermined conversion rule, and performs a word dictionary search on the converted standard form to identify the word. Characteristic text reading method.