JPS6184771A

JPS6184771A - Voice input device

Info

Publication number: JPS6184771A
Application number: JP59206238A
Authority: JP
Inventors: Hiroshi Ichikawa; 市川　熹; Yoshiaki Asakawa; 浅川　吉章; Yoshinori Kitahara; 義典北原; Nobuo Hataoka; 畑岡　信夫; Shoichi Takeda; 武田　昌一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1984-10-03
Filing date: 1984-10-03
Publication date: 1986-04-30
Also published as: JPH0554960B2

Abstract

PURPOSE:To confirm input results in the state convenient for use which agrees with the request of a user at each time, by setting various execution modes and making it possible that the user selects freely these modes. CONSTITUTION:A voice synthesizing part 6 and a character display part 19 request data for voice synthesis and character display of one word or clause to an input voice recording part and a KANA (Japanese syllabary) - KANJI (Chinese character) conversion output recording part 13 by a command from a control part 6 and synthesize a voice and perform D/A conversion and output a reading voice and display a sentence where KANJI, KANA, etc. are mixed. In this case, a coefficient which modifies parameters is outputted from a modify ing signal generating part 16, which modifies synthesis parameters, for the purpose of changing the tone color of the synthesized voice in accordance with a command from the user. The user inputs confirming/correcting information 21 from a key input part 22. When confirming information is inputted, the control part 6 outputs the next word or clause.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は音声入力装置に係り、特に音声タイプライタに
おける入力結果の認識方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a voice input device, and more particularly to a method for recognizing input results in a voice typewriter.

[Background of the invention]

音声入力装置、特に入力音声を文字列に変換する。いわ
ゆる音声タイプライタにおいては、入力した音声が正し
く文字列に変換されているかどうかを確認することが重
要である。従来の装置は、たとえば実公昭４４−５５２
Ｇ号記載のように、音節を入力毎に確認したり、特開昭
５４−１３６１３４号記載のように、入力音声と文字を
再生と表示を時間的に対応付けて読合せたり、変換され
た文字から規則合成により音声を合成し、入力音声と交
互に再生し、つき合せたり、入力音声を文字字種指定入
力情報によりエコー等音声を変えて出力するなど勝れた
工夫がなされている。A voice input device, especially converting input voice into a string of characters. In so-called voice typewriters, it is important to check whether the input voice is correctly converted into a character string. The conventional device is, for example,
As described in issue G, syllables can be checked for each input, as described in JP-A-54-136134, input speech and characters can be reproduced and displayed in a time-related manner and read together, or converted. Excellent ideas have been used, such as synthesizing speech from characters by regular synthesis, playing it alternately with the input speech, matching it, and outputting the input speech with different sounds such as echoes depending on the input information specifying the character type.

仮名漢字変換や同音異字の文字列への変換を行なうよう
なシステムでは、同音異字の変換があり、変換後の文字
列から音声を規則合成で出力するよりも、入力の音声と
変換後の文字列を照合した方が合理的である。しかしな
がら、たとえばテープレコーダによる自分の声の再生音
を聞くことは相当数の人が好まない。したがって、音色
の個人的特徴を変形した音声で再生することが望ましい
。In systems that perform kana-kanji conversion or conversion of homophones into character strings, there is conversion of homophones and allographs, and rather than outputting audio from the converted character string by rule synthesis, it is possible to convert the input audio and the converted characters. It is more reasonable to match columns. However, a significant number of people do not like hearing their own voice played back, for example on a tape recorder. Therefore, it is desirable to reproduce the personal characteristics of the timbre in a modified voice.

また、読み合せ確認は、対象文章の完成度により、最適
な形態が異なるが、これらに対処できる柔軟な構成を有
する入力システムは実現されていない。Further, the optimal format for reading confirmation differs depending on the degree of completion of the target text, but an input system with a flexible configuration that can handle these situations has not been realized.

入力に際しても、入力しながら確認して行くと思考が中
断されるという問題が生じる。When inputting information, there is also a problem in that if you check while inputting, your thinking is interrupted.

[Purpose of the invention]

本発明の目的は、以上の問題を解決し、その特待の要求
にそった柔軟で使いやすい音声入力システムに好適な入
力結果確認手段を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide an input result confirmation means suitable for a flexible and easy-to-use voice input system that meets the requirements for special treatment.

[Summary of the invention]

上記目的を達成するために、本発明では、先ず入力音声
を音声の形式で再生可能な形態でまとめて記録する手段
を設ける。この記録手段には、入力と同時に外部より入
力される信号、または認識時又は認識後の仮定漢字変換
用日本語処理時に自動的に判定・出力される単語又は文
節の境界を示す信号を音声と対応付けて記録する手段を
持たせる。さらに入力された音声は、外部から指定され
るモードに従い、入力しながら、又は入力後使用者から
の指定に従い、まとめてａ！識を行ない、認識した結果
を仮名・漢字変換等を行なった後、変換した結果に単語
又は文節の境界を示す記号を付して記録する手段を持た
せる。In order to achieve the above object, the present invention first provides means for collectively recording input audio in a reproducible audio format. This recording means records signals that are input externally at the same time as the input, or signals that indicate the boundaries of words or phrases that are automatically determined and output during recognition or post-recognition Japanese processing for hypothetical kanji conversion. Provide a means to associate and record. Furthermore, the input audio is combined into a! according to an externally designated mode, while inputting, or according to a user's designation after inputting. After the recognized results are converted into kana/kanji characters, etc., the converted results are provided with a means for recording the converted results with symbols indicating the boundaries of words or phrases.

読合せ確認時に、入力音声を再生する際には、使用者が
外部から指示する指令に従い、音色を変えて音声を再生
しうる手段と、指定に従い、再生の速度を変更しうる手
段と、指令に従がい、単語又は文節毎に再生するか、入
力全体を別途指令のある時点まで連続的に再生する制御
手段と、再生中の単語又は文節に対応する文字又は文字
列の位置が一見してわかる表示補助手段（カーソルやブ
リンキング、色の変更等）等を持たせる。When reproducing input audio during reading confirmation, there is a means for reproducing the audio by changing the timbre according to commands given by the user from the outside, a means for changing the playback speed according to the instructions, and a command. control means for reproducing word or phrase by word or for continuously reproducing the entire input up to a point in time specified separately; Provide an easy-to-understand display aid (cursor, blinking, color change, etc.).

これらの手段の有する機能を任意に組み合せて実行する
モードを設定し、使用者が自由にこれらモードを選択で
きるよう構成することにより、その時々の使用者の要求
に合致した使いやすい状態で入力結果確認を行なうこと
が可能となる。By setting a mode in which the functions of these methods are arbitrarily combined and allowing the user to freely select these modes, input results can be created in an easy-to-use state that meets the user's needs at the time. It becomes possible to perform confirmation.

[Embodiments of the invention]

以下、本発明の一実施例を第１図により説明する。 An embodiment of the present invention will be described below with reference to FIG.

入力音声１はＡ／Ｄ変換２によりデジタル信号変換され
１分析部３で分析複入力音声記録部４に記録される。分
析方法は認識部の認識方式と整合の取れた方式であれば
良い。また、読合せの合成にも用いるため１合成にも適
した方式が良いが両者用に別々に処理を行なってももち
ろん良い。両者は同一の方が能率上望ましいが、多くの
場合、合成音の音源に関する分析は認識には用いないの
で１合成用のみに行なうことが必要になる。認識部５は
制御部６の認識指令信号７により入力音声を要求８し１
分析された入力音声を認識し、認識結果を認識結果記録
部９に書き込む。認識部５の構成は多くの公知技術が知
られており、どの方式を採用してもその内容は本発明と
本質的にかかわりないため、その説明は省略する。仮名
−漢字変換処理部】０は制御部６からの仮名漢字変換処
理指令信号１１により、認識結果要求信号１２を出し、
認識結果を取り込み、形態素解析等の処理により単語又
は文節境界を求めながら入力を漢字混り文に変換し、そ
の出力を仮名−漢字変換出力記録部Ｉ３に書き込むとと
もに、求められた単語あるいは文節等の境界の位置の記
号１４を入力音声記録部４と仮名漢字変換出力記録部１
３の対応する所定の場所に記録して行く。仮名文字列か
ら漢字に変換する技術についても公知の技術を用いれば
良いので、ここでは説明は省略する６なお、単語や文節
の境界は音声入力時に、発声者により別途スイッチ等で
入力に同期させて入力しても、もちろんかまわない。An input voice 1 is converted into a digital signal by an A/D converter 2, and is recorded in an analysis multi-input voice recording unit 4 by an analysis unit 3. The analysis method may be any method as long as it is consistent with the recognition method of the recognition unit. Furthermore, since it is also used for reading and compositing, a method suitable for single compositing is preferable, but it is of course possible to perform processing separately for both. Although it is preferable for efficiency that the two be the same, in many cases, the analysis of the sound source of the synthesized sound is not used for recognition, so it is necessary to perform it only for one synthesis. The recognition unit 5 requests input voice 8 according to the recognition command signal 7 from the control unit 6.
The analyzed input voice is recognized and the recognition result is written into the recognition result recording section 9. Many publicly known techniques are known for the configuration of the recognition unit 5, and no matter which system is adopted, the details thereof are not essentially related to the present invention, so a description thereof will be omitted. [Kana-kanji conversion processing unit] 0 outputs a recognition result request signal 12 in response to a kana-kanji conversion processing command signal 11 from the control unit 6,
The recognition result is taken in, the input is converted into a sentence containing kanji while finding the word or phrase boundary through processing such as morphological analysis, and the output is written to the kana-kanji conversion output recording section I3, and the found word or phrase, etc. The symbol 14 at the boundary position is input to the audio recording section 4 and the kana-kanji conversion output recording section 1.
3. Record the information in the corresponding predetermined location. Known technology can be used to convert kana character strings into kanji, so the explanation will be omitted here6.Note that the boundaries between words and phrases can be synchronized with input by the speaker using a separate switch etc. during voice input. Of course, it doesn't matter if you enter it.

なお、音声を入力し、認識し、仮名−漢字等に変換する
各処理は、パイプライン的に並行して実行しても、各部
分毎にバッチ的に行なっても良く、これらのモードの選
択は使用者からの指令により制御部６からの各指令及び
１１の出力タイミングを制御することにより容易に変え
ることができることは明らかである。Note that each process of inputting audio, recognizing it, and converting it into kana-kanji, etc. can be executed in parallel in a pipeline, or in batches for each part, and these modes can be selected. It is clear that this can be easily changed by controlling each command from the control section 6 and the output timing of the control section 11 according to a command from the user.

次に読み合せ時の説明で行なう。制御部６からの指令で
、音声合成部６及び文字表示部１９は各各入力音声記録
部４と仮名−漢字変換出力記録部１３に一単語又は−文
節分の音声合成用及び文字表示用データを要求し、音声
を合成、Ｄ／Ａ変換し読合せ用音声として出力及び漢字
−仮名等混合文として表示する。この際制御部６は利用
者からの指令により、合成音声の音色を変えるために、
合成パラメータを修飾する修飾信号生成部１６により、
パラメータを変形する係数を出力する。たとえば、音声
合成部が当業者には良く知られているＬＳＰ合成器の場
合、各ＬＳＰパラメータに一定の値を掛け、合成音声の
ホルマント位置を移動させたり、ピッチ周波数に一定値
を掛け、声の高さを変えたり、分析間隔とは異なった間
隔で合成パラメータを合成部に供給するよう制御するこ
とにより、音声や合成音の発声速度を変更することが可
能である。利用者は確認／修正情報２１をキー人力部２
２より入力する。確認情報が入力されると制御部６は次
の単語または文節の出力を行なう。修正情報の場合は、
仮名−漢字出力記録部１３の対応する部分の情報を修正
し、次に進む。This will be explained next when reading together. In response to a command from the control unit 6, the speech synthesis unit 6 and the character display unit 19 send data for speech synthesis and character display for one word or a segment to each input voice recording unit 4 and the kana-kanji conversion output recording unit 13. is requested, the voice is synthesized, D/A converted, and output as voice for reading aloud, and displayed as a mixed sentence such as kanji and kana. At this time, the control unit 6 changes the timbre of the synthesized voice according to the command from the user.
By the modification signal generation unit 16 that modifies the synthesis parameters,
Outputs the coefficients that transform the parameters. For example, if the speech synthesis section is an LSP synthesizer well known to those skilled in the art, each LSP parameter may be multiplied by a certain value to move the formant position of the synthesized speech, or the pitch frequency may be multiplied by a certain value to It is possible to change the speaking speed of the voice or synthesized sound by changing the height of the synthesizer or by controlling the synthesis parameter to be supplied to the synthesizer at an interval different from the analysis interval. User confirms/corrects information 21 in Key Human Resources Department 2
Input from 2. When the confirmation information is input, the control unit 6 outputs the next word or phrase. For correction information,
The information in the corresponding portion of the kana-kanji output recording section 13 is corrected, and the process proceeds to the next step.

単語または文節境界の修正の場合は、入力音声記録部４
の対応する境界記号の位置も併せて修正し、読合せと文
字表示の対応にずれが生じないようにする。In the case of modifying words or phrase boundaries, the input audio recording unit 4
The position of the corresponding boundary symbol is also corrected to avoid any discrepancy in the correspondence between reading and character display.

なお、利用者からの指示により、誤り修正の入力２１が
入力されるまで、単語または文節境界にかかわらず、連
続的に音声の再生と文字の再生を行ない、修正人力２１
があると、入力のあった時点での単語又は文節の終りで
出力を停止し、修正処理後再開するモード等の設定も容
易に実現できる。また、文字の表示の方は文章や段落等
の単位で先に表示し、読合せ音声の出力に対応する単語
又は文節の位置をカーソルやブリレキング、カラー表示
等で示すように構成することも可能である。In addition, according to an instruction from the user, until the input 21 for error correction is input, the audio and characters are played continuously regardless of the word or phrase boundary, and the correction manual 21 is performed.
If there is, it is possible to easily set a mode in which output is stopped at the end of the word or phrase at the time of input, and restarted after correction processing. In addition, it is also possible to display characters in units of sentences or paragraphs first, and to indicate the position of the word or phrase that corresponds to the output of the reading voice using a cursor, brillex, color display, etc. It is.

漢字は同音異字が多く、このように文章全体の中で表示
する方が誤り発見は容易である。Kanji has many homonyms, and it is easier to detect errors when they are displayed within the entire sentence.

なお、以上の実施例は日本語について説明したが、発音
が同じでツヅリの異なる言語等においても同等に構成で
きることは言うまでもない。Although the above embodiment has been described for Japanese, it goes without saying that the same structure can be applied to languages that have the same pronunciation but different tsuzuri.

〔Effect of the invention〕

以上説明したごとく１本発明によれば、同音異字を含む
文章を音声で入力する場合に、極めて容易に入力音声と
文章を対応して確認、修正することができる。As described above, according to the present invention, when a sentence including homophones and allographs is input by voice, the input voice and the sentence can be checked and corrected in correspondence with each other with great ease.

[Brief explanation of the drawing]

第１図は本発明の一実施例を説明するためのブロック図
である。FIG. 1 is a block diagram for explaining one embodiment of the present invention.

Claims

[Claims] 1. A means for recording input voice, a means for recognizing input voice, a means for converting the recognized result into characters, etc., and a means for converting a character string converted into characters, etc. a character display means for displaying single words or phrases or longer words, a means for inputting confirmation and correction information of the displayed content by the character display means, and a means for inputting information for checking and correcting the displayed content by the character display means, and inputting audio in the input voice recording means for each word or phrase. An audio input device characterized by comprising means for reproducing and outputting the audio in sections. 2. The voice input device according to claim 1, wherein the voice reproduction means includes timbre modification means so that the reproduced sound has a different timbre from the input voice. 3. The voice reproduction means is controlled to sequentially output in units of words or phrases each time information is input from the confirmation and correction information input means, and the character display means corresponds to the voice output from the voice reproduction means. 2. The voice input device according to claim 1, further comprising display assisting means for clearly identifying the position of the character. 4. The voice input device according to claim 1, wherein the voice reproduction means has a speed changing means so that the reproduced sound has a speaking speed different from that of the input voice. 5. The input voice recording means has a sufficient capacity and has a storage means having a sufficient capacity to record the output of the character etc. conversion means, and the input voice recording means has a storage means having a sufficient capacity to record the output of the character etc. conversion means, and the time of inputting voice, the time of performing recognition operation, and 2. The voice input device according to claim 1, wherein the voice input device is characterized in that the time points at which confirmation and correction are performed can be separated and processed separately. 6. The above-mentioned patent characterized in that the confirmation and correction means is normally in a state where confirmation information is input, and also has a mode in which correction information can be input only when correction is necessary. A voice input device according to claim 1.