JPS5934598A

JPS5934598A - Voice input unit for printing

Info

Publication number: JPS5934598A
Application number: JP57144151A
Authority: JP
Inventors: 道夫倉田
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1982-08-20
Filing date: 1982-08-20
Publication date: 1984-02-24

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明は、音声を仮名コード、漢字コード及び記号コ
ードに変換して電算写植システムに入力するための印刷
用音声入力装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a printing voice input device for converting voice into kana code, kanji code and symbol code and inputting the converted voice into a computer typesetting system.

従来、印刷用原稿（写植時にオペレータが参照するもの
を言う。以下、同様である。）の入力に際しては、オペ
レータがキーボード等を手や指で操作するようになって
いる。このため、データの入力に多大の労力を要すると
共に、技術的な習熟を必要とし、入力作業に肉体的な疲
労を伴なうといった欠点がある。2. Description of the Related Art Conventionally, when inputting a printing document (referring to a document referred to by an operator during phototypesetting; the same applies hereinafter), an operator operates a keyboard or the like with his or her hands or fingers. For this reason, there are disadvantages in that inputting data requires a great deal of effort, requires technical skill, and input work is physically tiring.

このような欠点を解消するものとして印刷用音声入力装
置が提案されているが、従来の音声入力装置ではオペレ
ータの発声の経時的変化、あるいはマイクロホンの装着
位置の微小な違いによって認識率が低下し、原稿入力の
速度が低下する欠点があった。よって、この発明の目的
は、上述の如き欠点のない印刷用音声入力装置を提供す
ることにある。Printing voice input devices have been proposed to overcome these drawbacks, but with conventional voice input devices, the recognition rate decreases due to changes in the operator's utterances over time or minute differences in the position of the microphone. However, there was a drawback that the speed of document input was reduced. SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a voice input device for printing that does not have the above-mentioned drawbacks.

以下にこの発明を説明する。This invention will be explained below.

この発明は印刷用音声入力装置に関し、第１図に示すよ
うに、マイクロホン１を介して入力される音声（ＶＳ）
の特徴パラメータを抽出するパラメータ抽出装ｗ２と、
内部記憶装置に予め格納されている単音節特徴パラメー
タと上記特徴パラメータとを比較し、類似度の最も高い
ものを該当単音節コードとして出力する音声認識装置３
と、入力された音声特徴パラメータと内部記憶装置内の
該当単音節特徴パラメータとを比較し、特徴パラメータ
を学習するかどうかを判断する比較判断装置４と、仮名
−漢字変換を行なうワードプロセッサ５１を有すると共
に、漢字コード、仮名コードに対応する記号コードを格
納する記憶装置５２を有し、音声認識装置３からの出力
コードを入力して、記憶装置５２から電算写植システム
６に上記各コードを出力する漢字処理装置５とを設けた
ものである。The present invention relates to an audio input device for printing, and as shown in FIG.
a parameter extraction device w2 that extracts feature parameters of
A speech recognition device 3 that compares the monosyllabic feature parameters stored in advance in an internal storage device with the above feature parameters, and outputs the one with the highest degree of similarity as the corresponding monosyllabic code.
, a comparison/judgment device 4 that compares the input voice feature parameter with the corresponding monosyllabic feature parameter in an internal storage device and determines whether or not to learn the feature parameter, and a word processor 51 that performs kana-kanji conversion. It also has a storage device 52 for storing symbol codes corresponding to kanji codes and kana codes, inputs output codes from the speech recognition device 3, and outputs each of the above codes from the storage device 52 to the computer typesetting system 6. A kanji processing device 5 is provided.

しかして、パラメータ抽出装置２は第２図に示すように
、音声信号ｖＳを増幅して前処理する前処理回ｗＪ２１
と、前処理された音声信号■ＳＡを互いに中心周波数の
異なる各帯域に分割する帯域通過フィルタ群２２と、分
割された各帯域信号を制御信号’Ｃ８Ｉによって順次選
択するチャネル選択回路詔と、選択されたチャネル信号
ＣＨを制御（Ｘ号Ｃ８２によって所定のタイミングでサ
ンプリングするサンプリング回路２４とを具備している
。As shown in FIG.
, a group of band pass filters 22 that divides the preprocessed audio signal SA into bands having different center frequencies, a channel selection circuit that sequentially selects each of the divided band signals using a control signal 'C8I, and a selection circuit. The sampling circuit 24 controls and samples the channel signal CH at a predetermined timing using the X-channel signal C82.

このような構成において、マイクロホン１からの音声信
号■Ｓはパラメータ抽出装置２内の前処理回路２Ｊによ
って増幅及び前処理され、互いに中心周波数の異なる帯
域通過フィルタ群四に与えられる。ここで各帯域に分割
された音声信号はチャネル選択回路乙によって順次選択
され、後段のサンプリング回路冴に送られてサンプリン
グされる。In such a configuration, the audio signal S from the microphone 1 is amplified and preprocessed by the preprocessing circuit 2J in the parameter extraction device 2, and is applied to a group of band pass filters 4 having different center frequencies. Here, the audio signals divided into each band are sequentially selected by the channel selection circuit B, and sent to the subsequent sampling circuit B for sampling.

そして、チャネル選択回路２３及びサンプリング回路別
は、音声認識装置３からの制御信号Ｃ８Ｉ及びＣ８２に
よってタイミング制御され、各帯域η′分割された音声
信号は特徴パラメータとして時系列的に音声認識装置３
へ出力される。The timing of each channel selection circuit 23 and sampling circuit is controlled by control signals C8I and C82 from the speech recognition device 3, and the speech signal divided into each band η' is sent to the speech recognition device 3 in time series as a characteristic parameter.
Output to.

一方、音声認識装置３は、オペレータによって指定さｈ
たモードに従って音声認識を行ない、単１４−節コード
をワードプロセッサ５１へ出力する。ワードプロセッサ
５１は仮名−漢字変換機能を有し、仮名人力された印刷
用原稿中の必要部分を漢字に変換し、入力原稿をその割
付情報と共に記憶装置５２に格納する。記憶装置５２は
頁単位で印刷用原和の情報を記憶し、この記憶装置５２
から電算写植システム６へ記憶内容が出力される。また
、音声認識装置３はパラメータ抽出装置２に対してサン
プリングのタイミングを与える制御信号Ｃ８Ｉ　、　Ｃ
８２を出力″′４−るが、認識モードの指定に対応して
認識率を茜（するためにサンプリング周期を変えるよう
になっている。たとえば、単音節認識モードでは約２ミ
リ秒間隔のサンプリング時間で、単語認識モードでは約
１０ミリ秒間隔のサンプリング時間でそれぞれ特徴パラ
メータの入力を行なうようになっている。On the other hand, the voice recognition device 3
Speech recognition is performed according to the selected mode, and a single 14-clause code is output to the word processor 51. The word processor 51 has a kana-to-kanji conversion function, converts the necessary portions of the printed manuscript into kanji, and stores the input manuscript together with its layout information in the storage device 52. The storage device 52 stores the information of the printing original page on a page-by-page basis.
The stored contents are output from the computer phototypesetting system 6. The speech recognition device 3 also sends control signals C8I and C8I, which give sampling timing to the parameter extraction device 2.
82 is output, but the sampling period is changed to increase the recognition rate according to the recognition mode specification. For example, in monosyllabic recognition mode, sampling is performed at approximately 2 millisecond intervals. In the word recognition mode, characteristic parameters are input at sampling times of about 10 milliseconds.

ところで、音声入力を始める場合、先ず外部記憶装置よ
り各白子め登録しておいた単音節特徴パラメータ（標準
パターン）を内部記憶装置に入力するが、登録時におけ
る条件又はマイクロホンの装着位置の微小な違いにより
、発音の種類によっては十分な性能を得られないものが
ある。ここにおいては、音声入力を開始する時点に標準
パターンを更新することが、上記問題を解決するための
有力な手段であることが判明した。しカルながら、標準
パターンを更新するためには、オペレータが更新毎に正
確な判断を必要とするため大きな負担となり、集用的に
困難である。このため、この発明では、標準パターンを
更功することが必要な状態（学習モード）と、標準パタ
ーンの更新が十分圧行なわれ、各内部記憶装置の特徴パ
ラメータが入力音声の判断に適性である状態（無学習モ
ード）とを自動的に判断するようにし、その判断結果を
オペレータに知らせることを可能としている。By the way, when starting voice input, first input the monosyllabic characteristic parameters (standard pattern) registered for each albino from the external storage device into the internal storage device, but due to the conditions at the time of registration or the minute position of the microphone. Due to these differences, some types of pronunciation may not be able to provide sufficient performance. Here, it has been found that updating the standard pattern at the time of starting voice input is an effective means for solving the above problem. However, in order to update the standard pattern, the operator is required to make accurate judgments for each update, which is a heavy burden and is difficult to use. Therefore, in this invention, the standard pattern is in a state where it is necessary to improve it (learning mode), the standard pattern is sufficiently updated, and the characteristic parameters of each internal storage device are appropriate for determining the input voice. It is possible to automatically determine the state (non-learning mode) and notify the operator of the determination result.

第３図（Ａ）及び（Ｂ）は学習モードにおいて、内部記
憶装血″内の標準パターンが更新によって適応化してい
（様子を示したものであり、縦軸に正規化した評価値を
、横軸に更新回数をとったものである。Figures 3 (A) and (B) show how the standard pattern in the internal memory is adapted by updating in the learning mode.The vertical axis shows the normalized evaluation value, and the horizontal The axis shows the number of updates.

なお、同図におけるＮｏは評価値の安定レベルを示すも
のであり、同図（５）は’ｋａ“という発音の標準パタ
ーンの更新の様子を示すものであり、同図−は％　ａ　
ｆという発音の標準パターンの更新の様子を示すもので
ある。これからも明らかなように、更新の効果が大きい
ものと小さなものとがあり、この傾向は個人差もあるが
、一般的に子音部が％ｐ＃、％１＃、％に＃、％ｍ“及
び′ｎ“のものに標準パターンの更新効果がある。Note that No in the same figure indicates the stable level of the evaluation value, and (5) in the same figure shows how the standard pattern of pronunciation 'ka' has been updated, and - in the figure indicates % a.
This figure shows how the standard pattern for pronunciation f is updated. As will be clear from this, there are cases where the effect of updating is large and cases where it is small, and this tendency varies from person to person, but generally the consonants are %p#, %1#, %ni#, %m" and 'n' have the effect of updating the standard pattern.

ここにおいて、評価値の安定性に基づいて学習モード→
無学習モードの自動変換を行なうようにしているが、各
音によって評価値の安定レベルが異なるため、音声認識
装置３内の記憶装置にそれぞれの評価値の安定レベルを
記憶しておき、後述するようなソフトウェアで判断を行
なう。また、比較判断装置４は入力音声の特徴パラメー
タと、該当単音節特徴パラメータとを比較して評価値を
算定し、音声認識装置３内の記憶装置のそれぞれの評価
値を用いて判断を行なう。Here, the learning mode is based on the stability of the evaluation value→
Automatic conversion is performed in the non-learning mode, but since the stability level of the evaluation value differs depending on each sound, the stability level of each evaluation value is stored in the storage device in the speech recognition device 3, and will be described later. Judgments are made using software such as Further, the comparison/judgment device 4 calculates an evaluation value by comparing the characteristic parameter of the input speech with the corresponding monosyllabic characteristic parameter, and makes a determination using each evaluation value in the storage device within the speech recognition device 3.

次に、評価値の判断に用いるソフトウェアのフローチャ
ートを第４図に示して説明する。Next, a flowchart of the software used for determining the evaluation value will be described with reference to FIG. 4.

まず、第４図（Ａ）のように、イニシャル時において初
期値の設定を行なう。すべての単音節に対して安定レベ
ルに到達した回数により、学習状態あるいは無学習状態
を判断するマスク値を設定しくステップ５１１）、予め
設定しである評価値の安定レベルを設定する（ステップ
５１２）。そして、その後にすべての単音節に対して学
習モードをセットする（ステップ５１３）。First, as shown in FIG. 4(A), initial values are set at the time of initialization. A mask value for determining a learning state or a non-learning state is set based on the number of times a stable level is reached for all monosyllables (step 511), and a stable level of a preset evaluation value is set (step 512). . After that, the learning mode is set for all monosyllables (step 513).

また、第４図（Ｂ）は実行中における学習／無学習判断
及び評価値の判断ルーチンであり、まず入力音声が学習
モードであるか又は無学習モードであるかを判断する（
ステップ５２１）。そして、学習モードである場合には
、音声認識装置３で入力音声の特徴パラメータの評価値
の算定を行なうと共に、該轟単音節の評価値安定レベル
との比較を行なう（ステップ８２２，５２３）。ここで
、安定レベルに到達してい１、「い場合は、入力音声の
特徴パラメータによって該当単音節特徴パラメータの更
新を行なう（ステップ８２４，８２．５）。そして、安
定レベルに到達している場合は、マスク値の更新を行な
いマスク値がＭＩ　ＯＩになったかどうかの判断を行な
う（ステップ８２６，８２７）。ここでマスク値が１０
″となった場合は、該当単音節特徴パラメータに関して
は十分に安定なレベルに到達したと判断し、無学習モー
ドとする（ステップ５２８）。FIG. 4(B) shows a learning/non-learning judgment and evaluation value judgment routine during execution, in which it is first judged whether the input voice is in learning mode or non-learning mode (
Step 521). If the learning mode is selected, the speech recognition device 3 calculates the evaluation value of the characteristic parameter of the input speech and compares it with the evaluation value stability level of the roaring monosyllable (steps 822, 523). Here, if the stable level has been reached (1), the corresponding monosyllabic feature parameter is updated using the feature parameter of the input speech (steps 824, 82.5). updates the mask value and determines whether the mask value has become MI OI (steps 826, 827).Here, if the mask value is 10
'', it is determined that the corresponding monosyllabic feature parameter has reached a sufficiently stable level, and the non-learning mode is set (step 528).

以上のようにこの発明の音声入力装置によれば、音声入
力の開始時に安定レベルか否かわ判断を行ナイ、安定レ
ベルでない場合には、予め登録されている該尚早音節特
徴パラメータの更新を行なうようにしているので、音声
の認識率を著しく向上することができる。As described above, according to the voice input device of the present invention, it is not necessary to judge whether the voice input is at a stable level at the start of voice input, and if it is not at a stable level, the pre-registered premature syllable characteristic parameter is updated. As a result, the speech recognition rate can be significantly improved.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すブロック構成図、第
２図はその一部を詳細に示すブロック構成図、第３区内
及び（Ｂ）はそばれ標準パターンの更新による適応化の
様子を示す図、第４図（５）、０３）はこの発明の評価
値判断に用いるソフトウェアのフローチャートである。１・・・マイクロホン、２・・・パラメータ抽出装置、
３・・・音声認識装置、４・・・比較判断装置、５・・
・漢字処理装置、６・・・電算写植システム、２１・・
・前処理回路、Ｕ・・・サンプリング回路、５１・・・
ワードプロセッサ、５２・・・記憶装置。出願人代理人　　　安　　形　　雄　　三回りく蜂Fig. 1 is a block diagram showing one embodiment of the present invention, Fig. 2 is a block diagram showing a part of it in detail, and the third section and (B) show the adaptation by updating the Sobare standard pattern. The diagram illustrating the situation, FIG. 4 (5), 03) is a flowchart of the software used for evaluating the evaluation value of the present invention. 1...Microphone, 2...Parameter extraction device,
3... Voice recognition device, 4... Comparison/judgment device, 5...
・Kanji processing device, 6...Computer typesetting system, 21...
・Pre-processing circuit, U... sampling circuit, 51...
Word processor, 52...Storage device. Applicant's agent Yu Yasugata Three-turning bee

Claims

[Claims] a) A parameter extraction device which extracts feature parameters of input speech; b) Compares monosyllabic feature parameters stored in advance in an internal storage device with the feature parameters. C) A speech recognition device that outputs the one with the highest degree of similarity as the corresponding monosyllabic code; and C) Compares the input speech feature parameter with the corresponding monosyllabic feature parameter in the internal storage device, and d) a word processor for converting the monosyllabic code from kana to kanji, and a storage device for storing symbol codes corresponding to the kanji code and the kana code; a kanji processing device that inputs output codes from the voice recognition device and outputs each of the codes from the storage device to the computer phototypesetting system, learns the feature parameters when voice input starts, and stores them in the internal storage device. 1. A speech input device for printing, characterized in that a monosyllable characteristic parameter is improved.