JP2001134290A

JP2001134290A - Speech recognition system, method and recording medium

Info

Publication number: JP2001134290A
Application number: JP31693899A
Authority: JP
Inventors: Hiroki Tanioka; 広樹谷岡
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1999-11-08
Filing date: 1999-11-08
Publication date: 2001-05-18
Anticipated expiration: 2019-11-08
Also published as: JP3438869B2

Abstract

(57)【要約】【課題】音声認識による文書作成を容易にする。【解決手段】話者がマイクロフォン１Ａから音声を入
力すると、これがインターフェース１Ｂからデジタルの
音声情報として音声認識部２Ａに渡される。音声認識部
２Ａは、渡された音声情報を１以上の文字からなるテキ
ストに変換する。抑揚スコア割り当て部２Ｂは、前記音
声情報のうちの速度、音量及び周波数の変化量を、変換
された各文字に抑揚スコアとして割り当てる。この抑揚
スコアは、記憶装置３に一時記憶される。スコア標識作
成部４Ａは、記憶装置３に記憶された抑揚スコアに基づ
き、音声認識部２Ａが変換したテキスト中の各文字に、
色属性及びフォント属性といった表示属性を割り当て
る。表示データ作成部４Ｄは、表示条件記憶部４Ｃに記
憶された表示条件に従い、テキスト中の各文字に表示属
性を割り当てた表示データを作成し、表示装置５に表示
させる。 (57) [Summary] [Problem] To facilitate document creation by voice recognition. SOLUTION: When a speaker inputs voice from a microphone 1A, the voice is passed from an interface 1B to a voice recognition unit 2A as digital voice information. The voice recognition unit 2A converts the passed voice information into a text composed of one or more characters. The intonation score allocating unit 2B allocates the amount of change in speed, volume, and frequency in the audio information to each converted character as an intonation score. This intonation score is temporarily stored in the storage device 3. Based on the intonation score stored in the storage device 3, the score marker creating unit 4A assigns each character in the text converted by the speech recognition unit 2A to
Assign display attributes such as color attributes and font attributes. The display data creation unit 4D creates display data in which a display attribute is assigned to each character in the text according to the display conditions stored in the display condition storage unit 4C, and causes the display device 5 to display the display data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識システ
ム、方法及びこのようなシステムを実現するためのプロ
グラムを記録したコンピュータ読み取り可能な記録媒体
に関し、特に音声認識の結果生成されたテキストに属性
を付して出力するものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system and method, and a computer-readable recording medium on which a program for realizing such a system is recorded. Attached and output.

【０００２】[0002]

【従来の技術】従来より、マイクロフォンから入力され
た音声を音声認識し、テキストに変換して出力する音声
認識システムがある。特開平１１−１１９７９１号公報
は、このようなシステムのうちで、特に入力された音声
中の音韻パターンの変化或いは音声パワーの変化を解析
し、話者の感情までも認識するものを開示している。2. Description of the Related Art Conventionally, there is a voice recognition system that recognizes voice input from a microphone, converts the voice into text, and outputs the text. Japanese Patent Application Laid-Open No. H11-197991 discloses, among such systems, a system which analyzes a change in a phoneme pattern or a change in voice power in an input voice and recognizes even the emotion of a speaker. I have.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
システムは、話者が１つの単語を発したときにおいて、
その単語を音声認識すると共に、感情レベルを認識する
ことを想定している（上記公報段落００７１〜００７
３、００７７〜００８０参照）。つまり、話者が複数の
単語からなる文章を発し、これを音声認識して連続した
文字列からなるテキストとして出力する場合に、感情レ
ベルをテキストにどのように反映させるかを考慮してい
ない。従って、上記のシステムを文書作成に適用して
も、必ずしも文書作成を容易にすることができなかっ
た。However, the above-mentioned system, when a speaker utters one word,
It is assumed that the word is recognized by speech and the emotion level is recognized (see paragraphs 007 to 007 in the above publication).
3, 0077-0080). That is, when a speaker utters a sentence composed of a plurality of words, and outputs the text as a text composed of a continuous character string by voice recognition, it does not consider how the emotion level is reflected in the text. Therefore, even if the above-mentioned system is applied to document creation, document creation cannot always be facilitated.

【０００４】本発明は、音声認識による文書作成を容易
にすることができる音声認識システム、方法及びそのた
めのプログラムを記録したコンピュータ読み取り可能な
記録媒体を提供することを目的とする。[0004] It is an object of the present invention to provide a speech recognition system and method capable of facilitating document creation by speech recognition, and a computer-readable recording medium on which a program for the system is recorded.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するた
め、本発明の第１の観点にかかる音声認識システムは、
外部から入力された音声情報を、文字列に変換する音声
認識手段と、前記音声情報に基づいて前記音声情報が示
す音声の変化を示す変化情報を生成し、生成した変化情
報を前記音声認識手段が変換した文字列中の所定単位毎
に割り当てる変化情報割り当て手段と、前記変化情報割
り当て手段が割り当てた変化情報に従って、前記文字列
中の所定単位毎に表示属性を付加する属性付加手段とを
備えることを特徴とする。To achieve the above object, a speech recognition system according to a first aspect of the present invention comprises:
Voice recognition means for converting voice information input from the outside into a character string; generating change information indicating a change in voice indicated by the voice information based on the voice information; And a change information assigning means for assigning a display attribute for each predetermined unit in the character string according to the change information assigned by the change information assigning means. It is characterized by the following.

【０００６】上記音声認識システムでは、音声認識手段
が変換した文字列中の所定単位毎に、音声の変化に応じ
た表示属性が自動的に付加されることとなる。例えば、
話者が音声に抑揚をつけ、強調している部分では、文字
列中でもその部分を強調するための表示属性を付加する
ことができる。このため、上記音声認識システムによれ
ば、音声認識による文書作成が容易になる。In the above-described speech recognition system, a display attribute corresponding to a change in speech is automatically added to each predetermined unit in the character string converted by the speech recognition means. For example,
At the part where the speaker puts intonation and emphasizes the voice, a display attribute for emphasizing the part can be added even in the character string. For this reason, according to the speech recognition system, it is easy to create a document by speech recognition.

【０００７】上記音声認識システムにおいて、前記変化
情報割り当て手段は、例えば、前記音声情報が示す音声
の速度、音量及び周波数の少なくとも１つに基づいてス
コアを生成し、生成したスコアを前記文字列中の所定単
位毎に割り当てるものとすることができる。この場合、
前記属性付加手段は、前記変化情報割り当て手段が生成
したスコアに基づいて、前記文字列中の所定単位毎に表
示属性を付加するものとすることができる。In the above speech recognition system, the change information allocating means generates a score based on at least one of a speed, a volume and a frequency of the voice indicated by the voice information, and generates the generated score in the character string. Can be assigned for each predetermined unit. in this case,
The attribute adding unit may add a display attribute for each predetermined unit in the character string based on the score generated by the change information allocating unit.

【０００８】さらに、前記属性付加手段は、前記変化情
報割り当て手段が生成したスコアが所定の閾値を越える
場合に、対応する前記文字列中の所定単位に表示属性を
付加するものとしてもよい。Further, the attribute adding means may add a display attribute to a predetermined unit in the corresponding character string when the score generated by the change information allocating means exceeds a predetermined threshold.

【０００９】上記音声認識システムは、前記音声認識手
段が認識した文字列を構文解析する構文解析手段をさら
に備えていてもよい。この場合、前記属性付加手段は、
さらに前記構文解析手段による構文解析結果に従って、
前記文字列中の所定単位毎に表示属性を付加するものと
することができる。The speech recognition system may further include syntax analysis means for syntax-analyzing the character string recognized by the speech recognition means. In this case, the attribute adding means includes:
Further, according to the result of parsing by the parsing means,
A display attribute may be added for each predetermined unit in the character string.

【００１０】上記音声認識システムは、前記文字列中の
所定単位のうちの特定のものに対して前記変化情報割り
当て手段が割り当てた変化情報の傾向を学習する学習手
段をさらに備えていてもよい。この場合、前記属性付加
手段は、さらに前記学習手段に蓄積された学習結果に従
って、前記文字列中の所定単位毎に表示属性を付加する
ものとすることができる。[0010] The speech recognition system may further include learning means for learning a tendency of the change information assigned by the change information assigning means to a specific one of the predetermined units in the character string. In this case, the attribute adding unit may further add a display attribute for each predetermined unit in the character string according to the learning result accumulated in the learning unit.

【００１１】上記のように構文解析手段や学習手段を設
けることによって、文字列中の不自然な位置に表示属性
が付加され、その所定単位が最終的に生成される文字列
中で不自然に強調されてしまうことがない。By providing the parsing means and the learning means as described above, a display attribute is added to an unnatural position in a character string, and a predetermined unit thereof is unnaturally generated in a finally generated character string. There is no emphasis.

【００１２】上記音声認識システムにおいて、前記表示
属性は、例えば、色属性或いはフォント属性とすること
ができる。In the above speech recognition system, the display attribute may be, for example, a color attribute or a font attribute.

【００１３】上記音声認識システムにおいて、表示属性
を付加する所定単位は、例えば、文字、単語、句、節、
文章などとすることができる。In the above speech recognition system, the predetermined unit to which the display attribute is added is, for example, a character, word, phrase, section,
It can be a sentence or the like.

【００１４】上記目的を達成するため、本発明の第２の
観点にかかる音声認識方法は、外部から入力された音声
情報を、文字列に変換する音声認識ステップと、前記音
声情報に基づいて前記音声情報が示す音声の変化を示す
変化情報を生成し、生成した変化情報を前記音声認識ス
テップで変換した文字列中の所定単位毎に割り当てる変
化情報割り当てステップと、前記変化情報割り当てステ
ップで割り当てた変化情報に従って、前記文字列中の所
定単位毎に表示属性を付加する属性付加ステップとを含
むことを特徴とする。In order to achieve the above object, a voice recognition method according to a second aspect of the present invention includes a voice recognition step of converting voice information input from the outside into a character string, and the voice recognition method based on the voice information. Change information indicating a change in the voice indicated by the voice information, and the generated change information is allocated in the change information allocating step of allocating each of the predetermined units in the character string converted in the voice recognition step; An attribute adding step of adding a display attribute for each predetermined unit in the character string according to the change information.

【００１５】上記目的を達成するため、本発明の第３の
観点にかかるコンピュータ読み取り可能な記録媒体は、
外部から入力された音声情報を、文字列に変換する音声
認識手段、前記音声情報に基づいて前記音声情報が示す
音声の変化を示す変化情報を生成し、生成した変化情報
を前記音声認識手段が変換した文字列中の所定単位毎に
割り当てる変化情報割り当て手段、及び前記変化情報割
り当て手段が割り当てた変化情報に従って、前記文字列
中の所定単位毎に表示属性を付加する属性付加手段とし
てコンピュータ装置を機能させるためのプログラムを記
録したことを特徴とする。[0015] To achieve the above object, a computer-readable recording medium according to a third aspect of the present invention comprises:
Speech recognition means for converting speech information input from the outside into a character string, generating change information indicating a change in speech indicated by the speech information based on the speech information, and generating the change information by the speech recognition means. A change information allocating means for allocating each predetermined unit in the converted character string, and a computer device as an attribute adding means for adding a display attribute for each predetermined unit in the character string according to the change information allocated by the change information allocating means. It is characterized by recording a program for functioning.

【００１６】[0016]

【発明の実施の形態】以下、添付図面を参照して、本発
明の実施の形態について説明する。Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１７】図１は、この実施の形態にかかる音声認識
システムの構成を示すブロック図である。図示するよう
に、この音声認識システムは、音声情報入力装置１と、
音声認識装置２と、記憶装置３と、表示データ作成装置
４と、表示装置５とを備えている。FIG. 1 is a block diagram showing the configuration of the speech recognition system according to this embodiment. As shown in the figure, the speech recognition system includes a speech information input device 1,
The apparatus includes a voice recognition device 2, a storage device 3, a display data creation device 4, and a display device 5.

【００１８】音声情報入力装置１は、ユーザの声（音
声）を有力するユーザインターフェース装置であり、ユ
ーザが発した音声をアナログ電気信号に変換するマイク
ロフォン１Ａと、マイクロフォン１Ａから出力されたア
ナログ電気信号を所定のサンプリング間隔でデジタルの
音声情報に変換するインターフェース１Ｂとから構成さ
れている。The voice information input device 1 is a user interface device for influencing a user's voice (voice), and includes a microphone 1A for converting a voice uttered by the user into an analog electrical signal, and an analog electrical signal output from the microphone 1A. Is converted into digital audio information at a predetermined sampling interval.

【００１９】音声認識装置２は、インターフェース１Ｂ
が変換したデジタルの音声情報から、各文字単位で抑揚
スコア（後述）の付された、１以上の文字からなる文字
列（テキスト）を生成するもので、音声認識部２Ａと、
抑揚スコア割り当て部２Ｂとを有する。The voice recognition device 2 has an interface 1B
Generates a character string (text) composed of one or more characters to which an inflection score (described later) is attached for each character from the converted digital voice information.
And an intonation score allocating unit 2B.

【００２０】音声認識部２Ａは、音声認識技術を適用す
ることにより、インターフェース１Ｂから入力された音
声情報をテキストに変換する。音声認識部２Ａが音声情
報をテキストへ変換するために適用する音声認識技術に
は、従来より音声認識及び仮名漢字変換の分野で知られ
ている種々の技術を用いることができる。The speech recognition section 2A converts speech information input from the interface 1B into text by applying a speech recognition technique. Various techniques conventionally known in the field of speech recognition and kana-kanji conversion can be used as the speech recognition technique applied by the speech recognition unit 2A to convert speech information into text.

【００２１】抑揚スコア割り当て部２Ｂは、インターフ
ェース１Ｂから入力された音声情報が示す音声のタイム
スタンプ（速度）、ボリューム（音量）及びピッチ（周
波数）の程度を示す抑揚スコアを作成し、これを音声認
識部２Ａが変換したテキスト中の各文字に割り当てる。The intonation score allocating unit 2B creates an intonation score indicating the degree of the time stamp (speed), volume (volume) and pitch (frequency) of the voice indicated by the voice information input from the interface 1B, and converts this into a voice. It is assigned to each character in the text converted by the recognition unit 2A.

【００２２】記憶装置３は、抑揚スコア割り当て部２Ｂ
が各文字に抑揚スコアを割り当てたテキストを一時的に
記憶する。The storage device 3 stores the intonation score assigning unit 2B.
Temporarily stores the text in which each character is assigned an intonation score.

【００２３】表示データ作成装置４は、記憶装置３に記
憶された抑揚スコアの割り当てられたテキストから表示
装置５に表示する表示データを作成するもので、スコア
標識作成部４Ａと、表示条件設定部４Ｂと、表示条件記
憶部４Ｃと、表示データ作成部４Ｄとを有する。The display data creation device 4 creates display data to be displayed on the display device 5 from the text assigned the intonation score stored in the storage device 3, and includes a score marker creation unit 4A and a display condition setting unit. 4B, a display condition storage unit 4C, and a display data creation unit 4D.

【００２４】スコア標識作成部４Ａは、記憶装置３に記
憶された情報を読み出し、抑揚スコア割り当て部２Ｂが
テキスト中の各文字に割り当てた抑揚スコアに基づい
て、各文字に色属性及びフォント属性といった表示属性
を割り当てる。The score marker creating section 4A reads the information stored in the storage device 3, and based on the intonation score assigned to each character in the text by the intonation score assigning section 2B, assigns each character a color attribute and a font attribute. Assign display attributes.

【００２５】表示条件設定部４Ｂは、ユーザの操作によ
り、テキストを表示装置５に表示するための表示条件を
設定する。表示条件記憶部４Ｃは、表示条件設定部４Ｂ
から設定した表示条件を記憶する。この表示条件は、表
示装置５上での行数、行内の文字数、デフォルトの文字
の色やフォントの種類、サイズなどの表示属性を含む。The display condition setting section 4B sets display conditions for displaying text on the display device 5 by a user operation. The display condition storage unit 4C includes a display condition setting unit 4B.
The display conditions set from are stored. The display conditions include display attributes such as the number of lines on the display device 5, the number of characters in the lines, the default character color, font type, and size.

【００２６】表示データ作成部４Ｄは、表示条件記憶部
４Ｃに記憶した表示条件に従って、スコア標識作成部４
Ａが各文字に表示属性を割り当てたテキストから、表示
装置５に表示するための表示データを作成する。The display data creating section 4D stores the score marker creating section 4 according to the display conditions stored in the display condition storing section 4C.
A creates display data to be displayed on the display device 5 from the text in which the display attribute is assigned to each character.

【００２７】表示装置５は、ＣＲＴ（Cathode Ray Tub
e）、ＬＣＤ（Liquid Crystal Display）等から構成さ
れ、表示データ作成部４Ｄが作成した表示データを表示
して、ユーザに示すユーザインターフェース装置であ
る。The display device 5 is a CRT (Cathode Ray Tub).
e), a user interface device composed of an LCD (Liquid Crystal Display) or the like, displaying the display data created by the display data creation unit 4D and presenting it to the user.

【００２８】なお、この音声認識システムは、例えば、
パーソナルコンピュータ等の汎用コンピュータで実現さ
れる。ここで、音声認識装置２及び表示データ作成装置
４は、プロセッサとそのプログラムを記憶するメモリ、
データを記憶するメモリによって実現されるもので、現
実には同一のコンピュータシステム上で実現されていて
もよい。これらの装置２、４を実現するためのプログラ
ムは、ＣＤ−ＲＯＭなどのコンピュータ読み取り可能な
記録媒体に記録して配布したり、ネットワークを通じて
配信してもよい。また、記憶装置３は、該汎用コンピュ
ータの外部記憶装置などに領域が設けられて実現され
る。This speech recognition system is, for example,
It is realized by a general-purpose computer such as a personal computer. Here, the speech recognition device 2 and the display data creation device 4 are a processor and a memory for storing its program,
It is realized by a memory for storing data, and may actually be realized on the same computer system. Programs for implementing these devices 2 and 4 may be recorded on a computer-readable recording medium such as a CD-ROM and distributed, or may be distributed via a network. The storage device 3 is realized by providing an area in an external storage device or the like of the general-purpose computer.

【００２９】以下、この実施の形態にかかる音声認識シ
ステムの動作について、図２のフローチャートを参照し
て説明する。Hereinafter, the operation of the speech recognition system according to this embodiment will be described with reference to the flowchart of FIG.

【００３０】まず、このシステムのユーザは、所定の操
作を行って音声入力の開始を指示する（ステップＳ１
１）。次に、当該ユーザは、マイクロフォン１Ａに向か
って生成したいテキストに対応する音声を発する。この
音声は、マイクロフォン１Ａがアナログ電気信号に変換
し、インターフェース１Ｂがデジタルの音声情報に変換
して、音声認識部２Ａに渡す（ステップＳ１２）。First, the user of the system performs a predetermined operation to instruct the start of voice input (step S1).
1). Next, the user utters a voice corresponding to the text to be generated toward the microphone 1A. The voice is converted into an analog electric signal by the microphone 1A, converted into digital voice information by the interface 1B, and passed to the voice recognition unit 2A (step S12).

【００３１】次に、音声認識部２Ａは、音声認識技術の
適用により、インターフェース１Ｂから渡された音声情
報を１以上の文字からなるテキストに変換する。音声認
識部２Ａは、変換したテキストを、音声情報から抽出し
た音声のタイムスタンプ（速度）、ボリューム（音量）
及びピッチ（周波数）を示す情報と共に、抑揚スコア割
り当て部２Ｂに渡す（ステップＳ１３）。Next, the speech recognition section 2A converts the speech information passed from the interface 1B into a text consisting of one or more characters by applying a speech recognition technique. The voice recognition unit 2A converts the converted text into a time stamp (speed) and a volume (volume) of the voice extracted from the voice information.
And the information indicating the pitch (frequency) is passed to the intonation score assigning unit 2B (step S13).

【００３２】次に、抑揚スコア割り当て部２Ｂは、タイ
ムスタンプ、ボリューム及びピッチの変化量を、テキス
ト中の各文字に抑揚スコアとして割り当てる。抑揚スコ
ア割り当て部２Ｂは、各文字に抑揚スコアを割り当てた
テキストを、記憶装置３に一時記憶する（ステップＳ１
４）。なお、この処理の詳細は後述する。Next, the intonation score assigning unit 2B assigns the amount of change in the time stamp, volume, and pitch to each character in the text as an intonation score. The intonation score assigning unit 2B temporarily stores the text in which the intonation score is assigned to each character in the storage device 3 (step S1).
4). The details of this process will be described later.

【００３３】次に、スコア標識作成部４Ａは、記憶装置
３に記憶された抑揚スコアに基づき、音声認識部２Ａが
変換したテキスト中の各文字に、色属性及びフォント属
性といった表示属性を割り当てる。スコア標識作成部４
Ａは、表示属性を割り当てたテキストを表示データ作成
部４Ｄに渡す（ステップＳ１５）。なお、この処理の詳
細は後述する。Next, based on the intonation score stored in the storage device 3, the score marker creating unit 4A assigns display attributes such as a color attribute and a font attribute to each character in the text converted by the speech recognition unit 2A. Score sign making unit 4
A transfers the text to which the display attribute has been assigned to the display data creation unit 4D (step S15). The details of this process will be described later.

【００３４】次に、表示データ作成部４Ｄは、表示条件
記憶部４Ｃに記憶された表示条件に従い、スコア標識作
成部４Ａから渡されたテキスト中の各文字に表示属性を
割り当てた表示データを作成する。表示データ作成部４
Ｄは、作成した表示データを表示装置５に渡す（ステッ
プＳ１６）。Next, the display data creation section 4D creates display data in which display attributes are assigned to each character in the text passed from the score marker creation section 4A in accordance with the display conditions stored in the display condition storage section 4C. I do. Display data creation unit 4
D passes the created display data to the display device 5 (step S16).

【００３５】そして、表示装置５は、表示データ作成部
４Ｄから渡された表示データを表示する（ステップＳ１
７）。そして、このフローチャートの処理を終了する。Then, the display device 5 displays the display data passed from the display data creation unit 4D (step S1).
7). Then, the process of this flowchart ends.

【００３６】図３は、図２のステップＳ１４において、
抑揚スコア割り当て部２Ｂが実行する処理を詳細に示す
フローチャートである。抑揚スコア割り当て部２Ｂは、
音声認識部２Ａから渡されたタイムスタンプに従って、
ユーザがマイクロフォン１Ａから入力した音声の速度の
変化量を計算する。計算した音声の速度の変化量は、内
部データＤ１として一時記憶される（ステップＳ２
１）。FIG. 3 shows that in step S14 of FIG.
It is a flowchart which shows the process which the intonation score allocation part 2B performs in detail. The intonation score assignment unit 2B
According to the time stamp passed from the voice recognition unit 2A,
The change amount of the speed of the voice input by the user from the microphone 1A is calculated. The calculated change amount of the voice speed is temporarily stored as the internal data D1 (step S2).
1).

【００３７】次に、抑揚スコア割り当て部２Ｂは、音声
認識部２Ａから渡されたボリューム及びピッチに従っ
て、ユーザがマイクロフォン１Ａから入力した音声のデ
ータスカラー量の相対変化量を計算する。ここで計算し
たデータスカラー量も、内部データＤ１として一時記憶
される（ステップＳ２２）。Next, the intonation score assigning unit 2B calculates the relative change of the data scalar amount of the voice input from the microphone 1A by the user according to the volume and the pitch passed from the voice recognition unit 2A. The data scalar amount calculated here is also temporarily stored as the internal data D1 (step S22).

【００３８】そして、抑揚スコア割り当て部２Ｂは、内
部データＤ１として一時記憶した音声の速度の変化量と
データスカラー量の相対変化量とについて、音声認識部
２Ａから渡されたテキスト中でどの文字に対応するもの
か、文字との組み合わせを判断する。そして、抑揚スコ
ア割り当て部２Ｂは、組み合わされるこれらの変化量を
抑揚スコアとしてテキスト中の各文字に割り当てる（ス
テップＳ２３）。そして、このフローチャートの処理を
終了する。The intonation score allocating unit 2B determines which character in the text passed from the voice recognition unit 2A is related to the variation in the speed of the voice and the relative variation in the data scalar, which are temporarily stored as the internal data D1. Judge whether it is a corresponding one or a combination with a character. Then, the intonation score assigning unit 2B assigns these combined changes to each character in the text as an intonation score (step S23). Then, the process of this flowchart ends.

【００３９】図４は、図２のステップＳ１５において、
スコア標識作成部４Ａが実行する処理を詳細に示すフロ
ーチャートである。この処理は、記憶装置３に記憶され
た抑揚スコアが付されたテキスト中の各文字について、
それぞれ行われる。すなわち、スコア標識作成部４Ａ
は、文字数だけこの処理を繰り返して実行する。FIG. 4 shows that in step S15 of FIG.
It is a flowchart which shows the process which the score marker preparation part 4A performs in detail. This processing is performed for each character in the text with the intonation score stored in the storage device 3.
Each is done. That is, the score marker creating unit 4A
Repeats this process for the number of characters.

【００４０】まず、スコア標識作成部４Ａは、当該文字
に付された抑揚スコアが予め設定された閾値Ｄ２を下回
っているかどうかを判定する（ステップＳ３１）。閾値
を下回っていると判定した場合は（ステップＳ３１；Ｙ
ＥＳ）、スコア標識作成部４Ａは、当該文字に関するこ
のフローチャートの処理を終了する。First, the score marker creating section 4A determines whether or not the intonation score given to the character is below a preset threshold D2 (step S31). If it is determined that it is below the threshold value (step S31; Y
ES), the score indicator creating unit 4A ends the processing of this flowchart for the character.

【００４１】一方、閾値を下回っていない、すなわち閾
値以上の抑揚スコアがあったと判定した場合は（ステッ
プＳ３１；ＮＯ）、スコア標識作成部４Ａは、当該文字
に色属性をマッピングするかどうかを対応する抑揚スコ
アに従って判断する（ステップＳ３２）。色属性をマッ
ピングすべきと判断した場合は（ステップＳ３２；ＹＥ
Ｓ）、スコア標識作成部４Ａは、当該文字に抑揚スコア
に応じた色属性をマッピングする（ステップＳ３３）。On the other hand, if it is determined that the score is not below the threshold, that is, if there is an inflection score equal to or greater than the threshold (step S31; NO), the score marker creating section 4A determines whether or not to map a color attribute to the character. It is determined according to the intonation score to be performed (step S32). If it is determined that the color attribute should be mapped (step S32; YE
S), the score marker creating unit 4A maps a color attribute corresponding to the intonation score to the character (step S33).

【００４２】色属性をマッピングすべきでないと判断し
た場合（ステップＳ３２；ＮＯ）、或いは色属性のマッ
ピングを終了した場合は、スコア標識作成部４Ａは、当
該文字にフォント属性（例えば、フォントサイズ）をマ
ッピングすべきであるかどうかを対応する抑揚スコアに
従って判断する（ステップＳ３４）。If it is determined that the color attribute should not be mapped (step S32; NO), or if the mapping of the color attribute has been completed, the score marker creating unit 4A applies the font attribute (for example, font size) to the character. Is to be mapped according to the corresponding intonation score (step S34).

【００４３】フォント属性をマッピングすべきでないと
判断した場合は（ステップＳ３４；ＮＯ）、スコア標識
作成部４Ａは、当該文字に関するこのフローチャートの
処理を終了する。一方、フォント属性をマッピングすべ
きであると判断した場合は（ステップＳ３４；ＹＥ
Ｓ）、スコア標識作成部４Ａは、当該文字に抑揚スコア
に応じたフォント属性をマッピングする（ステップＳ３
５）。そして、当該文字に関するこのフローチャートの
処理を終了する。If it is determined that the font attribute should not be mapped (step S34; NO), the score marker creating section 4A ends the processing of this flowchart for the character. On the other hand, when it is determined that the font attribute should be mapped (step S34; YE
S), the score marker creating unit 4A maps a font attribute corresponding to the intonation score to the character (step S3).
5). Then, the processing of this flowchart for the character is completed.

【００４４】以下、この実施の形態にかかる音声認識シ
ステムにおいて表示装置５に表示されるテキストを、具
体例に従って説明する。ここでは、説明を簡単にするた
め、スコア標識作成部４Ａは、表示属性としてフォント
属性のみを与えるものとする。また、ユーザは、「今日
はとても楽しかった。」、「おー、やられた。」という
２つの文をマイクロフォン１Ａから入力したものとす
る。Hereinafter, texts displayed on the display device 5 in the voice recognition system according to this embodiment will be described with reference to specific examples. Here, for simplicity of explanation, it is assumed that score marker creating section 4A gives only font attributes as display attributes. Further, it is assumed that the user has input two sentences, “I was very happy today” and “Oh, I was done.” From the microphone 1A.

【００４５】第１の文で、ユーザは、「とても」の部分
を強調して喋ったとする。すると、この３つの文字の抑
揚スコアは、他の部分よりも大きくなって閾値を越え
る。このため、スコア標識作成部４Ａは、デフォルトの
フォントサイズよりもフォントサイズを大きくするフォ
ント属性を、この３つの文字に付加する。これにより、
図５に示すように、「とても」の文字が他の部分よりも
大きく強調されて、表示装置５上に表示される。In the first sentence, it is assumed that the user speaks while emphasizing the "very" part. Then, the intonation score of these three characters becomes larger than the other parts and exceeds the threshold. Therefore, the score indicator creating unit 4A adds a font attribute for making the font size larger than the default font size to the three characters. This allows
As shown in FIG. 5, the word “very” is displayed on the display device 5 with being emphasized more than other portions.

【００４６】第２の文で、ユーザは、「おー」の部分
を、例えば、だんだん声を小さくして喋ったとする。す
ると、この部分での抑揚スコアは、徐々に小さくなって
いく。ここで、音声認識部２Ａが、この部分を「おおお
おおお」という６文字に変換したとすると、スコア標識
作成部４Ａは、フォントサイズが小さくなっていくよう
に、各文字にフォント属性を付加する。これにより、図
５に示すように、表示装置５上でフォントサイズが小さ
くなっていきながら表示され、音声が小さくなっていく
様子が表される。In the second sentence, it is assumed that the user speaks the portion of "O", for example, by gradually reducing the voice. Then, the intonation score in this part gradually decreases. Here, assuming that the voice recognition unit 2A converts this part into six characters “Ooooooo”, the score marker creating unit 4A adds a font attribute to each character so that the font size decreases. . As a result, as shown in FIG. 5, the font size is displayed on the display device 5 while the font size is decreasing, and the sound is decreasing.

【００４７】以上説明したように、この実施の形態にか
かる音声認識システムでは、音声認識部２Ａが変換した
テキスト中の各文字に、抑揚スコア割り当て部２Ｂが抑
揚スコアを割り当て、この抑揚スコアに基づいて、スコ
ア標識作成部４Ａが色属性及びフォント属性といった表
示属性を割り当てている。As described above, in the speech recognition system according to this embodiment, the intonation score assigning unit 2B assigns an intonation score to each character in the text converted by the speech recognition unit 2A, and based on this intonation score. Thus, the score marker creating section 4A assigns display attributes such as color attributes and font attributes.

【００４８】つまり、ユーザがマイクロフォン１Ａから
音声を入力することによって、表示属性の付されたテキ
ストを自動的に作成することができる。ここで、入力さ
れた音声中で抑揚が強調されている部分に対応する文字
は、文字色を変更したり、フォントサイズを大きくした
りすることによって、テキスト中でも強調されるように
することができる。このため、この実施の形態にかかる
音声認識システムによれば、音声認識による文書作成を
容易に行うことができる。That is, when a user inputs a voice from the microphone 1A, a text with a display attribute can be automatically created. Here, the character corresponding to the portion where the intonation is emphasized in the input voice can be emphasized even in the text by changing the character color or increasing the font size. . For this reason, according to the speech recognition system according to this embodiment, it is possible to easily create a document by speech recognition.

【００４９】本発明は、上記の実施の形態に限られず、
種々の変形、応用が可能である。以下、本発明に適用可
能な上記の実施の形態の変形態様について説明する。The present invention is not limited to the above embodiment,
Various modifications and applications are possible. Hereinafter, modifications of the above-described embodiment applicable to the present invention will be described.

【００５０】上記の実施の形態では、抑揚スコア割り当
て部２Ｂは、文字を単位として抑揚スコアを割り当て、
スコア標識作成部４Ａは、文字を単位として色情報など
の表示属性を付するものとしていた。しかしながら、単
語を単位として表示属性を付するものとしてもよい。さ
らには、文章中の句、節を単位として、或いは文章を単
位として表示属性を付するものとしてもよい。In the above embodiment, the intonation score assigning unit 2B assigns the intonation score in units of characters,
The score marker creating unit 4A attaches display attributes such as color information in units of characters. However, the display attribute may be added in units of words. Further, a display attribute may be added in units of a phrase or a section in a sentence or in units of a sentence.

【００５１】上記の実施の形態では、抑揚スコア割り当
て部２Ｂは、音声情報から抽出したタイムスタンプ、ボ
リューム及びピッチに基づいて抑揚スコアを作成し、テ
キスト中の各文字に割り当てていた。しかしながら、抑
揚スコア割り当て部２Ｂは、これらのうちの任意の１つ
以上を使用して、抑揚スコアを作成することができる。
また、これら３つ以外の音声の変化に基づいた変化情報
を作成し、この変化情報に従ってテキスト中の各文字に
表示属性を付加するものとしてもよい。In the above embodiment, the intonation score assigning unit 2B creates an intonation score based on the time stamp, volume, and pitch extracted from the audio information, and assigns the intonation score to each character in the text. However, the intonation score assigning unit 2B can create an intonation score using any one or more of these.
Alternatively, change information based on changes in voices other than these three may be created, and a display attribute may be added to each character in the text according to the change information.

【００５２】上記の実施の形態では、スコア標識作成部
４Ａは、テキスト中の各文字に、表示属性として色属性
及びフォント属性を割り当てていた。しかしながら、こ
れら以外の表示属性、例えば、ボールド、イタリック、
アンダーライン、アッパーライン、網掛け、背景色、上
付、下付、大文字への変換（文字がアルファベットであ
る場合）、文字の反転、回転等の文字飾り属性を各文字
に割り当ててもよい。また、フォント属性としてフォン
トのサイズだけでなく、フォントの種類（ゴシック、明
朝など）を割り当てるものとしてもよい。In the above-described embodiment, the score marker creating section 4A assigns a color attribute and a font attribute as display attributes to each character in the text. However, other display attributes, such as bold, italic,
Character decoration attributes such as underline, upper line, shading, background color, superscript, subscript, conversion to upper case (when the character is an alphabet), inversion and rotation of the character may be assigned to each character. Also, not only the font size but also the font type (Gothic, Mincho, etc.) may be assigned as the font attribute.

【００５３】上記の実施の形態では、スコア標識作成部
４Ａは、抑揚スコア割り当て部２Ｂが各文字に割り当て
た抑揚スコアにのみ基づいて、音声認識部２Ａが変換し
たテキスト中の各文字に色属性或いはフォント属性等の
表示属性を割り当てていた。しかしながら、スコア標識
作成部４Ａは、さらに他の情報に基づいて、各文字に表
示属性を割り当ててもよい。In the above embodiment, the score marker creating section 4A assigns a color attribute to each character in the text converted by the speech recognition section 2A based on only the intonation score assigned to each character by the intonation score assigning section 2B. Alternatively, display attributes such as font attributes are assigned. However, the score indicator creating unit 4A may assign a display attribute to each character based on other information.

【００５４】図６は、他の実施の形態にかかる音声認識
システムを示すブロック図である。図６の音声認識シス
テムは、図１の構成に加えて、さらに形態素解析手段６
を有する。形態素解析手段６は、音声認識部２Ａが音声
情報を変換して生成したテキストを形態素解析し、その
結果をスコア標識作成部４Ａに渡す。なお、形態素解析
手段６は、音声認識装置２および／または表示データ作
成装置４と同じコンピュータ装置上で実現されていても
よい。FIG. 6 is a block diagram showing a speech recognition system according to another embodiment. The speech recognition system of FIG. 6 has a morphological analysis unit 6 in addition to the configuration of FIG.
Having. The morphological analysis unit 6 morphologically analyzes the text generated by the speech recognition unit 2A by converting the speech information, and passes the result to the score marker creation unit 4A. The morphological analysis means 6 may be realized on the same computer as the speech recognition device 2 and / or the display data creation device 4.

【００５５】この場合において、スコア標識作成部４Ａ
は、さらに形態素解析手段６の解析結果に基づいて、テ
キスト中の各文字に表示属性を割り当てるものとするこ
とができる。このように音声認識部２Ａが生成したテキ
ストを文法的に解析することにより、例えば、助詞や助
動詞などの通常はあまり強調されない品詞が強調された
不自然なテキストが表示装置５に表示されないようにす
ることができる。In this case, the score marker creating section 4A
Can further assign a display attribute to each character in the text based on the analysis result of the morphological analysis means 6. By grammatically analyzing the text generated by the speech recognition unit 2A in this way, it is possible to prevent the display device 5 from displaying an unnatural text in which a part of speech that is not usually emphasized, such as a particle or an auxiliary verb, is emphasized. can do.

【００５６】また、図７は、さらに他の実施の形態にか
かる音声認識システムを示すブロック図である。図７の
音声認識システムは、図１の構成に加えて、さらに学習
手段７を有する。学習手段７は、指定した特定の単語に
ついての抑揚スコア割り当て部２Ｂによるスコアの割り
当ての傾向を学習し、学習結果をスコア標識作成部４Ａ
に渡す。FIG. 7 is a block diagram showing a voice recognition system according to still another embodiment. The speech recognition system of FIG. 7 further includes a learning unit 7 in addition to the configuration of FIG. The learning means 7 learns the tendency of score assignment by the intonation score assigning section 2B for the specified specific word, and reports the learning result to the score marker creating section 4A.
Pass to.

【００５７】この場合において、スコア標識作成部４Ａ
は、さらに学習手段７の学習結果に基づいて、テキスト
中の各文字に表示属性を割り当てるものとすることがで
きる。このように抑揚スコア割り当て部２Ｂが割り当て
た抑揚スコアの傾向を学習していくことにより、例え
ば、ユーザの言葉の訛りなどによって必ず強調されてし
まうような単語を、テキスト中では強調しないように
し、不自然なテキストが表示装置５に表示されないよう
にすることができる。In this case, the score marker creating section 4A
Can further assign a display attribute to each character in the text based on the learning result of the learning means 7. By learning the tendency of the intonation score assigned by the intonation score assigning unit 2B in this manner, for example, words that are always emphasized due to the accent of the user's words and the like are prevented from being emphasized in the text, Unnatural text can be prevented from being displayed on the display device 5.

【００５８】さらに、上記した音声認識システムは、ス
コア標識作成部４Ａがテキスト中の各文字に表示属性を
付するかどうかを制御する制御手段をさらに備えるもの
としてもよい。この場合、テキスト中の各文字に表示属
性を付するかどうかの指示を、ユーザが制御手段に与え
ることができるようにし、制御手段がこの指示に基づい
てスコア標識作成部４Ａを制御するものとしてもよい。Further, the above-described speech recognition system may further include control means for controlling whether or not the score marker creating section 4A attaches a display attribute to each character in the text. In this case, it is assumed that the user can give an instruction as to whether or not to add a display attribute to each character in the text to the control means, and the control means controls the score marker creating unit 4A based on the instruction. Is also good.

【００５９】上記の実施の形態では、音声情報を音声認
識部２Ａに入力するユーザインターフェース装置とし
て、マイクロフォン１Ａ及びインターフェース１Ｂから
なる音声情報入力装置１を適用していたが、本発明はこ
れに限られない。例えば、予めカセットテープなどに記
録された音声を、デジタルの音声情報に変換して音声認
識部２Ａに渡してもよい。また、予め音声をサンプリン
グした音声情報を記録した媒体から音声情報を読み取っ
て、音声認識部２Ａに入力してもよい。In the above embodiment, the voice information input device 1 including the microphone 1A and the interface 1B is applied as the user interface device for inputting voice information to the voice recognition unit 2A, but the present invention is not limited to this. I can't. For example, the voice previously recorded on a cassette tape or the like may be converted into digital voice information and passed to the voice recognition unit 2A. Alternatively, voice information may be read from a medium in which voice information in which voice has been sampled in advance is recorded and input to the voice recognition unit 2A.

【００６０】上記の実施の形態では、生成されたテキス
トを外部に出力するユーザインターフェース装置とし
て、表示装置５を適用していたが、本発明はこれに限ら
れない。例えば、テキストを紙などの媒体上に印刷して
出力するプリンタであってもよい。また、遠隔地の端末
装置にネットワークを介して送信し、該端末装置から出
力させる通信装置であってもよい。In the above embodiment, the display device 5 is applied as the user interface device for outputting the generated text to the outside, but the present invention is not limited to this. For example, a printer that prints and outputs text on a medium such as paper may be used. Further, the communication device may be a communication device that transmits the data to a remote terminal device via a network and outputs the data from the terminal device.

【００６１】[0061]

【発明の効果】以上説明したように、本発明によれば、
音声情報から変換された文字列に、音声の変化に応じた
表示属性が自動的に付加されることとなるので、音声認
識による文書の作成が容易になる。As described above, according to the present invention,
Since a display attribute corresponding to a change in the voice is automatically added to the character string converted from the voice information, it is easy to create a document by voice recognition.

[Brief description of the drawings]

【図１】本発明の実施の形態にかかる音声認識システム
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition system according to an embodiment of the present invention.

【図２】図１の音声認識システムが実行する処理を示す
フローチャートである。FIG. 2 is a flowchart showing a process executed by the voice recognition system of FIG. 1;

【図３】抑揚スコア割り当て部が実行する処理を詳細に
示すフローチャートである。FIG. 3 is a flowchart showing in detail a process executed by an intonation score assigning unit.

【図４】スコア標識作成部が実行する処理を詳細に示す
フローチャートである。FIG. 4 is a flowchart showing in detail a process executed by a score marker creating unit.

【図５】図１の音声認識システムにおいて、表示装置に
表示されるテキストの例を示す図である。FIG. 5 is a diagram showing an example of a text displayed on a display device in the voice recognition system of FIG. 1;

【図６】本発明の他の実施の形態にかかる音声認識シス
テムの構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a speech recognition system according to another embodiment of the present invention.

【図７】本発明の他の実施の形態にかかる音声認識シス
テムの構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a speech recognition system according to another embodiment of the present invention.

[Explanation of symbols]

１音声情報入力装置１Ａマイクロフォン１Ｂインターフェース２音声認識装置２Ａ音声認識部２Ｂ抑揚スコア割り当て部３記憶装置４表示データ作成装置４Ａスコア標識作成部４Ｂ表示条件設定部４Ｃ表示条件記憶部４Ｄ表示データ作成部５表示装置６形態素解析手段７学習手段 Reference Signs List 1 voice information input device 1A microphone 1B interface 2 voice recognition device 2A voice recognition unit 2B intonation score allocation unit 3 storage device 4 display data generation device 4A score marker generation unit 4B display condition setting unit 4C display condition storage unit 4D display data generation unit 5 display device 6 morphological analysis means 7 learning means

Claims

[Claims]

1. A speech recognition means for converting speech information input from the outside into a character string, and generating change information indicating a change in speech indicated by the speech information based on the speech information, and generating the generated change information. Change information assigning means for assigning a display attribute to each predetermined unit in the character string according to the change information assigned by the change information assigning means. A speech recognition system comprising an adding unit.

2. The change information allocating means generates a score based on at least one of speed, volume and frequency of a voice indicated by the voice information, and allocates the generated score to each predetermined unit in the character string. 2. The speech recognition system according to claim 1, wherein the attribute adding unit adds a display attribute for each predetermined unit in the character string based on the score generated by the change information allocating unit.

3. The attribute adding means, when the score generated by the change information allocating means exceeds a predetermined threshold,
3. The speech recognition system according to claim 2, wherein a display attribute is added to a predetermined unit in the corresponding character string.

4. The apparatus according to claim 1, further comprising: a parsing means for parsing the character string recognized by said voice recognition means, wherein said attribute adding means further comprises: for each predetermined unit in said character string in accordance with a result of parsing by said parsing means. The speech recognition system according to any one of claims 1 to 3, wherein a display attribute is added to the information.

5. The apparatus according to claim 1, further comprising: a learning unit configured to learn a tendency of the change information allocated by the change information allocating unit to a specific one of the predetermined units in the character string. The speech recognition system according to any one of claims 1 to 4, wherein a display attribute is added for each predetermined unit in the character string according to a learning result accumulated in a learning unit.

6. The speech recognition system according to claim 1, wherein the display attribute is a color attribute.

7. The speech recognition system according to claim 1, wherein said display attribute is a font attribute.

8. A voice recognition step of converting voice information input from the outside into a character string, and generating change information indicating a change in voice indicated by the voice information based on the voice information, and generating the generated change information. And a change information assigning step of assigning a display attribute for each predetermined unit in the character string according to the change information assigned in the change information assigning step. And a speech recognition method.

9. Voice recognition means for converting voice information input from the outside into a character string, generating change information indicating a change in voice indicated by the voice information based on the voice information, and generating the generated change information. Change information allocating means for allocating each predetermined unit in a character string converted by the voice recognition means, and attribute addition for adding a display attribute for each predetermined unit in the character string in accordance with the change information allocated by the change information allocating means A computer-readable recording medium on which a program for causing a computer device to function as means is recorded.