JP2005257905A

JP2005257905A - Cat's emotion discriminating device based on audio feature analysis of mewing

Info

Publication number: JP2005257905A
Application number: JP2004067532A
Authority: JP
Inventors: Tetsuji Kawakami; 哲二川上; Matsumi Suzuki; 松美鈴木
Original assignee: Takara Co Ltd
Current assignee: Takara Co Ltd
Priority date: 2004-03-10
Filing date: 2004-03-10
Publication date: 2005-09-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an emotion discriminating device which objectively discriminates the concrete emotion of a cat from mewing of the cat and to provide the emotion discriminating device whose discriminant result can be reflected on a game. <P>SOLUTION: The emotion discriminating device is equipped with; a conversion means 2 which converts the mewing of the cat into an electric audio signal; an input audio pattern extracting means 6 which extracts the feature of a relational map between a period and a frequency component of the audio signal as an input audio pattern; a reference audio pattern storage means 7 classified by emotion in which reference audio patterns classified by emotion which express for every various emotion features of relational maps between periods and frequency components in which the cat expresses characteristically various emotion as the mewing are stored; a comparison means 8 which compares the input audio pattern with the reference audio patterns classified by emotion; and a control means 10 which determines emotion whose correlation is the highest to the input audio pattern by the comparison and which displays the determined results on a picture display means 1. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、本発明は、音声的特徴分析に基づく感情判別装置に関し、より詳しくは鳴声の音声的特徴分析に基づく猫の感情判別装置に関するものである。 The present invention relates to an emotion discriminating device based on phonetic feature analysis, and more particularly to a cat emotion discriminating device based on voice feature analysis of crying.

従来、人間の心を癒すペットとして動物が飼われることが多い。特に犬や猫は、いつも飼い主のそばにいて、飼い主が動物の行動や鳴声から感情を判断し、現在動物が何を要求しているのかを飼い主なりに判断して対応していた。 Conventionally, animals are often kept as pets that heal human hearts. In particular, dogs and cats were always near their owners, who judged their emotions based on animal behavior and crying, and determined what the animals are currently requesting and responded.

そのため、動物とのコミュニケーションを確立することは永年の人類の夢と言っても過言ではなく、種々の試みがなされ、動物等の意志を翻訳する翻訳装置が提案されている（例えば、特許文献１）。 Therefore, it is not an exaggeration to say that establishing communication with animals is a long-standing human dream, and various attempts have been made, and translation devices that translate the will of animals and the like have been proposed (for example, Patent Document 1). ).

この翻訳装置はペットや家畜等の動物等が発声する音声を受信して音声信号にし、その動物の動作を映像として受信して映像信号にし、これらの音声信号及び映像信号を、動物行動学等であらかじめ分析された音声と動作のデータと比較することによって、動物の意志を翻訳するものである。
特開平１０−３４７９号公報 This translation device receives voices uttered by animals such as pets and livestock, and converts them into audio signals, receives the movements of the animals as videos and converts them into video signals, and converts these audio signals and video signals into animal behavioral science etc. The will of the animal is translated by comparing the voice and motion data analyzed in advance.
Japanese Patent Laid-Open No. 10-3479

本発明が解決しようとする問題点は、上述の翻訳装置によれば、動物の鳴声及び動作に基づいて、動物の意志を翻訳することが可能であると考えられるが、具体的な猫の感情に対応する具体的な音声及び動作のデータは開示されていないため、猫の具体的な感情と、その感情を有する猫が特徴的に発する鳴声との関係を明確に把握した上で、それらの感情に対応する基準音声パターンを設定し、猫の鳴声とその基準音声パターンとの比較による音声的特徴分析に基づいて猫の感情を客観的に判別することはできず、具体的な猫の感情を鳴声に基づいて客観的に判別することができない点であった。 The problem to be solved by the present invention is that, according to the translation device described above, it is considered possible to translate the animal's will based on the animal's voice and movement. Since specific voice and action data corresponding to emotions are not disclosed, after clearly grasping the relationship between the specific emotion of the cat and the singing characteristic of the cat with that emotion, It is not possible to objectively discriminate the cat's emotions based on the voice feature analysis based on the comparison between the cat's cry and the reference voice pattern. It was the point that the emotion of the cat could not be discriminated objectively based on the cry.

本発明は、上記問題点を解決し、猫の鳴声から具体的な猫の感情を客観的に判別する感情判別装置を提供するとともに、判別結果をゲームに反映することができる感情判別装置を提供することを目的とする。 The present invention provides an emotion discriminating apparatus that solves the above problems and objectively discriminates a specific cat's emotion from the cat's cry, and that can reflect the discrimination result in the game. The purpose is to provide.

前記課題を解決するために本発明に係る猫の鳴声の音声的特徴分析に基づく猫の感情判別装置は、入力した猫の音声を感情別基準音声パターンと対比し、対応する感情判別結果を画像及び音声で表示する猫の感情判別装置であって、猫の鳴声を電気的な音声信号に変換する変換手段と、上記音声信号の時間と周波数成分との関係マップの特徴を入力音声パターンとして抽出する入力音声パターン抽出手段と、猫が各種感情を鳴声として特徴的に表現する音声の時間と周波数成分との関係マップの特徴を当該各種感情ごとに表わす感情別基準音声パターンを記憶する感情別基準音声パターン記憶手段と、上記入力音声パターンを上記感情別基準音声パターンと比較する比較手段と、上記比較により、上記入力音声パターンに最も相関の高い感情を判断し、判断結果を画像表示手段に表示させる制御手段とを備える猫の鳴声の音声的特徴分析に基づく猫の感情判別装置において、上記感情別基準音声パターンを満足の感情、求愛の感情、威嚇の感情、自己表現の感情、喜びの感情、要求の感情に大別するとともに、満足の感情に対しては、０．０３秒間隔でパルス的に成分が分布し、特に５００Ｈｚ付近の成分が強く分布する基準音声パターン、求愛の感情に対しては、２５０Ｈｚ付近を基本波とし、６０００Ｈｚ付近までの高調波が現われ、特に４０００〜６０００Ｈｚにかけての成分が強く分布する基準音声パターン、威嚇の感情に対しては、０〜８０００Ｈｚまでの全体的に成分が分布し、ホワイトノイズ状の音声が０．８秒間程度持続する基準音声パターン、自己表現の感情に対しては、基本周波数が５００Ｈｚ程度で、鳴き始めてから０．２秒程度経過後にノイズ状成分が見られ、特に４０００Ｈｚ付近の成分が強い傾向を有する基準音声パターン、喜びの感情に対しては、基本周波数が４００〜６００Ｈｚで、３２００Ｈｚ付近まで高調波が分布し、基本周波数最大の部分で若干の上下動が見られる特徴を有する基準音声パターン、要求の感情に対しては、基本周波数７００Ｈｚ付近に存在し、特に２０００〜４０００Ｈｚ付近に強い成分が分布する基準音声パターンのうち、少なくとも何れかを含むことを特徴とする。 In order to solve the above problems, the cat emotion discrimination device based on the voice feature analysis of the cat's cry according to the present invention compares the input cat voice with the emotion-specific reference voice pattern, and displays the corresponding emotion discrimination result. An apparatus for discriminating an emotion of a cat displayed with images and sounds, characterized in that a conversion means for converting a cat's cry into an electric sound signal, and a feature of a relationship map between the time and frequency components of the sound signal are input sound patterns And an input voice pattern extracting means for extracting the emotion, and a reference voice pattern for each emotion representing the characteristics of the relationship map between the time and frequency components of the voice that the cat expresses various emotions characteristically as a cry Emotion-based reference voice pattern storage means, comparison means for comparing the input voice pattern with the emotion-specific reference voice pattern, and emotions having the highest correlation with the input voice pattern based on the comparison. In the cat emotion discrimination device based on the voice feature analysis of the cat's cry, comprising a control means for determining and displaying the determination result on the image display means, the emotion-based reference voice pattern is expressed as satisfaction emotion, courtship emotion, Intimate emotions, self-expression emotions, joy emotions, demand emotions, and for satisfaction, components are distributed in pulses at intervals of 0.03 seconds, especially those around 500 Hz. For a strongly distributed reference voice pattern and courtship emotions, a fundamental wave near 250 Hz appears, and harmonics up to around 6000 Hz appear, especially in a reference voice pattern and a threatening emotion with a strong distribution of components from 4000 to 6000 Hz. On the other hand, the component is distributed as a whole from 0 to 8000 Hz, and white noise-like sound lasts for about 0.8 seconds. Thus, a fundamental frequency is about 500 Hz, and a noise-like component is seen after about 0.2 seconds from the start of singing. Especially for a reference voice pattern having a strong tendency of a component around 4000 Hz, a feeling of joy, The basic frequency is 400 to 600 Hz, harmonics are distributed to around 3200 Hz, and a reference voice pattern having a characteristic that a slight vertical movement can be seen at the maximum fundamental frequency part. It is characterized by including at least one of the reference voice patterns that exist and particularly have a strong component distributed in the vicinity of 2000 to 4000 Hz.

また、前記猫の感情判別装置は、ゲームプログラムとゲームデータとを記憶した記憶手段と、ユーザーが操作する操作手段とを備え、前記制御手段は前記感情別基準音声パターン記憶手段に記憶されている基準音声パターンと入力音声パターン抽出手段で抽出した入力音声パターンとを比較した結果から判断した猫の感情と、上記操作手段を操作してユーザーが判断した猫の感情とから、ユーザーの判断の成否を判定し、判定結果を前記表示手段に表示させるように、単に猫の感情を判別するだけではなくゲーム性を備えるようにしてもよい。 In addition, the cat emotion discrimination device includes a storage unit that stores a game program and game data, and an operation unit that is operated by a user, and the control unit is stored in the emotion-specific reference voice pattern storage unit. The success or failure of the user's judgment based on the cat's emotion determined from the comparison result between the reference voice pattern and the input voice pattern extracted by the input voice pattern extracting means, and the cat's emotion determined by the user by operating the above operating means In order to display the determination result on the display means, it is possible not only to determine the emotion of the cat but also to have game characteristics.

請求項１の発明によれば、猫の鳴声を電気的な音声信号に変換し、音声信号の時間と周波数成分との関係マップの特徴を入力音声パターンとして抽出し、猫が各種感情を鳴声として特徴的に表現する音声の時間と周波数成分との関係マップの特徴を当該各種感情ごとに表わす感情別基準音声パターンを記憶し、入力音声パターンを感情別基準音声パターンと比較し、その比較により入力音声パターンに最も相関の高い感情を判別するようにしたので、その感情別基準音声パターンから猫の感情を客観的に判別することができる。 According to the first aspect of the present invention, the cat's cry is converted into an electric voice signal, the characteristics of the relationship map between the time and frequency components of the voice signal are extracted as the input voice pattern, and the cat makes various emotions. Stores a reference voice pattern for each emotion that expresses the characteristics of the relationship map between the time and frequency components of the voice that is characteristically expressed as a voice, compares the input voice pattern with a reference voice pattern for each emotion, and compares them Thus, the emotion having the highest correlation with the input voice pattern is discriminated, so that the cat's emotion can be objectively discriminated from the emotion-specific reference voice pattern.

請求項２の発明によれば、ゲームプログラムとゲームデータとを記憶した記憶手段と、ユーザーが操作する操作手段とを備え、制御手段は感情別基準音声パターン記憶手段に記憶されている基準音声パターンと入力音声パターン抽出手段で抽出した入力音声パターンとを比較した結果から判断した猫の感情と、上記操作手段を操作してユーザーが判断した猫の感情とから、ユーザーの判断の成否を判定し、判定結果を前記表示手段に表示させるようにしたので、単に猫の感情を判別するだけではなくユーザーが猫の感情をどの程度理解しているかをゲーム性を持って表示するので、単に猫の鳴声を判断する装置としてだけではなく猫の鳴声から猫の理解度を深めることができる。 According to the second aspect of the present invention, the reference sound pattern stored in the emotion-specific reference sound pattern storage means includes a storage means for storing the game program and the game data, and an operation means operated by the user. And the input voice pattern extracted by the input voice pattern extraction means, the cat's emotion determined from the result of the comparison, and the cat's emotion determined by the user by operating the operation means, to determine the success or failure of the user's judgment Since the determination result is displayed on the display means, not only the emotion of the cat is discriminated but also how much the user understands the emotion of the cat is displayed with game characteristics, Not only as a device for judging crying, it is possible to deepen the understanding of the cat from the crying of the cat.

入力した猫の音声を感情別基準音声パターンと対比し、対応する感情判別結果を画像及び音声で表示する猫の感情判別装置であって、猫の鳴声を電気的な音声信号に変換する変換手段と、上記音声信号の時間と周波数成分との関係マップの特徴を入力音声パターンとして抽出する入力音声パターン抽出手段と、猫が各種感情を鳴声として特徴的に表現する音声の時間と周波数成分との関係マップの特徴を当該各種感情ごとに表わす感情別基準音声パターンを記憶する感情別基準音声パターン記憶手段と、上記入力音声パターンを上記感情別基準音声パターンと比較する比較手段と、上記比較により、上記入力音声パターンに最も相関の高い感情を判断し、判断結果を画像表示手段に表示させる制御手段とを備え、猫の鳴声から猫の感情を客観的に判断できるようにした。 A cat emotion discriminator that compares the input cat voice with emotion-based reference voice patterns and displays the corresponding emotion discrimination results as images and voice, and converts the cat's cry into an electrical voice signal. Means, input voice pattern extracting means for extracting the characteristics of the relationship map between the time and frequency components of the voice signal as an input voice pattern, and the time and frequency components of the voice that the cat expresses various emotions characteristically as a cry A reference voice pattern storage means for storing emotions that express the characteristics of the relationship map for each of the various emotions, a comparison means for comparing the input voice patterns with the reference voice patterns for emotions, and the comparison Control means for determining the emotion most highly correlated with the input voice pattern and displaying the determination result on the image display means, and objectively assessing the cat's emotion from the cat's cry It was to be able to determine in.

図１は本発明に係る鳴声の音声的特徴分析に基づく猫の感情判別装置（以下、感情判別装置という）Ａを示し、この感情判別装置Ａは全体として猫の形態を模して形成され、顔の部分には画像表示手段である液晶ディスプレイ１と猫の鳴声を電気的な音声信号に変換する変換手段２の一部を構成するマイクロフォン３とが配置され、胴体部の前面にはユーザーが操作する操作手段５が配置されている。さらに、胴体部内部には上記音声信号の時間と周波数成分との関係マップの特徴を入力音声パターンとして抽出する入力音声パターン抽出手段６と、猫が各種感情を鳴声として特徴的に表現する音声の時間と周波数成分との関係マップの特徴を当該各種感情ごとに表わす感情別基準音声パターンを記憶する感情別基準音声パターン記憶手段７と、上記入力音声パターンを上記感情別基準音声パターンと比較する比較手段８と、上記入力音声パターンに最も相関の高い感情を判別する感情判別手段９とを備えた制御手段１０が配置され、感情判別手段９で判別した判別結果はＬＣＤドライバー１１を介して画像表示手段である液晶ディスプレイ１に表示されるようになっている。 FIG. 1 shows a cat emotion discriminating apparatus (hereinafter referred to as an emotion discriminating apparatus) A based on a voice characteristic analysis of a cry according to the present invention, and this emotion discriminating apparatus A is formed as a whole imitating the form of a cat. The face portion is provided with a liquid crystal display 1 as image display means and a microphone 3 constituting a part of conversion means 2 for converting a cat's cry into an electrical sound signal. An operation means 5 operated by a user is arranged. Furthermore, in the body part, the input voice pattern extracting means 6 for extracting the characteristics of the relationship map between the time and frequency components of the voice signal as an input voice pattern, and the voice that the cat expresses various emotions characteristically as a cry A reference voice pattern storage unit by emotion for storing a reference voice pattern for each emotion expressing the characteristics of the relationship map of time and frequency components for each of the various emotions, and the input voice pattern is compared with the reference voice pattern for each emotion. A control means 10 having a comparison means 8 and an emotion discrimination means 9 for discriminating an emotion having the highest correlation with the input voice pattern is arranged, and the discrimination result discriminated by the emotion discrimination means 9 is displayed via the LCD driver 11 as an image. It is displayed on the liquid crystal display 1 which is a display means.

なお、上記制御手段１０はＣＰＵと感情判別装置全体を制御する制御プログラム及び文字データ、画像データ、基準音声パターン等のデータを記憶したＲＯＭ、ワークエリア用ＲＡＭから構成されていればよい。 The control means 10 may be composed of a control program for controlling the CPU and the entire emotion discriminating apparatus, a ROM that stores data such as character data, image data, and a reference voice pattern, and a work area RAM.

変換手段２は、猫の鳴声である音声をデジタルの音声信号に変換するもので、マイクロフォン３及びＡ／Ｄコンバータ４から構成され、猫の鳴声はマイクロフォン３で受信して電気信号に変換し、その電気信号をＡ／Ｄコンバータ４でデジタル化した音声信号を生成するようになっている。 The conversion means 2 converts the sound of a cat's sound into a digital sound signal, and is composed of a microphone 3 and an A / D converter 4. The cat's sound is received by the microphone 3 and converted into an electrical signal. Then, an audio signal obtained by digitizing the electric signal by the A / D converter 4 is generated.

操作手段５は、画面に表示された選択項目の中から任意の項目を選択する際に使用するカーソルキー５ａ、キャンセルボタン５ｂ、設定した内容などを確定する決定ボタン５ｃ設定した内容を全て消去するリセットボタン５ｄとから構成されている。なお、キャンセルボタン５ｂは長押し（例えば、２秒）すると電源のＯＮ／ＯＦＦができるようになっている。 The operation means 5 erases all the set contents, the cursor key 5a used when selecting an arbitrary item from the selection items displayed on the screen, the cancel button 5b, and the decision button 5c for confirming the set contents. And a reset button 5d. The cancel button 5b can be turned ON / OFF by long pressing (for example, 2 seconds).

入力音声パターン抽出手段６は、音声信号から特徴的なパターンを抽出するもので、ＣＰＵ（ＤＳＰでもよい）で構成することができ、入力音声パターン抽出手段として動作するためのプログラムを記憶したＲＯＭ、ワークエリア用ＲＡＭなどを備えていればよい。 The input voice pattern extraction means 6 extracts a characteristic pattern from the voice signal, and can be composed of a CPU (or a DSP), and stores a ROM storing a program for operating as the input voice pattern extraction means, What is necessary is just to provide RAM for work areas.

ところで、音声パターンは、一般的に、音声信号の時間と周波数成分との関係マップの形で表現され、この関係マップは、音声の時間的な周波数分布を、横軸を時間、縦軸を周波数として表現したものであって、好適にはそれを一定の時間間隔及び一定の周波数間隔で分割した個々のグリッド内の音声エネルギーの分布の形で表わされる。このように関係マップを表現することによって、音声信号の包括的かつ定量的な取扱いが可能となる。 By the way, an audio pattern is generally expressed in the form of a relationship map between time and frequency components of an audio signal, and this relationship map shows the temporal frequency distribution of audio, with the horizontal axis representing time and the vertical axis representing frequency. Preferably expressed in the form of a distribution of speech energy in individual grids divided by a fixed time interval and a fixed frequency interval. By expressing the relationship map in this way, comprehensive and quantitative handling of the audio signal becomes possible.

感情別基準音声パターン記憶手段７は、各種感情に対応する基準音声パターンを記憶するもので、ＲＯＭで構成されていればよい。このＲＯＭは、書換え可能なＦＬＡＳＨＲＯＭでもよく、将来の基準音声パターンの更新、感情の数の追加などに対応して、データを書き換えられるようにしてもよい。 The emotion-specific reference sound pattern storage means 7 stores reference sound patterns corresponding to various emotions, and may be composed of a ROM. This ROM may be a rewritable FLASH ROM, and data may be rewritten in response to future update of the reference voice pattern, addition of the number of emotions, and the like.

基準音声パターンは、上記音声パターンと同様に、音声信号の時間と周波数成分との関係マップの形で表現される。関係マップは、音声の時間的な周波数分布を、横軸を時間、縦軸を周波数として表現したものであって、好適にはそれを一定の時間間隔及び一定の周波数間隔で分割した個々のグリッド内の音声エネルギーの分布の形で表わされる。また、基準音声パターンは、関係マップの普遍性の高い特徴的な部分に大きい重み付けをしたパターンとすることもできる。このようにすることによって、基準音声パターンと入力音声パターンとを比較する際に、多様な入力音声パターンであっても、その中に感情に対応した普遍性の高い特徴的な部分を有する限り、いずれかの感情に対応する基準音声パターンと対応づけることが可能になり、感情の判別の確実度を向上させることができる。 The reference voice pattern is expressed in the form of a relationship map between the time and frequency components of the voice signal, similarly to the voice pattern. The relationship map is a representation of the temporal frequency distribution of speech expressed as time on the horizontal axis and frequency on the vertical axis, preferably individual grids divided by a constant time interval and a constant frequency interval. It is expressed in the form of the distribution of voice energy. Further, the reference voice pattern can be a pattern in which a highly universal characteristic part of the relation map is heavily weighted. By doing this, when comparing the reference voice pattern and the input voice pattern, even if it is a variety of input voice patterns, as long as it has a highly universal characteristic part corresponding to emotion, It becomes possible to make it correspond with the reference | standard audio | voice pattern corresponding to any emotion, and the certainty of discrimination | determination of an emotion can be improved.

比較手段８は、入力音声パターンを感情別基準音声パターンと比較する構成要素である。比較手段８は、ＣＰＵ（ＤＳＰでもよい）で構成すればよく、ＣＰＵを比較手段８として動作させるプログラムを記憶したＲＯＭ、ワークエリア用ＲＡＭなどから構成される。比較は、特徴づけをしたパターンをハミング処理によりパターンマッチングする手法などによって行なうことができる。比較の結果は、相関の高低として出力される。 The comparison means 8 is a component that compares the input voice pattern with the emotion-specific reference voice pattern. The comparison means 8 may be constituted by a CPU (or a DSP), and is constituted by a ROM storing a program for operating the CPU as the comparison means 8, a work area RAM, and the like. The comparison can be performed by a technique of pattern matching the characterized pattern by a Hamming process. The result of the comparison is output as the level of correlation.

感情判別手段９は、比較手段８による入力音声パターンと感情別基準音声パターンの比較により、最も相関が高いと判断された基準音声パターンに対応する感情を、その猫の感情であると判別する構成要素である。感情判別手段９は、ＣＰＵ（ＤＳＰでもよい）で構成すればよく、ＣＰＵを感情判別手段９として動作させるプログラムを記憶したＲＯＭ、ワークエリア用ＲＡＭなどから構成される。 The emotion discriminating means 9 discriminates an emotion corresponding to the reference voice pattern determined to have the highest correlation by comparing the input voice pattern and the emotion-specific reference voice pattern by the comparing means 8 as the cat's emotion. Is an element. The emotion discriminating means 9 may be constituted by a CPU (or a DSP), and is constituted by a ROM storing a program for operating the CPU as the emotion discriminating means 9, a work area RAM, and the like.

なお、猫の各種感情に対応する基準音声パターンの作成にあたっては、各種の感情を有する時に猫の発する鳴声のデータを多数の猫について採取し、鳴声の採取時の猫の感情は、動物行動学に基づき、その時の猫の行動、態度で判断すればよく、採取された多数の鳴声のデータを感情別に分類し、それら感情別の鳴声のデータに共通する音声パターンを、その感情に対応する基準音声パターンとして定義すればよい。 In creating reference voice patterns corresponding to various emotions of cats, data on the utterances of cats when various emotions are collected for a large number of cats. Based on behavioral studies, it is sufficient to judge the behavior and attitude of the cat at that time. The data of many collected voices are classified according to emotions, and the voice patterns common to the voice data by those emotions are classified into the emotions. May be defined as a reference voice pattern corresponding to.

なお、この基準音声パターンは、上述のように普遍性の高い特徴的な部分に大きい重み付けをしたものとすることもできる。 In addition, this reference | standard audio | voice pattern can also be taken as what weighted big the characteristic part with high universality as mentioned above.

本実施例では、猫の基本的な感情として、「満足」、「求愛」、「威嚇」、「自己表現」、「喜び」、及び「要求」の６種類の感情を設定したので、以下、それぞれの感情に対する音声パターンの特徴を説明する。 In this embodiment, six types of emotions of “satisfaction”, “courting”, “intimidation”, “self-expression”, “joy”, and “request” are set as basic emotions of the cat. Explain the characteristics of the voice pattern for each emotion.

「満足」の感情に対しては、図３に示すように、０．０３秒間隔でパルス的に成分が分布し、特に５００Ｈｚ付近の成分が強く分布するような音声パターンが猫が満足したときに発する鳴声に対応する音声パターンで、「ゴロゴロ」のように聞こえることが多い。 For emotions of “satisfied”, as shown in FIG. 3, when the cat is satisfied with a voice pattern in which components are distributed in a pulsed manner at intervals of 0.03 seconds, particularly in the vicinity of 500 Hz. This is a sound pattern corresponding to the crying sound, and it often sounds like “gorologo”.

「求愛」の感情に対しては、図４に示すように、２５０Ｈｚ付近を基本波とし、６０００Ｈｚ付近までの高調波が現われ、特に４０００〜６０００Ｈｚにかけての成分が強く分布する音声パターンが猫が求愛するときに発する鳴声に対応する音声パターンで、「ギャワー」のように聞こえることが多い。 As shown in FIG. 4, for the emotion of “courtship”, a cat has a voice pattern in which a harmonic wave up to around 6000 Hz appears with a fundamental wave around 250 Hz, and particularly the component from 4000 to 6000 Hz is strongly distributed. The sound pattern that corresponds to the cry that utters is often heard like “Gawah”.

「威嚇」の感情に対しては、図５に示すように、０〜８０００Ｈｚまでの全体的に成分が分布し、ホワイトノイズ状の音声が０．８秒間程度持続する音声パターンが猫が威嚇するときに発する鳴声に対応する音声パターンで、「シャー」のように聞こえることが多い。 For the emotion of “intimidation”, as shown in FIG. 5, the cat is intimidated by an audio pattern in which components are distributed generally from 0 to 8000 Hz and white noise-like sound lasts for about 0.8 seconds. It often sounds like “sher” in an audio pattern that corresponds to the cry that is sometimes emitted.

「自己表現」の感情に対しては、図６に示すように、基本周波数が５００Ｈｚ程度で、鳴き始めてから０．２秒程度経過後にノイズ状成分が見られ、特に４０００Ｈｚ付近の成分が強い傾向を有する音声パターンが猫が自己表現をするときに発する鳴声に対応する
音声パターンで、「ニャオーン」のように聞こえることが多い。 For the emotion of “self-expression”, as shown in FIG. 6, the fundamental frequency is about 500 Hz, and a noise-like component is seen after about 0.2 seconds from the start of the sound, and particularly the component around 4000 Hz tends to be strong. A voice pattern corresponding to a voice uttered when a cat expresses itself is often heard as “Nyaon”.

「喜び」の感情に対しては、図７に示すように、基本周波数が４００〜６００Ｈｚで、３２００Ｈｚ付近まで高調波が分布し、基本周波数最大の部分で若干の上下動が見られる特徴を有する音声パターンが猫が喜びを表現するときに発する鳴声に対応する音声パターンで、「フニャー」のように聞こえることが多い。 For the emotion of “joy”, as shown in FIG. 7, the fundamental frequency is 400 to 600 Hz, the harmonics are distributed up to around 3200 Hz, and a slight up and down movement is observed at the maximum fundamental frequency portion. The voice pattern corresponds to the cry that is produced when a cat expresses joy, and often sounds like "Funa".

「要求」の感情に対しては、図８に示すように、基本周波数が４００〜６００Ｈｚで、３２００Ｈｚ付近まで高調波が分布し、基本周波数最大の部分で若干の上下動が見られる特徴を有する音声パターンが猫が要求を表現するときに発する鳴声に対応する音声パターンで、「ニャーオ」のように聞こえることが多い。 For the emotion of “request”, as shown in FIG. 8, the fundamental frequency is 400 to 600 Hz, the harmonics are distributed up to around 3200 Hz, and a slight up and down movement is seen at the maximum fundamental frequency portion. The voice pattern is a voice pattern that corresponds to the cry that the cat expresses a request, and often sounds like "Meow".

次に、上記構成の感情判別装置Ａの使用態様について図９〜図１２のフローチャート図に基づいて説明する。 Next, a usage mode of the emotion determination apparatus A having the above-described configuration will be described based on the flowcharts of FIGS.

電源をＯＮすると、ステップＳＴ１で初期設定が成されているか否か判断され初期設定されていなければ、ステップＳＴ２に進んで初期設定画面が表示され、猫の種類、名前、性別、性格を表示される画面からカーソルボタンで選択し逐次決定ボタンを押して初期設定を行なう。 When the power is turned on, it is determined in step ST1 whether or not the initial setting has been made. If the initial setting has not been made, the process proceeds to step ST2 where the initial setting screen is displayed and the cat type, name, gender and personality are displayed. Use the cursor buttons to select from the screen that appears, and press the ENTER button to perform initial settings.

初期設定を行なうとオープニングが表示され（ステップＳＴ３）、続いてメニュー画面が表示され（ステップＳＴ４）、翻訳モード、しぐさ翻訳モード、健康チェックモード、占いモード、パートナーモード等のタイトルが表示されるので、カーソルボタンでモードを選択し（ステップＳＴ５）、決定ボタンを押すと選択したモードが実行される（ステップＳＴ６）。 When initial settings are made, an opening is displayed (step ST3), and then a menu screen is displayed (step ST4). Titles such as translation mode, gesture translation mode, health check mode, fortune-telling mode, partner mode are displayed. The mode is selected with the cursor button (step ST5), and the selected mode is executed when the enter button is pressed (step ST6).

本実施例では、上記メニューの中から、猫の鳴声から判断した猫の感情に基づいてプログラムの進行を図る翻訳モードとパートナーモードとについて説明する。 In the present embodiment, a translation mode and a partner mode for promoting the program based on the cat's emotion determined from the cat's cry from the above menu will be described.

メニュー画面で、ネコ語翻訳をカーソルボタンで選択し、決定ボタンを押すと翻訳モードに進み、ステップＳＴ１０でタイトル画面を表示した後、ステップＳＴ１１に進んで音声待機状態に入る。音声を受信すると（ステップＳＴ１２）、ステップＳＴ１３に進んで音声解析を行なう。この音声解析では、変換手段２が受信した猫の鳴声をデジタルの電気的な音声信号に変換し、入力音声パターン抽出手段３が、変換された音声信号から特徴的な音声パターンを抽出し、抽出した音声パターンを関係マップの形でＲＡＭ上に展開する。 When the cat language translation is selected with the cursor button on the menu screen and the enter button is pressed, the process proceeds to the translation mode. After the title screen is displayed in step ST10, the process proceeds to step ST11 to enter a voice standby state. When the voice is received (step ST12), the process proceeds to step ST13 to perform voice analysis. In this voice analysis, the cat cry received by the conversion means 2 is converted into a digital electrical voice signal, and the input voice pattern extraction means 3 extracts a characteristic voice pattern from the converted voice signal. The extracted voice pattern is developed on the RAM in the form of a relation map.

次に、比較手段８が、感情別基準音声パターン記憶手段７に記憶されたそれぞれの感情に対応する基準音声パターンを読み出し、それをＲＡＭ上に展開された入力音声パターンと比較する、この比較は、特徴づけをしたパターンをハミング処理によりパターンマッチングする手法などを用いることができる。この比較により、入力音声パターンとそれぞれの感情との相関が数値化される。次に、感情判別手段９が、最も相関の数値が大きい感情を、その猫の感情として判別し、ステップＳＴ１４に進んで、判別した感情をテキストとしてディスプレイに表示（例えば、満足の感情の場合は「ハッピーだニャン」）した後、ステップＳＴ１５に進んで、その感情を表現する猫の表情（図１４（ａ）〜（ｃ））を繰り返し表示する。 Next, the comparison means 8 reads out the reference voice patterns corresponding to the respective emotions stored in the emotion-specific reference voice pattern storage means 7, and compares it with the input voice patterns developed on the RAM. For example, a method of pattern matching using a hamming process can be used. By this comparison, the correlation between the input voice pattern and each emotion is quantified. Next, the emotion discriminating means 9 discriminates the emotion having the highest correlation value as the cat's emotion and proceeds to step ST14 to display the discriminated emotion on the display as a text (for example, in the case of a satisfactory emotion). After “Happy Nyan”), the process proceeds to step ST15 to repeatedly display the cat's facial expression (FIGS. 14A to 14C) expressing the emotion.

ユーザーはディスプレイに表示されたネコの表情からネコの感情を判断することができる。翻訳モードを継続する場合は、ステップＳＴ１６で決定ボタンを押すとステップＳＴ１１に戻って再び音声待機状態に入り、キャンセルボタンを押すと、ステップＳＴ１７に進んで、「やめますか？」の問合せとともに「はい」、「いいえ」の選択メッセージがディスプレイに表示されるので、カーソルボタンで「はい」を選択して決定ボタンを押せば翻訳モードを終了し、メニュー画面（ステップＳＴ４）に戻り、改めてモードの選択を行なうことができ、「いいえ」を選択して決定ボタンを押せば、ステップＳＴ１１に戻って再び音声待機状態に入る。 The user can determine the cat's emotion from the cat's expression displayed on the display. To continue the translation mode, press the enter button in step ST16 to return to step ST11 and enter the voice standby state again. If the cancel button is pressed, the process proceeds to step ST17, along with an inquiry “Do you want to quit?” A selection message “Yes” or “No” is displayed on the display. Select “Yes” with the cursor button and press the Enter button to end the translation mode, return to the menu screen (step ST4), and change the mode again. Selection can be made, and if “NO” is selected and the OK button is pressed, the process returns to step ST11 to enter the voice standby state again.

メニュー画面で「パートナーモード」を選択し、決定ボタンを押すと「パートナーモード」に進み、ステップＳＴ２０でタイトル画面が表示された後、ステップＳＴ２１に進んで選択画面になるので「知ってるつもり？」「パートナー度」「最近のごきげん」の中から実行したいモードをカーソルボタンで選択する。 Select “Partner Mode” on the menu screen and press the Enter button to proceed to “Partner Mode”. After the title screen is displayed in Step ST20, the process proceeds to Step ST21 and becomes a selection screen. Use the cursor buttons to select the mode you want to execute from "Partner Degree" or "Recently Happy".

「知ってるつもり？」を選択した場合は、ステップＳＴ１０１に進んで、音声待機状態に入り、音声を受信すると（ステップＳＴ１０２）、ステップＳＴ１０３に進んで、上述の翻訳モードと同様に受信した音声の解析を行う。解析が完了すると、出題表示画面が表示される（ステップＳＴ１０４）。表示された感情の中から、今の猫の鳴声がどのような感情を表しているのかを判断しユーザーはカーソルボタンで選択し、決定ボタンで決定する（ステップＳＴ１０５）。ユーザーが判断した感情と、音声を分析して得られた感情とが一致しているか否かを判断し（ステップＳＴ１０６）、一致していればステップＳＴ１０７で正解のアニメーションを表示し、一致していなければステップＳＴ１０８で不正解のアニメーションを表示する。 If “I know you?” Is selected, the process proceeds to step ST101, enters the voice standby state, and receives the voice (step ST102). The process proceeds to step ST103, and the received voice is received in the same manner as in the translation mode described above. Perform analysis. When the analysis is completed, a question display screen is displayed (step ST104). From the displayed emotions, it is determined what emotion the current cat cry represents, and the user selects with the cursor button and determines with the decision button (step ST105). It is determined whether or not the emotion determined by the user and the emotion obtained by analyzing the voice match (step ST106). If they match, a correct animation is displayed in step ST107, and they match. If not, an incorrect animation is displayed in step ST108.

パートナーモードを継続する場合は、ステップＳＴ１０９で決定ボタンを押すとステップＳＴ２１に戻って再び選択画面が表示されるので、実行したいモードを選択すればよい。 If the partner mode is to be continued, pressing the enter button in step ST109 returns to step ST21 and the selection screen is displayed again. Therefore, the mode to be executed may be selected.

キャンセルボタンを押すと、ステップＳＴ１１０に進んで、「やめますか？」の問合せとともに「はい」、「いいえ」の選択メッセージがディスプレイに表示されるので、カーソルボタンで「はい」を選択して決定ボタンを押せばパートナーモードを終了し、メニュー画面（ステップＳＴ４）に戻り、改めてモードの選択を行なうことができ、「いいえ」を選択して決定ボタンを押せば、ステップＳＴ２１の選択画面に戻ってパートナーモードを再び実行することができる。 When the cancel button is pressed, the process proceeds to step ST110, and a message “Yes” or “No” is displayed on the display together with an inquiry “Do you want to stop?” Select “Yes” with the cursor button and confirm If the button is pressed, the partner mode is ended, the menu screen (step ST4) is returned, and the mode can be selected again. If “NO” is selected and the enter button is pressed, the screen returns to the selection screen of step ST21. Partner mode can be run again.

このルーチンを繰り返すことにより、正解、不正解がメモリに設定されたカウンターにカウントされるので正解率が判るようになっている。 By repeating this routine, correct answers and incorrect answers are counted in the counter set in the memory, so that the correct answer rate can be determined.

「パートナー度」を確認する場合は、ステップＳＴ２１の選択画面で「パートナー度」を選択し、決定ボタンを押すと、結果表が表示される（ステップＳＴ２０１）。この結果表には問題数（「知ってるつもり？」を実行した回数）、正解数、不正解数、正解率が一覧表示される（図１５（ａ）参照）。 When confirming “partner level”, selecting “partner level” on the selection screen of step ST21 and pressing the enter button displays a result table (step ST201). In this result table, the number of problems (the number of times “I'm going to know?”), The number of correct answers, the number of incorrect answers, and the correct answer rate are displayed in a list (see FIG. 15A).

確認の上、決定ボタンを押すと、ステップＳＴ２０２に進んで、正解率に対応したメッセージがテーブルから読み出され、ディスプレイにパートナー度（例えば、初期設定で猫の名前を「タマ」で登録した場合）が、図１５（ｂ）に示すように、「あなたとタマちゃんとの相性は」が表示された後、総括する結果がコメントで表示され（図１５（ｃ）参照）、引き続き、詳細な結果が表示（図５（ｄ）参照）されるので、確認の決定ボタンを押すと、ステップＳＴ１０９に戻って、パートナーモードを継続するのか終了するのかを選択することになる。 After confirming, when the enter button is pressed, the process proceeds to step ST202, a message corresponding to the accuracy rate is read from the table, and the partner degree (for example, the name of the cat is registered as “tama” by default in the display) ), As shown in FIG. 15 (b), after “The compatibility between you and Tama-chan” is displayed, the summary results are displayed in comments (see FIG. 15 (c)). Is displayed (see FIG. 5D), when the confirmation determination button is pressed, the process returns to step ST109 to select whether to continue or end the partner mode.

「最近のごきげん」を確認する場合は、ステップＳＴ２１の選択画面で「最近のごきげん」を選択し、決定ボタンを押すと、図１６（ａ）（ｄ）に示すような、画面が表示される（ステップＳＴ３０１）、この画面は過去１０回の音声判定が時系列でドットｄで表示されるもので、ドットｄ１〜ｄ１０が縦軸の上に上がるほど機嫌がよく、下に下がるほど機嫌が悪い状態が一覧できるようになっており、過去の音声判定を再確認することができるようになっている。再確認するためには、カーソルｃを左右に移動させて任意のドットｄを選択し、決定ボタンを押すと、ステップＳＴ３０２、３０３に進んで、コメントと猫の表情とを再確認できるようになっている。例えば、図１６（ａ）に示すように、カーソルｃを１番左に移動させると、ドットｄ１が選択されるので決定ボタンを押すと、図１６（ｂ）（ｃ）に示すように、機嫌のよいメッセージと表情が確認でき、カーソルｃを右に移動させると、図１６（ｄ）に示すように、ドットｄ２が選択されるので決定ボタンを押すと、図１６（ｅ）（ｆ）に示すように、機嫌の悪いメッセージと表情を確認することができる。確認後、決定ボタンを押すと、ステップＳＴ１０９に戻って、パートナーモードを継続するのか終了するのかを選択することになる。 When confirming “Recent Dinner”, select “Recent Dinner” on the selection screen in Step ST21 and press the Enter button to display a screen as shown in FIGS. 16 (a) and 16 (d). Is displayed (step ST301). In this screen, the past 10 voice determinations are displayed as dots d in time series, and the more the dots d1 to d10 rise above the vertical axis, the better the mood, and the lower it goes down. You can see a list of bad moods, and you can recheck past voice judgments. In order to reconfirm, move the cursor c left and right to select an arbitrary dot d and press the enter button. Then, the process proceeds to steps ST302 and 303, where the comment and the cat's facial expression can be reconfirmed. ing. For example, as shown in FIG. 16 (a), when the cursor c is moved to the leftmost position, the dot d1 is selected. When the enter button is pressed, as shown in FIGS. 16 (b) and 16 (c), When the cursor c is moved to the right as shown in FIG. 16 (d), the dot d2 is selected as shown in FIG. 16 (d). As you can see, you can see a bad message and expression. When the confirmation button is pressed after confirmation, the process returns to step ST109 to select whether to continue or end the partner mode.

上述のように、「翻訳モード」を選択し、猫の鳴声を検出すると、猫の感情が速やかに解析され画面にその感情が表示され、パートナーモードを選択し、猫の鳴声を検出するとともに、その鳴声がどのような感情であるのかをユーザーが判断すると、ユーザーの判断と鳴声の解析結果とから、ユーザーが猫の感情をどの程度理解しているのかが判断され、パートナーとしての猫の理解度が判定されるので、単に猫の鳴声からそのときの猫の感情が判るだけではなく、ユーザーの猫の理解度まで判定され、より深いパートナーシップを取ることができるようにするツールを提供することができる。 As described above, when “translation mode” is selected and the cat's cry is detected, the cat's emotion is quickly analyzed and displayed on the screen, the partner mode is selected and the cat's cry is detected. At the same time, when the user decides what kind of emotion the call is, the user's judgment and the analysis result of the call determine how much the user understands the cat's emotion, The degree of understanding of cats is judged, so not only can you understand the feelings of the cat at that time but also the understanding of the user's cats, so that you can have a deeper partnership Tools can be provided.

本発明の感情判別装置の一例を示す斜視図The perspective view which shows an example of the emotion discrimination device of this invention 上記感情判別装置の構成を示すブロック図The block diagram which shows the structure of the said emotion discrimination device 「満足」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．２秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relation map" showing the typical audio | voice pattern corresponding to the feeling of "satisfaction" (the horizontal scale is 0.2 second and the vertical scale is 500 Hz, (The relevant part is circled) 「求愛」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．１秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relation map" showing the typical audio | voice pattern corresponding to the emotion of "courting" (one scale of a horizontal axis is 0.1 second, and one scale of a vertical axis is 500 Hz, (The relevant part is circled) 「威嚇」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．１秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relation map" showing a typical voice pattern corresponding to the emotion of "intimidation" (one scale on the horizontal axis is 0.1 second, one scale on the vertical axis is 500 Hz, (The relevant part is circled) 「自己表現」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．０５秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relation map" showing the typical audio | voice pattern corresponding to the emotion of "self-expression" (the horizontal scale is 0.05 second, the vertical scale is 500 Hz, (Characteristic parts are circled) 「喜び」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．１秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relation map" showing the typical audio | voice pattern corresponding to the emotion of "joy" (one scale of a horizontal axis is 0.1 second, and one scale of a vertical axis is 500 Hz, (The relevant part is circled) 「要求」の感情に対応する典型的な音声パターンを表わす「時間−周波数成分関係マップ」の図である（横軸の一目盛は０．２秒、縦軸の一目盛は５００Hzであり、特徴的な部分は○で囲んでいる）It is a figure of the "time-frequency component relationship map" showing the typical audio | voice pattern corresponding to the emotion of "request" (The horizontal scale is 0.2 second and the vertical scale is 500 Hz, (The relevant part is circled) 感情判別装置の使用態様を説明するフローチャート図The flowchart figure explaining the usage condition of an emotion discrimination device 感情判別装置の使用態様を説明するフローチャート図The flowchart figure explaining the usage condition of an emotion discrimination device 感情判別装置の使用態様を説明するフローチャート図The flowchart figure explaining the usage condition of an emotion discrimination device 感情判別装置の使用態様を説明するフローチャート図The flowchart figure explaining the usage condition of an emotion discrimination device 感情判別装置の使用態様を説明するフローチャート図The flowchart figure explaining the usage condition of an emotion discrimination device （ａ）〜（ｃ）はディスプレイに表示される猫の感情の一例を示す表示画面(A)-(c) are the display screens which show an example of the emotion of the cat displayed on a display （ａ）〜（ｄ）はパートナーモードで表示される画面の一例を示す表示画面(A) to (d) are display screens showing examples of screens displayed in the partner mode. （ａ）〜（ｆ）はパートナーモードで表示される画面の一例を示す表示画面(A)-(f) is a display screen showing an example of a screen displayed in the partner mode

Explanation of symbols

１画像表示手段
２変換手段
５操作手段
６入力音声パターン抽出手段６
７感情別基準音声パターン記憶手段
８比較手段
９感情判別手段
１０制御手段

DESCRIPTION OF SYMBOLS 1 Image display means 2 Conversion means 5 Operation means 6 Input sound pattern extraction means 6
7 Reference voice pattern storage means by emotion 8 Comparison means 9 Emotion discrimination means 10 Control means

Claims

A cat emotion discriminating apparatus that compares an input cat voice with an emotion-specific reference voice pattern and displays a corresponding emotion discrimination result as an image and voice,
Conversion means for converting the cat's cry into an electrical audio signal;
Input voice pattern extraction means for extracting the characteristics of the relationship map between the time and frequency components of the voice signal as an input voice pattern;
An emotion-specific reference voice pattern storage means for storing an emotion-specific reference voice pattern that represents the characteristics of the relationship map between the time and frequency components of the voice that the cat expresses various emotions characteristically as a cry;
A comparing means for comparing the input voice pattern with the reference voice pattern by emotion;
In the cat emotion discriminating apparatus based on the voice feature analysis of the cat's singing comprising the control means for judging the emotion most highly correlated with the input voice pattern by the comparison and displaying the judgment result on the image display means.
The above-mentioned standard voice patterns by emotion are roughly divided into satisfaction emotion, courtship emotion, threatening emotion, self-expression emotion, joy emotion, demand emotion,
For satisfied emotions, components are distributed in a pulse-like manner at intervals of 0.03 seconds, and in particular, a reference voice pattern in which components near 500 Hz are strongly distributed. For courtship emotions, the fundamental wave is around 250 Hz and is 6000 Hz. Harmonic waves up to the vicinity appear, especially in the reference voice pattern in which the component from 4000 to 6000 Hz is strongly distributed. For threatening emotions, the entire component from 0 to 8000 Hz is distributed, and the white noise-like sound is 0. Standard voice pattern that lasts for about 8 seconds For self-expressing emotions, the fundamental frequency is about 500 Hz, and a noise-like component is seen after about 0.2 seconds from the beginning of the sound, especially components around 4000 Hz tend to be strong The basic frequency is 400 to 600 Hz, and harmonics are distributed up to around 3200 Hz for pleasure emotions. A reference voice pattern having a characteristic that a slight vertical movement is seen at a portion where the fundamental frequency is maximum. For a required emotion, a reference voice pattern which exists near a fundamental frequency of 700 Hz and particularly has a strong component distributed around 2000 to 4000 Hz. A cat emotion discriminating apparatus based on a voice feature analysis of a cat cry characterized by including at least one of them.

The cat emotion discriminating apparatus includes a storage unit storing a game program and game data, and an operation unit operated by a user, and the control unit stores a reference voice stored in the emotion-specific reference voice pattern storage unit. The pattern and the input voice pattern extracted by the input voice pattern extraction means are compared, and the user's judgment is determined based on the cat emotion determined from the comparison result and the cat emotion determined by the user by operating the operation means. The cat emotion discriminating apparatus according to claim 1, wherein success / failure is determined and a determination result is displayed on the display means.