JPH09106339A

JPH09106339A - Information processor and data storing method

Info

Publication number: JPH09106339A
Application number: JP7263001A
Authority: JP
Inventors: Masabumi Matsumura; 正文松村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-10-11
Filing date: 1995-10-11
Publication date: 1997-04-22

Abstract

PROBLEM TO BE SOLVED: To provide an information processor selecting only the data including the voice signal satisfying a desired condition at the time of an input processing and storing the data and data storing method therefor. SOLUTION: Various kinds of data including a voice signal is inputted by an input part 3 and various kinds of data including this inputted voice signal is held as temporary stored data 8 in a storage part 4. When an analyzing part 5 performs a prescribed analysis for the voice signal to be the temporary stored data 8 that this storage part 4 holds and this analysis result is matched with a preliminarily set condition, various kinds of data including this voice signal is stored in a recording file 11. Thus, data quantity to be stored can be remarkably reduced.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は音声や画像などの
各種データを取り込んで処理する情報処理装置および同
装置に適用されるデータ格納方法に係り、特に冗長かつ
膨大な量のデータの中から所望の条件を満たす音声信号
を含むデータのみを入力処理時に選別し、記憶するデー
タの容量を大幅に削減する情報処理装置およびデータ格
納方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus for fetching and processing various kinds of data such as voices and images and a data storage method applied to the same, and particularly to a desired one from a redundant and huge amount of data. The present invention relates to an information processing apparatus and a data storage method for selecting only data including an audio signal satisfying the condition (3) at the time of input processing and significantly reducing the amount of stored data.

【０００２】[0002]

【従来の技術】従来において、音声や画像を記録すると
いった場合、その一つとして、たとえば飛行機のフライ
トレコーダのように、常に音声を記録し続けるといった
方法が存在するが、この方法では、記録容量の制約によ
り、ある一定時間より前の記録を残すことができないと
いった問題があった。一方、この記録容量の制約を考慮
した方法としては、一部の防犯カメラのように、ある一
定の時間間隔で記録したり、赤外線センサで信号を検出
したときのみ記録するなどの方法が存在する。しかしな
がら、このような方法では、必ずしも所望する記録が残
されているとは限らないといった問題があった。2. Description of the Related Art Conventionally, when recording voice or images, as one of them, there is a method in which voice is always recorded like an airplane flight recorder. Due to the restriction, there was a problem that it was not possible to keep a record before a certain period of time. On the other hand, as a method in which the limitation of the recording capacity is taken into consideration, there is a method of recording at a fixed time interval like some security cameras, or recording only when a signal is detected by an infrared sensor. . However, such a method has a problem that the desired recording is not always left.

【０００３】また、従来では、音声信号を記録する際
に、予め必要な音声の性別や発声者などが自明であって
も、それらを選別して記録するといったことができなか
ったために、会話などを記録する際、必要としない音声
までをも記録せざるを得ず、記録データが冗長かつ膨大
な量になっていた。また、それらを後に検索、再生する
際にも、検索のキーが記録時間、すなわち時間軸しかな
かったために、検索および再生に多大な労力が必要であ
った。これは、画像についても同様である。すなわち、
画像に関しては、そのデータ量は音声よりも膨大とな
り、さらに画像を認識し記録するかどうかを判定するこ
とは高度な技術的を要するため、外部に赤外線センサな
どの検知装置を設けることにより、記録有無の選別をし
ていた。Further, in the past, when recording a voice signal, it was not possible to select and record the necessary voice even if the voice gender and the utterer were obvious in advance. When recording, there was no choice but to record even unnecessary voice, and the recorded data was redundant and enormous. Further, even when they are searched and reproduced later, since the search key is only the recording time, that is, the time axis, a great deal of labor is required for the search and reproduction. This also applies to images. That is,
Regarding images, the amount of data is much larger than that of voice, and it is highly technical to recognize whether or not to record an image and determine whether or not to record it. I was sorting out the presence or absence.

【０００４】[0004]

【発明が解決しようとする課題】このように、従来にお
いて音声や画像を記録するといった場合、予め必要な音
声の性別や発声者などが自明であっても、それらを選別
して記録するといったことができなかったために、すべ
てのデータを記録せざるを得ないといった問題があっ
た。しかしながら、このような方法では、記録データが
冗漫となりがちであり、記録方法として不十分なことは
明白である。As described above, in the case of recording a voice or an image in the related art, it is necessary to select and record the required voice even if the gender of the voice or the speaker is obvious. However, there was a problem that all data had to be recorded because it could not be done. However, it is obvious that such a method is insufficient as a recording method because the recorded data tends to be redundant.

【０００５】この発明はこのような実情に鑑みてなされ
たものであり、音声を記録する際には、その音声を解析
して記録するか否かの判定を行なうことにより記録する
データ量を大幅に削減させ、また、画像を記録する際に
は、同時に記録される音声を解析することによって画像
の記録有無を選別することができる情報処理装置および
データ格納方法を提供することを目的とする。The present invention has been made in view of the above circumstances, and when recording a voice, the amount of data to be recorded is greatly increased by analyzing the voice and determining whether or not to record the voice. It is also an object of the present invention to provide an information processing apparatus and a data storage method capable of selecting whether or not to record an image by analyzing audio that is simultaneously recorded when recording an image.

【０００６】[0006]

【課題を解決するための手段】この発明は、音声信号を
含む各種データを入力する入力手段と、前記入力手段に
より入力された音声信号を含む各種データを一時的に保
持するバッファと、前記音声信号を含む各種データを記
憶する二次記憶装置と、前記バッファに保持された音声
信号に対し所定の解析を施す解析手段と、前記解析手段
の解析結果が予め設定された条件と合致したときに、こ
の音声信号を含む各種データを前記二次記憶装置に格納
する格納手段とを具備してなることを特徴とする。According to the present invention, there is provided input means for inputting various data including a voice signal, a buffer for temporarily holding various data including the voice signal input by the input means, and the voice. When a secondary storage device that stores various data including signals, an analysis unit that performs a predetermined analysis on the audio signal held in the buffer, and an analysis result of the analysis unit matches a preset condition, Storage means for storing various data including the audio signal in the secondary storage device.

【０００７】また、この発明は、音声信号を含む各種デ
ータを入力する入力手段と、前記入力手段により入力さ
れた音声信号を含む各種データを一時的に保持するバッ
ファと、前記音声信号を含む各種データを記憶する二次
記憶装置と、前記バッファに保持された音声信号に対し
所定の解析を施す解析手段と、前記解析手段の解析結果
が予め設定された条件と合致したときに、その時点から
所定の期間遡って前記音声信号を含む各種データを前記
二次記憶装置に格納する格納手段とを具備してなること
を特徴とする。Also, the present invention provides an input means for inputting various data including a voice signal, a buffer for temporarily holding various data including the voice signal input by the input means, and a variety for including the voice signal. When a secondary storage device that stores data, an analysis unit that performs a predetermined analysis on the audio signal held in the buffer, and an analysis result of the analysis unit match a preset condition, from that point of time And a storage unit for storing various data including the audio signal in the secondary storage device for a predetermined period of time.

【０００８】また、この発明は、前記解析手段が、性
別、話者認識、声の高低ならびに大小、および単語認識
の少なくとも一つ以上の解析手段を有することを特徴と
する。また、この発明は、音声信号を含む各種データを
入力するステップと、この入力された音声信号を含む各
種データを一時的にバッファに格納するステップと、こ
の音声信号を含む各種データの前記バッファへの格納と
並行して、前記バッファに格納済みである音声信号を解
析するステップと、この解析結果が予め設定された条件
と合致したときに、この音声信号を含む各種データを二
次記憶装置に格納するステップとを具備してなることを
特徴とする。Further, the present invention is characterized in that the analyzing means has at least one or more of analyzing means such as sex, speaker recognition, voice pitch, loudness, and word recognition. Also, the present invention provides a step of inputting various data including an audio signal, a step of temporarily storing various data including the input audio signal in a buffer, and a step of storing various data including the audio signal in the buffer. In parallel with the storing of the voice signal, a step of analyzing the voice signal already stored in the buffer, and when the analysis result matches a preset condition, various data including the voice signal are stored in the secondary storage device. And a step of storing.

【０００９】この発明によれば、音声や画像などの各種
データをバッファに一時記憶データとして蓄えつつ、こ
のデータに含まれる音声信号を解析することによって、
その性別、発声者、発声者の声の高さや大きさ、および
単語などを判定し、判定結果が予め設定された条件に合
致したときに、時間軸に遡って一時記憶データとして蓄
えられたデータを記録する。According to the present invention, various data such as voice and images are stored in the buffer as temporary storage data, and the voice signal included in this data is analyzed,
Data that is stored as temporary memory data by tracing back to the time axis when the gender, speaker, pitch and loudness of the speaker's voice, words, etc. are judged and the judgment result matches the preset conditions. To record.

【００１０】すなわち、この入力処理時の記録データの
選別によって、データ量を大幅に削減することが可能と
なり、さらに取扱者が必要とする記録データを容易かつ
効率的に検索し再生することが可能となる。That is, the selection of the recorded data at the time of this input processing makes it possible to greatly reduce the amount of data, and it is possible to easily and efficiently retrieve and reproduce the recorded data required by the operator. Becomes

【００１１】[0011]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態を説明する。図１は同実施形態に係る情報処理
装置の機能構成を示すブロック図である。図１に示すよ
うに、同実施形態の情報処理装置１は、制御部２、入力
部３、格納部４、解析部５、検索部６、再生部７および
出力部１５の各処理部を具備してなる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a functional configuration of the information processing apparatus according to the embodiment. As shown in FIG. 1, the information processing apparatus 1 of the embodiment includes a control unit 2, an input unit 3, a storage unit 4, an analysis unit 5, a search unit 6, a reproduction unit 7, and an output unit 15. I will do it.

【００１２】制御部２は、情報処理装置１全体の制御を
司り、入力部３は、音声や画像などの各種信号を入力す
る。格納部４は、入力部３から入力された各種信号を一
時データ８として保持するとともに、発声者の音声の特
徴量に関する情報が登録された個人辞書ファイル９、単
語の音声的特徴量に関する情報が登録された単語辞書フ
ァイル１０、および同装置により選択された音声や画像
などの信号を記憶する記録ファイル１１（音声ファイル
１２，音声・画像ファイル１３，ログファイル１４）を
管理する。The control unit 2 controls the entire information processing apparatus 1, and the input unit 3 inputs various signals such as voice and images. The storage unit 4 holds various signals input from the input unit 3 as temporary data 8, and stores a personal dictionary file 9 in which information regarding the feature amount of the voice of the speaker is registered, and information regarding the voice feature amount of the word. It manages a registered word dictionary file 10 and a recording file 11 (voice file 12, voice / image file 13, log file 14) that stores signals such as voices and images selected by the device.

【００１３】解析部５は、格納部４により保持された一
時記憶データ８である音声信号を解析し、性別、発声者
（話者）、発声者の音声の特徴などを判定する。この解
析部５での判定原理の概略を以下に示す。（１）性別ピッチ（声帯の振動の基本周波数）などを算出し、この
ピッチが所定の閾値以上である音声信号を女性のものと
判定し、それ以外を男性のものと判定する。（２）発声者ＦＦＴ（高速フーリエ変換）、周波数ケプストラム、お
よびＰＡＲＣＯＲ係数などを算出し、個人辞書ファイル
９に登録された情報とのマッチングを行なうことにより
発声者を判定する。（３）声の高低および大小ピッチや音声信号の振幅により判定する。（４）音声（単語）ＦＦＴ、フィルタ分析、線形予測分析、ホルマント、ケ
プストラム係数、および隠れマルコフモデルなどの一般
的な音声解析・認識手法を用いて、単語辞書ファイル１
０に登録された情報とのマッチングを行なうことにより
音声を判定する。The analysis unit 5 analyzes the voice signal, which is the temporary storage data 8 held in the storage unit 4, and determines the gender, the speaker (speaker), the voice feature of the speaker, and the like. The outline of the determination principle in the analysis unit 5 is shown below. (1) Gender Pitch (fundamental frequency of vocal cord vibration) and the like are calculated, voice signals whose pitch is equal to or higher than a predetermined threshold are determined to be female, and other signals are determined to be male. (2) Speaker The speaker is determined by calculating FFT (Fast Fourier Transform), frequency cepstrum, PARCOR coefficient, and the like, and performing matching with the information registered in the personal dictionary file 9. (3) Judgment based on pitch of voice, pitch of voice, and amplitude of voice signal. (4) Speech (word) Word dictionary file 1 using general speech analysis / recognition techniques such as FFT, filter analysis, linear prediction analysis, formants, cepstrum coefficients, and hidden Markov models.
The voice is judged by matching with the information registered in 0.

【００１４】検索部６は、格納部４により管理された記
録ファイル１１から、時間、性別、発声者、および発声
者の音声に関する特徴量などをキーとして、対応する音
声ファイル１２や音声・画像ファイル１３を検索する。
再生部７は、検索された音声、画像ファイルを再生す
る。そして、出力部１５は、この情報処理装置に外部接
続される記録装置を駆動制御する信号と、音声や画像そ
のものの信号とを出力する。The search unit 6 uses the recording file 11 managed by the storage unit 4 as a key, with the time, sex, speaker, and the feature amount of the speaker's voice as a key, and the corresponding audio file 12 or voice / image file. Search for 13.
The reproduction unit 7 reproduces the searched audio and image files. Then, the output unit 15 outputs a signal for driving and controlling a recording device externally connected to the information processing device and a signal of a sound or an image itself.

【００１５】音声信号を含む各種信号は、入力部３から
順次入力され、一時記憶データ８として格納部４により
保持される。この保持している一時記憶データ８が所定
の量を超えたときは、時間的に一番古いものから順次消
去され新たな入力信号に置き換えられていく。また、こ
の保持動作と並列に、保持された入力信号に含まれる音
声信号が順次解析部５によって解析され、その音声の特
徴量が抽出される。したがって、この一時記憶されてい
る時間は、解析に要する時間よりも長いものとする必要
がある。ファイルとして記録される音声などは、たとえ
ば図２に示すように、記録すべき音声データにヘッダと
して音声情報をつけて記録したり（図２の（ａ））、音
声データは既存の方式（たとえばマイクロソフト社製造
販売のＷｉｎｄｏｗｓのｗａｖｅ形式）のままで保存し
て、音声情報をログファイルとして記録したり（図２の
（ｂ））などすればよい。Various signals including a voice signal are sequentially input from the input section 3 and held in the storage section 4 as temporary storage data 8. When the held temporary storage data 8 exceeds a predetermined amount, the oldest temporal data is sequentially erased and replaced with a new input signal. Further, in parallel with this holding operation, the voice signal included in the held input signal is sequentially analyzed by the analysis unit 5, and the feature amount of the voice is extracted. Therefore, the temporarily stored time needs to be longer than the time required for the analysis. For example, as shown in FIG. 2, the sound recorded as a file is recorded by adding sound information as a header to the sound data to be recorded ((a) in FIG. 2), or the sound data is recorded in an existing method (for example, It may be saved as it is as the Windows wave format manufactured and sold by Microsoft Corporation, and the voice information may be recorded as a log file ((b) in FIG. 2).

【００１６】また、この入力部３から入力された音声信
号と、記録される音声信号との時間的な関係は図３に示
す通りである。すなわち、入力部３から入力されてくる
音声を解析部５が解析して各種判定を施している間も、
常に音声信号は入力されている。そこで、解析部５が各
種判定に要する時間分、データを一時記憶データ８とし
て格納部４が保持し、判定結果と選択条件とが合致した
時のみ「時間を遡った」信号を記録ファイル１１として
記録する。そして、このとき記録ファイル１１に、記録
時間や性別などの音声情報を併せてヘッダとして記録す
る。The time relationship between the audio signal input from the input unit 3 and the recorded audio signal is as shown in FIG. That is, while the analysis unit 5 analyzes the voice input from the input unit 3 and makes various determinations,
The audio signal is always input. Therefore, the storage unit 4 holds the data as temporary storage data 8 for the time required for various determinations by the analysis unit 5, and only when the determination result and the selection condition match, the signal "back in time" is set as the recording file 11. Record. Then, at this time, audio information such as recording time and sex is also recorded in the recording file 11 as a header.

【００１７】ここで、図４を参照して入力した音声信号
を性別により記録するかどうか判定する際の動作手順を
説明する。入力部３から音声信号が入力されると（ステ
ップＡ１）、格納部４は、この音声信号を一時記憶デー
タ８として保持する（ステップＡ２）。一方、解析部５
は、この格納部４に保持された一時記憶データ８を解析
して性別を判定する（ステップＡ３，ステップＡ４）。
この判定結果が取扱い者の希望する性別と一致する場合
には（ステップＡ５のＹ）、記録時間や性別などを含む
音声情報ヘッダを作成し（ステップＡ６）、この一時記
憶データ８として格納部４に保持された音声信号と作成
したヘッダとを記録ファイル１１に格納する（ステップ
Ａ７）。そして、この処理を一時記憶データ８がなくな
るまで繰り返す（ステップＡ８）。The operation procedure for determining whether or not to record the input audio signal according to sex will be described with reference to FIG. When a voice signal is input from the input unit 3 (step A1), the storage unit 4 holds this voice signal as temporary storage data 8 (step A2). On the other hand, the analysis unit 5
Analyzes the temporary storage data 8 held in the storage unit 4 to determine the sex (step A3, step A4).
If this determination result matches the gender desired by the handler (Y in step A5), a voice information header including the recording time and gender is created (step A6), and the storage unit 4 stores this temporary storage data 8. The audio signal held in the above and the created header are stored in the recording file 11 (step A7). Then, this process is repeated until there is no more temporarily stored data 8 (step A8).

【００１８】これにより、取扱い者の希望する性別の音
声信号のみを入力処理時に選別して記憶するため、記憶
するデータ量の大幅な削減を図ることができ、検索、再
生などにおいても効率的なデータ処理を行なうことがで
きることとなる。As a result, since only the voice signals of the sex desired by the operator are selected and stored at the time of input processing, the amount of data to be stored can be greatly reduced, and the search and reproduction are efficient. Data processing can be performed.

【００１９】次に、図５を参照して入力した音声信号を
発声者により記録するかどうか判定する際の動作手順を
説明する。入力部３から音声信号が入力されると（ステ
ップＢ１）、格納部４は、この音声信号を一時記憶デー
タ８として保持する（ステップＢ２）。一方、解析部５
は、この格納部４に保持された一時記憶データ８を解析
し（ステップＢ３）、その音声データの特徴量を抽出す
る（ステップＢ４）。そして、この抽出した特徴量と、
記録すべき発声者として設定され予め個人辞書ファイル
９に登録された各人の音声の特徴量に関する情報とを比
較する（ステップＢ５）。Next, with reference to FIG. 5, an operation procedure for determining whether or not the input voice signal is to be recorded by the speaker will be described. When a voice signal is input from the input unit 3 (step B1), the storage unit 4 holds this voice signal as temporary storage data 8 (step B2). On the other hand, the analysis unit 5
Analyzes the temporary storage data 8 held in the storage unit 4 (step B3) and extracts the feature amount of the voice data (step B4). Then, with the extracted feature amount,
The information relating to the voice feature amount of each person, which is set as the speaker to be recorded and registered in the personal dictionary file 9 in advance, is compared (step B5).

【００２０】この抽出した音声データの特徴量が取扱い
者の希望する発声者のものと一致する場合には（ステッ
プＢ６のＹ）、記録時間や発声者などを含む音声情報ヘ
ッダを作成し（ステップＢ７）、この一時記憶データ８
として格納部４に保持された音声信号と作成したヘッダ
とを記録ファイル１１に格納する（ステップＢ８）。な
お、この発声者が発声を終えたかどうかは、入力部３か
ら入力される音声信号を解析部５が前述の手法で解析す
ることによって検出される。そして、この処理を一時記
憶データ８がなくなるまで繰り返す（ステップＢ９）。When the feature amount of the extracted voice data matches that of the speaker desired by the operator (Y in step B6), a voice information header including the recording time and the speaker is created (step S6). B7), this temporarily stored data 8
As a result, the audio signal held in the storage unit 4 and the created header are stored in the recording file 11 (step B8). Whether or not the speaker has finished speaking is detected by the analysis unit 5 analyzing the voice signal input from the input unit 3 by the above-described method. Then, this process is repeated until there is no more temporarily stored data 8 (step B9).

【００２１】これにより、取扱い者の所望する発声者の
音声信号のみを入力処理時に選別して記憶するため、前
述と同様に、記憶するデータ量の大幅な削減を図ること
ができ、検索、再生などにおいても効率的なデータ処理
を行なうことができることとなる。As a result, since only the voice signal of the utterer desired by the operator is selected and stored at the time of input processing, the amount of data to be stored can be greatly reduced as in the case described above, and retrieval and reproduction can be performed. Even in such cases, efficient data processing can be performed.

【００２２】次に、図６を参照して入力した音声信号を
声の高さや声の大きさにより記録するかどうか判定する
際の動作手順を説明する。この場合、まず最初の段階と
して、前述した手法により入力した音声信号の発声者を
同定する（ステップＣ５〜ステップＣ７）。そして、こ
の音声信号の発声者が、取扱者の希望する発声者と合致
したときに（ステップＣ８のＹ）、「話者一致フラグ」
を立てておく。これは発声者により声の高さや大きさに
個人差があるためである。もちろん、個人差に関係なく
一律の声の高さや大きさにより記録するかどうかを判定
する際には（ステップＣ３のＮ）、これらのステップは
不要である。また、これらの個人データは予め個人辞書
ファイル９に登録されているものとする。音声の記録
は、このフラグが立ち（発声者指定の場合）、かつ声の
高さや大きさが所望の閾値を越えた（もしくは下回っ
た）場合のみ行ない、声の高さや大きさが閾値を下回る
まで（もしくは越えるまで）、または発声者が変わるま
で続ける（ステップＣ９〜ステップＣ１１）。そして、
この処理を一時記憶データ８がなくなるまで繰り返す
（ステップＣ１２）。Next, with reference to FIG. 6, an operation procedure for determining whether or not the input voice signal is to be recorded based on the voice pitch and voice volume will be described. In this case, as the first step, the speaker of the voice signal input by the above-described method is identified (step C5 to step C7). When the speaker of this voice signal matches the speaker desired by the operator (Y in step C8), "speaker match flag".
Set up. This is because there is individual difference in pitch and loudness depending on the speaker. Of course, these steps are unnecessary when determining whether or not to record with a uniform voice pitch and volume regardless of individual differences (N in step C3). Further, it is assumed that these personal data are registered in the personal dictionary file 9 in advance. Voice recording is performed only when this flag is set (when the speaker is designated) and the voice pitch or loudness exceeds (or falls below) the desired threshold, and the voice pitch or loudness falls below the threshold. Until (or until it exceeds) or until the speaker changes (step C9 to step C11). And
This process is repeated until there is no more temporarily stored data 8 (step C12).

【００２３】これにより、取扱い者の所望する（特定ま
たは不特定の発声者の）声の大きさや高さをもつ音声信
号のみを入力処理時に選別して記憶するため、前述と同
様に、記憶するデータ量の大幅な削減を図ることがで
き、検索、再生などにおいても効率的なデータ処理を行
なうことができることとなる。As a result, only voice signals having the voice volume and pitch desired by the operator (specific or unspecified vocalist) are selected and stored at the time of input processing, and thus are stored in the same manner as described above. The amount of data can be greatly reduced, and efficient data processing can be performed even in search and reproduction.

【００２４】次に、図７を参照して入力した音声信号を
発声された単語により記録するかどうか判定する際の動
作手順を説明する。この場合、まず取扱い者により記録
の始点と終点とが設定される（ステップＤ１）。たとえ
ば、「どろぼう」という単語が認識される前を遡って５
分（始点）と、認識後３分（終点）との間を記録し続け
るなどである。Next, with reference to FIG. 7, an operation procedure for determining whether or not to record the input voice signal by the spoken word will be described. In this case, the operator first sets the recording start point and the recording end point (step D1). For example, going back before the word "Dorobo" was recognized, 5
Recording is continued between the minute (start point) and 3 minutes after recognition (end point).

【００２５】入力部３から音声信号が入力されると（ス
テップＤ２）、格納部４は、この音声信号を一時記憶デ
ータ８として保持する（ステップＤ３）。一方、解析部
５は、この格納部４に保持された一時記憶データ８を解
析し（ステップＤ４）、その音声データの特徴量を抽出
する（ステップＤ５）。そして、この抽出した特徴量
と、記録すべき単語として設定され予め単語辞書ファイ
ル１０に登録された各語の音声の特徴量に関する情報と
を比較する（ステップＤ６）。When a voice signal is input from the input unit 3 (step D2), the storage unit 4 holds this voice signal as temporary storage data 8 (step D3). On the other hand, the analysis unit 5 analyzes the temporary storage data 8 held in the storage unit 4 (step D4) and extracts the feature amount of the voice data (step D5). Then, the extracted feature amount is compared with the information about the feature amount of the voice of each word which is set as a word to be recorded and registered in the word dictionary file 10 in advance (step D6).

【００２６】この抽出した音声データの特徴量が取扱い
者の所望する単語のものと一致する場合には（ステップ
Ｄ７のＹ）、記録時間や始終点などを含む音声情報ヘッ
ダを作成し（ステップＤ８）、この一時記憶データ８と
して格納部４に保持された音声信号と作成したヘッダと
を記録ファイル１１に格納する（ステップＤ９）。そし
て、この処理を一時記憶データ８がなくなるまで繰り返
す（ステップＤ１０）。When the feature amount of the extracted voice data matches the word desired by the operator (Y in step D7), a voice information header including the recording time and the start and end points is created (step D8). ), The audio signal held in the storage unit 4 as the temporary storage data 8 and the created header are stored in the recording file 11 (step D9). Then, this process is repeated until there is no more temporarily stored data 8 (step D10).

【００２７】これにより、取扱い者の所望する単語が発
せられた前後の音声信号のみを入力処理時に選別して記
憶するため、前述と同様に、記憶するデータ量の大幅な
削減を図ることができ、検索、再生などにおいても効率
的なデータ処理を行なうことができることとなる。As a result, since only the voice signals before and after the word desired by the operator is uttered are selected and stored at the time of the input processing, the amount of data to be stored can be greatly reduced as in the above. It is possible to perform efficient data processing even in search, reproduction, and the like.

【００２８】次に、図８を参照して入力した音声信号を
性別および単語により記録するかどうか判定する際の動
作手順を説明する。この場合、まず、図４にて説明した
判定により音声信号が取扱者の所望する性別と合致した
ときに（ステップＥ５〜ステップＥ６のＹ）、「性別一
致フラグ」を立てる（ステップＥ７）。そして、音声の
記録は、このフラグがたち、かつ図７にて説明した判定
により音声信号中に所望の単語を認識して（ステップＥ
８〜ステップＥ１０のＹ）、「単語一致フラグ」を立て
たとき（ステップＥ１１）のみ行ない、取扱者が設定し
た記録の始点と終点の範囲、または性別が変わるまで続
ける。Next, referring to FIG. 8, an operation procedure for determining whether or not to record the input voice signal by sex and word will be described. In this case, first, when the voice signal matches the gender desired by the operator according to the determination described with reference to FIG. 4 (Y in steps E5 to E6), the "sex matching flag" is set (step E7). When recording the voice, the flag is set, and the desired word is recognized in the voice signal by the determination described in FIG. 7 (step E
8 to Y in step E10), and only when the "word matching flag" is set (step E11), and is continued until the recording start point and end point range set by the operator or the sex is changed.

【００２９】このとき、ファイルには記録時間、性別、
発声単語、設定した記録の始点、終点などの音声情報が
併せてヘッダとして記録される。これにより、さらに記
録するデータの絞り込みが行なえ、記憶するデータ量の
大幅な削減を図ることができる。At this time, the file has recording time, sex,
Voice information such as the uttered word, the set start point and end point of the recording is recorded together as a header. As a result, the data to be recorded can be further narrowed down, and the amount of data to be stored can be significantly reduced.

【００３０】次に、図９を参照して入力した音声信号を
発声者および単語により記録するかどうか判定する際の
動作手順を説明する。この場合、まず、図５にて示した
判定により音声信号が取扱者の希望する発声者のものと
合致したときに（ステップＦ６〜ステップＦ７のＹ）、
「話者一致フラグ」を立てる（ステップＦ８）。そし
て、音声の記録は、このフラグがたち、かつ図７にて説
明した判定により音声信号中に希望した単語を認識して
（ステップＦ９〜ステップＦ１０のＹ）、「単語一致フ
ラグ」を立てたとき（ステップＦ１１）のみ行ない、取
扱者が設定した記録の始点と終点の範囲、または発声者
が変わるまで続ける。Next, referring to FIG. 9, an operation procedure for determining whether or not to record the input voice signal by the speaker and the word will be described. In this case, first, when the voice signal matches that of the speaker desired by the operator according to the determination shown in FIG. 5 (Y in steps F6 to F7),
A "speaker match flag" is set (step F8). Then, when recording the voice, this flag is set, and the desired word in the voice signal is recognized by the determination described in FIG. 7 (Y in step F9 to step F10), and the "word matching flag" is set. Only at this time (step F11), the operation is continued until the recording start point and end point range set by the operator or the speaker changes.

【００３１】このとき、ファイルには記録時間、発声
者、発声単語、設定した記録の始点、終点などの音声情
報が併せてヘッダとして記録する。これにより、前述と
同様、さらなるデータの絞り込みが行なえ、記憶するデ
ータ量の大幅な削減を図ることができる。At this time, voice information such as recording time, utterer, uttered word, set start point and end point of recording is recorded together in the file as a header. As a result, similarly to the above, the data can be further narrowed down, and the amount of data to be stored can be significantly reduced.

【００３２】次に、図１０を参照して入力した音声信号
を任意の発声者の声の高さや声の大きさ、および単語に
より記録するかどうか判定する際の動作手順を説明す
る。この場合、まず、図５にて示した判定により音声信
号が取扱者の希望する発声者のものと合致し（ステップ
Ｇ９のＹ）、かつ図６にて説明した判定により取扱い者
の所望する声の大きさや高さを認識したときに（ステッ
プＧ１０のＹ）、「声の高さ／大きさフラグ」を立てる
（ステップＧ１１）。そして、音声の記録は、このフラ
グがたち、かつ図７にて示した判定により音声信号中に
希望した単語を認識して、「単語認識フラグ」を立てた
とき（ステップＧ１６）ときのみ行ない、取扱者が設定
した記録の始点と終点の範囲、発声者が変わる、また
は、声の高さや大きさが閾値を下回る（もしくは越え
る）まで続ける。Next, with reference to FIG. 10, an operation procedure for determining whether or not to record the input voice signal by the voice pitch and voice volume of an arbitrary utterer and a word will be described. In this case, first, the voice signal according to the determination shown in FIG. 5 matches that of the speaker desired by the operator (Y in step G9), and the voice desired by the operator is determined by the determination described in FIG. When the size and height of the voice are recognized (Y in step G10), a "voice pitch / volume flag" is set (step G11). Then, the voice recording is performed only when this flag is set and the desired word is recognized in the voice signal by the determination shown in FIG. 7 and the "word recognition flag" is set (step G16). Continue until the recording start point and end point range set by the operator, the speaker change, or the voice pitch or volume falls below (or exceeds) the threshold.

【００３３】このとき、ファイルには記録時間、発声
者、発声単語、設定した記録の始点、終点、声の高さや
大きさに関する音声情報が併せてヘッダとして記録され
る。これにより、前述と同様、さらなるデータの絞り込
みが行なえ、記憶するデータ量の大幅な削減を図ること
ができる。At this time, in the file, the recording time, the speaker, the uttered word, the set start and end points of the recording, and the voice information regarding the pitch and loudness of the voice are recorded together as a header. As a result, similarly to the above, the data can be further narrowed down, and the amount of data to be stored can be significantly reduced.

【００３４】このように、音声信号を記録する際、入力
処理時に即時的にその音声信号を解析して性別、発声
者、声の高低および大小、および音声（単語）などを判
別し、取扱い者の所望する音声のみを記録することによ
って、記憶するデータの量を大幅に削減することが可能
となり、さらに図１１に示すようなインタフェースを取
扱者に提供すれば、取扱者は音声を視覚的に識別でき、
各種ボタン２１〜２５を操作するのみで、必要とする記
録データを容易かつ効率的に検索し再生することが可能
となる。As described above, when a voice signal is recorded, the voice signal is immediately analyzed at the time of input processing to determine the sex, the speaker, the pitch of the voice, the size of the voice, the voice (word), etc. By recording only the desired voice, it is possible to significantly reduce the amount of data to be stored, and if the interface shown in FIG. 11 is provided to the operator, the operator can visually recognize the voice. Can be identified,
Only by operating the various buttons 21 to 25, it becomes possible to easily and efficiently retrieve and reproduce the required record data.

【００３５】なお、同実施形態では、音声データの取扱
いを主に説明したが、たとえば音声信号を含む画像デー
タなどについても、その音声信号を解析して画像データ
そのものの記録有無を判定するなどといったような適用
が可能である。In this embodiment, the handling of audio data has been mainly described. For image data including an audio signal, for example, the audio signal is analyzed to determine whether or not the image data itself is recorded. Such an application is possible.

【００３６】また、画像や音声をビデオカメラなどの外
部記録装置上で記録する場合、カメラから出力される音
声や画像信号を同装置の入力部３を介して入力し、前述
した手順により記録すべきデータを選別して、音声・画
像信号と外部記録装置を作動するための信号を出力部１
５より出力するなどといった適用も考えられる。多くの
場合、ビデオなどの外部記録装置は、アナログ信号によ
り音声・画像信号を記録するため、出力部１５にてデジ
タル／アナログ変換を行なって出力する。すなわち、必
要な部分のみを同装置の出力部１５を介してビデオカメ
ラに再入力することにより、消費するテープなどの記録
媒体の消費を抑えることができる。When an image or sound is recorded on an external recording device such as a video camera, the sound or image signal output from the camera is input through the input unit 3 of the same device and recorded according to the procedure described above. Data to be selected, and output the audio / image signal and the signal for operating the external recording device 1
Application such as outputting from 5 is also conceivable. In many cases, an external recording device such as a video device records an audio / image signal by an analog signal, so that the output unit 15 performs digital / analog conversion and outputs the result. That is, by re-inputting only a necessary portion into the video camera via the output unit 15 of the same device, consumption of a recording medium such as a consumed tape can be suppressed.

【００３７】[0037]

【発明の効果】以上詳述したように、本発明によれば、
音声を記録する際に、まずその音声を解析し、その性別
や発声者、もしくは発声した単語等を認識することによ
って記録するか否かの選別をし、その選別したもののみ
を記録する。これにより、記録データ量を大幅に削減す
ることができ、また、検索、再生時においても取扱者の
労力が軽減される。As described in detail above, according to the present invention,
When recording a voice, first, the voice is analyzed, and it is selected whether or not to record the voice by recognizing the gender, the speaker, or the spoken word, and only the selected voice is recorded. As a result, the amount of recorded data can be significantly reduced, and the labor of the operator is reduced during searching and reproduction.

【００３８】同様に、たとえば画像を記録する際にも同
時に入力される音声を解析することにより、前述のよう
に音声を解析することで記録するか否かの選別する。こ
れにより、画像を認識することなしに記録することが可
能となり、記録時のデータ量を大幅に削減することがで
き、また、検索、再生時においても取扱者の労力が軽減
される。Similarly, for example, when an image is recorded, the voice input at the same time is analyzed to analyze the voice as described above to select whether or not to record. As a result, the image can be recorded without recognizing the image, the amount of data at the time of recording can be significantly reduced, and the labor of the operator at the time of searching and reproducing can be reduced.

【００３９】また、ビデオカメラなどの外部記録装置を
用いて音声と画像とを記録する際は、画像と音声信号と
を本装置へ入力して音声を解析し、取扱者が記録を希望
する音声と画像とを選別した後に本装置が信号を出力す
ることで、外部記録装置上に記録される音声・画像量を
大幅に削減することができ、さらに、本装置上に記録さ
れたログが情報として残るために、検索、再生の際に取
扱者の労力が軽減される。When recording voice and images using an external recording device such as a video camera, the image and voice signals are input to this device to analyze the voice, and the voice the operator desires to record. This device outputs a signal after selecting the image and the image, so that the amount of audio / image recorded on the external recording device can be significantly reduced. Therefore, the labor of the operator is reduced when searching and reproducing.

[Brief description of the drawings]

【図１】本発明の実施形態に係る情報処理装置の概略構
成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of an information processing apparatus according to an embodiment of the present invention.

【図２】同実施形態に係る音声の記録方法を示す概念
図。FIG. 2 is a conceptual diagram showing a voice recording method according to the embodiment.

【図３】同実施形態に係る入力された音声信号と記録さ
れる音声信号との時間的な関係を示す図。FIG. 3 is a view showing a temporal relationship between an input audio signal and a recorded audio signal according to the same embodiment.

【図４】同実施例に係る入力した音声信号を性別により
記録するかどうか判定する際の動作手順を説明するフロ
ーチャート。FIG. 4 is a flowchart illustrating an operation procedure when determining whether or not to record an input audio signal according to the embodiment according to sex.

【図５】同実施形態に係る入力した音声信号を発声者に
より記録するかどうか判定する際の動作手順を説明する
フローチャート。FIG. 5 is a flowchart illustrating an operation procedure when determining whether or not an utterer records an input voice signal according to the embodiment.

【図６】同実施形態に係る入力した音声信号を声の高さ
や声の大きさにより記録するかどうか判定する際の動作
手順を説明するフローチャート。FIG. 6 is a flowchart illustrating an operation procedure for determining whether or not to record an input audio signal according to a voice pitch or a voice volume according to the embodiment.

【図７】同実施形態に係る入力した音声信号を発声され
た単語により記録するかどうか判定する際の動作手順を
説明するフローチャート。FIG. 7 is an exemplary flowchart illustrating an operation procedure for determining whether to record an input voice signal according to a uttered word according to the embodiment.

【図８】同実施形態に係る入力した音声信号を性別およ
び単語により記録するか否かを判定する際の動作手順を
説明するフローチャート。FIG. 8 is an exemplary flowchart for explaining an operation procedure when determining whether or not to record the input voice signal according to the embodiment by sex and words.

【図９】同実施形態に係る入力した音声信号を発声者お
よび単語により記録するかどうか判定する際の動作手順
を説明するフローチャート。FIG. 9 is a flowchart illustrating an operation procedure when determining whether to record an input voice signal according to the same embodiment by a speaker and a word.

【図１０】同実施形態に係る入力した音声信号を任意の
発声者の声の高さや声の大きさ、および単語により記録
するかどうか判定する際の動作手順を説明するフローチ
ャート。FIG. 10 is an exemplary flowchart for explaining an operation procedure for determining whether or not to record an input voice signal according to the same embodiment by a voice pitch and a voice volume of an arbitrary speaker and a word.

【図１１】同実施形態の情報処理装置が取扱者に提供す
るインタフェースを示す図。FIG. 11 is an exemplary view showing an interface provided to the operator by the information processing apparatus of the embodiment.

[Explanation of symbols]

１…情報処理装置、２…制御部、３…入力部、４…格納
部、５…解析部、６…検索部、７…再生部、８…一時記
憶データ、９…個人辞書ファイル、１０…単語辞書ファ
イル、１１…記録ファイル、１２…音声ファイル、１３
…音声・画像ファイル、１４…ログファイル、１５…出
力部。1 ... Information processing device, 2 ... Control part, 3 ... Input part, 4 ... Storage part, 5 ... Analysis part, 6 ... Search part, 7 ... Reproduction part, 8 ... Temporary storage data, 9 ... Personal dictionary file, 10 ... Word dictionary file, 11 ... Recording file, 12 ... Voice file, 13
... audio / image file, 14 ... log file, 15 ... output unit.

Claims

[Claims]

1. An input unit for inputting various data including an audio signal, a buffer for temporarily holding various data including the audio signal input by the input unit, and various data for storing the audio signal. A secondary storage device, an analysis unit that performs a predetermined analysis on the audio signal held in the buffer, and various data including the audio signal when the analysis result of the analysis unit matches a preset condition. An information processing apparatus, comprising: a storage unit that stores the data in the secondary storage device.

2. The storage means stores, when the analysis result of the analysis means matches a preset condition, various data including the audio signal in the secondary storage device by going back a predetermined period from that time. The information processing apparatus according to claim 1, wherein the information processing apparatus stores the information.

3. The information processing apparatus according to claim 1, wherein the analysis unit has at least one of a gender, a speaker recognition, a voice pitch, a voice level, and a word recognition. .

4. A step of inputting various data including an audio signal, a step of temporarily storing various data including the input audio signal in a buffer, and a step of storing various data including the audio signal into the buffer. In parallel with the storage, a step of analyzing the voice signal already stored in the buffer, and when the analysis result matches a preset condition,
Storing various data including the audio signal in a secondary storage device.