JP2007189398A

JP2007189398A - Voice recording device

Info

Publication number: JP2007189398A
Application number: JP2006004666A
Authority: JP
Inventors: Masahiro Hayashi; 正博林; Masanori Saito; 将徳斉藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2006-01-12
Filing date: 2006-01-12
Publication date: 2007-07-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recording device capable of recording the call conversation content which is judged to be needed to record on a user side, in a recording medium without troublesome operation. <P>SOLUTION: A buffer memory is provided for writing sequentially and storing voice data corresponds to the voice during call conversation, from the start of the call conversation. According to the record start command, the voice data read from the buffer memory are recorded in a recording medium while the vice data stored in the buffer memory are read out. According to the end of call, all the content of the buffer memory is re-written to an initial value. In this constitution, if the user performs the recording start operation after starting the call conversation, all the conversation content from the start of the call conversation can be recorded in the recording medium. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、電話通話中の音声を記録媒体に録音する音声録音装置に関する。 The present invention relates to a voice recording apparatus for recording voice during a telephone call on a recording medium.

近年、インターネットプロトコルを利用して電話機による音声通話を可能にした通信システムが構築されている。又、このような通信システムに用いられる電話機として、パーソナルコンピュータ（以下、ＰＣと称する）上においてソフトウェアにて構築された、いわゆるソフトフォンが知られている。更に、かかるソフトフォンとして、通話中の音声を録音する録音機能を備えたものが提案されている（例えば特許文献１参照）。 In recent years, communication systems have been constructed that enable voice calls using telephones using the Internet protocol. As a telephone used in such a communication system, a so-called soft phone constructed by software on a personal computer (hereinafter referred to as a PC) is known. Furthermore, as such a soft phone, one having a recording function for recording a voice during a call has been proposed (see, for example, Patent Document 1).

このような録音機能を有するソフトフォンにおいては、ユーザからの録音開始命令に応じて、通話中の音声を記録媒体、例えばハードディスク等に記録する。 In a softphone having such a recording function, a voice during a call is recorded on a recording medium such as a hard disk in response to a recording start command from the user.

しかしながら、通話が開始された後、その通話内容に基づきユーザが録音の必要性を感じてから録音を開始したのでは、録音の必要性のある会話を先頭部から全て録音することができないという問題が生じた。 However, after starting a call, if the user starts recording after feeling the need for recording based on the content of the call, it is not possible to record all conversations that need to be recorded from the beginning. Occurred.

そこで、常に通話開始と同時に録音を実施させるようにし、通話終了後に、録音された音声を再生することにより、ユーザ側においてその録音内容が必要であるか否かを判断させるという方法が考えられる。 Therefore, a method is conceivable in which recording is always performed simultaneously with the start of the call, and the recorded content is reproduced after the call is finished, so that the user can determine whether or not the recorded content is necessary.

ところが、このような方法を採用すると、録音時間が長い場合にもユーザはこれを聴取しなければならない。更に、ハードディスクの空き容量確保の観点から、不要な録音データをユーザ側において削除すべき操作が必要となり、その操作が煩わしいという問題が生じる。
特開２００５−０５７５１１号公報 However, if such a method is adopted, the user must listen to this even when the recording time is long. Furthermore, from the viewpoint of securing the free space of the hard disk, there is a problem that an operation to delete unnecessary recording data on the user side is necessary and the operation is troublesome.
JP 2005-057511 A

本発明は、煩わしい操作を行わずとも、ユーザ側において録音の必要性があると判断された通話内容を記録媒体に記録することが可能な音声録音装置を提供することを目的とするものである。 SUMMARY OF THE INVENTION An object of the present invention is to provide an audio recording apparatus capable of recording a call content, which is determined to be necessary for recording on the user side, on a recording medium without performing troublesome operations. .

本発明による音声録音装置は、通話中の音声に対応した音声データを記録媒体に録音する音声録音装置であって、前記通話の開始から前記音声データを順次書き込んで記憶するバッファメモリと、録音開始指令に応じて前記バッファメモリに記憶されている前記音声データを書き込まれた順に読み出しつつ当該バッファメモリから読み出された音声データを前記記録媒体に記録させると共に、前記通話の終了に応じて前記バッファメモリの記憶内容を全て所定の初期値に書き換える録音制御手段と、を備える。 A voice recording apparatus according to the present invention is a voice recording apparatus for recording voice data corresponding to voice during a call on a recording medium, a buffer memory for sequentially writing and storing the voice data from the start of the call, and a recording start The voice data read from the buffer memory is recorded on the recording medium while reading the voice data stored in the buffer memory in accordance with a command in the order of writing, and the buffer is recorded in response to the end of the call. Recording control means for rewriting all the stored contents of the memory to predetermined initial values.

本発明による音声録音装置によれば、通話開始後にユーザが録音開始操作を行った場合にも、通話開始時点からの全通話内容を記録媒体に記録することが可能となる。 According to the voice recording apparatus of the present invention, even when the user performs a recording start operation after the start of the call, it is possible to record all the call contents from the start of the call on the recording medium.

通話の開始から音声データを順次書き込んで記憶するバッファメモリを設け、録音開始指令に応じてこのバッファメモリに記憶されている音声データを読み出しつつ当該バッファメモリから読み出された音声データを記録媒体に記録させ、通話終了に応じてバッファメモリの内容を全て初期値に書き換える。 A buffer memory for sequentially writing and storing voice data from the start of the call is provided, and the voice data read from the buffer memory is read onto the recording medium while reading the voice data stored in the buffer memory in response to a recording start command. The data is recorded, and all the contents of the buffer memory are rewritten to the initial values when the call ends.

以下に、本発明の実施例について添付の図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

図１は、本発明による音声録音装置を含むソフトフォンが構築されているパーソナルコンピュータの構成の一部を示す図である。 FIG. 1 is a diagram showing a part of the configuration of a personal computer in which a softphone including a voice recording apparatus according to the present invention is constructed.

図１に示されるように、このパーソナルコンピュータ（以下、ＰＣと称する）には、ソフトウェアによるソフトフォン１が構築されている。 As shown in FIG. 1, in this personal computer (hereinafter referred to as a PC), a softphone 1 is built by software.

又、かかるＰＣには、電話機の受話器としてのヘッドセット２、電話回線としてのＩＰ（Internet Protocol）ネットワーク３、表示装置４、及びユーザからの各種操作を受け付ける操作装置５が接続されている。更にこのＰＣには、例えばＲＡＭ（Random Access Memory）等からなるバッファメモリ６、ハードディスク装置からなる録音メモリ７が搭載されている。 Also connected to the PC are a headset 2 as a telephone receiver, an IP (Internet Protocol) network 3 as a telephone line, a display device 4, and an operation device 5 for receiving various operations from the user. Further, this PC is equipped with a buffer memory 6 made of, for example, a RAM (Random Access Memory) and a recording memory 7 made of a hard disk device.

ヘッドセット２には、ユーザが発した音声を音声信号に変換するマイク、及び通話相手側からの音声信号を音響出力するスピーカが内蔵されている。 The headset 2 has a built-in microphone that converts voice uttered by the user into a voice signal and a speaker that acoustically outputs the voice signal from the other party.

操作装置５は、ユーザからの通話開始指令、電話番号の入力、呼出要求指令、通話終了指令、録音開始指令を受け付け、各指令に対応した信号をソフトフォン１に供給する。すなわち、ユーザが、ソフトフォン１による通話を開始するにあたり、操作装置５を用いて電話番号の入力及び呼出要求指令操作を行うと、操作装置５は、入力された一連の番号を示す電話番号信号と共に呼出要求指令信号をソフトフォン１に供給する。又、相手側からの呼出要求に応じて通話を開始すべく、ユーザが操作装置５を用いて通話開始指令操作を行うと、操作装置５は、通話開始指令信号をソフトフォン１に供給する。又、通話中の音声を録音すべく、ユーザが操作装置５を用いて録音開始指令操作を行うと、操作装置５は、録音開始指令信号をソフトフォン１に供給する。又、通話を終了させるべく、ユーザが操作装置５を用いて通話終了指令操作を行うと、操作装置５は、通話終了指令信号をソフトフォン１に供給する。 The controller device 5 receives a call start command, a telephone number input, a call request command, a call end command, and a recording start command from the user, and supplies signals corresponding to the commands to the softphone 1. That is, when a user performs a telephone number input and a call request command operation using the operation device 5 when starting a call using the softphone 1, the operation device 5 displays a telephone number signal indicating a series of input numbers. At the same time, a call request command signal is supplied to the softphone 1. When the user performs a call start command operation using the operation device 5 to start a call in response to a call request from the other party, the operation device 5 supplies a call start command signal to the softphone 1. In addition, when the user performs a recording start command operation using the operation device 5 in order to record a voice during a call, the operation device 5 supplies a recording start command signal to the softphone 1. When the user performs a call end command operation using the operation device 5 in order to end the call, the operation device 5 supplies a call end command signal to the softphone 1.

ソフトフォン１は、ソフトウェアにて構築された電話送受信処理部１１、録音制御部１２、音声テキスト化処理部１３及び通話内容モニタ画像生成部１４ら構成されている。 The softphone 1 includes a telephone transmission / reception processing unit 11, a recording control unit 12, a voice text conversion processing unit 13, and a call content monitor image generation unit 14 constructed by software.

電話送受信処理部１１は、ＩＰネットワーク３を介して呼出要求パケットを受信した場合には、この呼出要求パケットにて示される送信元（通話相手）から呼出要求があった旨をユーザ側に知らせるべき画像信号を表示装置４に供給する。ここで、ユーザが操作装置５にて通話開始指令操作を行うと、操作装置５は、通話開始指令信号を電話送受信処理部１１及び録音制御部１２に供給する。そして、かかる通話開始指令操作の実行後、ユーザは、ヘッドセット２を用いて通話相手との会話を開始する。この間、電話送受信処理部１１は、ヘッドセット２から音声信号が供給されたか否かの判定を行い、音声信号が供給されたと判定された場合には、かかる音声信号を音声データとしての音声パケットに変換し、これをＩＰネットワーク３上に送出する。これにより、ユーザが、ヘッドセット２のマイクに向けて発した音声に対応した音声パケットがＩＰネットワーク３を介して上記の如き呼出要求を行った送信元（通話相手）に送信される。更に、電話送受信処理部１１は、上述した如きユーザが発した音声に対応した音声パケットを送信音声パケットＰ_TXとして、バッファメモリ６、録音メモリ７、録音制御部１２、音声テキスト化処理部１３及び通話内容モニタ画像生成部１４に夫々供給する。 When the call transmission / reception processing unit 11 receives a call request packet via the IP network 3, the telephone transmission / reception processing unit 11 should inform the user that there is a call request from the transmission source (calling party) indicated by the call request packet. An image signal is supplied to the display device 4. Here, when the user performs a call start command operation on the operation device 5, the operation device 5 supplies a call start command signal to the telephone transmission / reception processing unit 11 and the recording control unit 12. Then, after executing the call start command operation, the user uses the headset 2 to start a conversation with the call partner. During this time, the telephone transmission / reception processing unit 11 determines whether or not an audio signal is supplied from the headset 2, and if it is determined that the audio signal is supplied, the audio signal is converted into an audio packet as audio data. This is converted and sent out on the IP network 3. As a result, the voice packet corresponding to the voice uttered by the user toward the microphone of the headset 2 is transmitted via the IP network 3 to the transmission source (calling party) that made the call request as described above. Further, the telephone transmission / reception processing unit 11 uses the voice packet corresponding to the voice uttered by the user as described above as the transmission voice packet P _TX , and uses the buffer memory 6, the recording memory 7, the recording control unit 12, the voice text conversion processing unit 13, and the like. The call contents are supplied to the monitor image generation unit 14 respectively.

又、ユーザが、操作装置５を用いて通話相手先の加入者電話番号を入力すると共に呼出要求指令操作を行うと、操作装置５は、かかる加入者電話番号を示す電話番号信号と共に呼出要求指令信号を電話送受信処理部１１及び録音制御部１２に供給する。これら電話番号信号及び呼出要求指令信号に応じて、電話送受信処理部１１は、この電話番号に対応した送信先アドレス情報を含む呼出要求パケットを生成し、これをＩＰネットワーク３に送出する。その後、ＩＰネットワーク３を介して音声データとしての音声パケットを受信した場合、電話送受信処理部１１は、受信した音声パケットを音声信号に復調し、これをヘッドセット２に供給する。これにより、通話相手先から送信されてきた音声パケットに基づく音声がヘッドセット２のスピーカから音響出力される。更に、電話送受信処理部１１は、上記の如く受信した音声パケットを受信音声パケットＰ_RXとして、バッファメモリ６、録音メモリ７、録音制御部１２、音声テキスト化処理部１３及び通話内容モニタ画像生成回路１４に夫々供給する。 When the user inputs the subscriber telephone number of the other party using the operation device 5 and performs a call request command operation, the operation device 5 calls the call request command together with a telephone number signal indicating the subscriber phone number. The signal is supplied to the telephone transmission / reception processing unit 11 and the recording control unit 12. In response to the telephone number signal and the call request command signal, the telephone transmission / reception processing unit 11 generates a call request packet including transmission destination address information corresponding to the telephone number and sends it to the IP network 3. Thereafter, when a voice packet as voice data is received via the IP network 3, the telephone transmission / reception processing unit 11 demodulates the received voice packet into a voice signal and supplies this to the headset 2. Thereby, the sound based on the voice packet transmitted from the call partner is acoustically output from the speaker of the headset 2. Further, the telephone transmission / reception processing unit 11 converts the received voice packet as the received voice packet _{PRX into} the buffer memory 6, the recording memory 7, the recording control unit 12, the voice text conversion processing unit 13, and the call content monitor image generation circuit. 14 respectively.

バッファメモリ６は、録音制御部１２から供給された書込信号ＷＲ１に応じて、電話送受信処理部１１から供給された上記受信音声パケットＰ_RX及び送信音声パケットＰ_TXを順次書き込んで記憶する。この間、録音制御部１２から読出信号ＲＤが供給された場合には、バッファメモリ６は、記憶されている内容（Ｐ_RX又はＰ_TX）を書き込まれた順に読み出して録音メモリ７に供給する。この際、バッファメモリ６は、記憶されている内容（Ｐ_RX又はＰ_TX）が全て読み出し済みとなっている場合には、エンプティ信号ＥＰを録音制御部１２に供給する。又、バッファメモリ６は、録音制御部１２から消去信号ＥＲが供給された場合には、記憶されている内容を全て初期値（例えば０）に書き換えることにより記憶内容の削除を行う。 Buffer memory 6 in response to a write signal WR1, which is supplied from the recording control unit 12, sequentially writes in storing the received voice packet P _RX and transmitting voice packets P _TX is supplied from the telephone transmitting and receiving unit 11. During this time, when the read signal RD is supplied from the recording control unit 12, the buffer memory 6 reads the stored contents ( _PRX or _PTX ) in the order of writing and supplies them to the recording memory 7. At this time, the buffer memory 6 supplies an empty signal EP to the recording control unit 12 when all the stored contents ( _PRX or _PTX ) have been read. Further, when the erase signal ER is supplied from the recording control unit 12, the buffer memory 6 deletes the stored content by rewriting the stored content to the initial value (for example, 0).

録音メモリ７は、録音制御部１２から録音書込信号ＷＲＲ１が供給された場合には、バッファメモリ６から読み出された受信音声パケットＰ_RX又は送信音声パケットＰ_TXを取り込んで順次記憶して行く。一方、録音制御部１２から録音書込信号ＷＲＲ２が供給された場合には、録音メモリ７は、電話送受信処理部１１から供給された受信音声パケットＰ_RX又は送信音声パケットＰ_TXを直接取り込んで順次記憶して行く。又、録音メモリ７は、録音制御部１２から再生信号ＲＰが供給された場合には、記憶されている受信音声パケットＰ_RX又は送信音声パケットＰ_TXを読み出し、これを再生音声パケットとして録音制御部１２に供給する。この際、録音制御部１２は、かかる再生音声パケットを電話送受信処理部１１に供給する。すると、電話送受信処理部１１は、かかる再生音声パケットを音声信号に復調してヘッドセット２に供給する。これにより、ヘッドセット２に内蔵されているスピーカから、録音メモリ７に記憶されている受信音声パケットＰ_RX又は送信音声パケットＰ_TXに対応した音声が音響出力される。すなわち、録音メモリ７に録音されていた通話時の音声が再生されるのである。 Recording memory 7, if the record write signal WRR1 is supplied from the recording control unit 12 successively stores captures received voice packet P _RX or transmit voice packet P _TX is read from the buffer memory 6 . On the other hand, if the record write signal WRR2 from the recording control unit 12 is supplied, the recording memory 7 takes in directly received voice packet P _RX or transmit voice packet P _TX is supplied from the telephone transmitting and receiving unit 11 sequentially I will remember it. Further, when the reproduction signal RP is supplied from the recording control unit 12, the recording memory 7 reads the stored reception voice packet _PRX or transmission voice packet _PTX and uses this as a reproduction voice packet. 12 is supplied. At this time, the recording control unit 12 supplies the reproduced voice packet to the telephone transmission / reception processing unit 11. Then, the telephone transmission / reception processing unit 11 demodulates the reproduced audio packet into an audio signal and supplies the audio signal to the headset 2. Thus, from a speaker built in the headset 2, sound corresponding to the received voice packet P _RX or transmit voice packet P _TX is stored in the recording memory 7 is acoustic output. That is, the voice during a call recorded in the recording memory 7 is reproduced.

音声テキスト化処理装置１３は、先ず、上記受信音声パケットＰ_RX及び送信音声パケットＰ_TXに対して音声認識処理を施すことにより、かかる音声パケットにて表される音声を文字列にて表す受信テキストデータに変換して、通話内容モニタ画像生成部１４に供給する。すなわち、音声テキスト化処理装置１３は、受信音声パケットＰ_RXにて表される音声を文字列にて表す受信テキストデータＴ_RX、及び送信音声パケットＰ_TXにて表される音声を文字列にて表す送信テキストデータＴ_TXを通話内容モニタ画像生成部１４に供給するのである。 Incoming text speech text processing apparatus 13, first, by performing a speech recognition process on the received audio packet P _RX and transmitting voice packets P _TX, representing the voice represented by such voice packets at string The data is converted into data and supplied to the call content monitor image generation unit 14. That is, the speech text processing apparatus 13 uses the received text data T _RX that represents the voice represented by the received voice packet P _RX as a character string, and the voice represented by the transmitted voice packet P _TX as a character string. The transmitted transmission text data T _TX is supplied to the call content monitor image generation unit 14.

通話内容モニタ画像生成部１４は、上述した如き、受信音声パケットＰ_RX、送信音声パケットＰ_TX、受信テキストデータＴ_RX及び送信テキストデータＴ_TXを夫々個別に順次取り込む。そして、通話内容モニタ画像生成部１４は、これらの情報に基づいて、図２に示す如き、時間軸表示部ＴＭ、相手側発話時間帯表示部ＴＺ_PA、自分側発話時間帯表示部ＴＺ_SF、及び音声テキスト表示部ＴＸＴを有する通話内容モニタ画像を表す画像信号を生成する。 As described above, the call content monitor image generation unit 14 sequentially receives the reception voice packet P _RX , the transmission voice packet P _TX , the reception text data T _RX, and the transmission text data T _TX individually. Then, based on these pieces of information, the call content monitor image generation unit 14, as shown in FIG. 2, the time axis display unit TM, the other party utterance time zone display unit TZ _PA , the own side utterance time zone display unit TZ _SF , And an image signal representing the call content monitor image having the voice text display unit TXT.

すなわち、通話内容モニタ画像生成部１４は、取り込まれた受信音声パケットＰ_RX及び送信音声パケットＰ_TX各々のヘッダ情報から、通話開始時点の時刻及び現在の時刻を表す時刻情報を抽出し、この時刻情報に基づき、通話開始時刻〜現在時刻までを複数の目安時刻を含めて水平（又は垂直方向）方向の軸上にて表す図２の如き時間軸表示部ＴＭを表す画像信号を生成する。又、通話内容モニタ画像生成部１４は、送信音声パケットＰ_TXに基づき、上記時間軸表示部ＴＭにて示されている時間軸に沿って、ユーザ（自分）が発話した区間を示す帯状領域ＢＥ（斜線にて示す）を時系列的に配置した図２の如き自分側発話時間帯表示部ＴＺ_SFを表す画像信号を生成する。又、通話内容モニタ画像生成部１４は、受信音声パケットＰ_RXに基づき、上記時間軸表示部ＴＭにて示されている時間軸に沿って、通話相手が発話した区間を示す帯状領域ＢＥ（斜線にて示す）を時系列的に配置した図２の如き相手側発話時間帯表示部ＴＺ_PAを表す画像信号を生成する。更に、通話内容モニタ画像生成部１４は、自分側発話時間帯表示部ＴＺ_SF及び相手側発話時間帯表示部ＴＺ_PAにおける各帯状領域ＢＥ毎に、その先頭部におけるＮパケット分（Ｎは自然数）の音声パケットに基づくテキストデータを上記受信テキストデータＴ_RX又は送信テキストデータＴ_TX中から抽出する。そして、通話内容モニタ画像生成部１４は、その抽出されたテキストデータによって示され文字列を図２に示す如く各帯状領域ＢＥに対応づけして配置した音声テキスト表示部ＴＸＴを表す画像信号を生成する。すなわち、各発話時間帯毎に、その先頭部での音声に対応したｎ文字（ｎは自然数）分のテキストデータを上記受信テキストデータＴ_RX又は送信テキストデータＴ_TX中から抽出し、これを音声テキスト表示部ＴＸＴにて表示させるのである。 That is, the call content monitor image generation unit 14 extracts time information indicating the time at which the call is started and the current time from the header information of each of the received reception voice packet _PRX and transmission voice packet _PTX. Based on the information, an image signal representing the time axis display unit TM as shown in FIG. 2 is generated that represents the time from the call start time to the current time on a horizontal (or vertical) axis including a plurality of reference times. Further, the call content monitor image generation unit 14 is based on the transmission voice packet _PTX , and a belt-like region BE indicating a section in which the user (self) speaks along the time axis indicated by the time axis display unit TM. generates an image signal representing a time-series manner the placed such in FIG their side speech time zone display unit TZ _SF (indicated by hatching). Further, the call content monitor image generation unit 14, based on the received voice packet _PRX , along a time axis indicated by the time axis display unit TM, a belt-like region BE (hatched line) indicating a section in which the other party speaks at shown) to generate an image signal representing a time-series manner the placed such in FIG mating speech time zone display unit TZ _PA a. Furthermore, the call content monitoring image generation section 14, each strip each region BE of the self side speech time zone display unit TZ _SF and mating speech time zone display unit TZ _PA, N packet of at the head portion (N is a natural number) _Is extracted from the received text data T _RX or the transmitted text data T _TX . Then, the call content monitor image generation unit 14 generates an image signal representing the voice text display unit TXT in which the character string indicated by the extracted text data is arranged in association with each belt-like region BE as shown in FIG. To do. That is, for each utterance time zone, text data for n characters (n is a natural number) corresponding to the voice at the head portion is extracted from the received text data _TRX or the transmitted text data _TTX , and this is extracted as voice. It is displayed on the text display section TXT.

通話内容モニタ画像生成部１４は、上述した如き時間軸表示部ＴＭ、相手側発話時間帯表示部ＴＺ_PA、自分側発話時間帯表示部ＴＺ_SF、及び音声テキスト表示部ＴＸＴ各々に対応した画像信号を表示装置４に供給する。この際、表示装置４は、上記画像信号に応じて図２に示す如き通話内容モニタ画像を表示する。 The call content monitor image generation unit 14 includes image signals corresponding to the time axis display unit TM, the other party utterance time zone display unit TZ _PA , the own side utterance time zone display unit TZ _SF , and the voice text display unit TXT as described above. Is supplied to the display device 4. At this time, the display device 4 displays a call content monitor image as shown in FIG. 2 according to the image signal.

よって、通話内容モニタ画像によれば、ユーザは、通話開始時点から現時点までの通話内容を視覚的に一望することが可能となる。 Therefore, according to the call content monitor image, the user can visually overlook the call content from the call start time to the current time.

録音制御部１２は、操作装置５から上記通話開始指令信号又は呼出要求指令信号が供給された後、通話相手との音声パケットの送受信が開始されたら、図３に示す如き録音制御を開始する。 The recording control unit 12 starts recording control as shown in FIG. 3 when transmission / reception of a voice packet with the calling party is started after the call start command signal or the call request command signal is supplied from the operation device 5.

図３において、先ず、録音制御部１２は、上記の如き書込信号ＷＲを所定周期毎に繰り返しバッファメモリ６に供給する（ステップＳ０）。かかるステップＳ０に実行により、バッファメモリ６は、電話送受信処理部１１から供給された上記受信音声パケットＰ_RX及び送信音声パケットＰ_TXを順次書き込んで記憶して行く。つまり、バッファメモリ６は、通話開始に応じて、その通話音声に対応した受信音声パケットＰ_RX及び送信音声パケットＰ_TXを記憶して行くのである。次に、録音制御部１２は、操作装置５から上記録音開始指令信号が供給されたか否かの判定を行う（ステップＳ１）。かかるステップＳ１において、録音開始指令信号が供給されたと判定された場合、録音制御部１２は、バッファメモリ６からエンプティ信号ＥＰが供給されたか否かの判定を行う（ステップＳ２）。ステップＳ２においてエンプティ信号ＥＰが供給されていないと判定された場合、つまりバッファメモリ６内に未だ読み出されていない音声パケットが存在する場合には、録音制御部１２は、所定周期毎に繰り返し読出信号ＲＤをバッファメモリ６に供給しつつ、録音書込信号ＷＲＲ１を録音メモリ７に供給する（ステップＳ３）。かかるステップＳ３の実行により、バッファメモリ６に記憶されていた受信音声パケットＰ_RX及び送信音声パケットＰ_TXが読み出され、これらが録音メモリ７に記憶される。つまり、バッファメモリ６からは、通話開始時点〜現時点に到るまでの通話音声に対応した受信音声パケットＰ_RX及び送信音声パケットＰ_TXが順次読み出され、録音メモリ７に記憶されて行くのである。一方、ステップＳ２において、エンプティ信号ＥＰが供給されたと判定された場合、つまりバッファメモリ６に記憶されていた全ての音声パケットが既に読み出されていた場合には、録音制御部１２は、所定周期毎に繰り返し録音書込信号ＷＲＲ２を録音メモリ７に供給する（ステップＳ４）。かかるステップＳ４の実行により、電話送受信処理部１１から送出された受信音声パケットＰ_RX及び送信音声パケットＰ_TXが直接、録音メモリ７に記憶される。 In FIG. 3, first, the recording controller 12 repeatedly supplies the write signal WR as described above to the buffer memory 6 at predetermined intervals (step S0). By executing the consuming steps S0, buffer memory 6 is successively written in storing the received voice packet P _RX and transmitting voice packets P _TX is supplied from the telephone transmitting and receiving unit 11. That is, the buffer memory 6 in response to a call initiation, we are going to store the received voice packet P _RX and transmitting voice packets P _TX corresponding to the call voice. Next, the recording control unit 12 determines whether or not the recording start command signal is supplied from the controller device 5 (step S1). If it is determined in step S1 that the recording start command signal is supplied, the recording control unit 12 determines whether or not the empty signal EP is supplied from the buffer memory 6 (step S2). If it is determined in step S2 that the empty signal EP has not been supplied, that is, if there is an audio packet that has not been read out in the buffer memory 6, the recording control unit 12 repeatedly reads out every predetermined period. While supplying the signal RD to the buffer memory 6, the recording write signal WRR1 is supplied to the recording memory 7 (step S3). By executing step S 3, the received voice packet P _RX and the transmitted voice packet P _TX stored in the buffer memory 6 are read out and stored in the recording memory 7. That is, from the buffer memory 6, read the received voice packet P _RX and transmitting voice packets P _TX corresponding to call voice up to the call start time - current time is sequentially than is gradually stored in the recording memory 7 . On the other hand, if it is determined in step S2 that the empty signal EP has been supplied, that is, if all the audio packets stored in the buffer memory 6 have already been read, the recording control unit 12 performs the predetermined cycle. The recording recording signal WRR2 is repeatedly supplied to the recording memory 7 every time (step S4). By executing step S4, the received voice packet P _RX and the transmitted voice packet P _TX transmitted from the telephone transmission / reception processing unit 11 are directly stored in the recording memory 7.

上記ステップＳ３又はＳ４の実行後、あるいは上記ステップＳ１において録音開始指令信号が供給されていないと判定された場合、録音制御部１２は、操作装置５から通話終了指令信号が供給されたか否かの判定を行う（ステップＳ５）。かかるステップＳ５において、通話終了指令信号が供給されていないと判定された場合、録音制御部１２は、上記ステップＳ１の実行に戻り前述した如き動作を繰り返し実行する。一方、ステップＳ５において通話終了指令信号が供給されたと判定された場合、録音制御部１２は、消去信号ＥＲをバッファメモリ６に供給する（ステップＳ６）。ステップＳ６の実行により、バッファメモリ６は、記憶されている内容を全て初期値（例えば０）に書き換えることにより記憶内容の削除を行う。 After execution of step S3 or S4, or when it is determined in step S1 that the recording start command signal has not been supplied, the recording control unit 12 determines whether or not a call end command signal has been supplied from the controller device 5. A determination is made (step S5). If it is determined in step S5 that the call end command signal is not supplied, the recording control unit 12 returns to the execution of step S1 and repeatedly executes the operation as described above. On the other hand, when it is determined in step S5 that the call end command signal has been supplied, the recording control unit 12 supplies the erasure signal ER to the buffer memory 6 (step S6). By executing step S6, the buffer memory 6 deletes the stored contents by rewriting all the stored contents to initial values (for example, 0).

よって、かかる録音制御によれば、ユーザが通話途中で録音開始操作を行った場合でも、通話の開始時点からその通話内容が録音メモリ７に録音される。 Therefore, according to such recording control, even when the user performs a recording start operation during a call, the call contents are recorded in the recording memory 7 from the start of the call.

従って、ユーザは、通話開始時点から現時点までの通話内容を図２に示す如き通話内容モニタ画像によって視覚的に確認しつつ、その通話内容を保存する必要性があると判断した場合にのみ録音開始操作を行えば、通話開始時点からの通話内容が全て録音されることになる。 Therefore, the user starts the recording only when it is determined that the call contents need to be saved while visually confirming the contents of the call from the start of the call to the present time by using the call contents monitor image as shown in FIG. If the operation is performed, all the contents of the call from the start of the call are recorded.

尚、上記実施例においては、録音開始指令操作に応じて直ちにバッファメモリ６の内容を録音メモリ７に記憶させる（ステップＳ１〜Ｓ３）ようにしているが、通話終了直後、つまり図３のステップ５及びＳ６間の時点においてバッファメモリ６の内容を録音メモリ７に記憶させるようにしても良い。 In the above embodiment, the contents of the buffer memory 6 are immediately stored in the recording memory 7 in response to the recording start command operation (steps S1 to S3), but immediately after the end of the call, that is, step 5 in FIG. And the contents of the buffer memory 6 may be stored in the recording memory 7 at the time between S6 and S6.

又、上記実施例では、通話中において電話送受信処理部１１から送出された受信音声パケットＰ_RX及び送信音声パケットＰ_TXに基づき図２に示す如き通話内容モニタ画像を生成するようにしているが、録音メモリ７から再生された音声パケットに基づき図２に示す如き通話内容モニタ画像を生成するようにしても良い。 In the above embodiment, the call content monitor image as shown in FIG. 2 is generated based on the received voice packet _PRX and the transmitted voice packet _PTX transmitted from the telephone transmission / reception processing unit 11 during a call. A call content monitor image as shown in FIG. 2 may be generated based on the voice packet reproduced from the recording memory 7.

又、上記実施例においては、本発明による音声録音装置をソフトフォンに適用した場合について説明したが、アナログ電話機にも同様に適用可能である。 In the above embodiment, the case where the voice recording apparatus according to the present invention is applied to a soft phone has been described. However, the present invention is also applicable to an analog telephone.

本発明による音声録音装置を備えたソフトフォンが構築されているパーソナルコンピュータの概略構成を示す図である。It is a figure which shows schematic structure of the personal computer with which the soft phone provided with the audio | voice recording apparatus by this invention is constructed | assembled. 通話内容モニタ画像の一例を示す図である。It is a figure which shows an example of a call content monitor image. 録音制御部１２において実施される録音制御のフローを示す図である。It is a figure which shows the flow of the recording control implemented in the recording control part.

Brief description of symbols

１ソフトフォン
２ヘッドセット
３ＩＰネットワーク
４表示装置
５操作装置
６バッファメモリ
７録音メモリ
１１電話送受信処理部
１２録音制御部
１３音声テキスト化処理部
１４通話内容モニタ画像生成部 DESCRIPTION OF SYMBOLS 1 Softphone 2 Headset 3 IP network 4 Display apparatus 5 Operation apparatus 6 Buffer memory 7 Recording memory 11 Telephone transmission / reception process part 12 Recording control part 13 Speech text conversion process part 14 Call content monitor image generation part

Claims

A voice recording device that records voice data corresponding to voice during a call on a recording medium,
A buffer memory for sequentially writing and storing the audio data from the start of the call;
The voice data read from the buffer memory is recorded on the recording medium while reading the voice data stored in the buffer memory in accordance with a recording start command in the order of writing, and at the end of the call And a recording control means for rewriting all the stored contents of the buffer memory to a predetermined initial value.

Voice text processing means for converting voice based on the voice data into text data represented by a character string by performing voice recognition processing on the voice data;
The voice recording apparatus according to claim 1, further comprising: an image generation unit configured to generate an image signal for displaying the character string represented by the text data on a display.

3. The display device according to claim 2, wherein the display unit also displays an image representing a horizontal or vertical time axis including a plurality of reference times from the start time of the call to the current time. The voice recording device described.

The image generation means detects the other party's utterance time zone spoken by the other party and the own side utterance time zone spoken by the other side based on the audio data, and associates the band-like regions individually representing these on the time axis. The voice recording apparatus according to claim 3, wherein the voice recording apparatus is displayed on the display.

The image generating means corresponds to the character string of n characters (n is a natural number) corresponding to the voice at the head of each of the other party utterance time zone and the own side utterance time zone, corresponding to the display position of the band-like region The voice recording apparatus according to claim 4, wherein the voice recording apparatus is displayed on the display.