JP2017111339A

JP2017111339A - Voice reproduction device, voice reproduction method, and program

Info

Publication number: JP2017111339A
Application number: JP2015246371A
Authority: JP
Inventors: 一川竹; Hajime Kawatake
Original assignee: Sourcenext Corp
Current assignee: Sourcenext Corp
Priority date: 2015-12-17
Filing date: 2015-12-17
Publication date: 2017-06-22
Anticipated expiration: 2035-12-17
Also published as: JP6721981B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice reproduction device, voice reproduction method, and program that can perform pinpoint check for a voice associated with a desired portion of a character string that is a result of voice recognition.SOLUTION: A display control part 34 displays a character string that is a result of voice recognition. A voice reproduction part 40, according to designation of one or more characters included in the character string that is a result of voice recognition, reproduces the voice from reproduction positions of the one or more characters associated with a position in the character string that is a result of voice recognition.SELECTED DRAWING: Figure 7

Description

本開示は、音声再生装置、音声再生方法及びプログラムに関する。 The present disclosure relates to an audio reproduction device, an audio reproduction method, and a program.

留守番電話のメッセージである音声の認識結果を文字列として表示する技術が存在する（特許文献１〜３参照）。このような技術によると、留守番電話のメッセージを目視により確認することができる。 There is a technique for displaying a voice recognition result, which is an answering machine message, as a character string (see Patent Documents 1 to 3). According to such a technique, the message of the answering machine can be visually confirmed.

特開平５−３４７６５８号公報JP-A-5-347658 特開平１１−１５０６０３号公報JP-A-11-150603 特開２００３−２２４７００号公報JP 2003-224700 A

留守番電話のメッセージを目視により確認できたとしても、例えば日付や時間、場所などといった重要な部分については音声を再生させて音声認識に誤りがないかどうかが確認できることが望ましい。 Even if the message of the answering machine can be confirmed by visual observation, it is desirable that, for example, important parts such as date, time, and location can be reproduced to confirm whether or not there is an error in voice recognition.

また正確に音声認識ができておらず表示されている文字列からは意味する内容が不明である部分についても音声を再生させてその意味する内容を確認できることが望ましい。 It is also desirable to be able to confirm the meaning of the sound by reproducing the sound even in a portion where the speech recognition is not correctly performed and the meaning of the meaning is unknown from the displayed character string.

例えば上述した場面などにおいては、音声の認識結果である文字列のうちの所望の部分に対応付けられる音声をピンポイントで確認できれば便利であるが、特許文献１〜３に記載の技術ではこのようなことはできなかった。 For example, in the above-described scenes and the like, it is convenient if the voice associated with the desired portion of the character string that is the voice recognition result can be confirmed pinpointed. I couldn't do anything.

上記実情に鑑みて、本開示では、音声の認識結果である文字列のうちの所望の部分に対応付けられる音声をピンポイントで確認できる音声再生装置、音声再生方法及びプログラムを提案する。 In view of the above situation, the present disclosure proposes an audio reproduction device, an audio reproduction method, and a program that can pinpoint the voice associated with a desired portion of a character string that is a speech recognition result.

上記課題を解決するために、本開示に係る音声再生装置は、音声の認識結果である文字列を表示させる表示制御部と、前記文字列に含まれる１又は複数の文字の指定に応じて、当該１又は複数の文字の前記文字列における位置に対応付けられる再生位置から前記音声を再生させる音声再生部と、を含む。 In order to solve the above problem, an audio reproduction device according to the present disclosure includes a display control unit that displays a character string that is a speech recognition result, and one or more characters included in the character string, An audio reproduction unit that reproduces the audio from a reproduction position associated with the position of the one or more characters in the character string.

本開示の一態様では、前記音声再生部は、前記文字列をそれぞれ１又は複数の文字から構成される複数の部分文字列に分割した場合における先頭からｎ番目の部分文字列が指定される際には、前記音声の再生時間を前記部分文字列の数で複数の部分時間に分割した場合における先頭から前記ｎ番目の部分時間に属する再生位置、又は、当該再生位置の所定時間前に相当する再生位置から前記音声を再生させる。 In one aspect of the present disclosure, the sound reproducing unit may specify the nth partial character string from the beginning when the character string is divided into a plurality of partial character strings each including one or more characters. Corresponds to the playback position belonging to the nth partial time from the beginning or the predetermined time before the playback position when the playback time of the voice is divided into a plurality of partial times by the number of the partial character strings. The sound is reproduced from the reproduction position.

この態様では、前記複数の部分時間のそれぞれは、前記音声の再生時間を等時間間隔で分割したものであってもよい。 In this aspect, each of the plurality of partial times may be obtained by dividing the audio playback time at equal time intervals.

また、前記ｎ番目の部分時間に属する再生位置は、前記ｎ番目の部分時間の先頭の再生位置であってもよい。 The playback position belonging to the nth partial time may be the start playback position of the nth partial time.

また、本開示の一態様では、前記表示制御部は、前記音声が再生されている部分に対応付けられる文字を強調表示させる。 Moreover, in one aspect of the present disclosure, the display control unit highlights a character associated with a portion where the sound is reproduced.

また、本開示に係る音声再生方法は、音声の認識結果である文字列を表示させるステップと、前記文字列に含まれる１又は複数の文字の指定に応じて、当該１又は複数の文字の前記文字列における位置に対応付けられる再生位置から前記音声を再生させるステップと、を含む。 In addition, the audio reproduction method according to the present disclosure includes a step of displaying a character string that is a speech recognition result, and the designation of the one or more characters included in the character string according to the designation of the one or more characters. Reproducing the sound from a reproduction position associated with a position in a character string.

また、本開示に係るプログラムは、音声の認識結果である文字列を表示させる手順、前記文字列に含まれる１又は複数の文字の指定に応じて、当該１又は複数の文字の前記文字列における位置に対応付けられる再生位置から前記音声を再生させる手順、をコンピュータに実行させる。 In addition, the program according to the present disclosure may be configured to display a character string that is a speech recognition result, in accordance with designation of one or more characters included in the character string, in the character string of the one or more characters. Causing the computer to execute a procedure of reproducing the sound from a reproduction position associated with the position.

本開示の一実施形態に係る留守番電話システムの全体構成の一例を示す図である。It is a figure showing an example of the whole composition of an answering machine system concerning one embodiment of this indication. 本開示の一実施形態に係る留守番電話プログラムを利用可能にするための手続の流れの一例を示す図である。It is a figure which shows an example of the flow of the procedure for making the answering machine program which concerns on one Embodiment of this indication usable. 転送電話を受け付けた留守番電話処理サーバにより実行される処理の流れの一例を示すフロー図である。It is a flowchart which shows an example of the flow of the process performed by the answering machine processing server which received the transfer telephone. メッセージ一覧画面の一例を示す図である。It is a figure which shows an example of a message list screen. 音声再生画面の一例を示す図である。It is a figure which shows an example of an audio | voice reproduction | regeneration screen. 音声再生画面の一例を示す図である。It is a figure which shows an example of an audio | voice reproduction | regeneration screen. 音声再生画面の一例を示す図である。It is a figure which shows an example of an audio | voice reproduction | regeneration screen. 音声再生画面の一例を示す図である。It is a figure which shows an example of an audio | voice reproduction | regeneration screen. 音声再生画面の一例を示す図である。It is a figure which shows an example of an audio | voice reproduction | regeneration screen. 本開示の一実施形態に係る携帯電話端末で実装される機能の一例を示す機能ブロック図である。3 is a functional block diagram illustrating an example of functions implemented in a mobile phone terminal according to an embodiment of the present disclosure. FIG. 本開示の一実施形態に係る携帯電話端末において行われる処理の流れの一例を示すフロー図である。It is a flowchart which shows an example of the flow of the process performed in the mobile telephone terminal which concerns on one Embodiment of this indication.

以下、本発明の一実施形態について、図面を参照しながら説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

図１は、本開示で提案する留守番電話システム１の全体構成の一例を示す図である。図１に示すように、本開示で提案する留守番電話システム１には、留守番電話処理サーバ１０、音声認識サーバ１２、及び、複数の携帯電話端末１４が含まれている。留守番電話処理サーバ１０及び携帯電話端末１４は電話通信網１６及びインターネット１８に接続されている。そのため留守番電話処理サーバ１０と携帯電話端末１４との間、携帯電話端末１４同士の間は互いに電話通信網１６やインターネット１８を介して通信可能となっている。また音声認識サーバ１２は、インターネット１８に接続されている。そのため音声認識サーバ１２は、留守番電話処理サーバ１０等とインターネット１８を介して互いに通信可能となっている。 FIG. 1 is a diagram illustrating an example of the overall configuration of an answering machine system 1 proposed in the present disclosure. As shown in FIG. 1, the answering machine system 1 proposed in the present disclosure includes an answering machine processing server 10, a voice recognition server 12, and a plurality of mobile phone terminals 14. The answering machine processing server 10 and the mobile phone terminal 14 are connected to a telephone communication network 16 and the Internet 18. Therefore, the answering machine processing server 10 and the mobile phone terminal 14 and the mobile phone terminals 14 can communicate with each other via the telephone communication network 16 and the Internet 18. The voice recognition server 12 is connected to the Internet 18. Therefore, the voice recognition server 12 can communicate with the answering machine processing server 10 and the like via the Internet 18.

留守番電話処理サーバ１０は、例えば留守番電話サービス等のサービスを提供するサーバコンピュータである。 The answering machine processing server 10 is a server computer that provides services such as an answering machine service.

音声認識サーバ１２は、例えば音声を受け付けて、当該音声に対しての音声認識結果である文字列等のテキストを生成するサービスを提供するサーバコンピュータである。音声認識サーバ１２は、本実施形態では例えば、留守番電話処理サーバ１０が録音した留守番電話のメッセージの音声を受け付ける。そして音声認識サーバ１２は、当該音声に対して音声認識処理を実行することで、当該音声の音声認識結果である文字列を含むテキストデータを生成する。そして音声認識サーバ１２は、生成されたテキストデータを留守番電話処理サーバ１０に送信する。 The voice recognition server 12 is a server computer that provides a service that receives, for example, voice and generates text such as a character string that is a voice recognition result for the voice. In the present embodiment, for example, the voice recognition server 12 accepts voice of an answering machine message recorded by the answering machine processing server 10. Then, the voice recognition server 12 generates text data including a character string that is a voice recognition result of the voice by executing voice recognition processing on the voice. Then, the voice recognition server 12 transmits the generated text data to the answering machine processing server 10.

携帯電話端末１４は、例えばスマートフォンなどの端末である。図１に示すように、本実施形態に係る携帯電話端末１４には、例えば、制御部１４ａ、記憶部１４ｂ、通信部１４ｃ、タッチパネル１４ｄ、音声入出力部１４ｅ、が含まれる。 The mobile phone terminal 14 is a terminal such as a smartphone. As shown in FIG. 1, the mobile phone terminal 14 according to the present embodiment includes, for example, a control unit 14a, a storage unit 14b, a communication unit 14c, a touch panel 14d, and a voice input / output unit 14e.

制御部１４ａは、例えば携帯電話端末１４にインストールされるプログラムに従って動作するマイクロプロセッサ等のプログラム制御デバイスである。 The control unit 14a is a program control device such as a microprocessor that operates according to a program installed in the mobile phone terminal 14, for example.

記憶部１４ｂは、例えばＲＯＭやＲＡＭ等の記憶素子などである。記憶部１４ｂには、制御部１４ａによって実行されるプログラムなどが記憶される。 The storage unit 14b is a storage element such as a ROM or a RAM, for example. The storage unit 14b stores a program executed by the control unit 14a.

通信部１４ｃは、例えば電話通信網１６を介した音声通信やデータ通信を行うための携帯電話通信ユニットや、インターネット１８を介したデータ通信を行うための無線ＬＡＮモジュールなどの通信インタフェースである。 The communication unit 14 c is a communication interface such as a mobile phone communication unit for performing voice communication and data communication via the telephone communication network 16 and a wireless LAN module for performing data communication via the Internet 18.

タッチパネル１４ｄは、例えばタッチセンサ、及び、液晶ディスプレイや有機ＥＬディスプレイ等のディスプレイを含んで構成されており、制御部１４ａが生成する映像などを表示させる。またユーザはタッチパネル１４ｄを操作することで、制御部１４ａに対する操作入力を行えるようになっている。制御部１４ａは、タッチパネル１４ｄに対する操作入力に応じて各種の処理を実行する。 The touch panel 14d includes, for example, a touch sensor and a display such as a liquid crystal display or an organic EL display, and displays an image generated by the control unit 14a. Further, the user can perform an operation input to the control unit 14a by operating the touch panel 14d. The control unit 14a executes various processes in response to operation inputs on the touch panel 14d.

音声入出力部１４ｅは、例えばヘッドホンやスピーカ等の音声出力デバイスを含んでおり、通信部１４ｃが受信する音声データが表す音声などを出力する。また音声入出力部１４ｅは、マイク等の音声入力デバイスを含んでおり、例えば受け付ける音声を、通信部１４ｃを介して他の携帯電話端末１４に送信する。 The audio input / output unit 14e includes an audio output device such as a headphone or a speaker, for example, and outputs audio or the like represented by audio data received by the communication unit 14c. The voice input / output unit 14e includes a voice input device such as a microphone. For example, the voice input / output unit 14e transmits received voice to another mobile phone terminal 14 via the communication unit 14c.

本実施形態に係る携帯電話端末１４は、本実施形態に係る留守番電話プログラムをインストールすることで、録音された留守番電話のメッセージである音声の認識結果を文字列としてタッチパネル１４ｄに表示させることができるようになっている。 By installing the answering machine program according to the present embodiment, the mobile phone terminal 14 according to the present embodiment can display a voice recognition result as a recorded answering machine message on the touch panel 14d as a character string. It is like that.

ここで、本実施形態に係る留守番電話プログラムを利用可能にするための手続の流れの一例を、図２に示すフロー図を参照しながら説明する。 Here, an example of the flow of procedures for making the answering machine program according to the present embodiment available will be described with reference to the flowchart shown in FIG.

まず携帯電話端末１４のユーザは、携帯電話端末１４の販売店等において、本実施形態に係る留守番電話プログラムの入手先となるＵＲＬとシリアル番号を入手する（Ｓ１０１）。 First, the user of the mobile phone terminal 14 obtains the URL and serial number from which the answering machine program according to the present embodiment is obtained at a store of the mobile phone terminal 14 (S101).

そして、ユーザは携帯電話端末１４からＳ１０１に示す手続で入手したＵＲＬにアクセスして、本実施形態に係る留守番電話プログラムをダウンロードし、当該留守番電話プログラムを携帯電話端末１４にインストールする（Ｓ１０２）。 Then, the user accesses the URL obtained in the procedure shown in S101 from the mobile phone terminal 14, downloads the answering machine program according to the present embodiment, and installs the answering machine program in the mobile phone terminal 14 (S102).

そしてユーザは、タッチパネル１４ｄを介して、Ｓ１０１に示す手続で入手したシリアル番号を入力する（Ｓ１０３）。すると、タッチパネル１４ｄに、無応答時転送の転送先として設定すべき電話番号が表示される（Ｓ１０４）。 Then, the user inputs the serial number obtained in the procedure shown in S101 via the touch panel 14d (S103). Then, the telephone number to be set as the transfer destination for the non-response transfer is displayed on the touch panel 14d (S104).

そしてユーザが携帯電話端末１４を操作して、Ｓ１０４に示す手続で表示された電話番号を無応答時転送の転送先として設定すると（Ｓ１０５）、ユーザは、本実施形態に係る留守番電話プログラムを利用可能となる。 Then, when the user operates the mobile phone terminal 14 and sets the telephone number displayed in the procedure shown in S104 as the transfer destination for the no-response transfer (S105), the user uses the answering machine program according to the present embodiment. It becomes possible.

例えば、あるユーザが、本実施形態に係る留守番電話プログラムが利用可能なユーザの携帯電話端末１４の電話番号に宛てて電話をかけたとする。以下、電話をかけたユーザを発信ユーザ、発信ユーザが電話をかけた相手のユーザを着信ユーザと呼ぶこととする。 For example, it is assumed that a user calls a telephone number of the user's mobile phone terminal 14 that can use the answering machine program according to the present embodiment. Hereinafter, a user who makes a call is referred to as a calling user, and a user to whom the calling user calls is called a receiving user.

ここで着信ユーザの携帯電話端末１４が無応答である場合は、上記Ｓ１０５に示す手続で設定された電話番号に宛てた電話としてこの電話が留守番電話処理サーバ１０に転送される。なお本実施形態ではシリアル番号と転送先の電話番号とが１対１で対応付けられているので、留守番電話処理サーバ１０は、転送先として設定されている電話番号に基づいて、どの電話番号に宛てた電話が転送されたのかを特定できるようになっている。 If the incoming user's mobile phone terminal 14 does not answer, the call is forwarded to the answering machine processing server 10 as a call addressed to the telephone number set in the procedure shown in S105. In this embodiment, since the serial number and the transfer destination telephone number are associated with each other on a one-to-one basis, the answering machine processing server 10 determines which telephone number is based on the telephone number set as the transfer destination. It is possible to identify whether the addressed call has been transferred.

以下、転送電話を受け付けた留守番電話処理サーバ１０により実行される処理の流れの一例を、図３に示すフロー図を参照しながら説明する。 Hereinafter, an example of the flow of processing executed by the answering machine processing server 10 that has accepted a forwarded call will be described with reference to the flowchart shown in FIG.

留守番電話処理サーバ１０は、発信ユーザの携帯電話端末１４からの転送電話の着信を受け付けると（Ｓ２０１）、発信ユーザの携帯電話端末１４に自動応答メッセージを発信する（Ｓ２０２）。この自動応答メッセージは発信ユーザの携帯電話端末１４の音声入出力部１４ｅから音声出力される。 When the answering machine processing server 10 receives an incoming call from the mobile phone terminal 14 of the calling user (S201), it sends an automatic response message to the mobile phone terminal 14 of the calling user (S202). This automatic response message is output by voice from the voice input / output unit 14e of the mobile phone terminal 14 of the calling user.

そして留守番電話処理サーバ１０は、Ｓ２０１に示す処理で受け付けた着信に基づいて、発信ユーザが利用している携帯電話端末１４の電話番号を特定する（Ｓ２０３）。 The answering machine processing server 10 specifies the telephone number of the mobile phone terminal 14 used by the calling user based on the incoming call received in the process shown in S201 (S203).

その後、発信ユーザが留守番電話のメッセージを携帯電話端末１４の音声入出力部１４ｅを介して音声入力すると、当該メッセージの音声は留守番電話処理サーバ１０に送信される。そして留守番電話処理サーバ１０は、当該留守番電話のメッセージの音声を録音する（Ｓ２０４）。 Thereafter, when the calling user inputs a voice mail message via the voice input / output unit 14 e of the mobile phone terminal 14, the voice of the message is transmitted to the voice mail processing server 10. The answering machine processing server 10 records the voice of the answering machine message (S204).

すると留守番電話処理サーバ１０は、Ｓ２０４に示す処理で録音された音声のデータを音声認識サーバ１２に送信する（Ｓ２０５）。音声認識サーバ１２は、当該音声のデータを受信すると、当該音声に対して音声認識処理を実行する。そして音声認識サーバ１２は、当該音声の音声認識の結果である文字列を含むテキストデータを留守番電話処理サーバ１０に送信する。そして留守番電話処理サーバ１０は当該テキストデータを受信する（Ｓ２０６）。 Then, the answering machine processing server 10 transmits the voice data recorded in the process shown in S204 to the voice recognition server 12 (S205). When the voice recognition server 12 receives the voice data, the voice recognition server 12 performs voice recognition processing on the voice. Then, the voice recognition server 12 transmits text data including a character string that is a result of voice recognition of the voice to the answering machine processing server 10. The answering machine processing server 10 receives the text data (S206).

そして留守番電話処理サーバ１０は、Ｓ２０６に示す処理で受信したテキストデータやＳ２０４に示す処理で録音された音声のデータを含む留守番電話データを着信ユーザの携帯電話端末１４に送信する（Ｓ２０７）。なお本実施形態では当該留守番電話データには、上述のテキストデータや音声のデータの他に、例えば、着信／録音通知、発信ユーザの電話番号、当該音声の録音時刻、及び、当該音声の再生時間、のそれぞれを示すデータが含まれることとする。そして本処理例に示す処理は終了される。 The answering machine processing server 10 transmits the answering machine data including the text data received in the process shown in S206 and the voice data recorded in the process shown in S204 to the mobile phone terminal 14 of the incoming user (S207). In the present embodiment, the answering machine data includes, for example, an incoming / recording notification, a calling user's telephone number, a recording time of the voice, and a playback time of the voice, in addition to the text data and voice data described above. It is assumed that data indicating each of these is included. Then, the processing shown in this processing example is finished.

なお例えば音声認識の結果、録音された音声が無音であることが判明した場合には、Ｓ２０７に示す処理で、留守番電話処理サーバ１０は、Ｓ２０４に示す処理で録音されたメッセージの音声のデータを送信しなくてもよい。このようにすれば、送信されるデータのデータ量を低減できることとなる。またこの場合に、録音された音声が無音であったことを示すメッセージを送信するようにしてもよい。そして携帯電話端末１４が当該メッセージを表示するようにしてもよい。 Note that, for example, if the recorded voice is found to be silent as a result of voice recognition, the answering machine processing server 10 uses the voice data of the message recorded in the process shown in S204 in the process shown in S207. You do not have to send it. In this way, the amount of data to be transmitted can be reduced. In this case, a message indicating that the recorded voice is silent may be transmitted. Then, the mobile phone terminal 14 may display the message.

また留守番電話処理サーバ１０は、送信されるデータのデータ量を削減するために、無音の部分が除去（トリミング）された音声のデータを着信ユーザの携帯電話端末１４に送信するようにしてもよい。 In addition, the answering machine processing server 10 may transmit voice data from which the silent portion has been removed (trimmed) to the mobile phone terminal 14 of the incoming user in order to reduce the amount of data to be transmitted. .

Ｓ２０７に示す処理で送信された留守番電話データを受信した着信ユーザの携帯電話端末１４は、当該留守番電話データを記憶する。そして着信ユーザの携帯電話端末１４は、着信及び録音があったことを着信ユーザに通知する。 The mobile phone terminal 14 of the incoming user who has received the answering machine data transmitted in the process shown in S207 stores the answering machine data. Then, the mobile phone terminal 14 of the incoming user notifies the incoming user that there was an incoming call and recording.

そして着信ユーザが本実施形態に係る留守番電話プログラムを起動する処理を実行すると、図４に例示するメッセージ一覧画面２０がタッチパネル１４ｄに表示される。 When the incoming user executes processing for starting the answering machine program according to the present embodiment, a message list screen 20 illustrated in FIG. 4 is displayed on the touch panel 14d.

メッセージ一覧画面２０には、受信した留守番電話データに含まれるテキストデータを表すテキスト画像Ｉ１が、受信した時刻の順に時系列で並んで配置されている。またメッセージ一覧画面２０には、テキスト画像Ｉ１に対応付けて、発信ユーザ氏名画像Ｉ２、写真画像Ｉ３、再生アイコン画像Ｉ４、再生時間画像Ｉ５、及び、録音時刻画像Ｉ６が配置されている。 On the message list screen 20, text images I1 representing text data included in the received answering machine data are arranged in chronological order in the order of the received time. In addition, on the message list screen 20, an outgoing user name image I2, a photograph image I3, a reproduction icon image I4, a reproduction time image I5, and a recording time image I6 are arranged in association with the text image I1.

発信ユーザ氏名画像Ｉ２は例えば発信ユーザの氏名を表す画像である。写真画像Ｉ３は例えば発信ユーザの写真の画像である。本実施形態では例えば、着信ユーザの携帯電話端末１４にインストールされている連絡先情報アプリケーションにおいて、受信した留守番電話データに示されている発信ユーザの電話番号に関連付けられて管理されている氏名及び写真が特定される。そして本実施形態では、特定された氏名を表す画像が発信ユーザ氏名画像Ｉ２としてメッセージ一覧画面２０に配置され、特定された写真の画像が写真画像Ｉ３としてメッセージ一覧画面２０に配置される。 The calling user name image I2 is an image representing the name of the calling user, for example. The photograph image I3 is, for example, a photograph image of the calling user. In the present embodiment, for example, in the contact information application installed in the mobile phone terminal 14 of the incoming user, the name and photo managed in association with the telephone number of the calling user indicated in the received answering machine data Is identified. In the present embodiment, an image representing the specified name is arranged on the message list screen 20 as the calling user name image I2, and an image of the specified photo is arranged on the message list screen 20 as the photo image I3.

再生アイコン画像Ｉ４は、音声の再生を指示するためのアイコン画像である。また再生時間画像Ｉ５は、受信した留守番電話データに示されている再生時間を表す画像である。また録音時刻画像Ｉ６は、受信した留守番電話データに示されている録音時刻を表す画像である。 The reproduction icon image I4 is an icon image for instructing audio reproduction. The reproduction time image I5 is an image representing the reproduction time indicated in the received answering machine data. The recording time image I6 is an image representing the recording time indicated in the received answering machine data.

ここで着信ユーザが、例えば再生アイコン画像Ｉ４に対するタップ操作などといった、再生アイコン画像Ｉ４を選択する操作を行うと、図５Ａに例示する音声再生画面２２がタッチパネル１４ｄに表示される。 Here, when the incoming user performs an operation of selecting the reproduction icon image I4, such as a tap operation on the reproduction icon image I4, for example, an audio reproduction screen 22 illustrated in FIG. 5A is displayed on the touch panel 14d.

図５Ａに示す音声再生画面２２には、選択された再生アイコン画像Ｉ４に対応付けられるテキスト画像Ｉ１が配置されている。また本実施形態では、音声再生画面２２が表示されると、当該音声再生画面２２に配置されているテキスト画像Ｉ１に対応付けられる留守番電話のメッセージの音声の再生が開始されるようになっている。 On the audio reproduction screen 22 shown in FIG. 5A, a text image I1 associated with the selected reproduction icon image I4 is arranged. Further, in the present embodiment, when the voice reproduction screen 22 is displayed, the voice reproduction of the answering machine message associated with the text image I1 arranged on the voice reproduction screen 22 is started. .

また音声再生画面２２には、当該音声の再生時間を表す再生時間画像Ｉ５及び再生位置を表す再生位置画像Ｉ７が配置されている。また音声再生画面２２には、シークバー画像Ｉ８及び各種の操作画像Ｉ９が配置されている。着信ユーザはシークバー画像Ｉ８を操作することで、音声の再生位置を変えることができるようになっている。また着信ユーザは操作画像Ｉ９を操作することで音声の早送り、巻き戻し、停止、再生、２倍速等の操作を行うことができるようになっている。 On the audio playback screen 22, a playback time image I5 indicating the playback time of the sound and a playback position image I7 indicating the playback position are arranged. In addition, a seek bar image I8 and various operation images I9 are arranged on the audio reproduction screen 22. The called user can change the sound reproduction position by operating the seek bar image I8. The incoming user can operate the operation image I9 to perform operations such as fast forward, rewind, stop, playback, and double speed of the voice.

また本実施形態では、メッセージの音声の再生中には、テキスト画像Ｉ１が表す文字列のうち、再生位置に対応付けられる文字が強調表示される。図５Ａでは、強調表示されている文字が、カーソルＣで囲まれる文字として表現されている。 In the present embodiment, during the reproduction of the voice of the message, the character associated with the reproduction position is highlighted in the character string represented by the text image I1. In FIG. 5A, the highlighted character is expressed as a character surrounded by the cursor C.

なお再生中の音節や音素に対応付けられる文字が強調表示される必要はない。例えば単純に、テキスト画像Ｉ１が表す文字列に含まれる文字の数で再生時間を割った時間毎に強調表示される文字が変わるようにしてもよい。具体的には例えば、再生時間をＴ１秒、テキスト画像Ｉ１が表す文字列に含まれる文字の数をＮ１とした際に、先頭からｎ１番目の文字は、（（ｎ１−１）×Ｔ１／Ｎ１）秒から（ｎ１×Ｔ１／Ｎ１）秒までの再生位置である場合に強調表示されるようにしてもよい。 Note that characters associated with the syllable or phoneme being played back need not be highlighted. For example, the highlighted character may be changed every time the reproduction time is divided by the number of characters included in the character string represented by the text image I1. Specifically, for example, when the reproduction time is T1 seconds and the number of characters included in the character string represented by the text image I1 is N1, the n1st character from the top is ((n1-1) × T1 / N1). ) It may be highlighted when the playback position is from the second to (n1 × T1 / N1) seconds.

ここで図５Ｂに示すように、例えば発信ユーザが虎ノ門支社へ訪問する時刻を表す文字（例えば先頭から６０番目の文字である「８」）を指定する操作を着信ユーザが行ったとする。すると図５Ｃに示すように、テキスト画像Ｉ１が表す文字列における指定された文字の位置に応じたものに再生位置が変更される。そして変更後の再生位置からメッセージの音声が再生される。 Here, as shown in FIG. 5B, for example, it is assumed that the receiving user performs an operation of designating a character (for example, “8” which is the 60th character from the top) indicating the time when the calling user visits the Toranomon branch office. Then, as shown in FIG. 5C, the reproduction position is changed according to the position of the designated character in the character string represented by the text image I1. Then, the voice of the message is reproduced from the reproduction position after the change.

例えば、再生時間をＴ１秒、テキスト画像Ｉ１が表す文字列に含まれる文字の数をＮ１とした際に、先頭からｎ１番目の文字が指定されたとする。この場合は本実施形態では例えば、（（（ｎ１−１）×Ｔ１／Ｎ１）−Δ）秒の再生位置からメッセージの音声が再生される。なおΔは所定のオフセット値であり、ここでは例えば２秒であるとする。図５Ｃの例では、Ｎ１＝１３２、Ｔ１＝２４秒、ｎ１＝６０であるので、（（（６０−１）×２４／１３２）−２）＝８．７２秒の再生位置から音声が再生されることとなる。このように本実施形態では、指定された文字に対応付けられる再生位置の所定時間前からメッセージの音声が再生されることとなる。 For example, when the reproduction time is T1 seconds and the number of characters included in the character string represented by the text image I1 is N1, it is assumed that the n1st character from the beginning is designated. In this case, in this embodiment, for example, the voice of the message is reproduced from the reproduction position of (((n1-1) × T1 / N1) −Δ) seconds. Note that Δ is a predetermined offset value, for example, 2 seconds here. In the example of FIG. 5C, since N1 = 132, T1 = 24 seconds, and n1 = 60, the sound is reproduced from the reproduction position of (((60-1) × 24/132) -2) = 8.72 seconds. The Rukoto. As described above, in the present embodiment, the voice of the message is reproduced from a predetermined time before the reproduction position associated with the designated character.

なお上記Δの値は０であっても構わない。この場合は、指定された文字に対応付けられる再生位置からメッセージの音声が再生されることとなる。例えばＮ１＝１３２、Ｔ１＝２４秒、ｎ１＝６０である場合は、（（６０−１）×２４／１３２）＝１０．７２秒の再生位置から再生されることとなる。また文字の指定に応じてメッセージの音声が再生される再生位置から所定時間の部分（例えば５秒）が繰り返し再生されるようにしてもよい。 The value of Δ may be 0. In this case, the voice of the message is reproduced from the reproduction position associated with the designated character. For example, when N1 = 132, T1 = 24 seconds, and n1 = 60, playback is started from the playback position of ((60-1) × 24/132) = 10.72 seconds. Further, a portion of a predetermined time (for example, 5 seconds) may be repeatedly reproduced from the reproduction position where the voice of the message is reproduced according to the designation of the character.

図６Ａは、テキスト画像Ｉ１の別の一例が配置された音声再生画面２２の一例を示す図である。音声認識の精度が悪い場合には、図６Ａに示すように、テキスト画像Ｉ１が表す文字列からは意味する内容が不明である部分が存在することがある。ここで意味する内容が不明である部分（図６Ａにおいては例えば先頭から５９番目の文字である「社」）を指定する操作を着信ユーザが行ったとする。するとこの場合についても図６Ｂに示すように、テキスト画像Ｉ１が表す文字列における指定された文字の位置に応じたものに再生位置が変更されて、変更後の再生位置からメッセージの音声が再生される。図６Ｂの例では、Ｎ１＝１１７、Ｔ１＝２４秒、ｎ１＝５９であるので、（（（５９−１）×２４／１１７）−２）＝９．８９秒の再生位置から音声が再生されることとなる。なおテキスト画像Ｉ１が表す文字列のうちの、音声認識の精度が悪い部分について、強調表示されるようにしてもよい。例えば音声認識の精度が悪い部分については他の文字とは異なる色で表示されるようにしてもよい。 FIG. 6A is a diagram illustrating an example of the audio reproduction screen 22 on which another example of the text image I1 is arranged. When the accuracy of speech recognition is poor, as shown in FIG. 6A, there may be a portion whose meaning is unknown from the character string represented by the text image I1. It is assumed that the incoming user has performed an operation of designating a portion whose content is unknown (in FIG. 6A, for example, “company” which is the 59th character from the top). Then, also in this case, as shown in FIG. 6B, the reproduction position is changed according to the position of the designated character in the character string represented by the text image I1, and the voice of the message is reproduced from the reproduction position after the change. The In the example of FIG. 6B, since N1 = 117, T1 = 24 seconds, and n1 = 59, the sound is reproduced from the reproduction position of (((59-1) × 24/117) -2) = 9.89 seconds. The Rukoto. It should be noted that a portion of the character string represented by the text image I1 that is poor in accuracy of speech recognition may be highlighted. For example, a portion with poor voice recognition accuracy may be displayed in a color different from that of other characters.

本実施形態によれば、着信ユーザはテキスト画像Ｉ１を目視することで、発信ユーザによって録音された留守番電話のメッセージの内容を知ることができる。その上本実施形態では、テキスト画像Ｉ１が表す文字列に含まれる文字を指定することで、当該文字に対応付けられる再生位置から音声が再生されるようになっている。 According to the present embodiment, the receiving user can know the content of the answering machine message recorded by the calling user by viewing the text image I1. In addition, in the present embodiment, by specifying a character included in the character string represented by the text image I1, sound is reproduced from a reproduction position associated with the character.

例えば再生時間をＴ１秒、テキスト画像Ｉ１が表す文字列に含まれる文字の数をＮ１とした際に、先頭からｎ１番目の文字が指定されたとすると、当該文字を表す音節又は音素は、（（ｎ１−１）×Ｔ１／Ｎ１）秒の再生位置で再生される可能性が高い。このことを踏まえ本実施形態では、先頭からｎ番目の文字が指定された場合に、余裕を持って当該音節又は音素を聞き取ることができる（（（ｎ１−１）×Ｔ１／Ｎ１）−Δ）秒の再生位置から音声が再生されるようになっている。なお上述したように、当該文字を表す音節又は音素が再生される可能性の高い（（ｎ１−１）×Ｔ１／Ｎ１）秒の再生位置から音声が再生されても構わない。このようにして本実施形態では、日付や時間、場所などといった重要な部分や、意味する内容が不明である部分などといった、表示されているテキスト画像Ｉ１が表す文字列のうちのユーザが確認したい部分の音声をピンポイントで確認できることとなる。 For example, if the playback time is T1 seconds and the number of characters included in the character string represented by the text image I1 is N1, and the n1st character from the beginning is specified, the syllable or phoneme representing the character is (( There is a high possibility of playback at a playback position of (n1-1) × T1 / N1) seconds. Based on this, in the present embodiment, when the nth character from the head is specified, the syllable or phoneme can be heard with a margin (((n1-1) × T1 / N1) −Δ). Audio is played from the second playback position. As described above, the sound may be reproduced from the reproduction position of ((n1-1) × T1 / N1) seconds where the syllable or phoneme representing the character is highly likely to be reproduced. In this way, in the present embodiment, the user wants to check the character string represented by the displayed text image I1, such as an important part such as date, time, place, or a part whose meaning is unknown. The voice of the part can be confirmed pinpoint.

また本実施形態では、上述したような単純な方法で再生位置の特定が可能であるため、メッセージの音声を構成する音節や音素の再生位置と当該音節や音素が表す文字とを対応付けて管理する必要がない。そのためメッセージの音声を構成する音節や音素の再生位置と当該音節や音素が表す文字との対応関係を示すデータが留守番電話処理サーバ１０から携帯電話端末１４に送信される必要がない。そのため当該データに相当する通信量だけ節約されることとなる。 In this embodiment, since the playback position can be specified by the simple method as described above, the playback position of the syllable or phoneme constituting the voice of the message is associated with the character represented by the syllable or phoneme. There is no need to do. For this reason, it is not necessary for the answering machine processing server 10 to transmit to the mobile phone terminal 14 data indicating the correspondence between the reproduction positions of the syllables and phonemes constituting the voice of the message and the characters represented by the syllables and phonemes. Therefore, the communication amount corresponding to the data is saved.

以下、文字の指定に応じたピンポイントでの音声の再生を中心に、本実施形態に係る携帯電話端末１４の機能並びに本実施形態に係る携帯電話端末１４で実行される処理についてさらに説明する。なお本実施形態に係る携帯電話端末１４は、指定された文字に対応付けられる音声を再生する音声再生装置としての役割を担うこととなる。 Hereinafter, the function of the mobile phone terminal 14 according to the present embodiment and the processing executed by the mobile phone terminal 14 according to the present embodiment will be further described with a focus on reproduction of voice at a pinpoint according to the designation of characters. Note that the mobile phone terminal 14 according to the present embodiment plays a role as an audio reproduction device that reproduces audio associated with a designated character.

図７は、本実施形態に係る携帯電話端末１４で実装される機能の一例を示す機能ブロック図である。なお、本実施形態に係る携帯電話端末１４で、図７に示す機能のすべてが実装される必要はなく、また、図７に示す機能以外の機能が実装されていても構わない。 FIG. 7 is a functional block diagram showing an example of functions implemented in the mobile phone terminal 14 according to the present embodiment. Note that not all of the functions shown in FIG. 7 need be implemented in the mobile phone terminal 14 according to the present embodiment, and functions other than the functions shown in FIG. 7 may be implemented.

図７に示すように、本実施形態に係る携帯電話端末１４は、機能的には例えば、留守番電話データ受信部３０、留守番電話データ記憶部３２、表示制御部３４、指定受付部３６、再生位置決定部３８、音声再生部４０、を含んでいる。留守番電話データ受信部３０は、通信部１４ｃを主として実装される。留守番電話データ記憶部３２は、記憶部１４ｂを主として実装される。表示制御部３４、指定受付部３６は、制御部１４ａ及びタッチパネル１４ｄを主として実装される。再生位置決定部３８は、制御部１４ａを主として実装される。音声再生部４０は、制御部１４ａ及び音声入出力部１４ｅを主として実装される。 As shown in FIG. 7, the cellular phone terminal 14 according to the present embodiment functionally includes, for example, an answering machine data receiving unit 30, an answering machine data storage unit 32, a display control unit 34, a designation receiving unit 36, a reproduction position. A determination unit 38 and an audio reproduction unit 40 are included. The answering machine data receiving unit 30 is mainly implemented by the communication unit 14c. The answering machine data storage unit 32 is mainly implemented by the storage unit 14b. The display control unit 34 and the designation receiving unit 36 are mainly implemented by the control unit 14a and the touch panel 14d. The reproduction position determination unit 38 is mainly implemented by the control unit 14a. The audio reproducing unit 40 is mainly implemented with a control unit 14a and an audio input / output unit 14e.

以上の機能は、コンピュータである携帯電話端末１４にインストールされた、以上の機能に対応する指令を含むプログラム（上述の本実施形態に係る留守番電話プログラム）を制御部１４ａで実行することにより実装される。このプログラムは、例えば、光ディスク、磁気ディスク、磁気テープ、光磁気ディスク、フラッシュメモリ等のコンピュータ読み取り可能な情報記憶媒体を介して、あるいは、インターネットなどを介して携帯電話端末１４に供給される。 The above functions are implemented by executing, in the control unit 14a, a program (an answering machine program according to the above-described embodiment) that is installed in the mobile phone terminal 14 that is a computer and that includes instructions corresponding to the above functions. The This program is supplied to the mobile phone terminal 14 via a computer-readable information storage medium such as an optical disc, a magnetic disc, a magnetic tape, a magneto-optical disc, or a flash memory, or via the Internet.

留守番電話データ受信部３０は、本実施形態では例えば、図３に示すＳ２０７に示す処理で留守番電話処理サーバ１０が送信する留守番電話データを受信する。 In this embodiment, for example, the answering machine data receiving unit 30 receives the answering machine data transmitted by the answering machine processing server 10 in the process shown in S207 of FIG.

留守番電話データ記憶部３２は、本実施形態では例えば、留守番電話データ受信部３０が受信した留守番電話データを記憶する。 In the present embodiment, the answering machine data storage unit 32 stores, for example, the answering machine data received by the answering machine data receiving unit 30.

表示制御部３４は、本実施形態では例えば、音声の認識結果である文字列が配置された、図４に示すメッセージ一覧画面２０や図５Ａ〜図５Ｃ、図６Ａ、及び、図６Ｂに示す音声再生画面２２をタッチパネル１４ｄに表示させる。また上述したように、表示制御部３４が、音声が再生されている部分に対応付けられる文字を強調表示させてもよい。 In the present embodiment, for example, the display control unit 34 displays the message list screen 20 shown in FIG. 4 and the voices shown in FIGS. 5A to 5C, FIG. 6A, and FIG. The reproduction screen 22 is displayed on the touch panel 14d. Further, as described above, the display control unit 34 may highlight the character associated with the portion where the sound is reproduced.

指定受付部３６は、本実施形態では例えば、音声の認識結果である文字列に含まれる１又は複数の文字の指定を受け付ける。指定受付部３６は、本実施形態では例えば、タッチパネル１４ｄに対するタップ操作が行われた際には、タップされた位置に配置されている文字又はタップされた位置からの距離が最も短い位置に配置されている文字を、指定された文字として受け付ける。 In the present embodiment, for example, the designation receiving unit 36 receives designation of one or more characters included in a character string that is a speech recognition result. In the present embodiment, for example, when a tap operation is performed on the touch panel 14d, the designation receiving unit 36 is arranged at the position where the character arranged at the tapped position or the distance from the tapped position is the shortest. Is accepted as the specified character.

なお指定受付部３６が、テキスト画像Ｉ１が表すテキスト画像Ｉ１が表す文字列を分割した複数の部分文字列のうちのいずれかの指定を受け付けてもよい。ここで部分文字列とは、テキスト画像Ｉ１が表す文字列を、例えば、文単位、語単位、行単位、所定数の文字単位、などといった所定の単位で分割したものを指すこととする。本実施形態では部分文字列のそれぞれは１又は複数の文字から構成されていることとする。例えばテキスト画像Ｉ１が表す文字列が１文字単位で複数の部分文字列に分割された場合は、複数の部分文字列のそれぞれには１の文字が含まれることとなる。なお複数の部分文字列のそれぞれに含まれる文字の数は同じであっても異なっていてもよい。 Note that the designation receiving unit 36 may receive the designation of any one of a plurality of partial character strings obtained by dividing the character string represented by the text image I1 represented by the text image I1. Here, the partial character string refers to a character string represented by the text image I1 divided by a predetermined unit such as a sentence unit, a word unit, a line unit, or a predetermined number of character units. In the present embodiment, each partial character string is composed of one or more characters. For example, when the character string represented by the text image I1 is divided into a plurality of partial character strings in units of one character, each of the plurality of partial character strings includes one character. The number of characters included in each of the plurality of partial character strings may be the same or different.

再生位置決定部３８は、本実施形態では例えば、音声の認識結果である文字列に含まれる１又は複数の文字の指定に応じて、当該１又は複数の文字の当該文字列における位置に対応付けられる再生位置を、音声を再生させる再生位置として決定する。 In the present embodiment, for example, the reproduction position determination unit 38 associates the position of the one or more characters with the position in the character string in accordance with the designation of one or more characters included in the character string that is the speech recognition result. The playback position to be played is determined as a playback position for playing back the sound.

ここで例えば、指定受付部３６が先頭からｎ番目の部分文字列の指定を受け付けたとする。この場合、再生位置決定部３８は、再生時間を部分文字列の数で複数の部分時間に分割した場合における先頭からｎ番目の部分時間に属する再生位置を、音声を再生させる再生位置として決定してもよい。あるいはこの場合に、再生位置決定部３８が、先頭からｎ番目の部分時間に属する再生位置の所定時間前（例えば２秒前）に相当する再生位置を、音声を再生させる再生位置として決定してもよい。なおここで先頭からｎ番目の部分時間に属する再生位置は、例えば先頭からｎ番目の部分時間の先頭の再生位置であってもよい。また上記複数の部分時間のそれぞれは、再生時間を等時間間隔で分割したものであってもよい。 Here, for example, it is assumed that the designation receiving unit 36 receives the designation of the nth partial character string from the top. In this case, the playback position determination unit 38 determines the playback position belonging to the nth partial time from the beginning when the playback time is divided into a plurality of partial times by the number of partial character strings as the playback position for playing back sound. May be. Alternatively, in this case, the playback position determination unit 38 determines a playback position corresponding to a predetermined time before the playback position belonging to the nth partial time from the beginning (for example, 2 seconds before) as a playback position for playing back sound. Also good. Here, the reproduction position belonging to the nth partial time from the beginning may be, for example, the reproduction position at the beginning of the nth partial time from the beginning. Each of the plurality of partial times may be obtained by dividing the reproduction time at equal time intervals.

例えば再生時間をＴ２秒、部分文字列の数をＮ２とした際に、先頭からｎ２番目の部分文字列が指定されたとする。この場合には例えば（（（ｎ２−１）×Ｔ２／Ｎ２）−Δ）秒の再生位置が、音声を再生させる再生位置として決定されてもよい。例えば図６Ａに示すようにテキスト画像Ｉ１が表す文字列が８行であり、部分文字列は、当該文字列を行単位で分割したものとする。この場合はＴ２＝２４秒、Ｎ２＝８となる。ここで例えば３行目の部分文字列が指定された場合に、（（（３−１）×２４／８）−２）＝４秒の再生位置が、音声を再生させる生成位置として決定されてもよい。 For example, when the playback time is T2 seconds and the number of partial character strings is N2, it is assumed that the n2th partial character string from the beginning is designated. In this case, for example, a reproduction position of (((n2-1) × T2 / N2) −Δ) seconds may be determined as a reproduction position for reproducing sound. For example, as shown in FIG. 6A, the character string represented by the text image I1 is 8 lines, and the partial character string is obtained by dividing the character string in line units. In this case, T2 = 24 seconds and N2 = 8. Here, for example, when a partial character string on the third line is designated, a reproduction position of (((3-1) × 24/8) -2) = 4 seconds is determined as a generation position for reproducing sound. Also good.

音声再生部４０は、本実施形態では例えば、音声の認識結果である文字列に含まれる１又は複数の文字の指定に応じて、当該１又は複数の文字の当該文字列における位置に対応付けられる再生位置から当該音声を再生させる。音声再生部４０は、本実施形態では例えば、再生位置決定部３８が音声を再生させる再生位置として決定する再生位置から音声を再生させる。また音声再生部４０は、文字の指定に応じてメッセージの音声が再生される再生位置から所定時間の部分（例えば５秒）を繰り返し再生するようにしてもよい。また、音声再生部４０は、始点と終点の文字の指定に応じて始点に対応付けられる再生位置から終点に対応付けられる再生位置までを繰り返し再生するようにしてもよい。 In the present embodiment, for example, in accordance with the designation of one or more characters included in the character string that is the speech recognition result, the voice reproducing unit 40 is associated with the position of the one or more characters in the character string. The sound is reproduced from the reproduction position. In the present embodiment, for example, the audio reproduction unit 40 reproduces audio from a reproduction position determined as a reproduction position by which the reproduction position determination unit 38 reproduces audio. In addition, the voice reproduction unit 40 may repeatedly reproduce a portion of a predetermined time (for example, 5 seconds) from the reproduction position where the voice of the message is reproduced according to the designation of the character. In addition, the audio reproduction unit 40 may repeatedly reproduce from the reproduction position associated with the start point to the reproduction position associated with the end point according to the designation of the start point and end point characters.

また本実施形態では、再生位置決定部３８が音声を再生させる再生位置として決定する再生位置に応じて、表示制御部３４は、音声再生画面２２の表示内容を更新する。 In the present embodiment, the display control unit 34 updates the display content of the audio reproduction screen 22 in accordance with the reproduction position that is determined as the reproduction position at which the reproduction position determination unit 38 reproduces the audio.

以下、指定受付部３６が部分文字列の指定を受け付けた際に本実施形態に係る携帯電話端末１４において行われる処理の流れの一例を、図８に示すフロー図を参照しながら説明する。 Hereinafter, an example of the flow of processing performed in the mobile phone terminal 14 according to the present embodiment when the designation receiving unit 36 receives designation of a partial character string will be described with reference to the flowchart shown in FIG.

指定受付部３６が部分文字列の指定を受け付けると、まず、再生位置決定部３８が、当該部分文字列を特定する（Ｓ３０１）。そして再生位置決定部３８が、Ｓ３０１に示す処理で特定された部分文字列に基づいて、上述のようにして音声を再生させる再生位置を決定する（Ｓ３０２）。そして表示制御部３４が、Ｓ３０２に示す処理で決定された再生位置に基づいて表示内容を更新し、音声再生部４０が、Ｓ３０２に示す処理で決定された再生位置から音声を再生して（Ｓ３０３）、本処理例に示す処理は終了される。 When the designation receiving unit 36 receives designation of a partial character string, first, the reproduction position determining unit 38 identifies the partial character string (S301). Then, the reproduction position determination unit 38 determines a reproduction position for reproducing the sound as described above based on the partial character string specified in the process shown in S301 (S302). Then, the display control unit 34 updates the display content based on the reproduction position determined in the process shown in S302, and the audio reproduction unit 40 reproduces sound from the reproduction position determined in the process shown in S302 (S303). ), The processing shown in this processing example is terminated.

なお、本発明は上述の実施形態に限定されるものではない。 In addition, this invention is not limited to the above-mentioned embodiment.

例えば、留守番電話データに、メッセージの音声を構成する音節や音素の再生位置と当該音節や音素が表す文字との対応関係を示すデータが含まれていてもよい。そして再生位置決定部３８が、指定された部分文字列に対応付けられる音節や音素の再生位置、又は、当該再生位置の所定時間前から音声を再生させてもよい。 For example, the answering machine data may include data indicating the correspondence between the reproduction positions of the syllables and phonemes constituting the voice of the message and the characters represented by the syllables and phonemes. Then, the playback position determination unit 38 may play back the sound from the playback position of the syllable or phoneme associated with the designated partial character string or a predetermined time before the playback position.

また例えば本発明の適用範囲は携帯電話端末１４に限定されない。本発明を例えばパーソナルコンピュータ等のコンピュータ一般に適用してもよい。 For example, the application range of the present invention is not limited to the mobile phone terminal 14. The present invention may be applied to a general computer such as a personal computer.

また、上記の具体的な文字列や数値及び図面中の具体的な文字列や数値は例示であり、これらの文字列や数値には限定されない。 The specific character strings and numerical values described above and the specific character strings and numerical values in the drawings are examples, and are not limited to these character strings and numerical values.

１留守番電話システム、１０留守番電話処理サーバ、１２音声認識サーバ、１４携帯電話端末、１４ａ制御部、１４ｂ記憶部、１４ｃ通信部、１４ｄタッチパネル、１４ｅ音声入出力部、１６電話通信網、１８インターネット、２０メッセージ一覧画面、２２音声再生画面、３０留守番電話データ受信部、３２留守番電話データ記憶部、３４表示制御部、３６指定受付部、３８再生位置決定部、４０音声再生部。 DESCRIPTION OF SYMBOLS 1 Answering machine system, 10 Answering machine processing server, 12 Voice recognition server, 14 Mobile phone terminal, 14a Control part, 14b Storage part, 14c Communication part, 14d Touch panel, 14e Voice input / output part, 16 Telephone communication network, 18 Internet, 20 message list screen, 22 voice playback screen, 30 answering machine data receiving unit, 32 answering machine data storage unit, 34 display control unit, 36 designation receiving unit, 38 playback position determining unit, 40 voice playback unit.

Claims

A display control unit for displaying a character string that is a speech recognition result;
An audio reproduction unit that reproduces the audio from a reproduction position associated with a position of the one or more characters in the character string in accordance with designation of one or more characters included in the character string;
A sound reproducing device comprising:

When the nth partial character string from the top in the case where the character string is divided into a plurality of partial character strings each composed of one or a plurality of characters, the sound reproduction unit reproduces the sound When the time is divided into a plurality of partial times by the number of partial character strings, the sound is reproduced from a reproduction position belonging to the nth partial time from the beginning or a reproduction position corresponding to a predetermined time before the reproduction position. Let
The sound reproducing apparatus according to claim 1, wherein

Each of the plurality of partial times is obtained by dividing the reproduction time of the sound at equal time intervals.
The sound reproducing apparatus according to claim 2, wherein

The playback position belonging to the nth partial time is the first playback position of the nth partial time.
The audio reproducing apparatus according to claim 2 or 3, wherein

The display control unit highlights a character associated with a portion where the sound is reproduced,
The sound reproducing device according to any one of claims 1 to 4, wherein

Displaying a character string that is a speech recognition result;
Reproducing the sound from a reproduction position associated with a position of the one or more characters in the character string in accordance with designation of one or more characters included in the character string;
An audio reproduction method comprising:

The procedure to display the character string that is the speech recognition result,
A procedure for reproducing the sound from a reproduction position associated with a position of the one or more characters in the character string in accordance with designation of one or more characters included in the character string;
A program that causes a computer to execute.