JP4385710B2

JP4385710B2 - Audio signal processing apparatus and audio signal processing method

Info

Publication number: JP4385710B2
Application number: JP2003345147A
Authority: JP
Inventors: 浩幸武石; 豊一井
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2003-10-03
Filing date: 2003-10-03
Publication date: 2009-12-16
Anticipated expiration: 2023-10-03
Also published as: JP2005114781A

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem in which speaking-speed conversion is performed double when speaking-speed conversion processing is performed even on a transmission side depending upon a broadcast program in addition to a speaking-speed converting function on a reception side. <P>SOLUTION: A transmission device 1 packetizes and multiplexes a voice signal and also multiplexes and transmits speaking-speed conversion completion information indicating whether speaking-speed conversion of each voice signal is already performed on the transmission side or at a sound source before it as information attached to the voice signal. A voice signal processing apparatus 10A detects the speaking-speed conversion completion information in a received multiplexed signal by a speaking-speed conversion information detection part 13. A speaking-speed conversion processing part 14 decides whether speaking-speed conversion of a selected voice signal of a selected program after decoding which is outputted from a voice signal decoder 12 is already performed before the transmission, and turns off speaking-speed conversion processing operation when the speaking-speed conversion is already performed before the transmission to prevent speaking-speed conversion from being performed on both the transmission side and reception side in duplication, and performs speaking-speed conversion processing when the speaking-speed conversion is not performed before the transmission. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は音声信号処理装置及び音声信号処理方法に係り、特に高齢者に音声を聞き易くする、いわゆる話速変換機能を備えた音声信号処理装置及び音声信号処理方法に関する。 The present invention relates to an audio signal processing apparatus and an audio signal processing method, and more particularly to an audio signal processing apparatus and an audio signal processing method having a so-called speech speed conversion function that makes it easier for elderly people to hear audio.

一般に、高齢者においては早口で喋る音声から内容を聞き取ることが若い時よりも困難になる傾向があることがわかっている。このため、入力音声信号から発声の途切れる箇所を見つけて、この期間の時間を減らす替わりに、発声している期間の音声をピッチを変えずに時間軸上で引き伸ばして、全体としてゆっくりとした発声に変換する、いわゆる話速変換の機能を備えた音声信号処理装置及び音声信号処理方法が従来より知られている（例えば、特許文献１参照）。 In general, it has been found that in elderly people, it is more difficult to listen to the content from fast-speaking voice than when young. For this reason, instead of finding a part where the utterance is interrupted from the input audio signal and reducing the time of this period, the voice during the utterance period is stretched on the time axis without changing the pitch, and the overall utterance is slow. An audio signal processing device and an audio signal processing method having a so-called speech speed conversion function for converting to an audio signal have been conventionally known (see, for example, Patent Document 1).

この特許文献１記載の音声信号処理装置では、予め送信データ、録音メディア等に話速を制御するための話速制御情報を入れておき、送信データを受信して再生する再生装置または録音メディアの再生装置において、話速制御情報に基づいて話速を制御する。また、話速変換機能を搭載したラジオも開発されている（例えば、非特許文献１参照）。 In the audio signal processing device described in Patent Document 1, speech speed control information for controlling the speech speed is previously inserted in transmission data, recording media, etc., and a playback device or recording media for receiving and reproducing the transmission data is used. In the playback device, the speech speed is controlled based on the speech speed control information. A radio equipped with a speech speed conversion function has also been developed (see, for example, Non-Patent Document 1).

特開平８−１４６９８５号公報JP-A-8-146985 今井、都木、蓮田、武石著：「聴取補助機能を備えたラジオの開発」、電子情報通信学会、信学技報ＴＬ２００３−７、２００３年６月Imai, Tsuzuki, Hasuda, Takeishi: "Development of radio with listening assistance function", IEICE, IEICE Technical Report TL2003-7, June 2003

ところで、この様な話速変換技術は、放送業者によっても精力的に研究されており、将来はテレビジョン放送の音声信号やラジオ放送において、送信側で話速変換処理を行った音声信号を送信し、高齢者の視聴に便宜を図ることが考えられる。このため、受信側での話速変換機能に加えて、放送などにおいて番組によっては送信側でも話速変換処理が行われる様になると、話速変換が二重にかかってしまうことも起こり得る様になる。 By the way, this kind of speech speed conversion technology has been studied energetically by broadcasters. In the future, audio signals for television broadcasting and radio signals for which speech speed conversion processing has been performed on the transmission side will be transmitted. However, it is conceivable to facilitate the viewing of elderly people. For this reason, in addition to the speech speed conversion function on the receiving side, if the transmission speed conversion processing is also performed on the transmission side for some programs in broadcasting, etc., it may happen that the speech speed conversion is doubled. become.

その場合、すでに話速変換がかけられた音声信号に更に話速変換を施すことになり、必要以上にゆっくりした音声になってしまったりして、かえって聞き苦しい音になるという問題がある。これに関しては、従来の受信機では、ユーザーが手動で話速変換機能をオフにすればよいが、番組によってその都度話速変換の有無の切り換え操作を行わなければならず、操作が煩わしいという問題点が考えられる。また、高齢者自身がこれを切り換えることは実使用において考えづらい。 In that case, the speech signal that has already undergone the speech speed conversion is further subjected to the speech speed conversion, resulting in a sound that is slower than necessary, which is rather difficult to hear. In this regard, with conventional receivers, the user only has to manually turn off the speech speed conversion function. A point is considered. In addition, it is difficult for an elderly person to switch this in actual use.

本発明は以上の点に鑑みなされたもので、放送された音声信号に付随する情報を利用して、話速変換が送信側と受信側で二重に処理が行われてしまうことを自動的に防止し得る音声信号処理装置及び音声信号処理方法を提供することを目的とする。 The present invention has been made in view of the above points, and automatically uses the information associated with the broadcast audio signal to automatically convert the speech speed conversion between the transmitting side and the receiving side. An object of the present invention is to provide an audio signal processing apparatus and an audio signal processing method that can be prevented.

また、本発明の他の目的は、受信する番組に応じた話速変換の有無の切り換え操作を不要とし得る音声信号処理装置及び音声信号処理方法を提供することにある。 Another object of the present invention is to provide an audio signal processing apparatus and an audio signal processing method that can eliminate the need for switching operation of whether or not speech speed conversion is performed according to a program to be received.

上記の目的を達成するため、第１の発明の音声信号処理装置は、音声信号と、音声信号に対し送信側で音声ピッチを変えることなく、時間軸を変化させる話速変換処理が施されているか否かを示す話速変換に関する付随情報とが多重された多重信号を受信する受信手段と、受信手段により受信された多重信号中から話速変換に関する付随情報を検出して、その内容を判別する検出手段と、受信手段により受信された多重信号中から音声信号を再生する再生手段と、検出手段により検出された話速変換に関する付随情報の内容が送信側で既に話速変換が施されていると判定された場合には、再生手段により再生された音声信号の話速変換処理は行わず、話速変換に関する付随情報の内容が送信側で話速変換が施されていないと判定された場合には、再生手段により再生された音声信号の話速変換処理を行う話速変換処理手段とを有する構成としたものである。 In order to achieve the above object, the speech signal processing apparatus of the first invention is subjected to speech speed conversion processing for changing the time axis without changing the speech pitch on the transmission side for the speech signal and the speech signal. Receiving means for receiving a multiplexed signal in which accompanying information related to speech speed conversion indicating whether or not there is multiplexed, and detecting the accompanying information related to speech speed conversion from the multiplexed signal received by the receiving means, and determining the contents The content of the accompanying information related to speech speed conversion detected by the detection means has already been subjected to speech speed conversion on the transmission side. If it is determined that the voice signal reproduced by the reproduction means is not subjected to the speech speed conversion process, it is determined that the content of the accompanying information regarding the speech speed conversion is not subjected to the speech speed conversion on the transmission side. In case It is obtained by a structure having a speech speed conversion processing means for performing speech speed conversion processing of the audio signal reproduced by the reproducing means.

この発明では、送信側から送信された音声信号と話速変換に関する付随情報とを受信し、話速変換に関する付随情報に基づき、再生された音声信号の話速変換処理を行うか否かを自動的に判別することができる。 In the present invention, an audio signal transmitted from the transmission side and accompanying information related to speech speed conversion are received, and whether or not speech speed conversion processing of the reproduced audio signal is performed based on the accompanying information related to speech speed conversion is automatically performed. Can be determined automatically.

また、上記の目的を達成するため、第２の発明の音声信号処理装置は、送信側で音声ピッチを変えることなく、時間軸を変化させる話速変換処理が施されていない音声信号と話速変換処理が施された音声信号とを含む複数の音声信号と、複数の音声信号それぞれが、話速変換処理が施された音声信号であるか否かを示す話速変換済情報と、話速変換処理が施されていない音声信号に対応して話速変換処理が施された音声信号が存在するか否かを示す対応話速音声情報とが多重された多重信号を受信する受信手段と、受信手段により受信された多重信号中から話速変換済情報及び対応話速変換音声情報を検出して、その内容を判別する検出手段と、受信手段により受信された多重信号中から複数の音声信号のうちのいずれかを選択して再生する再生手段と、音声信号に対して話速変換処理を行う話速変換処理手段と、制御手段とを有することを特徴とする。ここで、上記制御手段は、再生手段により複数の音声信号のうちのいずれかを選択して再生する際に、選択した音声信号が、検出手段によって話速変換処理が施されている音声信号であると判定された場合には、話速変換処理手段をオフにして選択した音声信号に対して話速変換処理を行わず、再生手段により複数の音声信号のうちのいずれかを選択して再生する際に、選択した音声信号が、検出手段によって話速変換処理が施されていない音声信号であり、かつ、対応する話速変換処理が施された音声信号が存在すると判定された場合には、選択した音声信号に代えて対応する話速変換処理が施された音声信号に切り換え、話速変換処理手段をオフにして切り換えた音声信号に対して話速変換処理を行わず、再生手段により複数の音声信号のうちのいずれかを選択して再生する際に、選択した音声信号が、検出手段によって話速変換処理が施されていない音声信号であり、かつ、対応する話速変換処理が施された音声信号が存在しないと判定された場合には、話速変換処理手段をオンにして選択した音声信号に対して話速変換処理を行うよう制御する。 In order to achieve the above object, an audio signal processing device according to a second aspect of the present invention provides an audio signal and a speech speed that are not subjected to speech speed conversion processing for changing the time axis without changing the audio pitch on the transmission side. A plurality of audio signals including the audio signal subjected to the conversion processing, speech speed converted information indicating whether each of the plurality of audio signals is an audio signal subjected to the speech speed conversion processing, and the speech speed Receiving means for receiving a multiplexed signal in which corresponding speech speed speech information indicating whether or not there is a speech signal that has been subjected to speech speed conversion processing corresponding to a speech signal that has not been subjected to conversion processing ; Detection means for detecting speech speed converted information and corresponding speech speed converted voice information from the multiplexed signal received by the receiving means, and determining the contents thereof, and a plurality of voice signals from the multiplexed signal received by the receiving means to select and play one of the Raw unit, a speech speed conversion processing means for performing speech speed conversion process on the audio signal, characterized by a control unit. Here, when the reproducing unit selects and reproduces one of the plurality of audio signals by the reproducing unit, the selected audio signal is an audio signal that has been subjected to speech speed conversion processing by the detecting unit. If it is determined that there is, the speech speed conversion processing means is turned off and the speech speed conversion process is not performed on the selected audio signal, and one of the plurality of audio signals is selected and reproduced by the reproduction means. When it is determined that the selected speech signal is a speech signal that has not been subjected to speech speed conversion processing by the detecting means, and a speech signal that has been subjected to the corresponding speech speed conversion processing exists. Instead of the selected voice signal, the voice signal is switched to the corresponding voice speed conversion process, the voice speed conversion processing means is turned off and the voice speed conversion process is not performed on the switched voice signal. Multiple audio signals When any one of them is played back, the selected audio signal is an audio signal that has not been subjected to speech speed conversion processing by the detecting means, and an audio signal that has been subjected to the corresponding speech speed conversion processing. When it is determined that there is no voice rate, the speech speed conversion processing means is turned on and control is performed to perform the speech speed conversion process on the selected speech signal .

また、上記の目的を達成するため、第３の発明の音声信号処理方法は、音声信号と、音声信号に対し送信側で音声ピッチを変えることなく、時間軸を変化させる話速変換処理が施されているか否かを示す話速変換に関する付随情報とが多重された多重信号を受信する第１のステップと、受信された多重信号中から話速変換に関する付随情報を検出して、その内容を判別する第２のステップと、受信された多重信号中から音声信号を再生する第３のステップと、第２のステップにより検出された話速変換に関する付随情報の内容が送信側で話速変換が施されていないと判定された場合には、再生された音声信号の話速変換処理を行う第４のステップと、第２のステップにより検出された話速変換に関する付随情報の内容が送信側で既に話速変換が施されていると判定された場合には、再生された音声信号の話速変換処理は行うことなく出力する第５のステップとを含むことを特徴とする。 In order to achieve the above object, the speech signal processing method of the third invention performs speech speed conversion processing for changing the time axis without changing the speech pitch on the transmission side with respect to the speech signal. A first step of receiving a multiplexed signal multiplexed with accompanying information related to speech rate conversion indicating whether or not the speech rate conversion is performed, and detecting the accompanying information related to speech rate conversion from the received multiplexed signal, The second step of determining, the third step of reproducing the audio signal from the received multiplexed signal, and the content of the accompanying information related to the speech speed conversion detected by the second step are the speech speed conversion on the transmission side. If it is determined that the speech rate has not been applied, the transmitting side transmits the fourth step for performing the speech speed conversion processing of the reproduced audio signal and the content of the accompanying information regarding the speech speed conversion detected in the second step. Already speaking speed conversion When it is determined to be performed it is characterized in that it comprises a fifth step of outputting without speech rate conversion of the reproduced audio signal.

本発明によれば、送信側から送信された音声信号と話速変換に関する付随情報とを受信し、話速変換に関する付随情報に基づき、再生された音声信号の話速変換処理を行うか否かを自動的に判別するようにしたため、受信側の音声信号処理装置のユーザがその都度切り替えを行わなくても、受信した音声信号の話速変換について常に最適な設定で聞くことができる。 According to the present invention, whether the speech signal transmitted from the transmission side and the accompanying information related to speech speed conversion are received, and the speech speed conversion processing of the reproduced speech signal is performed based on the accompanying information related to speech speed conversion. Therefore, even if the user of the audio signal processing device on the receiving side does not switch each time, the speech speed conversion of the received audio signal can always be heard with the optimum setting.

また、本発明によれば、話速変換済みの受信音声信号に対しては、話速変換処理を行わないようにしたため、ユーザが選局受信する毎に話速変換の切り換えを行わなくても、話速変換が送信側と受信側で二重に処理が行われてしまうことを自動的に防止できる。 In addition, according to the present invention, since the speech speed conversion process is not performed on the received speech signal that has been subjected to the speech speed conversion, it is not necessary to switch the speech speed every time the user selects and receives a channel. It is possible to automatically prevent speech speed conversion from being performed twice on the transmission side and the reception side.

また、本発明によれば、送信装置から送信されている話速変換を行った音声信号を極力利用するようにしたため、受信側の音声信号処理装置で話速変換を行ったときに、音の途切れる個所が無く、話速変換そのものがうまくいかないという現象を極力回避できると共に、音声信号処理装置の消費電力を低減することができる。 Further, according to the present invention, since the speech signal that has been subjected to speech speed conversion transmitted from the transmission device is used as much as possible, when speech speed conversion is performed by the speech signal processing device on the receiving side, It is possible to avoid as much as possible the phenomenon that the speech speed conversion itself does not work because there is no break, and the power consumption of the audio signal processing device can be reduced.

次に、本発明を実施するための最良の形態について、図面と共に説明する。本発明の音声信号処理装置及び音声信号処理方法は、放送側で音声信号に付随する情報が送られてきているのを前提に、それを用いて話速変換機能のオン・オフを制御するものである。以下、いくつかの考えられる実施の形態について述べる。 Next, the best mode for carrying out the present invention will be described with reference to the drawings. The audio signal processing apparatus and audio signal processing method according to the present invention controls on / off of the speech speed conversion function using information accompanying the audio signal on the broadcast side. It is. Several possible embodiments are described below.

図１は本発明になる音声信号処理装置の第１の実施の形態のブロック図を示す。図１において、第１の実施の形態の音声信号処理装置１０Ａは、送信装置１に伝送路３を介して接続されており、送信装置１から送信された、多数の音声信号がパケット多重化された信号を伝送路３を介して受信し、更にその受信信号中から所望の音声信号を選択受信してデコードして音声出力を得ると共に、その音声出力の話速変換機能を有している。 FIG. 1 shows a block diagram of a first embodiment of an audio signal processing apparatus according to the present invention. In FIG. 1, an audio signal processing device 10A according to the first embodiment is connected to a transmission device 1 via a transmission path 3, and a large number of audio signals transmitted from the transmission device 1 are packet-multiplexed. The received signal is received via the transmission line 3, and a desired voice signal is selectively received from the received signal and decoded to obtain a voice output, and has a speech speed conversion function for the voice output.

ここで、送信装置１は、パケット化部２において、各音声信号（図１では第１から第５までの音声信号が示されている）をそれぞれ音声パケットにパケット化した後、各音声パケットに含まれているパケット識別のためのパケットＩＤ（ＰＩＤ）により区別して送信すると共に、音声パケットの他に、制御情報／番組情報のパケットを送信する。受信側で所望の音声信号が選択できる様に、上記の制御情報／番組情報の一種としては、例えば、ＭＰＥＧ（Moving Picture Experts Group）に規定されているプログラム・アソシエーション・テーブル（ＰＡＴ:Program Association Table）及びプログラム・マップ・テーブル（ＰＭＴ:Program Map Table）という情報が送られる。 Here, the transmitting apparatus 1 packetizes each voice signal (the first to fifth voice signals are shown in FIG. 1) into voice packets in the packetizing unit 2, and then sends each voice packet. In addition to the packet ID (PID) for identifying the included packet, the packet is transmitted in addition to the audio packet, and the packet of control information / program information is transmitted. As a kind of the control information / program information, for example, a program association table (PAT: Program Association Table) defined in MPEG (Moving Picture Experts Group) can be used so that a desired audio signal can be selected on the receiving side. ) And a program map table (PMT: Program Map Table).

特定のＰＩＤを持つＰＡＴにおいて、番組を構成する映像や音声のパケットの情報を伝送するＰＭＴのＰＩＤの情報が送られ、ＰＭＴは各番組毎にその番組を構成する映像や音声のパケットのＰＩＤが記されているため、これらを辿って所望の番組の映像や音声の信号のうちの特定のものを抽出することができる。 In a PAT having a specific PID, information on the PMT of a PMT that transmits information on video and audio packets constituting the program is sent, and the PID of the video and audio packets constituting the program for each program is sent to the PMT. Therefore, a specific one of the video and audio signals of a desired program can be extracted by tracing these.

更に、本実施の形態では、音声信号に付随する情報として、各音声信号が話速変換が送信側或いはさらに前の音源ですでに施された信号であるか否かの情報（図２に示す話速変換済情報）を送信するようにしている。この話速変換済情報は、特別に常に所定のＰＩＤで送信するようにしてもよく、ＰＭＴ等においてそのＰＩＤの情報が記される様にして、この情報が受信側で取り出せる様にしてもよい。図２では、各音声パケットのＰＩＤに対して、その音声パケットが話速変換済みであるときは「１」、話速変換が施されていないときは「０」とした情報を表にして送っている。 Furthermore, in the present embodiment, as information accompanying the audio signal, information indicating whether or not each audio signal is a signal that has already been subjected to speech speed conversion at the transmission side or at a previous sound source (shown in FIG. 2). (Speech speed converted information) is transmitted. This speech speed converted information may be specially always transmitted with a predetermined PID, or the information on the PID may be recorded on the PMT or the like so that the information can be retrieved on the receiving side. . In FIG. 2, for the PID of each voice packet, information that is “1” when the voice packet has been subjected to speech speed conversion and “0” when the voice speed conversion has not been performed is sent as a table. ing.

音声信号処理装置１０Ａは、話速変換済情報を含む制御情報／番組情報と音声パケットとの多重信号を伝送路３を介して受信部１１で受信し、受信信号中の音声パケットは音声信号デコーダ１２でデコードし、また、受信信号中の話速変換済情報のパケットが話速変換情報検出部１３においてＰＩＤから抜き取られ、話速変換済情報が検出される。 The audio signal processing apparatus 10A receives a multiplexed signal of control information / program information including speech speed converted information and an audio packet by the receiving unit 11 via the transmission path 3, and the audio packet in the received signal is an audio signal decoder. 12, and the speech speed converted information packet in the received signal is extracted from the PID by the speech speed conversion information detecting unit 13 to detect the speech speed converted information.

話速変換処理部１４は、話速変換情報検出部１３において検出された話速変換済情報に基づき、音声信号デコーダ１２から出力されたデコード後の音声信号に対して図３のフローチャートに従った話速変換処理を行う。すなわち、話速変換処理部１４は、検出された上記話速変換済情報に基づき、音声信号デコーダ１２から出力されたデコード後の選択された音声信号の話速変換が送信前で既に施されているか否かを判定し（図３のステップＳ１０１）、話速変換が送信前に施されている場合は話速変換処理動作をオフとして話速変換処理を行うことなく受信音声信号を出力端子１５へ出力し（図３のステップＳ１０２）、話速変換が送信前に施されていない場合は話速変換処理動作をオンとして受信音声信号に対して、音声区間の信号の時間軸を圧縮／伸張し、更に所定長さ以上の無音区間は削除する公知の話速変換処理を行ってから出力端子１５へ出力する（図３のステップＳ１０３）。 The speech speed conversion processing unit 14 follows the flowchart of FIG. 3 for the decoded speech signal output from the speech signal decoder 12 based on the speech speed converted information detected by the speech speed conversion information detection unit 13. Performs speech speed conversion processing. That is, the speech speed conversion processing unit 14 has already performed speech speed conversion of the selected speech signal after decoding output from the speech signal decoder 12 based on the detected speech speed converted information before transmission. (Step S101 in FIG. 3), if speech speed conversion is performed before transmission, the speech speed conversion processing operation is turned off and the received voice signal is output without performing the speech speed conversion processing. (Step S102 in FIG. 3), when speech speed conversion is not performed before transmission, the speech speed conversion processing operation is turned on and the time axis of the signal in the speech section is compressed / expanded for the received speech signal. Further, after performing a known speech speed conversion process for deleting a silent section of a predetermined length or longer, it is output to the output terminal 15 (step S103 in FIG. 3).

このように、本実施の形態では、音声信号が話速変換が送信側或いはさらに前の音源ですでに施された信号であるか否かの話速変換済情報を送信し、音声信号処理装置１０Ａが上記の話速変換済情報に基づき、受信音声信号が話速変換済みであるか否かを自動判別し、話速変換済みの受信音声信号に対しては、話速変換処理を行わないようにしたため、ユーザが選局受信する毎に話速変換の切り換えを行わなくても、話速変換が送信側と受信側で二重に処理が行われてしまうことを自動的に防止できる。 As described above, in the present embodiment, the speech signal processing apparatus transmits the speech speed converted information indicating whether the speech signal is a signal that has already been subjected to speech speed conversion on the transmission side or a previous sound source. 10A automatically determines whether or not the received speech signal has been speech speed converted based on the speech speed converted information, and does not perform speech speed conversion processing on the received speech signal that has been speech speed converted. Therefore, even if the user does not switch the speech speed conversion every time the channel selection is received, it is possible to automatically prevent the speech speed conversion from being performed twice on the transmission side and the reception side.

次に、本発明の第２の実施の形態について説明する。図４は本発明になる音声信号処理装置の第２の実施の形態のブロック図を示す。同図中、図１と同一構成部分には同一符号を付し、その説明を省略する。図４において、本発明の第２の実施の形態の音声信号処理装置１０Ｂは、伝送路３を介して送信装置４に接続されており、送信装置４から特に映像と複数の音声が組となって構成された番組を複数送信されて、これを受信する。 Next, a second embodiment of the present invention will be described. FIG. 4 shows a block diagram of a second embodiment of an audio signal processing apparatus according to the present invention. In the figure, the same components as those in FIG. In FIG. 4, an audio signal processing device 10B according to the second embodiment of the present invention is connected to a transmission device 4 via a transmission line 3, and a video and a plurality of audios in particular form a set from the transmission device 4. A plurality of programs configured as described above are transmitted and received.

図４に示すように、送信装置４は一つの映像信号と、複数の音声信号からなる番組を、番組＃０１、番組＃０２という様に複数送信する。ここで、各番組の映像信号及び音声信号は、それぞれＰＩＤで区別されてパケット化部５においてパケット化されて多重化されるが、各番組のＰＭＴの情報がＰＡＴにより送られ、各番組についてその中に含まれる映像及び音声信号のＰＩＤの対応表がＰＭＴとして送られるので、音声信号処理装置１０Ｂではこれらを辿って所望の番組の映像信号や音声信号の入ったパケットを特定でき、それらのパケットの中の情報をデコードすることにより映像や音声を得ることができる。 As shown in FIG. 4, the transmission device 4 transmits a plurality of programs including one video signal and a plurality of audio signals, such as program # 01 and program # 02. Here, the video signal and the audio signal of each program are distinguished by PID and packetized and multiplexed by the packetizing unit 5, but the PMT information of each program is sent by PAT, Since the correspondence table of PIDs of video and audio signals included therein is sent as PMT, the audio signal processing device 10B can trace these and specify packets containing video signals and audio signals of a desired program. Video and audio can be obtained by decoding the information in the.

ここで、音声信号処理装置１０Ｂは、その受信部１１においてパケット多重された伝送信号を受信するが、受信されたパケット中の信号の中でＰＡＴ、ＰＭＴ等の制御用信号がマイクロプロセッサ１７に供給され、これらの情報とヒューマンインターフェイス手段（ボタンやキーボード、表示とカーソルキーなどを適宜用いればよい）１６によりユーザが入力した情報とにより、ユーザにとって所望の番組の映像及び音声信号が抽出され、受信部１１内の誤り訂正などを経て音声信号は音声信号デコーダ１２に供給され、映像信号は映像信号デコーダ１８に供給される。ここで、各番組に音声信号は複数あるが、それらの中のどれをデコードし出力するかについてもユーザにより選択が可能であるとする。 Here, the audio signal processing apparatus 10 </ b> B receives the transmission signal that has been packet-multiplexed by the receiving unit 11, but supplies control signals such as PAT and PMT to the microprocessor 17 among the signals in the received packet. Based on this information and information input by the user through human interface means 16 (buttons, keyboard, display and cursor keys may be used as appropriate) 16, video and audio signals of a program desired by the user are extracted and received. The audio signal is supplied to the audio signal decoder 12 through error correction in the unit 11, and the video signal is supplied to the video signal decoder 18. Here, although there are a plurality of audio signals in each program, it is assumed that the user can select which of them is decoded and output.

こうして選択された映像信号は映像信号デコーダ１８により、例えばＭＰＥＧ２などの方式でエンコードされた信号をデコードして、映像信号として映像出力端子１９へ出力する。一方、選択された音声信号は、音声信号デコーダ１２によりデコードされて、話速変換処理部１４に入力されるが、ここでは後述する様に、場合によって、話速変換が施されるか、或いは、話速変換処理をオフにしたり、話速変換処理部１４をバイパスする等により、実質上ここでは音声信号に話速変換が施されない様にするか、いずれかに切り換えられる。話速変換処理部１４から出力された音声信号は、図示せぬＤＡ変換部を経てアナログ音声信号として音声出力端子１５へ出力される。 The video signal selected in this way is decoded by a video signal decoder 18 by a method such as MPEG2, and is output to the video output terminal 19 as a video signal. On the other hand, the selected audio signal is decoded by the audio signal decoder 12 and is input to the speech speed conversion processing unit 14. In this case, the speech speed conversion process is turned off or the speech speed conversion processing unit 14 is bypassed, so that the speech signal is substantially not subjected to the speech speed conversion here. The audio signal output from the speech speed conversion processing unit 14 is output to the audio output terminal 15 as an analog audio signal via a DA conversion unit (not shown).

図５（ａ）は番組内の音声信号の組の例を示す。この例では、日本語主音声、日本語副音声、英語の３つの基本的な音声信号があり、それらの中で日本語主音声及び英語については、送信側ですでに話速変換を施された音声信号がこれらの音声信号とは別に送られている。さらにＢＧＭ（バックグラウンドミュージック）としてこの番組の映像と共に再生するのに好適な音楽のみの音声信号や、新番組情報や番組時間変更などの「局からのお知らせ」の音声信号もあり、合計７種類の音声信号が送られる。 FIG. 5A shows an example of a set of audio signals in a program. In this example, there are three basic audio signals: Japanese main voice, Japanese sub voice, and English. Among them, Japanese main voice and English are already subjected to speech speed conversion on the transmitting side. The audio signal is sent separately from these audio signals. There are also 7 types of BGM (Background Music) audio signals for music only suitable for playback along with video of this program, and "News from the station" audio signals such as new program information and program time changes. The audio signal is sent.

これらの音声信号について、対応話速変換音声情報として、元の音声信号に対して話速変換された音声情報があるときは、それらの対応関係を表にして送ることにする。その例が図５（ｂ）である。ここで、図５（ｂ）において、右側の数字は、同図（ａ）で各音声信号の左側につけられたこの番組内での音声信号番号に対応する。元の音声と、対応する話速変換音声が順に並べられており、この様な対応関係のある音声信号全てについて記述した後にＥＮＤ情報があってこの情報の終わりであることを示している。 With respect to these voice signals, when there is voice information whose voice speed has been converted with respect to the original voice signal as the corresponding voice speed converted voice information, the correspondence relationship is sent in a table. An example is shown in FIG. Here, in FIG. 5B, the number on the right side corresponds to the audio signal number in this program attached to the left side of each audio signal in FIG. The original voice and the corresponding speech speed converted voice are arranged in order, and after describing all the voice signals having such a correspondence relationship, there is END information, indicating that this information is the end.

この対応話速変換音声情報は、ＰＭＴの中に記述されるこの伝送において特有のＰＩＤを持つことにより、他の情報と区別して送ることができる。これに加えて図２の話速変換済情報が送られているとして、これらの情報も音声信号処理装置１０Ｂはマイクロプロセッサ１７に供給して処理する。 The corresponding speech speed converted voice information can be sent separately from other information by having a unique PID in this transmission described in the PMT. In addition to this, assuming that the speech speed converted information shown in FIG. 2 has been sent, the audio signal processing device 10B also supplies the information to the microprocessor 17 for processing.

音声信号に関して、音声信号処理装置１０Ｂにおいてマイクロプロセッサ１７が行う動作の一例を図６に示す。なお、各ステップについて、図３と同じステップについては同じ記号で表すことにする。まず、マイクロプロセッサ１７は、検出された上記話速変換済情報に基づき、音声信号デコーダ１２から出力されたデコード後の選択された番組の選択された音声信号の話速変換が送信前で既に施されているか否かを判定する（図６のステップＳ２０１）。 FIG. 6 shows an example of the operation performed by the microprocessor 17 in the audio signal processing apparatus 10B with respect to the audio signal. For each step, the same step as in FIG. 3 is represented by the same symbol. First, based on the detected speech speed converted information, the microprocessor 17 has already performed speech speed conversion of the selected audio signal of the selected program output from the audio signal decoder 12 before transmission. It is determined whether or not it has been performed (step S201 in FIG. 6).

話速変換が送信前に施されていると判定した場合は、話速変換処理動作をオフとして話速変換処理を行うことなく受信音声信号を出力端子１５へ出力する（図６のステップＳ１０２）。他方、ステップＳ２０１において、話速変換済情報より、話速変換済みではないと判定した場合には、マイクロプロセッサ１７は、選択された音声信号に対して対応する話速変換済み音声が送信されているか否かを判定する（図６のステップＳ２０２）。この判定は図５（ｂ）に示した「対応話速変換音声情報」を参照して行えばよい。 If it is determined that speech speed conversion has been performed before transmission, the speech speed conversion processing operation is turned off and the received voice signal is output to the output terminal 15 without performing the speech speed conversion processing (step S102 in FIG. 6). . On the other hand, if it is determined in step S201 that the speech speed has not been converted from the speech speed converted information, the microprocessor 17 transmits the speech speed converted speech corresponding to the selected speech signal. It is determined whether or not (step S202 in FIG. 6). This determination may be made with reference to “corresponding speech speed converted voice information” shown in FIG.

ステップＳ２０２で対応する話速変換音声信号が送信されていることがわかった場合には、受信部１１で抽出する音声信号をその音声信号に切り換える（図６のステップＳ２０３）。この場合、音声信号デコーダ１２でデコードして再生する音声信号は、既に話速変換されているものとなるため、ステップＳ１０２に移行して、話速変換処理部１４の話速変換処理をオフとされる。一方、ステップＳ２０２で対応する話速変換音声信号が送信されていないと判定した場合には、マイクロプロセッサ１７は、ステップＳ１０３に移行して、話速変換処理部１４により音声信号デコーダ１２から出力される音声信号の話速変換処理を行う。 If it is found in step S202 that the corresponding speech rate converted voice signal is transmitted, the voice signal extracted by the receiving unit 11 is switched to the voice signal (step S203 in FIG. 6). In this case, since the audio signal decoded and reproduced by the audio signal decoder 12 has already been subjected to the speech speed conversion, the process proceeds to step S102 and the speech speed conversion processing of the speech speed conversion processing unit 14 is turned off. Is done. On the other hand, if it is determined in step S202 that the corresponding speech speed converted speech signal is not transmitted, the microprocessor 17 proceeds to step S103 and is output from the speech signal decoder 12 by the speech speed conversion processing unit 14. Performs speech speed conversion processing of the audio signal.

このように、本実施の形態では、送信装置４から話速変換を行った音声が送られているときには、極力それを利用することにしたのは、次の理由からである。送信装置４で話速変換を行うのであれば、アナウンサーや出演者の声を集録した音声に対して話速変換を行い、背景の音楽などは話速変換済み音声に対して後から付加することも可能である。これに対して、受信側の音声信号処理装置１０Ｂで背景音楽などを含んだ音声に対して話速変換を行うと、背景音楽が乱れたり、背景音楽のレベルによっては音の途切れる個所が無く、話速変換そのものがうまくいかないことも起こるので、できるならば送信側で話速変換を行うのが望ましいからである。また、受信側の音声信号処理装置１０Ｂとしては、話速変換機能を動作させないことで消費電力の低減にもなるからである。 As described above, in the present embodiment, when the voice subjected to the speech speed conversion is transmitted from the transmission device 4, the reason for using it as much as possible is as follows. If the transmission device 4 performs speech speed conversion, the speech speed conversion is performed on the voice collected from the announcer or performer, and background music or the like is added later to the speech speed-converted voice. Is also possible. On the other hand, when the speech speed conversion is performed on the sound including the background music or the like in the audio signal processing apparatus 10B on the receiving side, the background music is disturbed or there is no portion where the sound is interrupted depending on the level of the background music. This is because the speech speed conversion itself may not work, so it is desirable to perform the speech speed conversion on the transmission side if possible. In addition, the reception-side audio signal processing device 10B also reduces power consumption by not operating the speech speed conversion function.

次に、本発明の第３の実施の形態について説明する。本実施の形態では、第１の実施の形態で送信していた「話速変換済情報」の代わりに、図５（ｂ）に示した「対応話速変換音声情報」と、図５（ｃ）に示す「話速変換適合情報」を送信することにより、受信側の音声信号処理装置では話速変換が送信側では施されておらず、しかも話速変換に適している音声のみに対して話速変換をかけることができるようにしたものである。 Next, a third embodiment of the present invention will be described. In this embodiment, instead of the “speech speed converted information” transmitted in the first embodiment, “corresponding speech speed converted speech information” shown in FIG. 5B and FIG. ), The speech signal processing device on the receiving side does not perform the speech speed conversion on the transmitting side, and only the sound suitable for the speech speed conversion is transmitted. It is designed so that speech speed conversion can be applied.

図５（ｃ）は、同図（ａ）に示した番組の各音声について話速変換に適合する場合は「１」、適合しない場合は「０」として表した話速変換適合情報の例を示す。ここでは日本語主音声（音声１）、日本語副音声（音声２）、英語音声（音声３）については話速変換適合とされている。また、４番目と５番目の音声は、それぞれ日本語主音声と英語についての話速変換を施された結果の音声であるため、受信側では話速変換を行わない方がよく、「０」とされている。さらに、６番目の「ＢＧＭ」（音声６）の内容は音楽で、人の話し声ではないため、話速変換には適合しない。７番目の「局からのお知らせ」（音声７）はアナウンサーの声での案内であるため、話速変換は適合とされている。 FIG. 5C shows an example of speech speed conversion conforming information expressed as “1” when the speech of the program shown in FIG. Show. Here, the Japanese main speech (speech 1), the Japanese subspeech (speech 2), and the English speech (speech 3) are adapted to speech speed conversion. In addition, since the fourth and fifth voices are the voices resulting from the speech speed conversion for the Japanese main voice and English, respectively, it is better not to perform the speech speed conversion on the receiving side. It is said that. Further, the content of the sixth “BGM” (speech 6) is music, not a human voice, so it is not suitable for speech speed conversion. The seventh “notification from the station” (speech 7) is guidance by the announcer's voice, so the speech speed conversion is appropriate.

このような話速変換適合信号が送信される場合の受信側の音声信号処理装置内のマイクロプロセッサの処理の一例について図７のフローチャートと共に説明する。なお、図７中、図３及び図６と同一の処理ステップには同一の符号を付してある。まず、上記のマイクロプロセッサは、選択された音声が受信側での話速変換に好適か否かを判定する（図７のステップＳ３０１）。これは伝送された上記の話速変換適合情報を抽出して参照することにより行われる。 An example of the processing of the microprocessor in the audio signal processing device on the receiving side when such a speech speed conversion conforming signal is transmitted will be described with reference to the flowchart of FIG. In FIG. 7, the same processing steps as those in FIGS. 3 and 6 are denoted by the same reference numerals. First, the microprocessor determines whether or not the selected voice is suitable for speech speed conversion on the receiving side (step S301 in FIG. 7). This is done by extracting and referring to the transmitted speech speed conversion adaptation information.

マイクロプロセッサは、ステップＳ３０１で好適ではないと判定した場合には、話速変換は行わない（図７のステップＳ１０２）。図５（ｃ）に示した例では、既に話速変換が施された第４或いは第５番目の音声信号が選択されている場合や、送信側で話速変換処理は施されていないが、元々音楽情報で話速変換には適さない第６番目の音声信号（ＢＧＭ）が選択されている場合がこれに相当する。 If the microprocessor determines that it is not suitable in step S301, it does not perform speech speed conversion (step S102 in FIG. 7). In the example shown in FIG. 5C, when the fourth or fifth voice signal that has already been subjected to speech speed conversion is selected, or the speech speed conversion processing is not performed on the transmission side, This corresponds to the case where the sixth audio signal (BGM) originally selected for music information and not suitable for speech speed conversion is selected.

一方、マイクロプロセッサは、ステップＳ３０１で話速変換に好適と判定した場合には、その音声に対応する話速変換済み音声が送信されているか否かを判定する（図７のステップＳ２０２）。この判定は、受信した信号中の図５（ｂ）に示した「対応話速変換音声情報」を抽出して参照すればよい。 On the other hand, if it is determined in step S301 that it is suitable for speech speed conversion, the microprocessor determines whether speech speed converted speech corresponding to the speech is transmitted (step S202 in FIG. 7). For this determination, “corresponding speech speed converted speech information” shown in FIG. 5B in the received signal may be extracted and referred to.

このステップＳ２０２以後の処理は第２の実施の形態と同様であり、対応する話速変換音声が送信されている場合には、その音声に切り換えて受信側では話速変換処理は行わず（図７のステップＳ２０３、Ｓ１０２）、送信されていない場合には、受信側の話速変換処理を行う（図７のステップＳ１０３）。 The processing after step S202 is the same as that of the second embodiment. When the corresponding speech speed converted speech is transmitted, the speech speed conversion processing is not performed on the receiving side by switching to that speech (see FIG. 7 (steps S203 and S102), if not transmitted, speech speed conversion processing on the receiving side is performed (step S103 in FIG. 7).

従って、本実施の形態によれば、図５（ｂ）の例では、１番目の日本語主音声、３番目の英語については対応する話速変換音声が存在し（それぞれについて４番目、５番目の音声信号が相当）、それらへの切り換えが行われる。一方、２番目の日本語副音声、及び７番目の「局からのお知らせ」については図５（ｃ）に示したように、話速変換に適合するが、送信側では話速変換は施されていないので、受信側の音声信号処理装置で話速変換が行われる。 Therefore, according to the present embodiment, in the example of FIG. 5 (b), the first Japanese main speech and the corresponding speech rate conversion speech exist for the third English (fourth and fifth for each). Are switched to those). On the other hand, as shown in FIG. 5C, the second Japanese sub-speech and the seventh “notification from the station” are compatible with the speech speed conversion, but the transmission side performs the speech speed conversion. Therefore, speech speed conversion is performed in the audio signal processing device on the receiving side.

このように、本実施の形態によれば、送信側での話速変換処理が施されていない場合、その音声信号が話速変換に適合するか否かまで判定して、受信側で話速変換処理を行うのが好適な音声信号のみを自動的に判別して話速変換を行うことができる。 As described above, according to the present embodiment, when speech speed conversion processing on the transmission side is not performed, it is determined whether or not the voice signal is suitable for speech speed conversion, and the speech speed is determined on the reception side. It is possible to perform speech speed conversion by automatically discriminating only voice signals suitable for conversion processing.

なお、本発明は以上の実施の形態に限定されるものではなく、例えば、図７において、ステップＳ２０２及びＳ２０３を削除し、選択された音声が話速変換に好適であるときは、ステップＳ１０３に進んで話速変換処理を行うようにしてもよい。また、本発明は音声信号処理装置１０Ａ及び１０Ｂをコンピュータにより実現するコンピュータプログラムも含むものである。この場合、コンピュータプログラムは、記録媒体からコンピュータに取り込まれてもよいし、通信ネットワークを介してコンピュータにダウンロードしてもよい。 Note that the present invention is not limited to the above embodiment. For example, in FIG. 7, when steps S202 and S203 are deleted and the selected speech is suitable for speech speed conversion, the process proceeds to step S103. The speech speed conversion process may be performed in advance. The present invention also includes a computer program for realizing the audio signal processing apparatuses 10A and 10B by a computer. In this case, the computer program may be taken into the computer from a recording medium or downloaded to the computer via a communication network.

本発明の音声信号処理方法は、放送において話速変換音声を送信するサービスも行う場合の受信機側の対応において適用できる。 The audio signal processing method of the present invention can be applied to the response on the receiver side in the case of providing a service for transmitting speech speed converted audio in broadcasting.

本発明の音声信号処理装置の第１の実施の形態のブロック図である。1 is a block diagram of a first embodiment of an audio signal processing device of the present invention. FIG. 図１の実施の形態で送受信される話速変換済情報の一例を示す図である。It is a figure which shows an example of the speech speed converted information transmitted / received in embodiment of FIG. 図１の話速変換処理部の動作説明用フローチャートである。It is a flowchart for operation | movement description of the speech speed conversion process part of FIG. 本発明の音声信号処理装置の第２の実施の形態のブロック図である。It is a block diagram of 2nd Embodiment of the audio | voice signal processing apparatus of this invention. 音声の組の例、対応話速変換音声情報、及び話速変換適合情報の各例を示す図である。It is a figure which shows each example of the example of an audio | voice group, corresponding speech speed conversion audio | voice information, and speech speed conversion adaptation information. 図４の実施の形態の動作説明用フローチャートである。5 is a flowchart for explaining the operation of the embodiment of FIG. 本発明の第３の実施の形態の動作説明用フローチャートである。It is a flowchart for operation | movement description of the 3rd Embodiment of this invention.

Explanation of symbols

１、４送信装置
２、５パケット化部
３伝送路
１０Ａ、１０Ｂ音声信号処理装置
１１受信部
１２音声信号デコーダ
１３話速変換情報検出部
１４話速変換処理部
１５音声出力端子
１６ヒューマンインターフェイス手段
１７マイクロプロセッサ
１８映像信号デコーダ
１９映像出力端子

DESCRIPTION OF SYMBOLS 1, 4 Transmission apparatus 2, 5 Packetization part 3 Transmission path 10A, 10B Voice signal processing apparatus 11 Reception part 12 Voice signal decoder 13 Speech speed conversion information detection part 14 Speech speed conversion processing part 15 Voice output terminal 16 Human interface means 17 Microprocessor 18 Video signal decoder 19 Video output terminal

Claims

A multiplexed signal in which an audio signal and accompanying information related to speech speed conversion indicating whether or not speech speed conversion processing for changing the time axis is performed on the audio signal without changing the audio pitch on the transmission side are multiplexed. Receiving means for receiving
Detecting means for detecting accompanying information related to the speech speed conversion from the multiplexed signal received by the receiving means, and determining the contents thereof;
Reproducing means for reproducing the audio signal from the multiplexed signal received by the receiving means;
If it is determined that the content of the accompanying information regarding the speech speed conversion detected by the detection means has already been subjected to speech speed conversion on the transmission side, the speech speed of the audio signal reproduced by the reproduction means No conversion processing is performed, and when it is determined that the content of the accompanying information regarding the speech speed conversion is not subjected to the speech speed conversion on the transmission side, the speech speed conversion processing of the audio signal reproduced by the reproduction means An audio signal processing apparatus comprising: speech speed conversion processing means for performing

A plurality of audio signals including an audio signal not subjected to speech speed conversion processing for changing a time axis without changing an audio pitch on the transmission side and an audio signal subjected to speech speed conversion processing; and the plurality of audio signals Each of the signals is subjected to speech speed conversion processing corresponding to the speech speed converted information indicating whether or not the speech signal has undergone speech speed conversion processing and the speech signal not subjected to the speech speed conversion processing. Receiving means for receiving a multiplexed signal multiplexed with corresponding speech speed voice information indicating whether or not a voice signal is present ;
Detecting means for detecting the speech speed converted information and the corresponding speech speed converted voice information from the multiplexed signal received by the receiving means, and determining the contents thereof;
Reproducing means for selecting and reproducing one of the plurality of audio signals from the multiplexed signal received by the receiving means;
Speech rate conversion processing means for performing speech rate conversion processing on the audio signal;
When the reproducing unit selects and reproduces one of the plurality of audio signals, it is determined that the selected audio signal is an audio signal that has been subjected to speech speed conversion processing by the detecting unit. In this case, the speech speed conversion process is not performed on the selected voice signal with the speech speed conversion processing means off.
When the reproduction unit selects and reproduces one of the plurality of audio signals, the selected audio signal is an audio signal that has not been subjected to speech speed conversion processing by the detection unit, and is compatible When it is determined that there is an audio signal that has been subjected to speech speed conversion processing, the voice speed conversion processing means is switched to the voice signal that has been subjected to the corresponding speech speed conversion processing instead of the selected speech signal. The speech speed conversion process is not performed on the audio signal switched with
When the reproduction unit selects and reproduces one of the plurality of audio signals, the selected audio signal is an audio signal that has not been subjected to speech speed conversion processing by the detection unit, and is compatible Control means for controlling the speech speed conversion processing to be performed on the selected speech signal by turning on the speech speed conversion processing means when it is determined that there is no speech signal subjected to the speech speed conversion processing. An audio signal processing apparatus comprising:

A multiplexed signal in which an audio signal and accompanying information related to speech speed conversion indicating whether or not speech speed conversion processing for changing the time axis is performed on the audio signal without changing the audio pitch on the transmission side are multiplexed. A first step of receiving
A second step of detecting ancillary information related to the speech speed conversion from the received multiplexed signal and determining the content thereof;
A third step of reproducing the audio signal from the received multiplexed signal;
When it is determined that the content of the accompanying information regarding the speech speed conversion detected in the second step is not subjected to speech speed conversion on the transmission side, the speech speed conversion processing of the reproduced audio signal A fourth step of performing
When it is determined that the content of the accompanying information related to the speech speed conversion detected in the second step has already been subjected to the speech speed conversion on the transmission side, the speech speed conversion processing of the reproduced audio signal And a fifth step of outputting without performing the method.