JP2012235379A

JP2012235379A - Voice multiplexing device, voice hearing device and voice multiplexing method

Info

Publication number: JP2012235379A
Application number: JP2011103539A
Authority: JP
Inventors: Nobuhiro Kanbe; 信裕神戸
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2011-05-06
Filing date: 2011-05-06
Publication date: 2012-11-29

Abstract

PROBLEM TO BE SOLVED: To provide a voice multiplexing device capable of enhancing the level of easy-to-hear of a marked sound while reducing processing load.SOLUTION: The voice multiplexing device includes a voice input section 110 that inputs a first voice signal and a second voice signal; a first voice multiplexing section 120 that generates a first multiplexed voice signal which is obtained by multiplexing the first voice signal and the second voice signal in a first multiplexing positional relationship; a second voice multiplexing section 130 that generates a second multiplexed voice signal which is obtained by multiplexing the first voice signal and the second voice signal in a second multiplexing positional relationship different from the first multiplexing positional relationship; and a voice output section 140 that outputs the first multiplexed voice signal and the second multiplexed voice signal.

Description

本発明は、複数の音声信号を多重化して出力する音声多重化装置、音声聴取装置、および音声多重化方法に関する。 The present invention relates to an audio multiplexing apparatus, an audio listening apparatus, and an audio multiplexing method that multiplex and output a plurality of audio signals.

音声チャットや電話会議などの音声コミュニケーション、および、複数の音声データを多重化して配信するラジオ放送など、複数の音が同時に出力されるケースが増えている。 Increasingly, a plurality of sounds are output simultaneously, such as voice communication such as voice chat and telephone conference, and radio broadcasting that multiplexes and distributes a plurality of voice data.

ところが、聴取者にとっては、複数の音の中から所望の音（以下「注目音」という）を選択して聴取しようとするとき、当該注目音以外の音は、ノイズ（雑音）に感じられる。 However, for the listener, when a desired sound (hereinafter referred to as “noticeable sound”) is selected from a plurality of sounds and listened to, sounds other than the notable sound are felt as noise.

そこで、例えば、特許文献１には、各音声信号の、注目音がないときの状態に基づいてフィルタを作成し、作成したフィルタを用いて注目音のみを抽出する技術が記載されている。また、例えば、特許文献２には、注目音以外の音（以下「非注目音」）の音声信号に対して、環境音として聞こえるように音声処理を行う技術が記載されている。これらの従来技術では、注目音のみを聞き取り易く再生しつつ、複数の音を再生することができる。また、公知の技術としては、音声を個別に符号化するオブジェクト符号化によって、複数の音声を多重化して送受信し、ユーザが音声を個別にコントロールする技術が知られている。 Therefore, for example, Patent Document 1 describes a technique of creating a filter based on a state of each audio signal when there is no sound of interest, and extracting only the sound of interest using the created filter. Further, for example, Patent Document 2 describes a technique for performing audio processing so that an audio signal of a sound other than the target sound (hereinafter, “non-target sound”) can be heard as an environmental sound. In these conventional techniques, it is possible to reproduce a plurality of sounds while reproducing only the sound of interest so that it can be easily heard. As a known technique, there is known a technique in which a plurality of sounds are multiplexed and transmitted / received by object coding that individually encodes sounds, and a user individually controls the sounds.

特開平５−８７６１９号公報JP-A-5-87619 特開平８−１８６６４８号公報JP-A-8-186648

しかしながら、上記従来技術は、複数の音声信号を多重化して送受信し、重畳して再生するシステムによって、処理負荷が高くなるという課題を有する。すなわち、特許文献１記載の技術は、非注目音が発話音声などの非連続音の場合、非注目音の変化に応じて、フィルタを何度も繰り返し作成する必要が生じ、処理負荷が高くなる。また、特許文献１および特許文献２記載の技術は、音声チャットなどにおいて、不特定多数人から音声信号が送られてくる場合、その数に応じて処理負荷が高くなる。また、オブジェクト符号化技術では、個別にデータ圧縮した音声信号を多重化して送受信するが、音声を個別にコントロールする際の処理負荷は、音声の数に応じて高くなる。 However, the above-described conventional technique has a problem that a processing load increases due to a system in which a plurality of audio signals are multiplexed and transmitted / received and superimposed and reproduced. That is, in the technique described in Patent Document 1, when the non-target sound is a non-continuous sound such as an utterance voice, it is necessary to repeatedly create a filter according to the change of the non-target sound, which increases the processing load. . In the techniques described in Patent Document 1 and Patent Document 2, when an audio signal is sent from an unspecified number of people in voice chat or the like, the processing load increases according to the number. In the object coding technique, audio signals individually compressed with data are multiplexed and transmitted / received. However, the processing load for individually controlling audio increases depending on the number of audio.

複数の音声を重畳して出力する技術は、様々な分野への適用が期待されるが、処理負荷が高いと、携帯電話機などの小型携帯端末への適用が困難となる。したがって、かかる技術は、処理負荷を抑えた状態で注目音を聞き取り易くすることが可能であることが望まれる。 A technique for superimposing and outputting a plurality of sounds is expected to be applied to various fields. However, if the processing load is high, it is difficult to apply to a small portable terminal such as a mobile phone. Therefore, it is desired that such a technique can make it easy to hear the target sound while suppressing the processing load.

本発明の目的は、処理負荷を抑えた状態で注目音を聞き取り易くすることができる音声多重化装置、音声聴取装置、および音声多重化方法を提供することである。 An object of the present invention is to provide an audio multiplexing device, an audio listening device, and an audio multiplexing method that make it easy to hear a target sound with a reduced processing load.

本発明の音声多重化装置は、第１の音声信号および第２の音声信号を入力する音声入力部と、前記第１の音声信号と前記第２の音声信号とを第１の多重化位置関係で多重化して得られる、第１の多重音声信号を生成する第１の音声多重化部と、前記第１の音声信号と前記第２の音声信号とを前記第１の多重化位置関係とは異なる第２の多重化位置関係で多重化して得られる、第２の多重音声信号を生成する第２の音声多重化部と、前記第１の多重音声信号および前記第２の多重音声信号を出力する音声出力部とを有する。 An audio multiplexing apparatus according to the present invention includes an audio input unit that inputs a first audio signal and a second audio signal, and a first multiplexing positional relationship between the first audio signal and the second audio signal. What is the first multiplexing positional relationship between the first audio multiplexing unit that generates the first multiplexed audio signal obtained by multiplexing in step 1 and the first audio signal and the second audio signal? A second audio multiplexing unit that generates a second multiplexed audio signal obtained by multiplexing with different second multiplexing positional relationships, and outputs the first multiplexed audio signal and the second multiplexed audio signal And an audio output unit.

本発明の音声聴取装置は、上記音声多重化装置から、前記第１の多重音声信号および前記第２の多重音声信号を取得する多重音声受信部と、前記第１の多重音声信号と前記第２の多重音声信号とを調整可能な所定の重畳位置関係で重畳して得られる、重畳音声信号を生成する時間調整部と、ユーザ操作に基づいて、前記第１の多重音声信号と前記第２の多重音声信号のそれぞれに含まれる前記第１の音声信号の任意の位置が一致する第１の重畳位置関係と、前記第１の多重音声信号と前記第２の多重音声信号のそれぞれに含まれる前記第２の音声信号の任意の位置が一致する第２の重畳位置関係との間で、前記所定の重畳位置関係を切り替える操作部と、前記重畳音声信号を音声出力装置へ出力する音声出力部とを有する。 According to another aspect of the present invention, there is provided a sound listening apparatus, a multiple sound receiver that obtains the first multiple sound signal and the second multiple sound signal from the sound multiplexer, the first multiple sound signal, and the second multiple sound signal. And a time adjustment unit for generating a superimposed audio signal obtained by superimposing the multiple audio signal with a predetermined superimposable positional relationship, and based on a user operation, the first multiplexed audio signal and the second A first superimposed positional relationship in which arbitrary positions of the first audio signal included in each of the multiplexed audio signals match, and the first multiplexed audio signal and the second multiplexed audio signal included in each of the first audio signals; An operation unit that switches the predetermined superposition position relationship with a second superposition position relationship in which an arbitrary position of the second sound signal matches, and a sound output unit that outputs the superposition sound signal to a sound output device; Have

本発明の音声多重化方法は、第１の音声信号および第２の音声信号を入力するステップと、前記第１の音声信号と前記第２の音声信号とを第１の多重化位置関係で多重化して、第１の多重音声信号を生成するステップと、前記第１の音声信号と前記第２の音声信号とを前記第１の多重化位置関係とは異なる第２の多重化位置関係で多重化して、第２の多重音声信号を生成するステップと、前記第１の多重音声信号および前記第２の多重音声信号を出力するステップとを有する。 The audio multiplexing method of the present invention includes a step of inputting a first audio signal and a second audio signal, and multiplexing the first audio signal and the second audio signal in a first multiplexing positional relationship. And generating a first multiplexed audio signal, and multiplexing the first audio signal and the second audio signal with a second multiplexing positional relationship different from the first multiplexing positional relationship. And generating a second multiplexed audio signal, and outputting the first multiplexed audio signal and the second multiplexed audio signal.

本発明は、処理負荷を抑えた状態で注目音を聞き取り易くすることができる。 According to the present invention, it is possible to make it easy to hear the target sound while suppressing the processing load.

本発明の実施の形態１に係る音声多重化装置の構成の一例を示すブロック図1 is a block diagram showing an example of the configuration of a speech multiplexing apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態２に係る音声多重化装置および音声聴取装置ならびに音声多重化システムの構成の一例を示すブロック図FIG. 3 is a block diagram showing an example of the configuration of a speech multiplexing device, speech listening device, and speech multiplexing system according to Embodiment 2 of the present invention. 本発明の実施の形態２における入力音声信号の構成の一例を模式的に示す図The figure which shows typically an example of a structure of the input audio | voice signal in Embodiment 2 of this invention. 本発明の実施の形態２における第１および第２の多重音声信号の構成の一例を模式的に示す図The figure which shows typically an example of a structure of the 1st and 2nd multiplexed audio | voice signal in Embodiment 2 of this invention. 本発明の実施の形態２における重畳音声信号の構成の例を模式的に示す図The figure which shows typically the example of a structure of the superimposition audio | voice signal in Embodiment 2 of this invention. 本発明の実施の形態２に係る音声多重化装置の動作の一例を示すフローチャートThe flowchart which shows an example of operation | movement of the audio | voice multiplexing apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る音声聴取装置の動作の一例を示すフローチャートThe flowchart which shows an example of operation | movement of the audio | voice listening apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態３に係る音声多重化装置および音声聴取装置ならびに音声多重化システムの構成の一例を示すブロック図FIG. 3 is a block diagram showing an example of the configuration of a speech multiplexing device, speech listening device, and speech multiplexing system according to Embodiment 3 of the present invention. 本発明の実施の形態３における第１および第２の多重音声信号の構成の一例を模式的に示す図The figure which shows typically an example of a structure of the 1st and 2nd multiplexed audio | voice signal in Embodiment 3 of this invention. 本発明の実施の形態３における重畳音声信号の構成の例を模式的に示す図The figure which shows typically the example of a structure of the superimposition audio | voice signal in Embodiment 3 of this invention. 本発明の実施の形態４に係る音声多重化装置および音声聴取装置ならびに音声多重化システムの構成の一例を示すブロック図FIG. 7 is a block diagram showing an example of the configuration of a speech multiplexing device, speech listening device, and speech multiplexing system according to Embodiment 4 of the present invention. 本発明の実施の形態４に係る音声多重化装置の動作の一例を示すフローチャートThe flowchart which shows an example of operation | movement of the audio | voice multiplexing apparatus which concerns on Embodiment 4 of this invention.

以下、本発明の各実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

なお、各実施の形態において、複数の音声信号の多重化、および、複数の多重化音声信号の重畳は、各音声信号の時間軸上の位置（以下単に「位置」という）の相対的な関係を設定することを、少なくとも含むものとする。また、多重化において設定される相対的な関係は、「多重化位置関係」といい、重畳において設定される相対的な関係は、「重畳位置関係」というものとする。 In each embodiment, the multiplexing of a plurality of audio signals and the superimposition of a plurality of multiplexed audio signals are relative relationships of positions on the time axis of the respective audio signals (hereinafter simply referred to as “positions”). To set at least. The relative relationship set in multiplexing is called “multiplexed positional relationship”, and the relative relationship set in superimposing is called “superimposed positional relationship”.

（実施の形態１）
本発明の実施の形態１は、本発明に係る音声多重化装置の基本的態様の一例である。 (Embodiment 1)
Embodiment 1 of the present invention is an example of a basic aspect of a speech multiplexing apparatus according to the present invention.

図１は、本実施の形態に係る音声多重化装置の構成の一例を示すブロック図である。 FIG. 1 is a block diagram showing an example of the configuration of the speech multiplexing apparatus according to the present embodiment.

図１において、音声多重化装置１００は、音声入力部１１０、第１の音声多重化部１２０、第２の音声多重化部１３０、および音声出力部１４０を有する。 In FIG. 1, the speech multiplexing apparatus 100 includes a speech input unit 110, a first speech multiplexing unit 120, a second speech multiplexing unit 130, and a speech output unit 140.

音声入力部１１０は、第１の音声信号および第２の音声信号を入力する。 The voice input unit 110 inputs the first voice signal and the second voice signal.

第１の音声多重化部１２０は、第１の音声信号と第２の音声信号とを第１の多重化位置関係で多重化して得られる、第１の多重音声信号を生成する。 The first audio multiplexing unit 120 generates a first multiplexed audio signal obtained by multiplexing the first audio signal and the second audio signal with the first multiplexing positional relationship.

第２の音声多重化部１３０は、第１の音声信号と第２の音声信号とを第１の多重化位置関係とは異なる第２の多重化位置関係で多重化して得られる、第２の多重音声信号を生成する。 The second audio multiplexing unit 130 obtains a second audio signal obtained by multiplexing the first audio signal and the second audio signal with a second multiplexing positional relationship different from the first multiplexing positional relationship. A multiple audio signal is generated.

音声出力部１４０は、第１の多重音声信号および第２の多重音声信号を出力する。 The audio output unit 140 outputs the first multiplexed audio signal and the second multiplexed audio signal.

音声多重化装置１００は、例えば、ＣＰＵ（central processing unit）、およびＲＡＭ（random access memory）等の記憶媒体などを有する。この場合、上述の各機能部は、ＣＰＵにより制御プログラムが実行することにより実現される。 The audio multiplexing apparatus 100 includes, for example, a storage medium such as a central processing unit (CPU) and a random access memory (RAM). In this case, each functional unit described above is realized by the control program being executed by the CPU.

このような音声多重化装置１００は、第１の音声信号に対する第２の音声信号の多重化位置が異なる、二種類の多重音声信号を出力することができる。 Such an audio multiplexing apparatus 100 can output two types of multiplexed audio signals having different multiplexing positions of the second audio signal with respect to the first audio signal.

このような二種類の多重音声信号は、第１の音声信号のみ、あるいは、第２の音声信号のみを、選択的に一致させた状態で、重畳することができる。一致した音声信号の音声は、一致していない音声信号の音声に比べて、より明瞭となり、より聞き取り易くなる。すなわち、このような二種類の多重音声信号は、重畳する際の相対位置関係を調整するだけで、第１の音声信号および第２の音声信号を、選択的に聞き取り易くすることができる。 Such two types of multiplexed audio signals can be superimposed in a state where only the first audio signal or only the second audio signal is selectively matched. The voice of the matched voice signal is clearer and easier to hear than the voice of the mismatched voice signal. That is, the two types of multiplexed audio signals can easily make the first audio signal and the second audio signal easy to hear by simply adjusting the relative positional relationship when superimposed.

したがって、音声多重化装置１００は、処理負荷を抑えた状態で、注目音を聞き取り易くすることができる。 Therefore, the speech multiplexing apparatus 100 can make it easy to hear the target sound with a reduced processing load.

（実施の形態２）
本発明の実施の形態２は、本発明を、不特定多数の話者が同時に複数の話題について会話する、音声チャットシステムに適用した場合の、具体的態様の一例である。 (Embodiment 2)
Embodiment 2 of the present invention is an example of a specific aspect when the present invention is applied to a voice chat system in which an unspecified number of speakers talk about a plurality of topics at the same time.

まず、本実施の形態に係る各装置およびシステムの構成について説明する。 First, the configuration of each device and system according to the present embodiment will be described.

図２は、本実施の形態に係る音声多重化装置および音声聴取装置の構成ならびに音声多重化システムの構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the configuration of the audio multiplexing device and the audio listening device and the configuration of the audio multiplexing system according to the present embodiment.

図２において、音声多重化システム２００は、第１〜第４の音声提供装置３００−１〜３００−４、音声多重化装置１００、音声聴取装置４００、および音声出力装置５００を有する。 In FIG. 2, the audio multiplexing system 200 includes first to fourth audio providing devices 300-1 to 300-4, an audio multiplexing device 100, an audio listening device 400, and an audio output device 500.

なお、複数の音声提供装置３００と音声多重化装置１００、音声多重化装置１００と音声聴取装置４００、および音声聴取装置４００と音声出力装置５００は、それぞれ無線または有線により通信可能に接続されているものとする。そして、音声多重化装置１００から音声聴取装置４００への送信帯域は、２つの多重音声信号を分離可能な第１のチャンネルと第２のチャンネルを含むものとする。２つの多重音声信号の伝送方法は、個別の回線を有しても良いし、時分割多重方式、周波数分割多重方式であってもよい。 Note that the plurality of audio providing devices 300 and the audio multiplexing device 100, the audio multiplexing device 100 and the audio listening device 400, and the audio listening device 400 and the audio output device 500 are connected so as to be communicable by radio or wire. Shall. The transmission band from the audio multiplexing apparatus 100 to the audio listening apparatus 400 includes a first channel and a second channel that can separate two multiplexed audio signals. The two multiplexed audio signal transmission methods may have separate lines, or may be a time division multiplexing system or a frequency division multiplexing system.

また、第１〜第４の音声提供装置３００−１〜３００−４は、同一の構成を有するものとし、以下、適宜「音声提供装置３００」としてまとめて説明する。 The first to fourth voice providing apparatuses 300-1 to 300-4 have the same configuration, and will be collectively described as “voice providing apparatus 300” as appropriate.

更に、第１〜第４の音声提供装置３００−１〜３００−４および音声聴取装置４００は、同一の構成とすることができるが、ここでは、音声の供給側か聴取側かの区別に従い、これらを区別して説明する。 Furthermore, the first to fourth voice providing devices 300-1 to 300-4 and the voice listening device 400 can have the same configuration, but here, according to the distinction between the voice supply side and the listening side, These will be described separately.

音声提供装置３００は、例えば、音声チャットを行うユーザが携帯する情報通信端末である。音声提供装置３００は、マイクロフォンを有し、ユーザの発話音声を含む音声を入力して電気信号である音声信号に変換し、音声多重化装置１００へ送信する。本実施の形態では、音声信号はデジタルとする。 The voice providing device 300 is, for example, an information communication terminal carried by a user who performs voice chat. The voice providing apparatus 300 includes a microphone, inputs voice including user's uttered voice, converts the voice into an audio signal that is an electric signal, and transmits the voice signal to the voice multiplexing apparatus 100. In this embodiment, the audio signal is digital.

音声多重化装置１００は、例えば、音声チャットサーバである。音声多重化装置１００は、第１〜第４の音声提供装置３００−１〜３００−４から送られてくる４種類の音声信号（以下、順に「第１〜第４の音声信号」という）を受信する。そして、音声多重化装置１００は、受信した第１〜第４の音声信号を、多重化方法の異なる２チャンネルの（２種類の）多重音声信号に多重化して、音声聴取装置４００へ送信する。以下、受信時の第１〜第４の音声信号の一まとまりは、以下「入力音声信号」という。 The voice multiplexing device 100 is, for example, a voice chat server. The audio multiplexing apparatus 100 receives four types of audio signals (hereinafter referred to as “first to fourth audio signals” in order) sent from the first to fourth audio providing apparatuses 300-1 to 300-4. Receive. The audio multiplexing apparatus 100 multiplexes the received first to fourth audio signals into two channels (two types) of multiplexed audio signals having different multiplexing methods, and transmits the multiplexed audio signals to the audio listening apparatus 400. Hereinafter, a group of the first to fourth audio signals at the time of reception is hereinafter referred to as an “input audio signal”.

図３は、本実施の形態において想定する、入力音声信号の構成の一例を模式的に示す図である。 FIG. 3 is a diagram schematically showing an example of the configuration of the input audio signal assumed in the present embodiment.

図３に示すように、入力音声信号６１０は、ここでは、第１〜第４の音声信号６１１〜６１４から構成されるものとする。入力音声信号６１０上の任意の時間ｔ０における第１〜第４の音声信号６１１〜６１４のそれぞれの位置は、順に、Ｖｔａ、Ｖｔｂ、Ｖｔｃ、Ｖｔｄであるものとする。 As shown in FIG. 3, the input audio signal 610 is assumed to be composed of first to fourth audio signals 611 to 614 here. Assume that the positions of the first to fourth audio signals 611 to 614 at an arbitrary time t0 on the input audio signal 610 are Vta, Vtb, Vtc, and Vtd in this order.

図２の音声多重化装置１００は、音声入力部１１０、第１の音声多重化部１２０、第２の音声多重化部１３０、および多重音声送信部１４１を有する。 The audio multiplexing apparatus 100 in FIG. 2 includes an audio input unit 110, a first audio multiplexing unit 120, a second audio multiplexing unit 130, and a multiplexed audio transmission unit 141.

音声入力部１１０は、第１〜第４の音声信号を入力する。 The voice input unit 110 inputs the first to fourth voice signals.

具体的には、音声入力部１１０は、第１〜第４の音声提供装置３００−１〜３００−４から送信された第１〜第４の音声信号を受信し、音声圧縮部１１１により、第１〜第４の音声信号の振幅をそれぞれ圧縮する。そして、音声入力部１１０は、圧縮した第１〜第４の音声信号（以下単に「第１〜第４の音声信号」という）を、第１の音声多重化部１２０および第２の音声多重化部１３０へそれぞれ出力する。 Specifically, the voice input unit 110 receives the first to fourth voice signals transmitted from the first to fourth voice providing apparatuses 300-1 to 300-4, and the voice compression unit 111 The amplitudes of the first to fourth audio signals are respectively compressed. Then, the audio input unit 110 converts the compressed first to fourth audio signals (hereinafter simply referred to as “first to fourth audio signals”) into the first audio multiplexing unit 120 and the second audio multiplexing. Output to the unit 130.

この際、音声入力部１１０は、第１の音声信号の任意の位置に対して、当該位置と同一のタイミングで受信した第２〜第４の音声信号のそれぞれの位置が一致するように、第１〜第４の音声信号を出力する。 At this time, the audio input unit 110 matches the positions of the second to fourth audio signals received at the same timing as the position with respect to an arbitrary position of the first audio signal. 1st to 4th audio signals are output.

なお、本実施の形態における圧縮とは、音声信号を同時に送信している音声提供装置３００の数（本実施の形態では４、以下「音声信号の数」という）に応じて、各音声信号の振幅（音圧レベル）を小さくすることを含む。例えば、圧縮は、各音声信号の振幅を、音声信号の数で除算することにより行う。 Note that compression in the present embodiment refers to the number of audio signals 300 according to the number of audio providing apparatuses 300 that simultaneously transmit audio signals (4 in this embodiment, hereinafter referred to as “the number of audio signals”). This includes reducing the amplitude (sound pressure level). For example, compression is performed by dividing the amplitude of each audio signal by the number of audio signals.

また、圧縮は、各音声信号の振幅の最大値が、予め定めた上限値に一致するように、音量の小さい音声信号の振幅を低減あるいは増大させることを含んでもよい。 In addition, the compression may include reducing or increasing the amplitude of the audio signal having a low volume so that the maximum value of the amplitude of each audio signal matches a predetermined upper limit value.

第１の音声多重化部１２０は、第１〜第４の音声信号を第１の多重化位置関係で多重化して得られる、第１の多重音声信号を生成する。 The first audio multiplexing unit 120 generates a first multiplexed audio signal obtained by multiplexing the first to fourth audio signals according to the first multiplexing positional relationship.

具体的には、第１の音声多重化部１２０は、入力された第１〜第４の音声信号を、その入力タイミングに沿って、第１のチャンネルでの送信の対象として、そのまま多重音声送信部１４１へ出力する。 Specifically, the first audio multiplexing unit 120 transmits the input first to fourth audio signals directly as multiplexed transmission targets on the first channel in accordance with the input timing. Output to the unit 141.

すなわち、上述の第１の多重化位置関係では、受信（入力）された第１〜第４の音声信号の相対的な位置が変化せず、送信（出力）は受信（入力）と同一のタイミングとなる。 That is, in the above-described first multiplexing positional relationship, the relative positions of the first to fourth audio signals received (input) do not change, and transmission (output) has the same timing as reception (input). It becomes.

第２の音声多重化部１３０は、第１〜第４の音声信号を第１の多重化位置関係とは異なる第２の多重化位置関係で多重化して得られる、第２の多重音声信号を生成する。 The second audio multiplexing unit 130 multiplexes the second multiplexed audio signal obtained by multiplexing the first to fourth audio signals with a second multiplexing positional relationship different from the first multiplexing positional relationship. Generate.

すなわち、上述の第２の多重化位置関係では、受信（入力）された第１〜題の音声信号の相対的な位置が変化し、送信（出力）は受信（入力）とはタイミングが異なる。 That is, in the above-described second multiplexing position relationship, the relative position of the first to the first audio signals received (input) changes, and the timing of transmission (output) is different from that of reception (input).

具体的には、第２の音声多重化部１３０は、遅延処理部１３１により、第２〜第４の音声信号を、それぞれ異なる所定時間で遅延させる。遅延処理部１３１は、例えば、任意の時間だけ音声信号を保存した後に出力する、デジタルディレイである。そして、第２の音声多重化部１３０は、第１の音声信号と、遅延した第２〜第４の音声信号とを、その遅延されたタイミングに沿って、第２のチャンネルでの送信の対象として、多重音声送信部１４１へ出力する。 Specifically, the second audio multiplexing unit 130 causes the delay processing unit 131 to delay the second to fourth audio signals by different predetermined times. The delay processing unit 131 is, for example, a digital delay that is output after storing an audio signal for an arbitrary time. The second audio multiplexing unit 130 then transmits the first audio signal and the delayed second to fourth audio signals along the delayed timing on the second channel. Is output to the multiplexed audio transmission unit 141.

すなわち、上述の第２の多重化位置関係は、第１の音声信号の任意の位置に対して、当該位置と同一のタイミングで受信（入力）された第２〜第４の音声信号の位置がそれぞれに対応する所定時間遅延する関係となる。 In other words, the second multiplexing positional relationship described above is such that the positions of the second to fourth audio signals received (input) at the same timing as the position with respect to an arbitrary position of the first audio signal. The relationship is delayed by a predetermined time corresponding to each.

図４は、本実施の形態における第１および第２の多重音声信号の構成の一例を模式的に示す図であり、図３に対応するものである。図４（Ａ）は、第１の多重音声信号の構成を示す。図４（Ｂ）は、第２の多重音声信号の構成を示す。 FIG. 4 is a diagram schematically showing an example of the configuration of the first and second multiplexed audio signals in the present embodiment, and corresponds to FIG. FIG. 4A shows the configuration of the first multiplexed audio signal. FIG. 4B shows the configuration of the second multiplexed audio signal.

図４（Ａ）に示すように、第１の多重音声信号６２０における第１〜第４の音声信号６１１〜６１４の相対的な位置関係（第１の多重化位置関係）は、図３に示す入力音声信号６１０における相対的な位置関係とほぼ同一となる。 As shown in FIG. 4A, the relative positional relationship (first multiplexed positional relationship) of the first to fourth audio signals 611 to 614 in the first multiplexed audio signal 620 is shown in FIG. The relative positional relationship in the input audio signal 610 is almost the same.

すなわち、第２〜第４の音声信号６１２〜６１４の上述の各位置Ｖｔｂ、Ｖｔｃ、Ｖｔｄは、第１の音声信号３１１の位置Ｖｔａに対応する時刻ｔ１に対して、一致している。 That is, the above-described positions Vtb, Vtc, and Vtd of the second to fourth audio signals 612 to 614 coincide with the time t1 corresponding to the position Vta of the first audio signal 311.

一方、図４（Ｂ）に示すように、第２の多重音声信号６３０における第１〜第４の音声信号６１１〜６１４の相対的な位置関係（第２の多重化位置関係）は、図３に示す入力音声信号６１０における相対的な位置関係と異なる。 On the other hand, as shown in FIG. 4B, the relative positional relationship (second multiplexed positional relationship) of the first to fourth audio signals 611 to 614 in the second multiplexed audio signal 630 is as shown in FIG. This is different from the relative positional relationship in the input audio signal 610 shown in FIG.

すなわち、第２〜第４の音声信号６１２〜６１４の上述の各位置Ｖｔｂ、Ｖｔｃ、Ｖｔｄは、第１の音声信号３１１の位置Ｖｔａに対応する時刻ｔ１に対して、それぞれ遅延している。 That is, the above-described positions Vtb, Vtc, and Vtd of the second to fourth audio signals 612 to 614 are delayed with respect to the time t1 corresponding to the position Vta of the first audio signal 311.

なお、第２〜第４の音声信号６１２〜６１４の遅延時間ｄ１〜ｄ３は、それぞれ異なるものとする。そして、遅延時間ｄ３は遅延時間ｄ２よりも長く、遅延時間ｄ２は遅延時間ｄ１よりも長いものとする。また、遅延時間ｄ１〜ｄ３を示す情報は、第２の多重音声信号に付加されるなどして、音声多重化装置１００が取得可能であるものとする。 Note that the delay times d1 to d3 of the second to fourth audio signals 612 to 614 are different from each other. The delay time d3 is longer than the delay time d2, and the delay time d2 is longer than the delay time d1. Also, it is assumed that the information indicating the delay times d1 to d3 can be acquired by the audio multiplexing apparatus 100 by being added to the second multiplexed audio signal.

図２の多重音声送信部１４１は、第１の多重音声信号および第２の多重音声信号を出力する。 2 outputs a first multiplexed audio signal and a second multiplexed audio signal.

具体的には、多重音声送信部１４１は、入力された第１の多重音声信号および第２の多重音声信号を、それぞれ第１のチャンネルと第２のチャンネルを用いて、音声聴取装置４００へ送信する。 Specifically, the multiplex sound transmission unit 141 transmits the input first multiplex sound signal and second multiplex sound signal to the sound listening apparatus 400 using the first channel and the second channel, respectively. To do.

音声聴取装置４００は、例えば、音声チャットを行うユーザが使用するパーソナルコンピュータ（音声チャットクライアント）である。 The voice listening device 400 is, for example, a personal computer (voice chat client) used by a user who performs voice chat.

音声聴取装置４００は、多重音声受信部４１０、時間調整部４２０、操作部４３０、および音声出力部４４０を有する。 The audio listening device 400 includes a multiple audio receiving unit 410, a time adjusting unit 420, an operation unit 430, and an audio output unit 440.

多重音声受信部４１０は、音声多重化装置１００から、第１の多重音声信号および第２の多重音声信号を取得する。 Multiplex audio receiving section 410 acquires the first multiplexed audio signal and the second multiplexed audio signal from audio multiplexing apparatus 100.

具体的には、多重音声受信部４１０は、音声多重化装置１００から上述の２チャンネルを用いて送信された第１の多重音声信号および第２の多重音声信号を受信する。そして、多重音声受信部４１０は、受信した第１の多重音声信号および第２の多重音声信号を、時間調整部４２０へ出力する。 Specifically, the multiplex audio receiver 410 receives the first multiplex audio signal and the second multiplex audio signal transmitted from the audio multiplexer 100 using the above-described two channels. Multiplex sound receiving section 410 then outputs the received first multiple sound signal and second multiple sound signal to time adjustment section 420.

時間調整部４２０は、第１の多重音声信号と第２の多重音声信号とを調整可能な所定の重畳位置関係で重畳して得られる、重畳音声信号を生成する。 The time adjustment unit 420 generates a superimposed audio signal obtained by superimposing the first multiplexed audio signal and the second multiplexed audio signal with a predetermined superposition position relationship that can be adjusted.

具体的には、時間調整部４２０は、操作部４３０による制御を受けて、第１の多重音声信号および第２の多重音声信号の一方を遅延させる。これにより、時間調整部４２０は、第１の多重音声信号と第２の多重音声信号との相対位置関係を調整する。そして、時間調整部４２０は、調整された相対位置関係で第１の多重音声信号と第２の多重音声信号とを重畳して重畳音声信号を生成し、音声出力部４４０へ出力する。 Specifically, the time adjustment unit 420 is controlled by the operation unit 430 to delay one of the first multiplexed audio signal and the second multiplexed audio signal. Thereby, the time adjustment unit 420 adjusts the relative positional relationship between the first multiplexed audio signal and the second multiplexed audio signal. Then, the time adjustment unit 420 generates a superimposed audio signal by superimposing the first multiplexed audio signal and the second multiplexed audio signal with the adjusted relative positional relationship, and outputs the superimposed audio signal to the audio output unit 440.

操作部４３０は、ユーザ操作に基づいて、少なくとも、第１の重畳位置関係、第２の重畳位置関係、第３の重畳位置関係、および第４の重畳位置関係の間で、上述の所定の重畳位置関係を切り替える。 Based on the user operation, the operation unit 430 performs at least the above-described predetermined superposition between the first superposition position relationship, the second superposition position relationship, the third superposition position relationship, and the fourth superposition position relationship. Switch the positional relationship.

具体的には、操作部４３０は、例えば、ダイヤルやスライダーなど、プラス方向とマイナス方向の入力値を得ることができる操作インタフェースを有する。そして、操作部４３０は、入力値に応じて音を聞き取り易くする対象として指定された音声信号の任意の位置が、第１の多重音声信号と第２の多重音声信号との間で一致するように、所定の重畳位置関係を切り替える。 Specifically, the operation unit 430 has an operation interface that can obtain input values in the positive direction and the negative direction, such as a dial and a slider. Then, the operation unit 430 causes an arbitrary position of the audio signal designated as a target to make sound easy to hear according to the input value so as to match between the first multiplexed audio signal and the second multiplexed audio signal. Then, the predetermined superposition position relationship is switched.

第１の重畳位置関係は、第１の多重音声信号と第２の多重音声信号とに含まれる第１の音声信号の任意の位置が一致する重畳位置関係である。 The first superimposed positional relationship is a superimposed positional relationship in which arbitrary positions of the first audio signal included in the first multiplexed audio signal and the second multiplexed audio signal match.

第２の重畳位置関係は、第１の多重音声信号と第２の多重音声信号とに含まれる第２の音声信号の任意の位置が一致する重畳位置関係である。 The second superposition position relationship is a superposition position relationship in which arbitrary positions of the second audio signal included in the first multiplexed audio signal and the second multiplexed audio signal match.

第３の重畳位置関係は、第１の多重音声信号と第２の多重音声信号とに含まれる第３の音声信号の任意の位置が一致する重畳位置関係である。 The third superimposed positional relationship is a superimposed positional relationship in which arbitrary positions of the third audio signal included in the first multiplexed audio signal and the second multiplexed audio signal match.

第４の重畳位置関係は、第１の多重音声信号と第２の多重音声信号とに含まれる第４の音声信号の任意の位置が一致する重畳位置関係である。 The fourth superimposed positional relationship is a superimposed positional relationship in which arbitrary positions of the fourth audio signal included in the first multiplexed audio signal and the second multiplexed audio signal match.

なお、第１〜第４の音声信号には、順に、１〜４の音声番号が割り当てられているものとする。そして、操作部４３０は、入力値にしたがって、あたかも音声番号を指定するポインタを移動させるように、第１の多重音声信号と第２の多重音声信号との重畳位置関係を切り替える。 It is assumed that the first to fourth audio signals are assigned audio numbers 1 to 4 in order. Then, operation unit 430 switches the superposition position relationship between the first multiplexed audio signal and the second multiplexed audio signal so as to move the pointer that designates the audio number according to the input value.

また、第１〜第４の重畳位置関係は、例えば、時間調整部４２０が、音声多重化装置１００から遅延時間ｄ１〜ｄ３を示す情報を取得して保持しておき、遅延時間ｄ１〜ｄ３に基づいて設定するものとする。 In addition, the first to fourth superposition position relationships are such that, for example, the time adjustment unit 420 acquires and holds information indicating the delay times d1 to d3 from the audio multiplexing apparatus 100, and the delay times d1 to d3 are set. It shall be set based on this.

図５は、本実施の形態における重畳音声信号の構成の例を模式的に示す図であり、図４に対応するものである。図５（Ａ）は、音声番号１が指定されたとき（つまり第１の重畳位置関係で重畳が行われたとき）の重畳音声信号の構成を示す。図５（Ｂ）は、音声番号２が指定されたとき（つまり第２の重畳位置関係で重畳が行われたとき）の重畳音声信号の構成を示す。 FIG. 5 is a diagram schematically showing an example of the configuration of the superimposed audio signal in the present embodiment, and corresponds to FIG. FIG. 5A shows the structure of the superimposed audio signal when audio number 1 is designated (that is, when superimposition is performed in the first superposition position relationship). FIG. 5B shows the configuration of the superimposed audio signal when the audio number 2 is designated (that is, when superimposition is performed in the second superposition position relationship).

図５（Ａ）に示すように、音声番号１が指定されたときの重畳音声信号６４０では、第１の音声信号６１１の位置Ｖｔａは、第１の多重音声信号６２０と第２の多重音声信号６３０との間で一致する。そして、他の第２〜第４の音声信号６１２〜６１４の各位置Ｖｔｂ、Ｖｔｃ、Ｖｔｄは、第１の多重音声信号６２０と第２の多重音声信号６３０との間で、いずれも一致しない。 As shown in FIG. 5A, in the superimposed audio signal 640 when the audio number 1 is designated, the position Vta of the first audio signal 611 is the first multiplexed audio signal 620 and the second multiplexed audio signal. 630. The positions Vtb, Vtc, and Vtd of the other second to fourth audio signals 612 to 614 do not match between the first multiplexed audio signal 620 and the second multiplexed audio signal 630.

位置が一致した状態で同一の２つの音声信号が重畳された重畳音声信号６４０は、振幅は倍になり、音量が増すことになる。一方、位置が一致していない状態で同一の２つの音声信号が重畳された重畳音声信号６４０は、振幅は倍にはならず、その音声は、残響あるいは反響を伴っているように聴こえ、輪郭がぼやけた音となる。 The superimposed audio signal 640 obtained by superimposing the same two audio signals in a state where the positions coincide with each other doubles the amplitude and increases the volume. On the other hand, the superimposed audio signal 640 in which the same two audio signals are superimposed in a state where the positions do not coincide with each other does not double the amplitude, and the audio is heard as having reverberation or reverberation, and has a contour. Becomes a blurred sound.

したがって、図５（Ａ）に示すような重畳音声信号６４０の音声（以下「重畳音声」という）では、第１の音声信号６１１の音声のみが明瞭に聞こえ、第２〜第４の音声信号６１２〜６１４の各音声は、不明瞭に聞こえることになる。 Therefore, in the audio of the superimposed audio signal 640 as shown in FIG. 5A (hereinafter referred to as “superimposed audio”), only the audio of the first audio signal 611 is clearly heard, and the second to fourth audio signals 612 are clearly heard. Each of the sounds ˜614 will be heard indefinitely.

図５（Ｂ）に示すように、音声番号２が指定されたときの重畳音声信号６５０では、第２の音声信号６１２の位置Ｖｔｂのみが、第１の多重音声信号６２０と第２の多重音声信号６３０との間で一致する。このような重畳音声では、第２の音声信号６１２の音声のみが明瞭に聞こえることになる。 As shown in FIG. 5B, in the superimposed audio signal 650 when the audio number 2 is designated, only the position Vtb of the second audio signal 612 is the first multiplexed audio signal 620 and the second multiplexed audio. Matches with signal 630. In such superposed audio, only the audio of the second audio signal 612 can be heard clearly.

したがって、音声聴取装置４００は、多重化位置関係を切り替えることにより、明瞭に聞こえる音声を切り替え、任意の音声信号を選択的に聞こえ易くすることができる。 Therefore, the voice listening device 400 can switch the multiplexed positional relationship to switch the voice that can be clearly heard and easily make an arbitrary voice signal easily heard.

なお、多重化位置関係は、非注目音声の遅延時間が短過ぎると、注目音声と非注目音声との聞こえ方の差が小さくなる。多重化位置関係は、逆に、非注目音声の遅延時間が長すぎると、第１の多重音声信号６２０における当該非注目音声と第２の多重音声信号６３０における当該非注目音声とが独立して、同じ音声が２度出力されたように聞こえてしまう。そこで、０以外の全ての遅延時間（図４（Ｂ）の遅延時間ｄ１〜ｄ３）は、数十ミリ秒から数百ミリ秒など、実験などによって予め定められた数値範囲に収まることが望ましい。 As for the multiplexing position relationship, if the delay time of the non-target voice is too short, the difference in how the target voice and the non-target voice are heard becomes small. Conversely, if the delay time of the non-target speech is too long, the non-target speech in the first multiplexed speech signal 620 and the non-target speech in the second multiplexed speech signal 630 are independent. , It sounds like the same sound was output twice. Therefore, it is desirable that all delay times other than 0 (delay times d1 to d3 in FIG. 4B) fall within a numerical range predetermined by experiments, such as several tens of milliseconds to several hundred milliseconds.

更にいえば、第１〜第４の音声信号は、所定の時間ずつずれていることが望ましい。すなわち、遅延時間ｄ２は、遅延時間ｄ１の２倍であり、遅延時間ｄ３は遅延時間ｄ１の３倍であることが望ましい。これにより、時間調整部４２０は、重畳位置関係の調整を、遅延時間ｄ１を単位として行うことができ、その処理が容易となる。 Furthermore, it is desirable that the first to fourth audio signals are shifted by a predetermined time. That is, it is desirable that the delay time d2 is twice the delay time d1 and the delay time d3 is three times the delay time d1. Thereby, the time adjustment unit 420 can adjust the superposition position relationship in units of the delay time d1, and the processing becomes easy.

図２の音声出力部４４０は、重畳音声信号を音声出力装置５００へ出力する。 The audio output unit 440 in FIG. 2 outputs the superimposed audio signal to the audio output device 500.

具体的には、音声出力部４４０は、入力された重畳音声信号を、音声出力装置５００へ送信する。 Specifically, the audio output unit 440 transmits the input superimposed audio signal to the audio output device 500.

音声出力装置５００は、例えば、ユーザがパーソナルコンピュータに接続して使用するヘッドフォンである。音声出力装置５００は、音声聴取装置４００から送信された重畳音声信号を受信し、音声に変換して出力する。 The audio output device 500 is, for example, a headphone that is used by a user connected to a personal computer. The audio output device 500 receives the superimposed audio signal transmitted from the audio listening device 400, converts it into audio, and outputs it.

音声提供装置３００、音声多重化装置１００、および音声聴取装置４００は、例えば、ＣＰＵ、およびＲＡＭ（random access memory）などの記憶媒体等を有する。この場合、上述の各機能部は、ＣＰＵにより制御プログラムが実行することにより実現される。 The audio providing device 300, the audio multiplexing device 100, and the audio listening device 400 include, for example, a CPU and a storage medium such as a RAM (random access memory). In this case, each functional unit described above is realized by the control program being executed by the CPU.

このような音声多重化システム２００は、複数の音声信号の多重化位置が異なる、二種類の多重音声信号を出力することができる。 Such an audio multiplexing system 200 can output two types of multiplexed audio signals having different multiplexing positions of a plurality of audio signals.

このような二種類の多重音声信号は、複数の音声信号のうちの任意の１つのみを選択的に一致させた状態で、重畳することができる。一致した音声信号の音声は、一致していない音声信号の音声に比べて、より明瞭となり、より聞き取り易くなる。すなわち、このような二種類の多重音声信号は、重畳の際の相対位置関係を調整するだけで、任意の音声信号を、選択的に聞き取り易くすることができる。 Such two types of multiplexed audio signals can be superimposed in a state where only one of a plurality of audio signals is selectively matched. The voice of the matched voice signal is clearer and easier to hear than the voice of the mismatched voice signal. That is, these two types of multiplexed audio signals can be made easy to selectively listen to any audio signal simply by adjusting the relative positional relationship during superimposition.

音声多重化システム２００は、特許文献１記載の技術のようにフィルタを何度も作成したり、特許文献２記載の技術のように音声信号ごとの音声処理を行ったりする必要がない。したがって、音声多重化システム２００は、従来技術に比べて、処理負荷を抑えた状態で、注目音を聞き取り易くすることができる。 The audio multiplexing system 200 does not need to create a filter many times as in the technique described in Patent Document 1 or perform audio processing for each audio signal as in the technique described in Patent Document 2. Therefore, the audio multiplexing system 200 can make it easier to hear the sound of interest in a state where the processing load is suppressed as compared with the prior art.

また、音声多重化システム２００は、ユーザが指定した音声番号の音声信号が一致するように、二種類の多重音声信号の重畳位置を調整することができる。これにより、音声多重化システム２００は、ユーザの所望の音声（注目音）のみを聞き取り易くすることができる。 Also, the audio multiplexing system 200 can adjust the superposition positions of the two types of multiplexed audio signals so that the audio signals of the audio numbers designated by the user match. Thereby, the voice multiplexing system 200 can make it easy to hear only the user's desired voice (noticeable sound).

また、音声多重化システム２００は、複数のユーザから発話音声を取得し、これを多重化して再生することができる。これにより、音声多重化システム２００は、多数人での音声チャットを、注目音のみを聞き取り易くした状態で実現することができる。 Also, the voice multiplexing system 200 can acquire speech voices from a plurality of users, multiplex them, and reproduce them. As a result, the voice multiplexing system 200 can realize voice chat with a large number of people in a state where only the target sound is easily heard.

また、音声多重化システム２００は、音声信号の多重化の際に、音声の振幅の圧縮を行うので、多重化された音声信号の振幅が大きくなり過ぎて再生音が歪むのを防ぐことができる。 In addition, since the audio multiplexing system 200 compresses the audio amplitude when the audio signal is multiplexed, it is possible to prevent the reproduction sound from being distorted due to the amplitude of the multiplexed audio signal being excessively large. .

以上で、本実施の形態に係る各装置およびシステムの構成についての説明を終える。 This is the end of the description of the configuration of each device and system according to the present embodiment.

次に、本実施の形態に係る各装置の動作について説明する。 Next, the operation of each device according to the present embodiment will be described.

図６は、音声多重化装置１００の動作の一例を示すフローチャートである。 FIG. 6 is a flowchart showing an example of the operation of the audio multiplexing apparatus 100.

まず、ステップＳ１０１０において、音声入力部１１０は、第１〜第４の音声提供装置３００−１〜３００−４から送信された各音声信号（第１〜第４の音声信号）を、受信する。例えば、音声入力部１１０は、予め定められた周期毎に、音声信号の受信を行い、次のステップ１０２０へ進む。 First, in step S1010, the voice input unit 110 receives each voice signal (first to fourth voice signals) transmitted from the first to fourth voice providing apparatuses 300-1 to 300-4. For example, the voice input unit 110 receives a voice signal every predetermined period, and proceeds to the next step 1020.

そして、ステップＳ１０２０において、音声圧縮部１１１は、受信した受信した各音声信号（第１〜第４の音声信号）の振幅を、それぞれ圧縮する。 In step S1020, the audio compression unit 111 compresses the amplitude of each received audio signal (first to fourth audio signals) received.

そして、ステップＳ１０３０において、第１の音声多重化部１２０は、第１の多重音声信号を生成し、第１のチャンネルでの送信の対象として出力する。すなわち、第１のチャンネルは、全ての音声が遅延なく多重化されたチャンネルなる。 Then, in step S1030, first audio multiplexing section 120 generates a first multiplexed audio signal and outputs it as a transmission target on the first channel. That is, the first channel is a channel in which all voices are multiplexed without delay.

そして、ステップＳ１０４０において、遅延処理部１３１は、音声信号ごとに定めた遅延を、各音声信号に設定する。すなわち、遅延処理部１３１は、各音声信号を、適宜、それぞれ異なる遅延時間で遅延させる処理（以下「遅延処理」という）を行う。 In step S1040, the delay processing unit 131 sets a delay determined for each audio signal for each audio signal. That is, the delay processing unit 131 performs processing (hereinafter referred to as “delay processing”) for appropriately delaying each audio signal with a different delay time.

そして、ステップＳ１０５０において、第２の音声多重化部１３０は、遅延処理後の音声信号（第１〜第４の音声信号）から第２の多重音声信号を生成し、第２のチャンネルでの送信の対象として出力する。すなわち、第２のチャンネルは、各音声が他の全ての音声とずれた状態で多重化されたチャンネルなる。 In step S1050, the second audio multiplexing unit 130 generates a second multiplexed audio signal from the audio signals after delay processing (first to fourth audio signals), and transmits the second multiplexed audio signal on the second channel. Output as the target of. In other words, the second channel is a channel in which each sound is multiplexed in a state where it is shifted from all other sounds.

そして、ステップＳ１０６０において、多重音声送信部１４１は、第１および第２の多重音声信号を、音声聴取装置４００へと送信する。 In step S <b> 1060, the multiple sound transmission unit 141 transmits the first and second multiple sound signals to the sound listening device 400.

そして、ステップＳ１０７０において、音声入力部１１０は、ユーザ操作などにより、音声の多重化の処理の終了要求があったか否かを判断する。 In step S1070, the voice input unit 110 determines whether or not there has been a request to end the voice multiplexing process by a user operation or the like.

音声入力部１１０は、終了要求がない場合（Ｓ１０７０：ＮＯ）、ステップＳ１０１０へ戻る。また、音声入力部１１０は、終了要求があった場合（Ｓ１０７０：ＹＥＳ）、一連の処理を終了する。 If there is no termination request (S1070: NO), the voice input unit 110 returns to step S1010. In addition, when there is an end request (S1070: YES), the voice input unit 110 ends a series of processes.

このような動作により、音声多重化装置１００は、音声提供装置３００から複数の音声信号を受信し、複数の音声信号の多重化位置関係が異なる二種類の多重音声信号を、音声聴取装置４００へ連続的に送信することができる。 With this operation, the audio multiplexing apparatus 100 receives a plurality of audio signals from the audio providing apparatus 300, and transmits two types of multiplexed audio signals having different multiplexing positional relationships to the audio listening apparatus 400. It can be transmitted continuously.

図７は、音声聴取装置４００の動作の一例を示すフローチャートである。なお、時間調整部４２０は、例えば、第１の重畳位置関係を、所定の重畳位置関係の初期状態とする。 FIG. 7 is a flowchart showing an example of the operation of the audio listening device 400. Note that the time adjustment unit 420 sets, for example, the first superimposed position relationship as an initial state of the predetermined superimposed position relationship.

まず、ステップＳ２０１０において、多重音声受信部４１０は、音声多重化装置１００から送信された第１の多重音声信号および第２の多重音声信号を受信する。例えば、多重音声受信部４１０は、予め定められた周期毎に、第１の多重音声信号および第２の多重音声信号の受信を行い、次のステップＳ２０２０へ進む。 First, in step S2010, the multiplex audio receiving unit 410 receives the first multiplex audio signal and the second multiplex audio signal transmitted from the audio multiplexer 100. For example, the multiplex sound receiving unit 410 receives the first multiplex sound signal and the second multiplex sound signal at predetermined intervals, and proceeds to the next step S2020.

そして、ステップＳ２０２０において、時間調整部４２０は、第１の多重音声信号および第２の多重音声信号から重畳音声信号を生成する。そして、音声出力部４４０は、この重畳音声信号を、音声出力装置５００へ送信する。重畳音声信号は、上述の通り、第１の多重音声信号および第２の多重音声信号を、現在の所定の重畳位置関係で重畳したものである。 In step S2020, time adjustment section 420 generates a superimposed audio signal from the first multiplexed audio signal and the second multiplexed audio signal. Then, the audio output unit 440 transmits the superimposed audio signal to the audio output device 500. As described above, the superimposed audio signal is obtained by superimposing the first multiplexed audio signal and the second multiplexed audio signal in the current predetermined overlapping position relationship.

そして、ステップＳ２０３０において、多重音声受信部４１０は、ユーザ操作などにより、音声の多重化の処理の終了要求があったか否かを判断する。 In step S2030, the multiplexed sound receiving unit 410 determines whether or not there has been a request for termination of the sound multiplexing process by a user operation or the like.

多重音声受信部４１０は、終了要求がない場合（Ｓ２０３０：ＮＯ）、ステップＳ２０４０へ進む。また、多重音声受信部４１０は、終了要求があった場合（Ｓ２０３０：ＹＥＳ）、一連の処理を終了する。 If there is no termination request (S2030: NO), the multiplexed sound receiving unit 410 proceeds to step S2040. In addition, when there is an end request (S2030: YES), the multiplexed sound receiving unit 410 ends a series of processes.

ステップＳ２０４０において、操作部４３０は、プラス方向またはマイナス方向の入力値があったか、つまり、音声番号を指定するポインタ移動の操作の入力があったか否かを判断する。 In step S2040, the operation unit 430 determines whether there is an input value in the plus direction or the minus direction, that is, whether there is an input of a pointer movement operation that designates a voice number.

操作部４３０は、移動操作があった場合（Ｓ２０４０：ＹＥＳ）、ステップＳ２０５０へ進む。また、操作部４３０は、移動操作がない場合（Ｓ２０４０：ＮＯ）、ステップＳ２０１０へ戻る。 If there is a moving operation (S2040: YES), operation unit 430 proceeds to step S2050. If there is no movement operation (S2040: NO), operation unit 430 returns to step S2010.

ステップＳ２０５０において、操作部４３０は、ポインタ移動がプラス方向であるか否かを判断する。 In step S2050, operation unit 430 determines whether or not the pointer movement is in the plus direction.

すなわち、操作部４３０は、音声番号１から音声番号２へというように、音声番号が増大する方向にポインタが移動されたか否かを判断する。 That is, the operation unit 430 determines whether or not the pointer has been moved in the direction in which the voice number increases, such as from voice number 1 to voice number 2.

操作部４３０は、ポインタ移動がプラス方向である場合（Ｓ２０５０：ＹＥＳ）、ステップＳ２０６０へ進む。また、操作部４３０は、ポインタ移動がマイナス方向である場合（Ｓ２０５０：ＮＯ）、ステップＳ２０７０へ進む。 If the pointer movement is in the plus direction (S2050: YES), operation unit 430 proceeds to step S2060. If the pointer movement is in the minus direction (S2050: NO), operation unit 430 proceeds to step S2070.

ステップＳ２０６０において、操作部４３０は、現状よりも第１の多重音声信号（つまり第１のチャンネルの信号）を相対的に遅延させるように、時間調整部４２０の所定の重畳位置関係を切り替えて、ステップＳ２０１０へ戻る。 In step S2060, the operation unit 430 switches the predetermined superposition position relationship of the time adjustment unit 420 so as to relatively delay the first multiplexed audio signal (that is, the signal of the first channel) from the current state, The process returns to step S2010.

すなわち、操作部４３０は、第１の音声信号から第２の音声信号へというように、１つ大きい音声番号に対応する音声信号の任意の位置を、第１の多重音声信号と第２の多重音声信号との間で一致させる。これは、重畳音声信号を、図５（Ａ）に示す状態から、図５（Ｂ）に示す状態へと切り替えることに相当する。 That is, the operation unit 430 moves the first multiplexed audio signal and the second multiplexed signal at an arbitrary position of the audio signal corresponding to the one larger audio number, such as from the first audio signal to the second audio signal. Match between audio signals. This corresponds to switching the superimposed audio signal from the state shown in FIG. 5A to the state shown in FIG.

ステップＳ２０７０において、操作部４３０は、現状よりも第２の多重音声信号（つまり第２のチャンネルの信号）を相対的に遅延させるように、時間調整部４２０の所定の重畳位置関係を切り替えて、ステップＳ２０１０へ戻る。 In step S2070, the operation unit 430 switches the predetermined superposition position relationship of the time adjustment unit 420 so as to relatively delay the second multiplexed audio signal (that is, the signal of the second channel) from the current state, The process returns to step S2010.

すなわち、操作部４３０は、第２の音声信号から第１の音声信号へというように、１つ小さい音声番号に対応する音声信号の任意の位置を、第１の多重音声信号と第２の多重音声信号との間で一致させる。これは、重畳音声信号を、図５（Ｂ）に示す状態から、図５（Ａ）に示す状態へと切り替えることに相当する。 That is, the operation unit 430 moves the first multiplexed audio signal and the second multiplexed signal at an arbitrary position of the audio signal corresponding to the smaller audio number, such as from the second audio signal to the first audio signal. Match between audio signals. This is equivalent to switching the superimposed audio signal from the state shown in FIG. 5B to the state shown in FIG.

このような動作により、音声聴取装置４００は、複数の音声信号の多重化位置関係が異なる二種類の多重音声信号を重畳した重畳音声信号を、音声出力装置５００へ連続的に送信することができる。また、音声聴取装置４００は、ユーザが所望する音声信号が聞こえ易くなるように、重畳位置関係を調整することができる。 With such an operation, the audio listening device 400 can continuously transmit a superimposed audio signal on which two types of multiplexed audio signals having different multiplexing positional relationships of a plurality of audio signals are superimposed to the audio output device 500. . In addition, the audio listening device 400 can adjust the superposition position relationship so that an audio signal desired by the user can be easily heard.

なお、操作部４３０は、入力値の累積値の上限および下限の判定を行うことが望ましいが、ここでは省略している。入力値の累積値の上限は、音声信号の数から１引いた数（本実施の形態では４−１＝３）となる。また、入力値の累積値の下限は、０となる。 Note that the operation unit 430 desirably determines the upper limit and the lower limit of the cumulative value of input values, but is omitted here. The upper limit of the cumulative value of the input value is a number obtained by subtracting 1 from the number of audio signals (4-1 = 3 in the present embodiment). Further, the lower limit of the cumulative value of the input value is 0.

また、操作部４３０は、入力値の累積値に上限および下限を設けず、累積値が上限を超えたとき累積値を下限（０）にし、累積値が下限を下回ったとき累積値を上限（３）にするようにしてもよい。また、この場合、操作部４３０は、プラス方向の入力値、および、マイナス方向の入力値の一方のみを受け付けるようにすることができる。 In addition, the operation unit 430 does not set an upper limit and a lower limit on the cumulative value of the input value, sets the cumulative value to the lower limit (0) when the cumulative value exceeds the upper limit, and sets the cumulative value to the upper limit ( 3). Further, in this case, the operation unit 430 can accept only one of the input value in the plus direction and the input value in the minus direction.

また、操作部４３０は、１方向の入力値のみを受け付ける場合、累積値が上限に達したとき、入力値をマイナス方向の値として扱い、累積値が下限に達したとき、入力値をプラス方向の値として扱うようにしてもよい。 Further, when the operation unit 430 accepts only an input value in one direction, when the cumulative value reaches the upper limit, the operation unit 430 treats the input value as a negative value, and when the cumulative value reaches the lower limit, the input value is increased in the positive direction. It may be handled as a value of.

以上で、本実施の形態に係る各装置の動作についての説明を終える。 This is the end of the description of the operation of each device according to the present embodiment.

以上のように、本実施の形態に係る音声多重化システム２００は、複数の音声信号の多重化位置が異なる二種類の多重音声信号を生成し、これらの重畳音声信号を、その重畳位置関係を調整して出力することができる。これにより、音声多重化システム２００は、従来技術に比べて、処理負荷を抑えた状態で、複数の音声を同時に出力しつつ、注目音を聞き取り易くすることができる。 As described above, the audio multiplexing system 200 according to the present embodiment generates two types of multiplexed audio signals having different multiplexing positions of a plurality of audio signals, and uses these superimposed audio signals as their superimposed position relationships. Adjust and output. As a result, the voice multiplexing system 200 can make it easier to hear the sound of interest while simultaneously outputting a plurality of voices with a reduced processing load as compared with the prior art.

なお、音声多重化システム２００は、３種類以上の多重音声信号を生成し、これらを、それぞれの重畳位置関係を調整して出力するようにしてもよい。この場合、時間調整部４２０は、聞き取り易くする対象として指定された音声信号のみが全ての多重音声信号間で一致するように、重畳位置関係を調整すればよい。 Note that the audio multiplexing system 200 may generate three or more types of multiplexed audio signals, and adjust these superimposed positional relationships to output them. In this case, the time adjustment unit 420 may adjust the superposition position relationship so that only the audio signal designated as the target to be easily heard matches between all the multiplexed audio signals.

また、音声多重化システム２００は、音声信号がデジタルの場合、多重音声信号単位（チャンネル単位）で、サンプリング周波数を下げてもよい。例えば、音声多重化装置１００は、第１の多重音声信号については、高品質の音声信号のままで送信し、第２の多重音声信号については、そのサンプリング周波数を下げてから、送信する。これにより、音声多重化システム２００は、重畳音声信号の音声の音質を劣化させずに、扱うデータ量を低減し、処理負荷を低減することが可能となる。 In addition, when the audio signal is digital, the audio multiplexing system 200 may lower the sampling frequency in units of multiplexed audio signals (channel units). For example, the audio multiplexing apparatus 100 transmits the first multiplexed audio signal as it is as a high-quality audio signal, and transmits the second multiplexed audio signal after lowering its sampling frequency. As a result, the audio multiplexing system 200 can reduce the amount of data to be handled and the processing load without deteriorating the sound quality of the superimposed audio signal.

また、音声多重化システム２００は、多重音声信号の数（チャンネル数）が２である場合、それぞれの多重音声信号を、従来のステレオ音声の左右チャンネルに割り当ててもよい。これにより、音声多重化システム２００は、多重音声信号の通信処理を従来のステレオ音声のシステムと共通化することができる。 Also, when the number of multiplexed audio signals (number of channels) is 2, the audio multiplexing system 200 may assign each multiplexed audio signal to the left and right channels of conventional stereo audio. As a result, the audio multiplexing system 200 can share the communication processing of the multiplexed audio signal with the conventional stereo audio system.

また、音声多重化システム２００は、各音声信号を、ステレオ音声の左右チャンネルに割り当ててもよく、また、ステレオ音声により実現される仮想音響空間に立体的に配置してもよい。これにより、音声多重化システム２００は、注目音声を更に聞き分け易くすることができる。 Also, the audio multiplexing system 200 may assign each audio signal to the left and right channels of stereo audio, or may arrange them in a three-dimensional manner in a virtual acoustic space realized by stereo audio. Thereby, the voice multiplexing system 200 can make it easier to distinguish the target voice.

また、音声多重化システム２００は、遅延時間が上述の定められた数値範囲に収まるように、遅延時間の上限値を遅延時間の下限値で除算した数以下に、出力対象とする音声信号の数を制限しても良い。これにより、音声多重化システム２００は、音声が聞き取り辛くなるのを防ぐことができる。 Also, the audio multiplexing system 200 allows the number of audio signals to be output to be equal to or less than the number obtained by dividing the upper limit value of the delay time by the lower limit value of the delay time so that the delay time falls within the above-defined numerical range. May be limited. Thereby, the voice multiplexing system 200 can prevent the voice from becoming difficult to hear.

なお、音声多重化システム２００は、複数の音声信号のうち、互いに聞き分け易い複数の音声信号が存在するとき、それらの音声信号の遅延時間をずらさないようにしてもよい。 Note that when there are a plurality of audio signals that are easy to distinguish from each other among the plurality of audio signals, the audio multiplexing system 200 may not shift the delay times of the audio signals.

例えば、音声多重化システム２００ａは、複数の音声信号を、仮想音源空間に円弧状に配置して出力する場合、位置が離れている音声信号については、遅延時間を一致させる。また、例えば、音声多重化システム２００ａは、音程が大きく異なる発話音声の音声信号については、遅延時間を一致させる。 For example, when the audio multiplexing system 200a outputs a plurality of audio signals arranged in an arc shape in the virtual sound source space, the delay times of the audio signals that are separated from each other are matched. In addition, for example, the audio multiplexing system 200a matches the delay times for speech signals of speech sounds having greatly different pitches.

これにより、音声多重化システム２００は、音声が聞き取り辛くなるのを防ぎつつ、同時に出力する音声信号の数を増やすことができる。 Thereby, the audio multiplexing system 200 can increase the number of audio signals to be simultaneously output while preventing the audio from becoming difficult to hear.

（実施の形態３）
本発明の実施の形態３は、第２の多重音声信号の位相を反転させることにより、非注目音声の打ち消しを行う例である。 (Embodiment 3)
The third embodiment of the present invention is an example in which non-target speech is canceled by inverting the phase of the second multiplexed speech signal.

図８は、本実施の形態に係る音声多重化装置および音声聴取装置ならびに音声多重化システムの構成の一例を示すブロック図であり、実施の形態２の図２に対応するものである。図２と同一部分には同一符号を付し、これについての説明を省略する。 FIG. 8 is a block diagram showing an example of the configuration of the speech multiplexing apparatus, speech listening apparatus, and speech multiplexing system according to the present embodiment, and corresponds to FIG. 2 of the second embodiment. The same parts as those in FIG.

図８において、音声多重化システム２００ａは、図１の音声多重化装置１００に代えて、音声多重化装置１００ａを有する。音声多重化装置１００ａの第２の音声多重化部１３０ａは、遅延処理部１３１に加えて、位相反転部１３２ａを有する。 In FIG. 8, a voice multiplexing system 200a includes a voice multiplexing apparatus 100a instead of the voice multiplexing apparatus 100 of FIG. In addition to the delay processing unit 131, the second audio multiplexing unit 130a of the audio multiplexing device 100a includes a phase inverting unit 132a.

位相反転部１３２ａは、第１の多重音声信号および第２の多重音声信号の一方に含まれる、第１の音声信号および第２の音声信号の位相を、それぞれ反転させる。本実施の形態において、位相反転部１３２ａは、第２の多重音声信号の第１〜第４の音声信号の位相を、全て反転させるものとする。位相反転部１３２ａは、位相反転を、遅延処理の前に行ってもよいし、遅延処理の後に行ってもよい。 The phase inverting unit 132a inverts the phases of the first audio signal and the second audio signal included in one of the first multiplexed audio signal and the second multiplexed audio signal, respectively. In the present embodiment, phase inversion section 132a inverts all the phases of the first to fourth audio signals of the second multiplexed audio signal. The phase inversion unit 132a may perform the phase inversion before the delay process or after the delay process.

なお、本実施の形態において、遅延処理部１３１は、第２の多重音声信号に含まれる音声信号を２グループに分け、そのうちの１つのグループの音声信号を、全て同一の遅延時間で遅延させるものとする。 In the present embodiment, the delay processing unit 131 divides the audio signals included in the second multiplexed audio signal into two groups, and delays all the audio signals of one group with the same delay time. And

具体的には、遅延処理部１３１は、第１および第３の音声信号を非遅延グループとし、第２および第４の音声信号を遅延グループとして、第２および第４の音声信号を、第１の遅延時間で遅延させるものとする。 Specifically, the delay processing unit 131 uses the first and third audio signals as non-delay groups, the second and fourth audio signals as delay groups, and the second and fourth audio signals as the first. The delay time is assumed to be delayed.

図９は、本実施の形態における第１および第２の多重音声信号の構成の一例を模式的に示す図であり、実施の形態２の図４に対応するものである。 FIG. 9 is a diagram schematically showing an example of the configuration of the first and second multiplexed audio signals in the present embodiment, and corresponds to FIG. 4 of the second embodiment.

図９（Ａ）に示す、第１の多重音声信号６２０における第１〜第４の音声信号６１１〜６１４の相対的な位置関係（第１の多重化位置関係）は、実施の形態２と同様、図３に示す入力音声信号６１０における相対的な位置関係と同一となる。 The relative positional relationship (first multiplexed positional relationship) of the first to fourth audio signals 611 to 614 in the first multiplexed audio signal 620 shown in FIG. 9A is the same as in the second embodiment. This is the same as the relative positional relationship in the input audio signal 610 shown in FIG.

一方、図９（Ｂ）に示す、本実施の形態の第２の多重音声信号６３０は、第１〜第４の音声信号６１１〜６１４をそれぞれ位相反転した、第１〜第４の反転音声信号６１１'〜６１４'により構成される。そして、第１〜第４の反転音声信号６１１'〜６１４'の相対的な位置関係（第２の多重化位置関係）は、実施の形態２と異なり、遅延グループである第２と第４の反転音声信号６１２'、６１４'のみが、第１の遅延時間ｄ１で遅延している。 On the other hand, the second multiplexed audio signal 630 of the present embodiment shown in FIG. 9B is a first to fourth inverted audio signal obtained by inverting the phases of the first to fourth audio signals 611 to 614, respectively. 611 ′ to 614 ′. Unlike the second embodiment, the relative positional relationship (second multiplexed positional relationship) between the first to fourth inverted audio signals 611 ′ to 614 ′ is the second and fourth delay groups. Only the inverted audio signals 612 ′ and 614 ′ are delayed by the first delay time d1.

すなわち、第３の反転音声信号６１３'の上述の位置Ｖｔｃは、第１の反転音声信号６１１'の位置Ｖｔａと一致している。そして、第２および第４の反転音声信号６１２'、６１４'の上述の各位置Ｖｔｂ、Ｖｔｄは、位置Ｖｔａに対応する時刻ｔ１に対して、それぞれ第１の遅延時間ｄ１だけ遅延している。 In other words, the above-described position Vtc of the third inverted audio signal 613 ′ matches the position Vta of the first inverted audio signal 611 ′. The above-described positions Vtb and Vtd of the second and fourth inverted audio signals 612 ′ and 614 ′ are respectively delayed by the first delay time d1 with respect to the time t1 corresponding to the position Vta.

音声聴取装置４００は、このような第１の多重音声信号および第２の多重音声信号を受信し、これらを重畳して、重畳音声信号を生成する。この際、音声聴取装置４００の時間調整部４２０は、操作部４３０からの制御を受けて、上述の第１の重畳位置関係と第２の重畳位置関係との間で、上述の所定の重畳位置関係を切り替える。 The audio listening device 400 receives the first multiplexed audio signal and the second multiplexed audio signal and superimposes them to generate a superimposed audio signal. At this time, the time adjustment unit 420 of the audio listening device 400 receives the control from the operation unit 430, and performs the above-described predetermined overlapping position between the first overlapping position relationship and the second overlapping position relationship. Switch relationships.

なお、本実施の形態において、遅延グループ（第２および第４音声信号）には音声番号１が割り当てられ、非遅延グループ（第１および第３の音声信号）には音声番号２が割り当てられているものとする。 In this embodiment, voice number 1 is assigned to the delay group (second and fourth voice signals), and voice number 2 is assigned to the non-delay group (first and third voice signals). It shall be.

図１０は、本実施の形態における重畳音声信号の構成の例を模式的に示す図であり、図５に対応するものである。 FIG. 10 is a diagram schematically showing an example of the configuration of the superimposed audio signal in the present embodiment, and corresponds to FIG.

図１０（Ａ）に示すように、音声番号１（遅延グループ）が指定されたときの重畳音声信号６６０において、第１の音声信号６１１の位置Ｖｔａと、これを位相反転した第１の反転音声信号６１１'の位置Ｖｔａとは、一致する。また、同様に、第３の音声信号６１３の位置Ｖｔｃと、第３の反転音声信号６１３'の位置Ｖｔａとは、一致する。 As shown in FIG. 10A, in the superimposed audio signal 660 when the audio number 1 (delay group) is designated, the position Vta of the first audio signal 611 and the first inverted audio obtained by inverting the phase thereof. It coincides with the position Vta of the signal 611 ′. Similarly, the position Vtc of the third audio signal 613 matches the position Vta of the third inverted audio signal 613 ′.

音声信号に、その位相が反転した関係にある音声信号が重畳されると、音声信号は、相殺される。したがって、音声番号１が指定されたときの重畳音声信号６６０の重畳音声では、第１の音声信号６１１の音声および第３の音声信号６１３の音声（非遅延グループの音声）は聞こえなくなる。 When an audio signal having a reversed phase is superimposed on the audio signal, the audio signal is canceled. Therefore, in the superimposed sound of the superimposed sound signal 660 when the sound number 1 is designated, the sound of the first sound signal 611 and the sound of the third sound signal 613 (the sound of the non-delay group) cannot be heard.

また、音声番号１が指定されたときの重畳音声信号６６０において、第２の音声信号６１２の位置Ｖｔｂと、第２の反転音声信号６１２'の位置Ｖｔｂとは、第１の遅延時間ｄ１だけずれる。また、同様に、第４の音声信号６１４の位置Ｖｔｄと、第４の反転音声信号６１４'の位置Ｖｔｄとは、第１の遅延時間ｄ１だけずれる。したがって、第２の音声信号６１２の音声および第４の音声信号６１４の音声（遅延グループの音声）は、多少輪郭がぼやけるものの、聞こえることになる。 Further, in the superimposed audio signal 660 when the audio number 1 is designated, the position Vtb of the second audio signal 612 and the position Vtb of the second inverted audio signal 612 ′ are shifted by the first delay time d1. . Similarly, the position Vtd of the fourth audio signal 614 and the position Vtd of the fourth inverted audio signal 614 ′ are shifted by the first delay time d1. Therefore, the sound of the second sound signal 612 and the sound of the fourth sound signal 614 (the sound of the delay group) can be heard although the outline is somewhat blurred.

一方、図１０（Ｂ）に示すように、音声番号２（非遅延グループ）が指定されたときの重畳音声信号６７０における各位置Ｖｔａ、Ｖｔｂ，Ｖｔｃ，Ｖｔｄの一致不一致は、図１０（Ａ）に示す重畳音声信号６６０の場合と逆のパターンとなる。したがって、音声番号２が指定されたときの重畳音声信号６７０の重畳音声では、遅延グループの音声はほとんど聞こえなくなり、非遅延グループの音声のみが聞こえることになる。 On the other hand, as shown in FIG. 10B, the coincidence / non-coincidence of the positions Vta, Vtb, Vtc, Vtd in the superimposed audio signal 670 when the audio number 2 (non-delay group) is designated is shown in FIG. The reverse pattern is the case of the superimposed audio signal 660 shown in FIG. Therefore, in the superimposed voice of the superimposed voice signal 670 when the voice number 2 is designated, the voice of the delay group can hardly be heard and only the voice of the non-delayed group can be heard.

したがって、音声聴取装置４００は、多重化位置関係を切り替えることにより、音声の間引き方を切り替えることができ、２つのグループのうちの任意のグループの音声信号のみを、選択的に聞こえるようにすることができる。 Therefore, the audio listening device 400 can switch the audio thinning method by switching the multiplexing positional relationship, and can selectively hear only the audio signals of any group of the two groups. Can do.

このような音声多重化システム２００ａは、複数の音声信号の多重化位置が異なる二種類の多重音声信号のうち、一方を位相反転させておき、その重畳位置関係を調整して出力することができる。これにより、音声多重化システム２００ａは、非注目音を聞こえないようにし、相対的に注目音を聞き取り易くすることができる。 Such an audio multiplexing system 200a can invert the phase of one of two types of multiplexed audio signals having different multiplexing positions of a plurality of audio signals and adjust the superposition position relationship for output. . Thereby, the speech multiplexing system 200a can make the non-notable sound inaudible and make it relatively easy to hear the notable sound.

なお、音声多重化システム２００ａは、特に聞き分け辛い２つの音声信号が存在するとき、それらの音声信号が異なるグループに属するように、グループ分けを行うことが望ましい。 In addition, when there are two audio signals that are particularly difficult to distinguish, the audio multiplexing system 200a desirably performs grouping so that the audio signals belong to different groups.

例えば、音声多重化システム２００ａは、複数の音声信号を、仮想音源空間に円弧状に配置して出力する場合、各音声信号が属するグループが、その並びの順序において交互に異なるように、グループ分けを行う。また、例えば、音声多重化システム２００ａは、音程が近い発話音声を異なるグループに属するように、グループ分けを行う。 For example, when the audio multiplexing system 200a outputs a plurality of audio signals arranged in an arc shape in the virtual sound source space, grouping is performed so that the groups to which the audio signals belong are alternately different in the order of arrangement. I do. Further, for example, the speech multiplexing system 200a performs grouping so that speech sounds having close pitches belong to different groups.

これにより、音声多重化システム２００ａは、音声が聞き取り辛くなるのを防ぎつつ、同時に出力する音声信号の数を増やすことができる。 Thereby, the audio multiplexing system 200a can increase the number of audio signals to be simultaneously output while preventing the audio from becoming difficult to hear.

（実施の形態４）
本発明の実施の形態４は、本発明を、多数の音声信号（音声データ）を格納して再生するポータブルプレイヤーに適用した場合の、具体的態様の一例である。 (Embodiment 4)
Embodiment 4 of the present invention is an example of a specific mode when the present invention is applied to a portable player that stores and reproduces a large number of audio signals (audio data).

図１１は、本実施の形態に係る音声多重化装置および音声聴取装置ならびに音声多重化システムの構成の一例を示すブロック図であり、実施の形態２の図２に対応するものである。図２と同一部分には、同一符号を付し、これについての説明を省略する。 FIG. 11 is a block diagram showing an example of the configuration of the speech multiplexing apparatus, speech listening apparatus, and speech multiplexing system according to the present embodiment, and corresponds to FIG. 2 of the second embodiment. The same parts as those in FIG. 2 are denoted by the same reference numerals, and description thereof will be omitted.

図１１において、音声多重化システム２００ｂは、音声多重化装置１００ｂおよび音声出力装置５００を有する。 In FIG. 11, the audio multiplexing system 200b includes an audio multiplexing device 100b and an audio output device 500.

音声多重化装置１００ｂは、実施の形態２の音声入力部１１０に代えて、音声入力部１１０ｂを有し、更に、時間調整部４２０、操作部４３０、および音声出力部４４０を有する。本実施の形態において、音声多重化装置１００ｂは、例えば、ポータブルプレイヤーである。 The audio multiplexing apparatus 100b includes an audio input unit 110b instead of the audio input unit 110 of the second embodiment, and further includes a time adjustment unit 420, an operation unit 430, and an audio output unit 440. In the present embodiment, the audio multiplexing device 100b is, for example, a portable player.

音声入力部１１０ｂは、多数の音声信号を格納するデータベースを保持し、かかるデータベースから複数の音声信号を取得する。音声入力部１１０ｂは、音声圧縮部１１１、音声保持部１１２ｂ、および音声検索部１１３ｂを有する。 The voice input unit 110b holds a database that stores a large number of voice signals, and acquires a plurality of voice signals from the database. The voice input unit 110b includes a voice compression unit 111, a voice holding unit 112b, and a voice search unit 113b.

音声保持部１１２ｂは、上述のデータベースである。音声保持部１１２ｂが格納する各音声信号には、音声信号のメタ情報が付加されている。 The voice holding unit 112b is the database described above. Each audio signal stored in the audio holding unit 112b is added with meta information of the audio signal.

メタ情報としては、各種の情報を適用することができる。 Various kinds of information can be applied as the meta information.

データベースが多数の楽曲の音声データを集めたものである場合、メタ情報は、例えば、アーティスト名、およびジャンルを含むことができる。また、データベースが多数の講演の音声データを集めたものである場合、メタ情報は、例えば、日付、講演者名、および講演テーマを含むことができる。更に、メタ情報は、講演テーマが分類されるジャンルを含んでもよい。 When the database is a collection of audio data of a large number of music pieces, the meta information can include, for example, an artist name and a genre. Further, when the database is a collection of audio data of a large number of lectures, the meta information can include, for example, a date, a speaker name, and a lecture theme. Further, the meta information may include a genre in which the lecture theme is classified.

なお、音声入力部１１０ｂ自体が、例えば、各音声データに対して音声認識処理を行い、その認識結果を、メタ情報として、各音声信号に付与してもよい。 Note that the voice input unit 110b itself may perform voice recognition processing on each voice data, for example, and give the recognition result to each voice signal as meta information.

音声検索部１１３ｂは、例えば、ユーザから音声信号の条件を入力し、当該入力条件を満たすメタ情報が付加された音声信号を、音声保持部１１２ｂにて検索する。そして、音声検索部１１３ｂは、検索された音声信号を、音声圧縮部１１１へ出力する。 For example, the voice search unit 113b receives a voice signal condition from the user, and searches the voice holding unit 112b for a voice signal to which meta information satisfying the input condition is added. Then, the voice search unit 113b outputs the searched voice signal to the voice compression unit 111.

なお、以下の説明においては、複数の音声信号が常に検索されるものとする。１つの音声信号のみが検索された場合、例えば、後段の音声圧縮部１１１は、当該音声信号を、直接に音声出力部４４０へ出力すればよい。 In the following description, it is assumed that a plurality of audio signals are always searched. When only one audio signal is searched, for example, the audio compression unit 111 in the subsequent stage may output the audio signal directly to the audio output unit 440.

本実施の形態において、第１の音声多重化部１２０は、音声出力部４４０へ、第１の多重音声信号を出力する。また、第２の音声多重化部１３０は、時間調整部４２０へ、第２の多重音声信号を出力する。 In the present embodiment, first audio multiplexing section 120 outputs the first multiplexed audio signal to audio output section 440. Second audio multiplexing unit 130 outputs the second multiplexed audio signal to time adjustment unit 420.

なお、本実施の形態において、第１の音声多重化部１２０は、例えば、全ての音声信号の開始位置を揃えて再生する。 In the present embodiment, the first audio multiplexing unit 120 reproduces, for example, by aligning the start positions of all audio signals.

この場合、上述の第１の多重化位置関係は、第１の音声信号の開始位置に対して、他の全ての音声信号（第２の音声信号）の開始位置が一致する関係となる。そして、上述の第２の多重化位置関係は、第１の音声信号の開始位置に対して、他の全ての音声信号（第２の音声信号）の開始位置が所定時間遅延する関係となる。 In this case, the first multiplexing positional relationship described above is a relationship in which the starting positions of all the other audio signals (second audio signals) coincide with the starting position of the first audio signal. The second multiplexing position relationship described above is a relationship in which the start positions of all other sound signals (second sound signals) are delayed by a predetermined time with respect to the start position of the first sound signal.

このような音声多重化システム２００ｂは、保持する多数の音声信号の中から複数の音声信号を選択し、選択した音声信号の音声を、同時に出力することができる。また、音声多重化システム２００ｂは、複数の音声信号の多重化位置が異なる二種類の多重音声信号を生成し、これら二種類の多重音声信号の重畳位置関係を調整して出力する。したがって、音声多重化システム２００ｂは、任意の音声信号を聞き取り易くすることができる。 Such an audio multiplexing system 200b can select a plurality of audio signals from a large number of audio signals to be held, and can simultaneously output the audio of the selected audio signals. Also, the audio multiplexing system 200b generates two types of multiplexed audio signals having different multiplexing positions of a plurality of audio signals, and adjusts and outputs the superimposed positional relationship between these two types of multiplexed audio signals. Therefore, the audio multiplexing system 200b can make it easy to hear an arbitrary audio signal.

次に、本実施の形態に係る音声多重化装置１００ｂの動作について説明する。 Next, the operation of speech multiplexing apparatus 100b according to the present embodiment will be described.

図１２は、音声多重化装置１００ｂの動作の一例を示すフローチャートであり、実施の形態２の図６および図７に対応するものである。図６および図７と同一部分には同一ステップ番号を付し、これについての説明を省略する。なお、音声多重化装置１００ｂは、たとえば、音声信号の検索の開始が指示されるごとに、以下の図１２に示す処理を実行する。 FIG. 12 is a flowchart showing an example of the operation of speech multiplexing apparatus 100b, and corresponds to FIGS. 6 and 7 of the second embodiment. The same parts as those in FIGS. 6 and 7 are denoted by the same step numbers, and description thereof will be omitted. Note that the audio multiplexing apparatus 100b executes, for example, the following process shown in FIG. 12 each time an instruction to start searching for an audio signal is given.

まず、ステップＳ１０１１ｂにおいて、音声検索部１１３ｂは、音声保持部１１２ｂにおいて音声信号を検索する。 First, in step S1011b, the voice search unit 113b searches for a voice signal in the voice holding unit 112b.

そして、ステップＳ１０１２ｂにおいて、音声検索部１１３ｂは、検索された複数の音声信号を取得する。 In step S1012b, the voice search unit 113b acquires a plurality of searched voice signals.

そして、ステップＳ１０２０〜Ｓ１０５０において、音声多重化装置１００ｂは、実施の形態２と同様に、複数の音声信号から、第１の多重音声信号と、これとは多重化位置関係が異なる第２の多重音声信号とを生成する。 Then, in steps S1020 to S1050, as in the second embodiment, the audio multiplexing apparatus 100b performs the second multiplexing from the plurality of audio signals, the first multiplexed audio signal, and the multiplexing position relationship different from this. And an audio signal.

そして、ステップＳ２０２０〜２０７０において、音声多重化装置１００ｂは、実施の形態２の音声聴取装置４００と同様に、ユーザ操作に応じてその重畳位置関係を調整しつつ、重畳音声信号を生成して、音声出力装置５００へ出力する。 And in step S2020-2070, the audio | voice multiplexing apparatus 100b produces | generates a superimposition audio | voice signal, adjusting the superimposition positional relationship according to user operation similarly to the audio | voice listening apparatus 400 of Embodiment 2. Output to the audio output device 500.

例えば、時間調整部４２０は、予め定められた周期毎に、第２の多重音声信号に対する処理を行う。この周期が非常に短い場合、音声多重化装置１００ｂは、音声信号の再生の途中で、音を聞き取り易くする対象を切り替えることができる。 For example, the time adjustment unit 420 performs processing on the second multiplexed audio signal at predetermined intervals. When this period is very short, the audio multiplexing apparatus 100b can switch the target to make the sound easy to hear during the reproduction of the audio signal.

このような動作により、音声多重化装置１００ｂは、複数の音声信号の音声を、特定の音声を聞き取り易くした状態で出力し、その出力の最中に、聞き取り易くする対象を切り替えることができる。 With such an operation, the audio multiplexing apparatus 100b can output the audio of a plurality of audio signals in a state in which the specific audio is easily heard, and can switch the target to be easily heard during the output.

以上で、本実施の形態に係る音声多重化装置１００ｂの動作についての説明を終える。 This is the end of the description of the operation of speech multiplexing apparatus 100b according to the present embodiment.

このように、本実施の形態に係る音声多重化システム２００ｂは、検索結果が多い場合など、音声信号の数が多い場合であっても、ユーザに対して、複数の音声を同時に確認させることができ、所望の音声を探し出し易くすることができる。 As described above, the speech multiplexing system 200b according to the present embodiment allows the user to simultaneously confirm a plurality of speeches even when the number of speech signals is large, such as when there are many search results. The desired voice can be easily found.

また、音声多重化システム２００ｂは、実施の形態１の音声多重化装置１００と音声聴取装置４００とを一体化したので、これらの間の通信回路や個別の筐体などを不要とすることができる。すなわち、音声多重化システム２００ｂは、システム全体を簡素化することができる。 In addition, since the audio multiplexing system 200b integrates the audio multiplexing device 100 and the audio listening device 400 of the first embodiment, it is possible to eliminate the need for a communication circuit or individual housing between them. . That is, the audio multiplexing system 200b can simplify the entire system.

なお、音声多重化装置１００ｂは、実施の形態２のように、位相反転部１３２ａを有してもよい。この場合、実施の形態２と同様の効果を得ることができる。 Note that the audio multiplexing apparatus 100b may include the phase inversion unit 132a as in the second embodiment. In this case, the same effect as in the second embodiment can be obtained.

なお、以上説明した各実施の形態のうち、実施の形態２および実施の形態３では、音声信号を入力する装置として音声提供装置３００を備えた例について説明した。また、実施の形態４では、重畳音声信号を音声化する装置として音声出力装置５００を備えた例について説明した。しかし、本発明の適用は、これらに限定されない。 Of the above-described embodiments, Embodiment 2 and Embodiment 3 have described examples in which the audio providing apparatus 300 is provided as an apparatus for inputting an audio signal. In the fourth embodiment, the example in which the audio output device 500 is provided as an apparatus for converting the superimposed audio signal into sound has been described. However, the application of the present invention is not limited to these.

例えば、本発明にかかる音声多重化装置は、マイクロフォンなどの音声入力機能と、音声聴取装置の機能と、スピーカなどの音声出力機能とを備えた、ヘッドセットとすることができる。 For example, the audio multiplexing apparatus according to the present invention can be a headset having an audio input function such as a microphone, an audio listening apparatus function, and an audio output function such as a speaker.

また、第１の多重音声号と第２の多重音声号との重畳位置関係の切り替えの手法は、上記各実施の形態において、指定する音声番号をその番号の順番に従って切り替える手法としたが、これに限定されない。 In addition, the method of switching the superposition position relationship between the first multiplex speech signal and the second multiplex speech signal is a method of switching the designated speech number according to the order of the numbers in the above embodiments. It is not limited to.

本発明にかかる音声聴取装置は、例えば、数値入力やキースイッチの押下などにより音声番号の指定を受け付け、指定された音声番号の音声信号が聞き取り易くなるように、上記重畳位置関係を切り替えてもよい。 The voice listening device according to the present invention accepts designation of a voice number by, for example, inputting a numerical value or pressing a key switch, and switches the superposition position relationship so that the voice signal of the designated voice number can be easily heard. Good.

また、音声聴取装置は、例えば、各音声番号を配置した仮想軸上のポインタを移動させ、ポインタ位置がいずれかの音声番号の位置に一致したとき、その音声番号の音声信号が聞き取り易くなるように、上記重畳位置関係を切り替えてもよい。 Also, the voice listening device, for example, moves the pointer on the virtual axis on which each voice number is arranged, and when the pointer position matches the position of any voice number, the voice signal of that voice number can be easily heard. Alternatively, the superposition position relationship may be switched.

また、音声聴取装置は、例えば、第１の多重音声号の時間軸に対して第２の多重音声号の時間軸をスライドさせる操作を、ユーザから受け付けることにより、上記重畳位置関係を切り替えてもよい。 In addition, the voice listening device may switch the superposition position relationship by accepting, from the user, an operation of sliding the time axis of the second multiplexed voice signal with respect to the time axis of the first multiplexed voice signal, for example. Good.

また、本発明は、上述の音声チャットシステムやポータブルプレイヤー以外の各種システムおよび装置に適用することができる。例えば、本発明は、同時に複数のラジオ放送を受けて、所望のラジオ放送の音声を選択することができるラジオ受信機に適用してもよい。 Further, the present invention can be applied to various systems and devices other than the above-described voice chat system and portable player. For example, the present invention may be applied to a radio receiver that can simultaneously receive a plurality of radio broadcasts and select a desired radio broadcast sound.

本発明に係る音声多重化装置、音声聴取装置、および音声多重化方法は、処理負荷を抑えた状態で注目音を聞き取り易くすることができる音声多重化装置、音声聴取装置、および音声多重化方法として有用である。 An audio multiplexing apparatus, an audio listening apparatus, and an audio multiplexing method according to the present invention are an audio multiplexing apparatus, an audio listening apparatus, and an audio multiplexing method that make it easy to hear an attention sound while suppressing a processing load. Useful as.

１００、１００ａ、１００ｂ音声多重化装置
１１０、１１０ｂ音声入力部
１１１音声圧縮部
１１２ｂ音声保持部
１１３ｂ音声検索部
１２０第１の音声多重化部
１３０、１３０ａ第２の音声多重化部
１３１遅延処理部
１３２ａ位相反転部
１４０、４４０音声出力部
１４１多重音声送信部
２００、２００ａ、２００ｂ音声多重化システム
３００音声提供装置
４００音声聴取装置
４１０多重音声受信部
４２０時間調整部
４３０操作部
５００音声出力装置 100, 100a, 100b Audio multiplexer 110, 110b Audio input unit 111 Audio compression unit 112b Audio holding unit 113b Audio search unit 120 First audio multiplexing unit 130, 130a Second audio multiplexing unit 131 Delay processing unit 132a Phase inversion unit 140, 440 Audio output unit 141 Multiplex audio transmission unit 200, 200a, 200b Audio multiplexing system 300 Audio providing device 400 Audio listening device 410 Multiplex audio reception unit 420 Time adjustment unit 430 Operation unit 500 Audio output device

Claims

An audio input unit for inputting the first audio signal and the second audio signal;
A first audio multiplexing unit that generates a first multiplexed audio signal obtained by multiplexing the first audio signal and the second audio signal in a first multiplexing positional relationship;
A second multiplexed audio signal is generated which is obtained by multiplexing the first audio signal and the second audio signal with a second multiplexing positional relationship different from the first multiplexing positional relationship. Two audio multiplexing units;
An audio output unit that outputs the first multiplexed audio signal and the second multiplexed audio signal;
Audio multiplexer.

The second audio multiplexing unit includes:
A delay processing unit for delaying the second audio signal;
The speech multiplexing apparatus according to claim 1.

The first multiplexing positional relationship is a relationship in which an arbitrary position of the first audio signal matches the position of the second audio signal input at the same timing as the position,
The second multiplexed positional relationship is a relationship in which the position of the second audio signal input at the same timing as the position is delayed by a predetermined time with respect to an arbitrary position of the first audio signal. ,
The speech multiplexing apparatus according to claim 1.

The first multiplexing positional relationship is a relationship in which the start position of the first audio signal and the start position of the second audio signal match.
The second multiplexed positional relationship is a relationship in which the start position of the second audio signal is delayed by a predetermined time with respect to the start position of the first audio signal.
The speech multiplexing apparatus according to claim 1.

A phase inverting unit for inverting the phases of the first audio signal and the second audio signal, respectively, included in one of the first multiplexed audio signal and the second multiplexed audio signal;
The speech multiplexing apparatus according to claim 1.

The audio output unit
The first multiplexed audio signal and the second multiplexed audio are output to an audio listening device that superimposes and outputs the first multiplexed audio signal and the second multiplexed audio signal in a predetermined superimposable positional relationship. Send signal,
The speech multiplexing apparatus according to claim 1.

A time adjustment unit that generates a superimposed audio signal obtained by superimposing the first multiplexed audio signal and the second multiplexed audio signal in a predetermined superposition position that can be adjusted;
Based on a user operation, at least a first superimposed positional relationship in which an arbitrary position of the first audio signal matches between the first multiplexed audio signal and the second multiplexed audio signal; An operation of switching the predetermined superposition position relationship between a second superposition position relationship in which an arbitrary position of the second sound signal is coincident between the first multiple sound signal and the second multiple sound signal And further comprising
The audio output unit
Outputting the superimposed audio signal to an audio output device;
The speech multiplexing apparatus according to claim 1.

A multiplex sound receiving unit that acquires the first multiplex sound signal and the second multiplex sound signal from the sound multiplex device according to claim 1;
A time adjustment unit that generates a superimposed audio signal obtained by superimposing the first multiplexed audio signal and the second multiplexed audio signal in a predetermined superposition position that can be adjusted;
Based on a user operation, a first superimposed positional relationship in which arbitrary positions of the first audio signal included in each of the first multiplexed audio signal and the second multiplexed audio signal match, and the first Switching the predetermined superposition position relationship between the second superposition position relationship in which the arbitrary positions of the second sound signal included in each of the multiple sound signals of the second and the second multiple sound signals are the same And
An audio output unit that outputs the superimposed audio signal to an audio output device;
Voice listening device.

Inputting a first audio signal and a second audio signal;
Multiplexing the first audio signal and the second audio signal in a first multiplexing position relationship to generate a first multiplexed audio signal;
Multiplexing the first audio signal and the second audio signal in a second multiplexing position relationship different from the first multiplexing position relationship to generate a second multiplexed sound signal;
Outputting the first multiplexed audio signal and the second multiplexed audio signal,
Audio multiplexing method.