JP2007187748A

JP2007187748A - Sound selective processing device

Info

Publication number: JP2007187748A
Application number: JP2006004011A
Authority: JP
Inventors: Satoru Suzuki; 哲鈴木; Shinichi Yoshizawa; 伸一芳澤; Yoshihisa Nakato; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2006-01-11
Filing date: 2006-01-11
Publication date: 2007-07-26

Abstract

PROBLEM TO BE SOLVED: To estimate and remove displeasure sound causing irritation out of mixed sounds generated asynchronously in a real environment. SOLUTION: A sound selective processing device includes a sound acquisition part, a sound structure characteristic extraction part, a sound separation part, a displeasure detection part, a candidate sound selection determination part, a candidate sound presentation specifying part, a processing sound structure characteristic updating part, and a sound processing part. When detecting displeasure, displeasure sound being an irritating cause is estimated, and the number of presenting sounds is reduced as much as possible to determine an object sound after confirming it. By the composition, the displeasure sound being the irritating cause out of the mixed sounds is estimated and the displeasure is reduced quickly by processing to specify the sound presented by a user out of the candidate sound. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザの周囲で非同期に発生している様々な音から構成される混合音から、特定の音を選択し、その音を加工することによって、ユーザにとって重要な聞きたい音は残しつつ、イライラする原因となる不快音を聞こえにくくする音選択加工装置に関する。 The present invention selects a specific sound from a mixed sound composed of various sounds generated asynchronously around the user, and processes the sound while leaving a sound that is important for the user to be heard. The present invention relates to a sound selection processing device that makes it difficult to hear unpleasant sounds that cause frustration.

我々は、様々な音の中で生活している。その雑多な音のうち、多くは、自らが発生源もしくは知り合いや友人が発生させている音でない限り、自ら音を抑えたリ、動作を止めて音を消すことができない性質のものである。たとえば、見知らぬ人が大きな声で話す声、公共交通機関の音などがそれに該当する。人は、このような制御できない音に対して、イライラして、不快に感じることがある。 We live in various sounds. Many of the miscellaneous sounds are of a nature that cannot be silenced by stopping the sound by itself, unless the sound is generated by the source or acquaintance or friend. For example, a stranger speaks loudly, or sounds of public transportation. People can feel frustrated and uncomfortable with such uncontrollable sounds.

図１８は、電車内で発生する音を模擬したシーンを描いた図である。電車内では、電車の走行に伴う、エンジン（モータ）音や、踏み切りの音、電車が線路をまたぐ音などの走行音や、乗客の会話や携帯電話の操作音、車掌のアナウンスなどさまざまな音が非同期に発生している。 FIG. 18 is a diagram depicting a scene simulating sound generated in a train. Inside the train, various sounds such as engine (motor) sound, crossing sound, traveling sound such as train crossing track, passenger conversation, mobile phone operation sound, conductor announcement, etc. Is occurring asynchronously.

このシーン例を用いて、人によって、聞きたい音、聞きたくない音が、立場が変わると異なるという状況例を説明する。 Using this example scene, a situation example will be described in which the sound that the person wants to hear and the sound that he / she does not want to hear differ depending on the person's position.

人Ａは電車内で本を集中して読んでいた。そこへさきほど停車した駅で新たに乗車してきた、人Ｂと人Ｃが、人Ａの真ん前に立ち大声で話し始めた。その結果、人Ａは、あまりの声の大きさのため本に集中できず、イライラし始める。また、主に聞き手役である人Ｃは、乗車するまでの駅構内での会話と違い、電車内では人Ｂの声が聞こえにくくなってしまい、会話の流れを掴むのに支障がおき困っている。一方、この路線には初めて乗車した人Ｄは、次の停車駅名を聞き取ろうと、車内アナウンスに集中している。しかし、人Ｄも、電車音、さらに人Ｂと人Ｃとの会話音のために、アナウンスがよく聞き取れず、次の駅が自分が降車する駅なのか判断できずに困っている。また、いつ駅に近づいて次のアナウンスが始まるのか判断できずに落ち着かない状態でもある。そんな人Ｄも携帯電話を操作してメイルを書いているのだが、その操作音に対して、人Ｂは気になって落ち着かないと感じている。 Person A was reading books intensively on the train. People B and C, who had newly boarded at the station where they stopped, stood in front of person A and started talking loudly. As a result, person A cannot concentrate on the book because of the loudness of his voice, and begins to get frustrated. Also, the person C, who is mainly the listener, is unlikely to hear the voice of the person B on the train, unlike the conversation in the station until the boarding, and has difficulty in grasping the flow of the conversation. Yes. On the other hand, the person D who gets on this route for the first time concentrates on the announcement in the car to hear the name of the next stop. However, the person D is also in trouble because the train sound and the conversation sound between the person B and the person C cannot hear the announcement well and cannot judge whether the next station is the station where he / she gets off. In addition, it is in a state of restlessness because it is not possible to judge when the next announcement will start when approaching the station. Such a person D is also writing a mail by operating a mobile phone, but the person B feels anxious about the operation sound.

このシーンから、人をイライラさせる原因となる不快音は、さまざまであることが見出せる。 From this scene, it can be found that there are various unpleasant sounds that cause human frustration.

たとえば、人Ｂのように、電車内では、友人と話す会話音が大きくても、意外と、自らはあまり気にならないものである。それは、友人の声が聞きたい音の対象であり、電車の走行に伴う音によって話が聞こえにくくなると、余計に大きな声を出してしまうことがあるためである。一方、別の立場にいる人Ａや人Ｄにとっては、電車の音自体はそれほど気にならない音であり、人Ｂの大きな声の会話自体がうるさいと感じてしまうことがある。 For example, like a person B, in a train, even if the conversational sound spoken with a friend is loud, it is surprising that it does not bother itself. This is because a friend's voice is the target of the sound he / she wants to hear, and if the sound becomes difficult to hear due to the sound of traveling on the train, an excessively loud voice may be produced. On the other hand, for people A and D who are in different positions, the sound of the train itself is not so much anxious, and the conversation of the loud voice of the person B may feel noisy.

図１９（ａ）は混合音から抽出された各分離音を示す図であり、（ｂ）は、図１８における、アナウンス、電車音、携帯電話操作音、人Ｂの声といった各音に対して、各人がそれぞれの主観によって、聞きたい音、どちらでもない音、聞きたくない音という主観を持っているかをまとめた図である。図に示したように、人Ａと人Ｄは、聞きたくない音つまりイライラさせられる音が共通であるのだが、人Ａは、聞きたい音があるわけではなく、人Ｄはアナウンスが聞きたいという違いがある。このことから、音を全部消すという方策だけでは十分ではないことがわかる。 FIG. 19A is a diagram showing each separated sound extracted from the mixed sound, and FIG. 19B is a diagram showing each sound such as the announcement, train sound, mobile phone operation sound, and person B's voice in FIG. It is the figure which summarized whether each person has the subjectivity of the sound which he wants to hear, the sound which is neither, and the sound which he does not want to hear by each subjectivity. As shown in the figure, people A and D share the same sound that they do not want to hear, that is, the sound that makes them frustrated, but person A does not have the sound he wants to hear, and person D wants to hear the announcement There is a difference. From this, it can be seen that the strategy of muting all sounds is not enough.

人Ｂと人Ｃは、聞きたい音が人Ｂの声、携帯電話の音が聞きたくない点では共通しているものの、人Ｃは、人Ｂの声以外の音は、いずれも聞きたくない状態である。このことから、聞きたくない音だけを消すという方策だけでも不十分であることがわかる。 Person B and person C have the same sound that they want to hear, but they do not want to hear the voice of person B or the mobile phone, but person C does not want to hear any sound other than person B's voice. State. From this, it can be seen that it is not enough to just eliminate the sounds you do not want to hear.

したがって、同じ時間と場所において同じ音環境を共有していても、その場と、その人のコンテキストによって、聞きたい音、聞きたくない音、どちらでもない音、といったその場の音環境に存在する各音に対する主観的感覚が異なってくることが不快音の要因であると言える。 Therefore, even if the same sound environment is shared at the same time and place, depending on the place and the person's context, it exists in the sound environment of the place, such as the sound that you want to hear, the sound that you do not want to hear, or the sound that is neither It can be said that the subjective sensation for each sound is a factor of unpleasant sound.

さらに、人の聴覚機構の複雑さも、音にイライラする感覚に、影響を与えていると考えられる。人には、様々な音が鳴っている混合音の中から聴きたい音に注目して、音を聞くことができる、所謂カクテルパーティー効果と呼ばれる、音を取捨選択する機能が備わっている。また、人Ａのように電車内で本を読んでいるときに、先ほどまで聞こえていたはずのガタンゴトンという電車が線路の継ぎ目をまたぐ音やモータ音がいつの間にか聞こえなくなっていたなどというように、他のことに集中していると特に定常的なあまり変動しない音を選択的に聞こえなくするという現象である。したがって、一度イライラした音だと決めても、別の機会に同じようにイライラするかどうかは、分からないということになる。 In addition, the complexity of the human auditory mechanism is thought to have an impact on the irritating sensation of sound. A person has a function of selecting a sound called a so-called cocktail party effect, in which a person can listen to a sound by paying attention to the sound to be heard from a mixed sound in which various sounds are produced. In addition, when reading a book on the train like person A, the train called Gatangoton, which was supposed to be heard earlier, couldn't hear the sound that straddled the seam of the track or the motor sound. This is a phenomenon in which a steady and rarely changing sound is selectively made inaudible when it concentrates on other things. Therefore, even if you decide that the sound is frustrating once, you will not know if you will be frustrated at the same time.

このような自分では制御できない音のために生じた不快感を低減させるために、たとえば、耳栓をして周囲の音をシャットアウトするように試みたり、ヘッドホンで音楽を聴くことによって、特定の音環境を生成したりすることも一般的に行なわれている。 In order to reduce the discomfort caused by such uncontrollable sounds, for example, try to shut out the surrounding sounds with earplugs or listen to music with headphones. It is also common to create a sound environment.

さらには、音楽など聞く際に妨害音となる周囲音を消すヘッドホンもある。例えば、特許文献１では、エンジン回転や車両の走行に伴って車室内に発生する不快な騒音に対し、逆位相かつ等振幅の信号を干渉させることでこの騒音を低減する能動型騒音低減装置が開示されている。特にこの発明では、騒音の低減誤差を検出するマイクロフォンからの出力信号の異常により正常な騒音低減動作が行われないことに起因する、異常音の発生を防止している。 In addition, there is a headphone that eliminates ambient sounds that are disturbing sounds when listening to music and the like. For example, Patent Literature 1 discloses an active noise reduction device that reduces noise by causing signals having opposite phases and equal amplitudes to interfere with unpleasant noise generated in the passenger compartment as the engine rotates and the vehicle travels. It is disclosed. In particular, according to the present invention, the generation of abnormal noise due to the fact that normal noise reduction operation is not performed due to an abnormality in the output signal from the microphone that detects the noise reduction error is prevented.

しかし、このような対処策では、ヘッドフォンから出力される音楽等の音以外の周囲の音がほとんど聞こえなくなる。そのため、周囲音から受ける不快感を低減することはできるものの、周囲に聞きたい音が発生した場合に聞き漏らしてしまう可能性がある。従って、この手法は、聞きたい音や残したい音が定まっている場合にのみ有効な手法であって、図１８の状況のように、聞きたい音や不快音が次々と変わってゆく場合には、対処できないという課題があった。 However, with such a countermeasure, it is almost impossible to hear surrounding sounds other than music output from headphones. Therefore, although discomfort received from ambient sounds can be reduced, there is a possibility of being missed when a sound desired to be heard is generated in the surroundings. Therefore, this method is effective only when the sound that you want to hear or the sound that you want to leave is determined, and when the sound you want to hear or the unpleasant sound changes one after another as in the situation of FIG. , There was a problem that can not be dealt with.

そのため、上記のようなイライラする原因となる不快音のみを消したいという要望を実現するためには、混合音を音源ごとの音に分離して、その中から所望の音を選択し、加工する技術が必要である。図１９（ａ）は、実環境での混合音を、音分離技術を用いて、ｃｈ（チャネル）１からｃｈ３までの３つの音源ごとの音に分解したことを示す模式図である。このように分離した音のうち、どれがイライラする原因となる不快音かを特定する技術が必要となる。 Therefore, in order to realize the desire to eliminate only the unpleasant sound that causes the frustration as described above, the mixed sound is separated into sounds for each sound source, and a desired sound is selected and processed. Technology is needed. FIG. 19A is a schematic diagram showing that the mixed sound in the real environment is decomposed into sounds for each of the three sound sources from ch (channel) 1 to ch 3 using sound separation technology. A technique for identifying which of these separated sounds is an unpleasant sound that causes frustration is required.

従来の音分離技術には、複数の指向性マイク、アレイマイクやＩＣＡやビームフォーミングなどの技術により、推定した音源の方向性に基づいて、音を分離する方法や、周波数解析的な手法として、時間周波数領域の分解能を同時に独立に詳細に挙げる事で、時間周波数領域の音が重なりを減らすことで、対象とする音特徴を抽出できる音分析技術を用いた分離方法もある。また、Ｂｅｒｇｍａｎが提唱したヒトの聴覚機構に基づいた聴覚情景分析方法も提案されている。 The conventional sound separation technology includes a plurality of directional microphones, array microphones, ICA, beam forming, and other techniques for separating sounds based on the estimated sound source direction, as a frequency analysis method, There is also a separation method using a sound analysis technique that can extract the target sound feature by reducing the overlapping of the sound in the time frequency domain by simultaneously and in detail detailing the resolution in the time frequency domain. Also, an auditory scene analysis method based on the human auditory mechanism proposed by Bergman has been proposed.

一方、音を選択する従来技術には、モニタ内で発生する音源位置と、モニタ上に向けた視線方向とが一致する音を選択する手法（例えば、特許文献２）や、音源の重要度をもとに音を選択する手法（例えば、特許文献３）、さらに、現在地や状況に応じて利用者に伝えるべき音声合成内容を変更することで音声対話を行なう手法（例えば、特許文献４）が知られている。なお特許文献２から４に記載の方法はいずれも、入力される音源が音源方向や現在地や音源の重要度によって固定されているため、固定されていない音源からの音を分離する技術は含まれていない。
特開２００４−３５２０７０号公報特開平９−０９０９６３号公報特開平９−２７５５３３号公報特開平１１−３５１９０１号公報 On the other hand, in the prior art for selecting a sound, a method of selecting a sound in which the position of the sound source generated in the monitor and the line-of-sight direction on the monitor match (for example, Patent Document 2), the importance of the sound source is set. There is a method for selecting a sound based on the method (for example, Patent Document 3), and a method for performing a voice conversation by changing the content of speech synthesis to be transmitted to the user according to the current location and situation (for example, Patent Document 4). Are known. Note that any of the methods described in Patent Documents 2 to 4 includes a technique for separating sound from an unfixed sound source because the input sound source is fixed depending on the sound source direction, the current location, and the importance of the sound source. Not.
JP 2004-352070 A JP-A-9-090963 Japanese Patent Laid-Open No. 9-275533 JP 11-351901 A

図２０（ａ）〜（ｄ）は、音によりイライラさせられる状況が生まれる主な要因を示す図である。音によりイライラさせられる状況が生まれる要因には、図２０に示すように、次の代表的な４つの要因が考えられる。この図では、周囲環境で発生する音が、ａ）聞きたい音、ｂ）どちらでもない音、ｃ）聞きたくない音、という利用者の主観によって分類されるとき、各音が発生する時間的関係により主観が影響を受けて変化し、その結果、単独では不快感を感じない音がｄ）不快感を感じさせられる音に変化することを示す。ここでは、各要因について図２０（ａ）−（ｄ）を用いて説明する。 20 (a) to 20 (d) are diagrams illustrating main factors that cause a situation in which the user is frustrated by sound. As shown in FIG. 20, the following four typical factors are conceivable as factors that create a situation in which the sound is frustrated. In this figure, when the sounds generated in the surrounding environment are classified according to the user's subjectivity: a) the sound that they want to hear, b) the sound that they do not want to hear, and c) the sound that they do not want to hear. It shows that subjectivity changes due to the relationship, and as a result, a sound that does not feel uncomfortable by itself changes to d) a sound that makes you feel uncomfortable. Here, each factor is demonstrated using FIG. 20 (a)-(d).

図２０（ａ）では、要因１として、聞きたくない音が鳴っているためイライラする場合を示している。ここでは、ｃ）聞きたくない音が単独で鳴っているため，ｄ）不快感が生じ、その後に、ａ）聞きたい音、ｂ）どちらでもない音が発生し重なりが生じている。ｄ）不快感は聞きたくない音が鳴っている区間と一致している。 In FIG. 20A, as factor 1, a case where the user is frustrated because a sound that he does not want to hear is sounding is shown. Here, since c) the sound that the user does not want to hear is sounding alone, d) discomfort occurs, and then, a) the sound that the user wants to hear and b) the sound that is neither is generated and overlapped. d) The discomfort is consistent with the section where the sound that you do not want to hear is sounding.

図２０（ｂ）では、要因２として、単独で聞こえている場合はイライラしなかったが、聞きたい音に重なってイライラする場合を示している。ここでは、単独では、ｂ）どちらでもない音が鳴っていたところに、ａ）聞きたい音が発生したために、聞きたい音が聞こえにくい状態になった。このように、音の重なり区間が一時的な、ｃ）聞きたくない音（点線）に相当するために、不快感が発生したことを示している。 In FIG. 20 (b), as factor 2, there is no frustration when listening alone, but there is a frustration overlying the sound you want to hear. Here, by itself, b) a sound that was neither sounded, but a) a desired sound was generated, so that it became difficult to hear the desired sound. As described above, since the sound overlapping section corresponds to a temporary sound c) that is not desired to be heard (dotted line), it indicates that discomfort has occurred.

図２０（ｃ）では、要因３として、複数の音が同時に発生し、音の取捨選択を利用者が自ら行なうことができないのでイライラする場合を示している。この例では、ａ）聞きたい音が３つほぼ同時に発生して、さらにｂ）どちらでもない音に重なり、どれを聞いてよいのか判断することができないため、この重なり区間が一時的なｃ）聞きたくない音に相当するために、不快感が発生したことを示している。 FIG. 20C shows a case where a plurality of sounds are generated at the same time as factor 3 and the user cannot perform sound selection by themselves, which is frustrating. In this example, a) three sounds to be heard occur almost simultaneously, and b) overlaps with none of the sounds, and it is not possible to determine which one to listen to. It indicates that discomfort has occurred because it corresponds to a sound that you do not want to hear.

図２０（ｄ）では、要因４として、聞きたい情報は既に聞き取れたのに、その音がいつまでもなりつづけているのでイライラする場合を示している。この例では、ａ）聞きたい音が、ｂ）どちらでもない音がなり終わった後にも、引き続き鳴りつづけているため、聞きたい部分の後半部分が一時的に聞きたくない音に相当してイライラを感じていることを示している。 FIG. 20D shows a case where the information to be heard has already been heard as factor 4, but the sound continues to be irritated, and is frustrating. In this example, a) the sound that you want to hear continues to sound after the sound that b) is neither, so the latter half of the part you want to hear is equivalent to the sound you do not want to hear temporarily. It shows that you feel.

上記の要因の分析を行なうと、要因１では、単純に利用者個人にとって常に聞きたくない音が存在することであるのに対して、要因２から４では、混合音から分離される分離音と他の分離音との重なりや、他の分離音との関係によって、聞きたい音やどちらでもなかった音が一時的に聞きたくない音に変化して、不快感が発生することが分かる。 When the above factors are analyzed, the factor 1 simply means that there is a sound that the individual user does not always want to hear, whereas the factors 2 to 4 indicate the separated sound separated from the mixed sound. It can be seen that due to the overlap with other separated sounds and the relationship with other separated sounds, the sound that is desired to be heard or the sound that was neither is temporarily changed to a sound that is not desired to be heard, resulting in unpleasant feeling.

つまり、従来の音の重要度による方法、予め除去対象を決めてその音特性を学習する方法、場所や時間に応じて音を選択する方法を用いて、要因１では、常に利用者にとって聞きたくない分離音が学習できさえすれば、混合音中から聞きたくない音を分離音の中に含まれるかどうか検出分離しさえすればよい。しかし、要因２から４では、聞きたくない音を事前に学習しても、分離音の関係によって主観が変わるために、必ずしも現在地や音源の重要度から特定できるとはいえないという課題がある。 That is, using the conventional method based on the importance of sound, the method of determining the removal target in advance and learning the sound characteristics, and the method of selecting the sound according to the place and time, factor 1 always wants the user to hear As long as the separated sound can be learned, it is only necessary to detect and separate whether the sound that is not desired to be heard from the mixed sound is included in the separated sound. However, in Factors 2 to 4, there is a problem that even if a sound that the user does not want to hear is learned in advance, the subjectivity changes depending on the relationship of the separated sound, so that it cannot always be identified from the importance of the current location and the sound source.

また、視線方向と一致する音源位置の音を選択する手法では、同じ方向から音が発生している場合や拡散音場のためそもそも音源の位置が精度よく求められないことがある。このような際には、イライラする音そのものを特定できないという課題がある。 In addition, in the method of selecting the sound at the sound source position that matches the line-of-sight direction, the sound source position may not be accurately obtained in the first place when the sound is generated from the same direction or because of the diffuse sound field. In such a case, there is a problem that the frustrating sound itself cannot be specified.

そこで本発明は、前記従来の課題を解決し、音による不快感をすばやく低減させるために、周囲で非同期に発生している音から構成される様々な音の中からイライラの原因となる不快音を推定した上でユーザに提示することにより音を特定し、その音を加工することによって、重要な聞きたい音は残しつつ、イライラする原因となる不快音を聞こえにくくするための音選択加工装置を提供することを目的としている。 Therefore, the present invention solves the above-described conventional problems, and in order to quickly reduce discomfort due to sound, unpleasant sound that causes frustration from among various sounds that are generated asynchronously in the surroundings A sound selection processing device that makes it difficult to hear unpleasant sounds that cause annoyance while preserving important sounds to be identified while processing the sounds by identifying the sounds by estimating them and processing the sounds The purpose is to provide.

前記従来の課題を解決するために、ユーザの音に対する不快感の検知は、ユーザが音に対してイライラした場合にボタンを押すことにより行うことで、イライラする原因が音以外のことに起因するかどうかを判断する必要がない。 In order to solve the conventional problem, detection of discomfort to the user's sound is performed by pressing a button when the user is frustrated with the sound, and the cause of frustration is due to something other than sound. There is no need to determine whether or not.

本構成によって、混合音の中からイライラする原因となる不快音として、推定・提示された音を候補音の中から、すばやく特定された音による不快感を低減させることができる。 With this configuration, it is possible to reduce the discomfort caused by the sound quickly identified from the candidate sounds as the undesired sounds that cause irritation from the mixed sounds.

前記構成に加えて、加工基準入力部と探索対象音構造特徴ＤＢを有した別の構成をもつ本発明の音選択加工装置は、探索対象音の音特徴構造をもとに、音分離を行ない、特定された不快音に対して、加工条件を入力することに自在に加工した音を聞くことができる。 In addition to the above-described configuration, the sound selection processing device of the present invention having another configuration having the processing reference input unit and the search target sound structure feature DB performs sound separation based on the sound feature structure of the search target sound. For the specified unpleasant sound, it is possible to listen to the sound processed freely by inputting the processing conditions.

本構成によって、混合音の中から、時間や場所に応じて決まった音が発生する場合にその音構造特徴を抽出しておくことで、それ以外の変動しやすく、不快感と感じる音をより精度高く抽出することができる。このとき、さらに、他の分離音の構成から表現される音環境と同じかどうか判断することにより、学習されたときの音構造から継続的にあるいは連続的に判断することができるようになる。 With this configuration, when a sound that is determined according to time and place is generated from the mixed sound, the sound structure characteristics are extracted, so that other sounds that are more likely to fluctuate and feel uncomfortable can be obtained. It can be extracted with high accuracy. At this time, by determining whether or not the sound environment is the same as that represented by other separated sound configurations, it is possible to determine continuously or continuously from the sound structure when learned.

さらに、前記構成に加えて、他の音選択加工装置との間で、不快音の情報を送受信する。 Furthermore, in addition to the above configuration, information on unpleasant sounds is transmitted to and received from other sound selection processing devices.

本構成によって、自分だけでなく相手の推定不快音の情報をもとに、音を選択加工することで、より精度よく不快音を除去することができる。 With this configuration, it is possible to more accurately remove unpleasant sounds by selecting and processing the sound based on information on the estimated unpleasant sounds of not only the user but also the other party.

なお、本発明は、このような音選択加工装置として実現することができるだけでなく、このような音選択加工装置が備える特徴的な手段をステップとする音選択加工方法として実現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのは言うまでもない。 Note that the present invention can be realized not only as such a sound selection processing apparatus, but also as a sound selection processing method including steps characteristic of the sound selection processing apparatus. It can also be realized as a program for causing a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

本発明の音選択加工装置によれば、混合音の中からイライラする原因となる不快音をすばやく特定し不快感の原因となる音を聞こえにくいように音を加工することができるため、音に対する不快感やストレスを低減し、さらには自己管理しやすくすることができる。 According to the sound selection processing device of the present invention, it is possible to quickly identify an unpleasant sound that causes irritation from the mixed sound and process the sound so that it is difficult to hear the sound causing the unpleasant feeling. Discomfort and stress can be reduced, and self-management can be facilitated.

また、本発明の音選択加工装置を複数用いて、不快音に関する情報を通信しあうことにより、通信間双方共通に不快音を消すための音構造特徴を共有することができるため、相手側の不快感をすばやく抑えることができる。 In addition, by using a plurality of sound selection processing devices of the present invention and communicating information related to unpleasant sounds, it is possible to share the sound structure characteristics for canceling unpleasant sounds in common between the communications, Discomfort can be quickly suppressed.

以下本発明の実施の形態について、図面を参照しながら説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（実施の形態１）
図１は、本実施の形態である音選択加工装置１００の構成を示すブロック図である。音選択加工装置１００は、利用者の周囲で発生する混合音から、利用者が不快に感じる音を選択的に取り除く音選択加工装置であって、音取得部１０１、音構造特徴抽出部１０２、音分離部１０３、音加工部１０４、候補音選択決定部１０５、候補音提示特定部１０６、加工音構造特徴更新部１０７、加工音構造特徴ＤＢ（データベース）１０８および不快感検知部１３１を備える。候補音選択決定部１０５は、内部に音構造特徴比較部１３３およびマスカー分離音推定部１３４を備える。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a sound selection processing apparatus 100 according to the present embodiment. The sound selection processing device 100 is a sound selection processing device that selectively removes a sound that the user feels uncomfortable from the mixed sound generated around the user, and includes a sound acquisition unit 101, a sound structure feature extraction unit 102, A sound separation unit 103, a sound processing unit 104, a candidate sound selection determination unit 105, a candidate sound presentation specifying unit 106, a processed sound structure feature update unit 107, a processed sound structure feature DB (database) 108, and a discomfort detection unit 131 are provided. The candidate sound selection determination unit 105 includes a sound structure feature comparison unit 133 and a masker separation sound estimation unit 134 therein.

音選択加工装置１００は、不快感検知部１３１においてユーザの不快感を検知すると、イライラする原因となる不快音の候補を推定し、ユーザに提示する候補の数をできるだけ減らして提示する。ユーザは提示された対象音の候補を確認の上で、候補の中から加工すべき音を選択し決定する。 When the discomfort detection unit 131 detects a user's discomfort, the sound selection processing device 100 estimates a candidate for an unpleasant sound that causes frustration, and presents the candidate with a reduced number of candidates as much as possible. The user confirms the presented target sound candidates and then selects and determines a sound to be processed from the candidates.

この図１ように、混合音を音源ごとの音に分離して加工する分離音加工時の構成と、加工対象候補となる分離音をユーザに提示し、ユーザの選択を受け付ける分離音選択時の構成とに大きく２分できるので、流れ図では、これにそって、それぞれに該当する処理の流れを、図２、図３、図４、図５及び図６を用いて説明する。 As shown in FIG. 1, the separated sound processing configuration for separating and processing the mixed sound into sounds for each sound source and the separated sound that is the candidate for processing are presented to the user, and the separated sound is selected when the user's selection is accepted. Since it can be roughly divided into two parts according to the configuration, in the flowchart, the processing flow corresponding to each will be described with reference to FIG. 2, FIG. 3, FIG. 4, FIG.

まず、分離音加工時の処理の流れについて説明する。
音取得部１０１は、マイクを通じて入力される、利用者の周囲で発生する混合音をＡ／Ｄ変換することにより音を取得する。音取得部１０１として複数のマイクを用いて、方向に基づいて加工した音を取得してもよいし、取得した後データをハードディスクに一旦記録された音から取得してもよい。また、携帯電話などの受信部の音として捉えても構わない。 First, the flow of processing at the time of separated sound processing will be described.
The sound acquisition unit 101 acquires sound by performing A / D conversion on the mixed sound generated around the user, which is input through the microphone. A plurality of microphones may be used as the sound acquisition unit 101 to acquire the processed sound based on the direction, or the acquired data may be acquired from the sound once recorded on the hard disk. Moreover, you may catch as a sound of receiving parts, such as a mobile telephone.

音構造特徴抽出部１０２は、音構造特徴として、時間周波数情報を表現するため、フーリエ変換を用いて、音の特徴を捕らえる。なお、ＦＦＴ（高速フーリエ変換：Fast Fourier Transform）に替えて、ＤＣＴ（離散コサイン変換：Discrete Cosine Transform）などを用いてもよい。 The sound structure feature extraction unit 102 captures sound features using Fourier transform in order to express time-frequency information as sound structure features. Instead of FFT (Fast Fourier Transform), DCT (Discrete Cosine Transform) or the like may be used.

音分離部１０３は、混合音を、音源ごとの音に分離する音分離手段に相当し、音分離部１０３では、従来の技術に示したように、音構造特徴により混合音を分離音のストリームに分解する。この実施の形態で用いる音分離手法には、複数の音源の音から構成される混合音から、音の特徴の連続性を評価することにより、混合音から複数の分離音を抽出し分解する手法（例えば、Ｎａｋａｔａｎｉ，Ｔ．ＣｏｍｐｕｔａｔｉｏｎａｌＡｕｄｉｔｏｒｙＳｃｅｎｅＡｎａｌｙｓｉｓｂａｓｅｄｏｎＲｅｓｉｄｕｅ− ＤｒｉｖｅｎＡｒｃｈｉｔｅｃｔｕｒｅａｎｄＩｔｓＡｐｐｌｉｃａｔｉｏｎｔｏＭｉｘｅｄＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ，ｈｉｓＤｉｓｓｅｒｔａｔｉｏｎ．）を想定している。この手法は、音の開始終了をグループとしてまとめ、ピークの立ち上がり、立下りが同期しているピーク群として捉え、その遷移を追跡することにより、一つの音のストリームを抽出する。 The sound separation unit 103 corresponds to a sound separation unit that separates the mixed sound into sounds for each sound source. The sound separation unit 103 divides the mixed sound into a stream of separated sounds according to sound structure characteristics as shown in the related art. Disassembled into The sound separation method used in this embodiment is a method for extracting and decomposing a plurality of separated sounds from a mixed sound by evaluating the continuity of the sound characteristics from the mixed sound composed of the sounds of a plurality of sound sources. (For example, Nakatani, T. Computational Audit Scene Analysis based on Residue-Driving Architecture and Its Application to Mixed Spec Recognition. In this method, the start and end of a sound are grouped together, regarded as a peak group in which the rise and fall of the peak are synchronized, and a transition of the sound is tracked to extract one sound stream.

音加工部１０４は、特定された前記分離音を加工して、混合音を再構成する音加工手段に相当し、音加工部１０４では、音選択時の処理の流れ（後述）に従って加工音構造特徴ＤＢ１０８に保存されている、聞きたくない音の音構造特徴を有する音を加工し、特許文献１等に記載されている手法に基づいて、特定した分離音の逆相信号などを生成することによって、特定した分離音を聞こえないように加工する。 The sound processing unit 104 corresponds to sound processing means for processing the specified separated sound to reconstruct the mixed sound. In the sound processing unit 104, the processed sound structure is processed in accordance with a processing flow (described later) at the time of sound selection. Processing a sound having a sound structure characteristic of a sound that is not desired to be stored, which is stored in the feature DB 108, and generating a reverse phase signal or the like of the identified separated sound based on the method described in Patent Document 1 or the like To prevent the specified separated sound from being heard.

図２（ａ）は、本実施の形態１における分離音加工時の処理の流れ図である。この図を用いて分離音加工時の処理を説明する。 FIG. 2A is a flowchart of the processing during the separated sound processing in the first embodiment. The process at the time of separation sound processing is demonstrated using this figure.

Ｓ７０１において、音選択加工装置１００は、処理対象区間の設定を行ない、混合音の処理対象区間分の音構造特徴を準備する。この処理対象区間とは、音取得部１０１が混合音を取得して、音分離部１０３で混合音を分離し、音加工部１０４で混合音を再構成するまでの時間を言う。音選択加工装置１００は、例えば、３秒間の処理対象区間を設定し、その３秒間における混合音の音構造特徴を抽出する。処理対象区間の長さは、音選択加工装置１００の出荷時に、あらかじめ定められているとしてもよい。次いで、Ｓ７０２において音分離部１０３は、分離音の抽出を行なう。この実施の形態では、後述の図２（ｂ）に記載の方法を用いる。分離音の抽出処理が行なわれた後、Ｓ７０３において、分離可能な音ストリームが得られたかどうかを判断する。たとえば、混合音が結局１つのストリームにしかならない、つまり暗騒音しか得られなかった場合などが考えられる。このような場合には、Ｓ７０４において、混合音をそのまま出力し、分離音がある場合には、Ｓ７０５において既に特定された加工対象音を加工音構造特徴ＤＢ１０８から読み出して、聞こえにくくするように、例えば、その加工対象音の逆相を与えて、音の加工を行なう。 In S 701, the sound selection processing device 100 sets a processing target section, and prepares sound structure characteristics for the processing target section of the mixed sound. This processing target section refers to the time from when the sound acquisition unit 101 acquires mixed sound, the sound separation unit 103 separates the mixed sound, and the sound processing unit 104 reconstructs the mixed sound. For example, the sound selection processing device 100 sets a processing target section for 3 seconds, and extracts the sound structure characteristics of the mixed sound in the 3 seconds. The length of the section to be processed may be determined in advance when the sound selection processing device 100 is shipped. Next, in S 702, the sound separation unit 103 extracts the separated sound. In this embodiment, the method shown in FIG. 2B described later is used. After the separation sound extraction processing is performed, it is determined in step S703 whether a separable sound stream is obtained. For example, the mixed sound eventually becomes only one stream, that is, only the background noise can be obtained. In such a case, the mixed sound is output as it is in S704, and if there is a separated sound, the processing target sound already identified in S705 is read from the processed sound structure feature DB 108 so that it is difficult to hear. For example, the sound is processed by giving the opposite phase of the sound to be processed.

また、図２（ｂ）は、本実施の形態１における分離音抽出ピーク探索の流れ図である。音分離部１０３に対応するＳ７１１において、混合音の音構造特徴から代表的なピークを抽出し、そのピークを時間周波数方向に追跡を行なう。Ｓ７１２において、ピークの立ち上がりや立下りなどが同期して変動するピークをグルーピングし、一つの音ストリーム候補と見なす。Ｓ７１３において、音ストリーム候補に含まれるピーク数を検証し、閾値以上の数のピークが含まれていたら、それを１つの音として認める。Ｓ７１４において、グループごとに、抽出対象音をスペクトル波形から時間波形に変換し、Ｓ７１５において、混合音と抽出対象音との差分をとる。Ｓ７１２の結果、複数のグループが生じた場合には、Ｓ７１３からＳ７１５を繰り返して、グループ毎に分離音として抽出する。 FIG. 2B is a flowchart of the separated sound extraction peak search in the first embodiment. In S711 corresponding to the sound separation unit 103, a representative peak is extracted from the sound structure characteristics of the mixed sound, and the peak is traced in the time-frequency direction. In step S712, peaks whose rising and falling peaks fluctuate synchronously are grouped and regarded as one sound stream candidate. In S713, the number of peaks included in the sound stream candidate is verified, and if the number of peaks equal to or greater than the threshold is included, it is recognized as one sound. In S714, the extraction target sound is converted from a spectrum waveform to a time waveform for each group, and in S715, a difference between the mixed sound and the extraction target sound is obtained. As a result of S712, when a plurality of groups are generated, S713 to S715 are repeated and extracted as a separated sound for each group.

なお、Ｓ７０５の分離音の加工の際に、加工音構造特徴ＤＢ１０８に保存されている、聞きたくないと特定された分離音が存在するかどうかを判定した上で、特定された分離音が存在する場合には、特定された分離音が混合音から減算されるように加工するようにしてもよい。 When the separated sound is processed in S705, it is determined whether there is a separated sound that is stored in the processed sound structure feature DB 108 and that is identified as not to be heard, and then the identified separated sound exists. In that case, the specified separated sound may be processed so as to be subtracted from the mixed sound.

図２（ｃ）は、本実施の形態１における特定音の有無の判定を行う分離音加工の流れ図である。ここで、特定音とは、聞きたくないとして分離音選択時にあらかじめ特定された音をいう。Ｓ７１６において、加工音構造特徴ＤＢ１０８に保存されている聞きたくないとして特定された特定音の音構造特徴量に対応する入力分離音の音構造特徴量を求め、Ｓ７１７において、特徴量間のベクトルの距離を例えばユークリッド距離によって求める。Ｓ７１８において、この距離が所定の判定値内にある場合には、入力分離音は特定音と同じ音であると判定することができる。同じ音と判定された場合にはその場合には、Ｓ７１９において、入力分離音を波形に変換した上で，Ｓ７２０混合音から入力分離音を減算する。この操作を、特定音に対して入力分離音ごとに行うことで、特定音の有無を確認した上で、特定された聞きたくない音を低減させることができる。 FIG. 2C is a flowchart of separated sound processing for determining the presence or absence of a specific sound in the first embodiment. Here, the specific sound refers to a sound specified in advance when selecting a separated sound because it is not desired to hear. In S716, the sound structure feature amount of the input separated sound corresponding to the sound structure feature amount of the specific sound specified as not desired to be stored stored in the processed sound structure feature DB 108 is obtained. In S717, the vector between the feature amounts is calculated. The distance is obtained by, for example, the Euclidean distance. In S718, when the distance is within the predetermined determination value, it can be determined that the input separated sound is the same sound as the specific sound. If it is determined that the sound is the same, in step S719, the input separated sound is converted into a waveform, and the input separated sound is subtracted from the S720 mixed sound. By performing this operation for each input separated sound with respect to the specific sound, it is possible to reduce the specified undesired sound after confirming the presence or absence of the specific sound.

なお距離尺度には、正規化した方向余弦を用いてもよい。さらに、所定時間方向にも考慮を加えて、前記距離所定値を、トラッキングを考慮した範囲に時間的に変動するように設定してもよい。 Note that a normalized direction cosine may be used as the distance measure. Further, in consideration of the predetermined time direction, the predetermined distance value may be set so as to fluctuate with time in a range that considers tracking.

かかる構成によって、さらに、聞きたくない音として特定された分離音が一時的に止んだ場合にも、その有無を判断することによって、聞きたくない音として特定された分離音以外には影響が出ないように、音を加工することができる。 With this configuration, even if a separated sound that has been identified as a sound that you do not want to hear temporarily stops, the presence or absence of the separated sound that has been identified as a sound that you do not want to hear has an effect. The sound can be processed so that there is no.

次に、分離音選択時に関わる構成について説明する。
不快感検知部１３１は、利用者が不快な状態にあることを検知する不快感検知手段に相当する。不快感検知部１３１では、利用者による不快感ボタンの押下を検知することにより、不快感を検知する。 Next, a configuration related to separation sound selection will be described.
The discomfort detection unit 131 corresponds to discomfort detection means for detecting that the user is in an uncomfortable state. The discomfort detection unit 131 detects discomfort by detecting pressing of the discomfort button by the user.

候補音選択決定部１０５は、不快感検知手段によって利用者が不快な状態にあることが検知されると、分離された音である各分離音間の関係を評価し、評価結果に基づいて、加工対象候補の分離音を推定する候補音選択決定手段に相当する。候補音選択決定部１０５では、所定時間分記録しておいた分離された音同士の関係を評価することにより、不快音つまり聞きたくない音の候補を推定する。なお、本実施の形態の構成（図１）では、音構造特徴比較部１３３とマスカー分離音推定部１３４からなる。音構造特徴比較部１３３は、前記各分離音から形成された信号を比較することによって、前記各分離音間の関係を評価する。マスカー分離音推定部１３４は、前記評価結果に基づいて、前記各分離音のうち、支配的となる分離音又は支配的となる周波数帯域を推定することによって、前記加工対象候補の分離音を推定する。マスカー分離音推定部１３４のマスカーとは、比較される分離音のうち、支配的な音や帯域を指す。 When the candidate sound selection determination unit 105 detects that the user is in an uncomfortable state by the discomfort detection means, the candidate sound selection determination unit 105 evaluates the relationship between each separated sound that is a separated sound, and based on the evaluation result, This corresponds to candidate sound selection determining means for estimating the separated sound of the candidate for processing. The candidate sound selection determining unit 105 estimates a candidate for an unpleasant sound, that is, an undesired sound, by evaluating the relationship between the separated sounds recorded for a predetermined time. The configuration of the present embodiment (FIG. 1) includes a sound structure feature comparison unit 133 and a masker separated sound estimation unit 134. The sound structure feature comparison unit 133 evaluates the relationship between the separated sounds by comparing the signals formed from the separated sounds. The masker separated sound estimation unit 134 estimates the separated sound of the candidate to be processed by estimating a dominant separated sound or a dominant frequency band among the separated sounds based on the evaluation result. To do. The masker of the masker separated sound estimation unit 134 refers to a dominant sound or band among the separated sounds to be compared.

加工音提示特定部１０６は、推定された加工対象候補の分離音を利用者に提示して、選択を受け付け、選択された分離音を特定する候補音提示特定手段に相当する。加工音提示特定部１０６では、候補音選択決定部１０５において決定した候補音を利用者に提示し、利用者に確認のボタンなどをおさせることによって候補音提示特定を行なう。 The processed sound presentation specifying unit 106 corresponds to a candidate sound presentation specifying unit that presents the estimated separation sound of the candidate to be processed to the user, accepts the selection, and specifies the selected separated sound. The processed sound presentation specifying unit 106 presents the candidate sound determined by the candidate sound selection determination unit 105 to the user, and makes the user press a confirmation button to specify the candidate sound presentation.

加工音構造特徴更新部１０７は、加工音提示特定部で特定された、イライラする原因となる不快音の特徴を加工音構造特徴ＤＢ１０８に保存あるいは更新する。ここでいう音特徴とは、Ｓ７１４においてグルーピングされたピークの周波数や時間波形を特徴量化した値を指すものとする。この加工音構造特徴ＤＢ１０８の保存分離音構造特徴を有している場合に、逆相波形を作成することによって音を消滅させるように加工を行う。 The processed sound structure feature update unit 107 stores or updates the feature of the unpleasant sound that causes the frustration specified by the processed sound presentation specifying unit in the processed sound structure feature DB 108. The sound feature here refers to a value obtained by characterizing the peak frequency and time waveform grouped in S714. In the case where the processed sound structure feature DB 108 has the stored separated sound structure feature, processing is performed so that the sound is extinguished by creating a reverse phase waveform.

次に、分離音選択時の処理の流れについて説明する。
図３は、本実施の形態１における分離音選択時の処理の流れ図である。不快感検知部１３１に相当するＳ６０１において、利用者によるボタンの押下などを利用して、不快感を検知すると、候補音選択決定部１０５に相当するＳ６０２において、音分離部１０３から受け取った分離音の中から、分離音同士の関係を評価することによって、イライラする原因となる不快音の候補を推定する。加工音提示特定部１０６に相当するＳ６０３において、その候補音を提示し、利用者に確認のボタンなどをおさせることによって候補音の提示および特定を行なう。そして、加工音構造特徴更新部１０７に相当するＳ６０４において、特定された加工音構造特徴を、加工音構造特徴ＤＢ１０８に保存する。保存された加工音構造特徴は、分離音加工時に利用される。 Next, the flow of processing when selecting a separated sound will be described.
FIG. 3 is a flowchart of a process when selecting a separated sound in the first embodiment. In S601 corresponding to the discomfort detection unit 131, when the user detects a discomfort using a button press or the like, the separated sound received from the sound separation unit 103 in S602 corresponding to the candidate sound selection determination unit 105 The candidate of the unpleasant sound which becomes a cause of irritation is estimated by evaluating the relationship between separated sounds. In S603 corresponding to the processed sound presentation specifying unit 106, the candidate sound is presented, and the candidate sound is presented and specified by causing the user to press a confirmation button or the like. In step S604 corresponding to the processed sound structure feature update unit 107, the specified processed sound structure feature is stored in the processed sound structure feature DB. The stored processed sound structure feature is used at the time of separated sound processing.

なお、Ｓ６０２で決定された候補音が一つであれば、Ｓ６０３で確認せずにそのまま、Ｓ６０４でその音構造特徴を保存するようにしてもよい。 If there is one candidate sound determined in S602, the sound structure feature may be stored in S604 without confirmation in S603.

ここからは、図３の各ステップにおける処理の詳細について説明する。
図４（ａ）は、本実施の形態１における不快感検知部１３１の動作を示す流れ図である。ここでは、Ｓ３０１で、利用者による不快感ボタンの押下を検知すると、不快感を検知する簡単な構成である。利用者による不快感ボタンの押下を検知することにより、利用者の不快感を検知する方法は、簡単だが、実際に、利用者自身の主観を捕らえるためには、その利用者自身の主観が反映されるため、確実な手段といえる。複数の音が鳴っている際に不快感ボタン押されたとき、その数秒前までの音のみを対象に、不快音を選択することにも用いることができる。 From here, the detail of the process in each step of FIG. 3 is demonstrated.
FIG. 4A is a flowchart showing the operation of the discomfort detecting unit 131 in the first embodiment. Here, it is a simple configuration for detecting discomfort when it is detected in S301 that the user presses the discomfort button. The method of detecting user discomfort by detecting the user pressing the discomfort button is simple, but in order to actually capture the user's own subjectivity, the user's own subjectivity is reflected. Therefore, it can be said to be a reliable means. When a discomfort button is pressed while a plurality of sounds are being played, it can also be used to select an unpleasant sound only for sounds up to several seconds before that.

図４（ｂ）は、不快感検知ボタンの押下タイミングから分離音の開始立ち上がり時点までの時間によって、候補音として分離音を選択することを説明する模式図である。同図では、不快感検知の下向きの矢印が、不快感検知ボタンの押されたタイミングを示している。 FIG. 4B is a schematic diagram for explaining that the separated sound is selected as the candidate sound according to the time from the pressing timing of the discomfort detection button to the start rising time of the separated sound. In the figure, the downward arrow of the discomfort detection indicates the timing when the discomfort detection button is pressed.

この例では、分離音Ａは、音の立ち上がりが不快感検知から所定時間以上離れているためイライラへの影響は薄れているものと捕らえて、イライラする原因となる不快音の候補は、分離音Ｂと分離音Ｃに限定することができる。 In this example, the separated sound A is regarded as having a weak influence on the frustration because the rise of the sound is more than a predetermined time away from the detection of the unpleasant feeling, and the unpleasant sound candidate causing the frustration is the separated sound. B and separated sound C can be limited.

なお、この所定の時間は、人によって個人差があることも考えられるので、特定された不快音の立ち上がり時刻とボタン押下の時刻の差分時間をもとに、決定するなどして、可変的に設定できるようにしても良い。 Note that this predetermined time may vary from person to person, so it can be variably determined, for example, based on the difference between the rise time of the specified unpleasant sound and the time when the button is pressed. It may be settable.

また、このように決めた所定時間とは別に、利用者がうるさいと感じる音の大きさの基準値を設定して、分離音がこの基準値よりも大きい場合に、不快感検知の時刻から分離音の開始時刻である立ち上がりまでの時間を短く設定するようにして、候補音の絞込みを行ってもよい。これは、例えば驚かされるほどの極端に大きい音が発生した場合や暗騒音そのものがうるさいと感じるレベルにある場合には、より敏感に音を不快に感じる傾向があるものと推測できるためである。すなわち、驚かされるほどの極端に大きい音が発生した場合には、利用者がその音に対して不快に感じる可能性が高い。従って、不快感検知の時刻から長くさかのぼって不快感を招いていると思われる他の分離音を探さなくても、不快感検知の時刻の直前に発生した極端に大きい音が不快音であるとみなしても、その確実性が高いからである。 In addition to the predetermined time determined in this way, a reference value is set for the loudness level that the user feels noisy, and when the separated sound is larger than this reference value, it is separated from the time of detection of discomfort. The candidate sounds may be narrowed down by setting the time until the rise, which is the start time of the sound, to be short. This is because, for example, when an extremely loud sound is generated that is surprising or when the background noise itself is at a level where it feels noisy, it can be estimated that the sound tends to be more sensitive and uncomfortable. That is, when an extremely loud sound that is surprising is generated, the user is likely to feel uncomfortable with the sound. Therefore, an extremely loud sound generated immediately before the time of discomfort detection is an unpleasant sound without searching for other separated sound that seems to be causing discomfort going back from the time of discomfort detection. This is because the certainty is high.

このような構成によって、大きな音に対して感度よく、候補音の絞込みができるようになるため、よりすばやく不快感を生じさせる音を特定することができる。 With such a configuration, the candidate sounds can be narrowed down with high sensitivity to loud sounds, so that sounds that cause discomfort can be identified more quickly.

図５（ａ）は、本実施の形態１における候補音選択決定部１０５の処理の流れ図である。ここでは、分離音同士の関係を評価することにより、どの分離音が不快音つまり聞きたくない音であるかを推定する。 FIG. 5A is a flowchart of processing of the candidate sound selection determination unit 105 in the first embodiment. Here, by evaluating the relationship between separated sounds, it is estimated which separated sounds are unpleasant sounds, that is, sounds that one does not want to hear.

まず、Ｓ４０１において、不快感検知時の分離音の数が一つかどうかを判定する。分離音数が一つであれば、この分離音がイライラする原因である不快音であると特定し、終了する。分離音が１つ以上あった場合には、Ｓ４０２において変更が必要であれば、Ｓ４０３における分離音の選択に用いられる所定時間の指定を行い、Ｓ４０３において、図４（ｂ）に示すように、不快感検知時刻と各分離音の立ち上がり時刻の間隔が所定時間内（例えば１０秒）にあるかどうかの判定を行なうことにより、加工対象の選択候補とする分離音かどうかを判定する。なお、不快感検知時刻と各分離音の立ち上がり時刻の間隔を示す所定時間は、ユーザが不快な音を聞いてからボタンを押すまでに要する時間によっても変わってくる。従って、例えば、ユーザが前述の所定時間を示す秒数を手入力することなどにより、所定時間を設定できるようにしてもよい。この判定の結果、Ｓ４０４において、候補となる分離音が一つしかない場合には、その分離音が不快音であるとして特定する。この段階では、図２０の要因１と要因４の場合に分離音が単独であった場合が検出されていることになる。 First, in S401, it is determined whether or not the number of separated sounds when detecting discomfort is one. If the number of separated sounds is one, the separated sound is identified as an unpleasant sound that causes annoyance, and the process ends. If there is one or more separated sounds, if a change is necessary in S402, a predetermined time used for selection of the separated sounds is designated in S403. In S403, as shown in FIG. By determining whether or not the interval between the discomfort detection time and the rising time of each separated sound is within a predetermined time (for example, 10 seconds), it is determined whether or not the separated sound is a candidate for selection as a processing target. The predetermined time indicating the interval between the uncomfortable feeling detection time and the rising time of each separated sound also varies depending on the time required for the user to press the button after hearing the unpleasant sound. Accordingly, for example, the user may be able to set the predetermined time by manually inputting the number of seconds indicating the predetermined time. As a result of this determination, if there is only one candidate separated sound in S404, the separated sound is specified as an unpleasant sound. At this stage, in the case of factor 1 and factor 4 in FIG.

そして、分離音が複数ある場合には、Ｓ４０５において、支配的分離音抽出ルーチンにおいて、支配的として算出された候補音を確定する。Ｓ４０６において支配的な候補音が見つかったかどうか判定を行ない、みつかったと判断された場合には、Ｓ４０７において、その音が聞きたくない音になる可能性が高いと判断して、聞きたくない音としてユーザに優先的に提示するように順番を入れ替える。また、逆に、支配的な候補音が決定されない、つまり各音がそれぞれの音に対してある帯域で支配的になっている場合には、これは図２０の要因の３に該当すると判断する。この場合、Ｓ４０７においては、一つずつ順番にユーザに提示を行ない、どれを残し、どれを消すかの指示を与えてもらう必要がある。この段階では、要因の１、２、４のいずれかで、複数の分離音が重なっている場合に該当する。 If there are a plurality of separated sounds, in S405, the candidate sound calculated as dominant in the dominant separated sound extraction routine is determined. In S406, it is determined whether or not a dominant candidate sound has been found. If it is determined that the sound has been found, it is determined in S407 that the sound is likely not to be heard and the sound is not desired to be heard. The order is changed so that it is presented preferentially to the user. Conversely, if no dominant candidate sound is determined, that is, each sound is dominant in a certain band with respect to each sound, it is determined that this corresponds to factor 3 in FIG. . In this case, in S407, it is necessary to present to the user one by one in order, and to give instructions on which to leave and which to erase. This stage corresponds to a case where a plurality of separated sounds are overlapped by any one of factors 1, 2, and 4.

図５（ｂ）は、図５（ａ）中のＳ４０５の支配的分離音抽出における処理の手順を示す流れ図であり、マスカー分離音推定部１３４の機能に対応するものである。ここでは、分離音同士の関係を評価することにより支配的な音を決定する。 FIG. 5B is a flowchart showing a processing procedure in the dominant separated sound extraction of S405 in FIG. 5A, and corresponds to the function of the masker separated sound estimation unit 134. Here, the dominant sound is determined by evaluating the relationship between the separated sounds.

まず比較するＳ５０２で、分離音の周波数帯域ごとに、最大のパワーを持つ信号を選択し、Ｓ５０３において、１対の分離音が有する、帯域代表信号（帯域で最大のパワーを持つ信号のこと）同士を比較し、Ｓ５０４において、さらに時間方向の遷移をみることによって、Ｓ５０５でマスカー分離音を決定する。 First, in S502 to be compared, a signal having the maximum power is selected for each frequency band of the separated sound, and in S503, a band representative signal (a signal having the maximum power in the band) included in a pair of separated sounds. By comparing them with each other, and in S504, further, a transition in the time direction is observed, and a masker separation sound is determined in S505.

ここで、図６は、この流れを具体例で示した、本実施の形態１におけるＳ４０５の支配的分離音決定の処理を説明するための図である。 Here, FIG. 6 is a diagram for explaining the dominant separated sound determination process of S405 in the first embodiment, showing this flow as a specific example.

ここでは、支配的な帯域または音を決定する処理の流れを、混合音が次の３つの種類の音（Ａ純音、Ｂ、音声、Ｃ，暗騒音）から構成されており、これらが分離音として抽出された例を用いて説明する。ここでは、後述のように波形を重ねて比較するため、それぞれを、実線、破線、および一点鎖線で表記する。また、この図では、簡単なため、それぞれの波形は横軸を周波数とした線スペクトルで表されている。例えば、純音の波形は、単一周波数の周波数スペクトルで表される。また、音声の波形は、基本周波数とその高調波成分でピークを持つ周波数スペクトルの集まりで表される。暗騒音は、周波数に関係なく概ね一様な大きさの周波数スペクトルの集まりで表される。 Here, the process flow for determining the dominant band or sound is that the mixed sound is composed of the following three types of sounds (A pure sound, B, sound, C, background noise), which are separated sounds. Will be described using an example extracted as Here, in order to overlap and compare waveforms as will be described later, each is represented by a solid line, a broken line, and a one-dot chain line. Further, in this figure, for simplicity, each waveform is represented by a line spectrum having the horizontal axis as the frequency. For example, a pure tone waveform is represented by a frequency spectrum of a single frequency. A voice waveform is represented by a collection of frequency spectra having peaks at the fundamental frequency and its harmonic components. Background noise is represented by a collection of frequency spectra having a substantially uniform size regardless of frequency.

まず、図６ａ）の各分離音波形から、周波数軸Ｆを所定の幅を持つ帯域に区切って得られる音構造信号の所定帯域ごとの代表値（例えば、帯域の中でパワーが最大となる周波数スペクトルをさす。）をｂ）帯域代表信号として選択し、このｂ）帯域代表信号をｃ）音構造比較信号として得る。つまりここでは、ｂ）とｃ）は同一の信号波形であるが、図１５との比較を簡単にするため異なる名称を有している。ここで、図６ａ）の横軸は周波数、縦軸は該当周波数のスペクトルパワーを示している。また図６ｂ）からｄ）は、ともに同じスケールを持つ。 First, the representative value for each predetermined band of the sound structure signal obtained by dividing the frequency axis F into bands having a predetermined width from each separated sound waveform of FIG. B) is selected as the band representative signal, and b) the band representative signal is obtained as c) the sound structure comparison signal. That is, here, b) and c) have the same signal waveform, but have different names in order to simplify the comparison with FIG. Here, the horizontal axis of FIG. 6a) indicates the frequency, and the vertical axis indicates the spectral power of the corresponding frequency. 6b) to d) both have the same scale.

なお、この模式的な例では、所定帯域は５つの同一周波数帯幅の線形なバンド幅を想定している。しかし、ヒトの聴覚機構に関する知見として知られる、バークスケール等で記述される非線形なバンド幅を持つ臨界帯域をこの所定帯域として利用してもよい。また、帯域の代表値の選択方法は、帯域中でもっとも大きなパワーを持つ周波数成分にしてもよいし、帯域を臨界帯域とする場合には、その中心周波数のパワーを用いるようにしてもよい。 In this schematic example, the predetermined band is assumed to have a linear bandwidth of five identical frequency bandwidths. However, a critical band having a non-linear bandwidth described in a Bark scale or the like, which is known as knowledge about the human auditory mechanism, may be used as the predetermined band. The method of selecting the representative value of the band may be a frequency component having the largest power in the band, or when the band is a critical band, the power of the center frequency may be used.

次に、図６d）支配の帯域の評価において、比較する分離音について、分離音の対応する帯域ごとに、どちらの振幅が大きいかなどによって支配的となるか否かの判定を行う。ここで、振幅差が例えば、３ｄＢ以内である場合には、支配的な帯域が存在しないと判断する。その結果として、支配的かどうか判定できない帯域は、図中では、"―"記号で示している。ここで波形Ａと波形Ｂとの比較を行う際、Ａ／Ｂのように表記し、以降同様に波形の比較の記号として用いるものとする。 Next, in the evaluation of the dominant band in FIG. 6d, it is determined whether or not the separated sound to be compared becomes dominant depending on which amplitude is larger for each band corresponding to the separated sound. Here, when the amplitude difference is within 3 dB, for example, it is determined that there is no dominant band. As a result, a band that cannot be determined as dominant is indicated by a “-” symbol in the figure. Here, when the waveform A and the waveform B are compared, they are expressed as A / B, and are used similarly as symbols for waveform comparison.

この例では、Ａ／ＣおよびＡ／Ｂでの比較では、Ａが支配的であるのは、Ａの純音の散在する帯域だけであり、それ以外の帯域では、ＢとＣが支配的である。そこで、Ｂ／Ｃを比較すると、支配的な帯域数で比較すると、Ｂが２つ、Ｃが３つとＣが優位である。 In this example, in the comparison between A / C and A / B, A is dominant only in the band in which A's pure tone is scattered, and B and C are dominant in other bands. . Therefore, when comparing B / C, when comparing with the dominant number of bands, C is superior with 2 for B and 3 for C.

そして、図６ｅ）マスカー音の決定において、分離音間の関係を全帯域で評価し、支配的な音の基準に基づいて、支配的な音を決定する。 6e) In determining the masker sound, the relationship between the separated sounds is evaluated over the entire band, and the dominant sound is determined based on the dominant sound standard.

この例では、時刻Ｔ−２から現在時刻Ｔにおける、分離音の支配帯域評価の結果を列挙して、変化がある帯域を抽出する。この結果、Ａ／ＣおよびＡ／Ｂでは、Ａが支配的であると判断できる。Ｂ／Ｃの比較の結果では、ＢとＣの変化の個数が同等であるため、支配的な分離音はないと判断できる。最終的には、Ａの音が支配的であると判断できる。 In this example, the results of evaluation of the dominant band of separated sound from the time T-2 to the current time T are listed, and a band with a change is extracted. As a result, in A / C and A / B, it can be determined that A is dominant. As a result of the comparison of B / C, it can be determined that there is no dominant separated sound because the number of changes of B and C is equal. Eventually, it can be determined that the sound of A is dominant.

ここで、図６ｅ）の横軸には帯域ごとの支配的な分離音の識別子の系列が示されており、縦軸には、フレーム単位の時刻Ｔ−２、時刻Ｔ−１などが示されている。なお、フレーム単位の時刻Ｔ−２、時刻Ｔ−１の区切りは、例えば、３０ｍｓ程度が好ましい。 Here, the horizontal axis of FIG. 6e) shows a sequence of dominant separated sound identifiers for each band, and the vertical axis shows time T-2, time T-1, etc. in units of frames. ing. For example, the interval between time T-2 and time T-1 in frame units is preferably about 30 ms.

なお、上記とは異なる支配的な音の判断基準として、ある帯域における対応分離音と振幅差が極端に大きい場合（例えば、２０ｄＢ）には、帯域内の影響だけではなく、分離音全体に与える影響が大きいと考えられる。従って、少なくとも１つの帯域で対応する分離音との振幅差が極端に大きい分離音があれば、それを支配的な音として決定してもよい。また、別の支配的な音の判断基準として、（１）分離音の所定帯域の時間振幅変化が大きい場合（例えば、ある所定帯域で２０ｄＢ以上の振幅変化がある場合）や、（２）一定時間内に所定回数以上の大きい振幅変動がある場合（例えば１０ｄＢ以上の振幅変動が所定時間（例えば１０秒間）あたり所定回数（例えば５回）以上ある場合）には、音の大きさの変動が激しいと判断して、支配的と捉えることもできる。これらの基準は、ヒトがより大きな音や音の立ち上がりや立下りなどの変動により注目する特性があることに基づいている。 In addition, as a criterion for determining a dominant sound different from the above, when the amplitude difference with the corresponding separated sound in a certain band is extremely large (for example, 20 dB), it is applied not only to the influence in the band but to the entire separated sound. The impact is considered large. Therefore, if there is a separated sound having an extremely large amplitude difference from the corresponding separated sound in at least one band, it may be determined as a dominant sound. In addition, as another criterion for determining a dominant sound, (1) when the temporal amplitude change of a predetermined band of separated sound is large (for example, when there is an amplitude change of 20 dB or more in a predetermined band), or (2) constant When there is a large amplitude fluctuation of a predetermined number or more in time (for example, when an amplitude fluctuation of 10 dB or more is a predetermined number (for example, 5 times) or more per predetermined time (for example, 10 seconds)) Judging that it is intense, it can be regarded as dominant. These criteria are based on the fact that humans are more interested in fluctuations such as louder sounds and rising and falling edges.

図６ｅ）のＢ／Ｃの波形の比較結果のように、優先的な順位が定められない候補音の場合、候補音提示特定部１０６で、支配的な帯域が多いほうの分離音から、先に提示しても良い。なお、時間方向に優位な帯域数をカウントしてもよい。すなわち、現時点により近い時刻（例えば、フレームなど）の分離音同士で、その分離音が支配的な帯域の数をカウントして、カウントされた帯域の数が多いものほど、加工対象の候補音として先に提示するようにしてもよい。また、このような場合、図４（ｂ）での不快感検知ボタン押下のタイミングからさかのぼって、より近い範囲内の分離音の優先度を上げるようにしてもよい。すなわち、図４（ｂ）の場合に、不快感検知ボタンの押下のタイミングからさかのぼって例えば、５秒の間に立ち上がりのある分離音を加工対象の候補としたとすると、この場合のように、支配的な帯域の数が多いものほど加工対象の候補音として優先する場合には、不快感検知ボタンの押下のタイミングからさかのぼって例えば、３秒の間だけ支配的な帯域の数を調べればよい。 In the case of a candidate sound whose priority order is not determined as in the comparison result of the B / C waveform in FIG. 6e), the candidate sound presentation specifying unit 106 starts with the separated sound having the dominant band. May be presented. Note that the number of bands dominant in the time direction may be counted. That is, among the separated sounds at a time closer to the current time (for example, a frame, etc.), the number of bands in which the separated sounds are dominant is counted, and the larger the number of counted bands, the more the candidate sounds to be processed. You may make it show previously. In such a case, the priority of the separated sound within a closer range may be increased from the timing of pressing the discomfort detection button in FIG. 4B. That is, in the case of FIG. 4B, assuming that a separated sound that rises in 5 seconds from the timing of pressing the discomfort detection button is a candidate for processing, as in this case, If priority is given to the candidate sound to be processed as the number of dominant bands increases, the number of dominant bands only needs to be examined for 3 seconds, for example, from the timing of pressing the discomfort detection button. .

あるいは、支配的な帯域を決め、その帯域に限り判定を行なうようにしても良い。その結果として、全ての帯域において、例えば、分離音Ｃが優位になった場合には、確実にＣがイライラの原因である不快音であると、判断することできるので、音を提示せずに候補を特定決定しても良い。支配的な帯域は、利用者の聴力、特に聞こえやすい帯域に基づいて決定しても良い。また、不快音として特定された音の帯域から決定するようにしても良い。かかる構成により、不快音推定に必要な帯域のみを計算することにより、推定精度を落とさずに推定に必要な処理を削減することができる。 Alternatively, a dominant band may be determined and the determination may be performed only for that band. As a result, in all bands, for example, when the separated sound C is dominant, it can be determined that C is an unpleasant sound that causes irritation, so no sound is presented. Candidates may be identified and determined. The dominant band may be determined based on the user's hearing, especially the band that is easy to hear. Further, it may be determined from the band of the sound specified as an unpleasant sound. With this configuration, by calculating only the band necessary for unpleasant sound estimation, it is possible to reduce processing necessary for estimation without reducing the estimation accuracy.

図７は、本実施の形態１における候補音提示特定部１０６の処理を示す流れ図である。Ｓ４０８において、候補音選択決定部１０５において優先順位をつけて決定された候補音を、ユーザに提示し、イライラの原因となる音を特定する。このとき、イライラの原因となる音を特定するために選択ボタンを設けて、ユーザが、提示された候補音の中からイライラの原因となる音を選択するためにボタンを押すことにより決定する。 FIG. 7 is a flowchart showing the processing of the candidate sound presentation specifying unit 106 in the first embodiment. In S 408, the candidate sounds determined by the candidate sound selection determination unit 105 with priorities are presented to the user, and the sound causing the irritation is specified. At this time, a selection button is provided in order to specify the sound that causes irritation, and the user presses the button to select the sound that causes irritation from the presented candidate sounds.

また、提示する音の内容は、候補音だけを聞かせても良いし、分離前の音から候補音の差分をとった音でも構わない。 Further, the content of the sound to be presented may be only the candidate sound, or may be a sound obtained by subtracting the candidate sound from the sound before separation.

さらに、Ｓ４０１やＳ４０４において候補音が一つに決まった場合においても、必ず候補音を提示してユーザに確認するようにさせることによって、不快音の音特徴の無駄な更新を行うことを防ぐことができる。 Furthermore, even when only one candidate sound is determined in S401 or S404, it is possible to prevent unnecessary updating of the sound characteristics of unpleasant sound by always presenting the candidate sound and confirming it to the user. Can do.

また、支配的な音の順位が決まらないときには、候補音となる分離音全てを取り除いた音を聞かせて、その状態のほうが好ましいかどうか一括してユーザに確認してもよい。こうすることによって、まずはユーザがうるささから開放されて静かにしたいという思いに対しては早く答えることができる。 If the order of dominant sounds is not determined, a sound obtained by removing all separated sounds as candidate sounds may be heard, and the user may be collectively confirmed as to whether or not the state is preferable. In this way, first of all, the user can quickly answer the desire to be freed from annoyance and quiet.

かかる構成によれば、不快感検知時の直前の分離音同士の関係を評価することにより、イライラする原因となる不快音を推定することにより、すばやく音を特定することができ、不快感も低減させることができる。 According to such a configuration, by evaluating the relationship between the separated sounds immediately before detection of discomfort, it is possible to quickly identify the sound by estimating the discomfort sound that causes frustration, and to reduce discomfort Can be made.

また、候補音提示特定部１０６に、ユーザが聞きたい音を特定するための選択ボタンを設けるとしてもよい。この場合、候補音提示特定部１０６は、図７に示したＳ４０８の手順と同様にして、候補音選択決定部１０５で、例えば、不快音とは逆の優先順位をつけて決定された聞きたい音の候補音を、ユーザに提示する。そして、ユーザが、提示された候補音の中から聞きたい音を選択するためのボタンを押すことにより、ユーザが聞きたい音を決定する。このように決定された聞きたい音は、加工対象の候補音から除外され、ユーザは、聞きたい音が除外された不快音の候補から聞きたくない音を選択するとしてもよい。また、決定された聞きたい音は、加工音構造特徴更新部１０７によって音構造が抽出され、加工音構造特徴ＤＢ１０８の特定音とは別の領域に格納される。音加工部１０４は、加工音構造特徴ＤＢ１０８に格納されている聞きたい音の音構造特徴に基づいて、聞きたい音の同相波形を不快音が除去された混合音に加算する、又は、聞きたい音の分離音から生成される波形だけを増幅して混合音を再構成する。これによって、聞きたい音が利用者にとってより聞こえ易くなるようにすることができるという効果がある。 The candidate sound presentation specifying unit 106 may be provided with a selection button for specifying the sound that the user wants to hear. In this case, the candidate sound presentation specifying unit 106 wants to listen to the candidate sound selection determining unit 105, for example, with a priority order opposite to that of the unpleasant sound, in the same manner as the procedure of S408 shown in FIG. A candidate sound is presented to the user. Then, the user presses a button for selecting a sound to be heard from the presented candidate sounds, thereby determining the sound that the user wants to hear. The sound to be heard determined in this way may be excluded from the candidate sounds to be processed, and the user may select a sound that the user does not want to hear from the unpleasant sound candidates from which the sound to be heard is excluded. In addition, the sound structure of the determined sound to be heard is extracted by the processed sound structure feature updating unit 107 and stored in a region different from the specific sound in the processed sound structure feature DB 108. The sound processing unit 104 adds the in-phase waveform of the desired sound to the mixed sound from which the unpleasant sound has been removed, based on the sound structure characteristics of the desired sound stored in the processed sound structure characteristic DB 108, or wants to hear Only the waveform generated from the separated sound is amplified to reconstruct the mixed sound. As a result, there is an effect that it is possible to make it easier for the user to hear the desired sound.

図８は、マセニーの心理モデルと本発明の対応関係を示す図である。以下では、図８に示すマセニーの心理モデル（新生理心理学３巻２章１２−１４ページ北大路書房に記載）をもとに、本発明を対応付けることにより、本発明の効果について説明する。 FIG. 8 is a diagram illustrating a correspondence relationship between Macheny's psychological model and the present invention. Below, the effect of the present invention will be described by associating the present invention with reference to Macheny's psychological model shown in FIG. 8 (described in New Physiological Psychology, Vol. 3, Chapter 2, pages 12-14, Kitaoji Shobo).

このモデルによれば、「ストレスは、自己への要求、生活の変化、役割の要求、イライラすることからなる要求がストレスの開始点であり、恐怖感のように条件付けられた経験が意識的な気づきを伴わないで直接ストレス反応を引き起こすことを除いて、まず要求に気づき、その気づきがことの激しさや重要性を見積もり（一時的評価）、それらに対処するための資源と比較される（二次元評価）。もしその人の対処資源がその要求に対処されるものであれば、要求は挑戦とみなされ、健康状態を維持する。しかし、要求と資源の見積もりが一致せず、資源に対して要求があまりにも強ければ、その要求はストレッサーとみなされ、ストレス反応が生起し、最終的にストレス症状に導かれる。」としている。 According to this model, “Stress is the starting point of stress because of demands on oneself, changes in life, demands for roles, and frustration, and conditioned experiences like fear are conscious. Except for causing a direct stress response without awareness, first the requirements are noticed, and the intensity or importance of the awareness is estimated (temporary assessment) and compared with the resources to deal with them ( Two-dimensional assessment) If the person's coping resources address the demand, the demand is considered a challenge and remains healthy, but the demand and resource estimates do not match, On the other hand, if the request is too strong, the request is regarded as a stressor, and a stress response occurs and ultimately leads to stress symptoms. "

このモデルを用いて、本発明を音によるストレスを低減させるものと対応付けることができることを説明する。ストレス源となる音の存在に気づいて上記のようにストレッサーとなりイライラし始めたとき、不快感を検知しイライラの原因となる不快音を特定しその音を加工することによって、図中の太い矢印のように、ストレスの反応を起こしていた状態から抜け出すことができるため、ストレス源となる不快音が原因であるストレスから回避し、またはストレスを低減することができる。 Using this model, it will be explained that the present invention can be associated with the one that reduces the stress caused by sound. When you notice the presence of a stress source sound and become irritated as a stressor as described above, you can detect discomfort, identify the discomfort sound that causes irritation, and process the sound. Thus, it is possible to escape from the state in which the stress reaction has occurred, so that it is possible to avoid or reduce the stress caused by the unpleasant sound that is the source of stress.

また、このことは、ストレス源となる音を特定し消すことによって、ストレス源回避をすることができるため、その人の対処資源を増加させる役目も果たしている。また、帯域の一部だけを変化させることによってストレス源となる不快音の音色を変えることによって、ストレス源の音に対する印象を変えることができる。また、ストレス源を他の音に置き換えることによって、対処資源を増強できる。 This also serves to increase a person's coping resources because the stress source can be avoided by specifying and extinguishing the stress source sound. Moreover, the impression with respect to the sound of a stress source can be changed by changing the timbre of the unpleasant sound used as a stress source by changing only a part of zone | band. In addition, by replacing the stress source with another sound, the coping resources can be increased.

（実施の形態１における第二の構成図）
図９は、本実施の形態１の第２の構成である音選択加工装置１１００の構成を示すブロック図である。図９において、図１と同じ構成要素については同じ符号を用い、説明を省略する。 (Second configuration diagram in Embodiment 1)
FIG. 9 is a block diagram showing a configuration of a sound selection processing apparatus 1100 that is the second configuration of the first embodiment. 9, the same components as those in FIG. 1 are denoted by the same reference numerals, and description thereof is omitted.

図９において、音選択加工装置１１００は、図１の音選択加工装置１００の構成に加えて、加工基準入力部１３５と探索対象音構造特徴ＤＢ１３６とを有している。探索対象音構造特徴ＤＢ１３６は、上記の分離音抽出方法による抽出された分離音の音構造特徴を保存しておく保存部であって、音分離部１０３で対象とする分離音の音構造特徴を用いて音抽出を行なう。加工基準入力部１３５については、後に詳しく説明する。 In FIG. 9, the sound selection processing device 1100 includes a processing reference input unit 135 and a search target sound structure feature DB 136 in addition to the configuration of the sound selection processing device 100 of FIG. The search target sound structure feature DB 136 is a storage unit that stores the sound structure feature of the separated sound extracted by the above-described separated sound extraction method, and the sound structure feature of the separated sound targeted by the sound separation unit 103 is stored. To extract sound. The processing reference input unit 135 will be described in detail later.

探索対象音構造特徴ＤＢ１３６には、加工音構造特徴更新部１０７において、聞きたくない音が更新される際に、聞きたくない音として特定された分離音以外の分離音が、その音が取得された時間や場所に対応づけて蓄積されている。これにより、電車の中などでは、取得された混合音の中から、まず探索対象音構造特徴ＤＢ１３６に音構造特徴が格納されている電車の中の背景音を除去してしまい、聞きたくない音として加工対象となる候補音をあらかじめ削減しておくことができる。 In the search target sound structure feature DB 136, when the processed sound structure feature update unit 107 updates a sound that the user does not want to hear, the separated sound other than the separated sound specified as the sound that the user does not want to hear is acquired. Stored in association with the time and place. As a result, in a train or the like, the background sound in the train whose sound structure feature is stored in the search target sound structure feature DB 136 is first removed from the acquired mixed sound, and the sound that you do not want to hear is removed. As a result, the candidate sounds to be processed can be reduced in advance.

図１０は、本実施の形態１の第二の構成である音選択加工装置１１００における音分離部１０３での分離音探索対象音の有判定を伴う分離音抽出を示す流れ図である。この図は、図２（ｃ）と同じ構成要素については、同じ符号を用い、説明を省略する。図２（ｃ）が特定音を分離音と一致しているかを判定するのに対して、図１０では探索対象音をもとに分離音を抽出するように目的が異なっているが、特徴量の距離によって２つの分離音の一致度を判定して音を加工するという点では共通している。 FIG. 10 is a flowchart showing the separation sound extraction accompanied by the determination of the separation sound search target sound in the sound separation unit 103 in the sound selection processing device 1100 according to the second configuration of the first embodiment. In this figure, the same components as those in FIG. 2C are denoted by the same reference numerals, and description thereof is omitted. While FIG. 2C determines whether the specific sound matches the separated sound, in FIG. 10, the purpose is different to extract the separated sound based on the search target sound. This is common in that the sound is processed by determining the degree of coincidence of the two separated sounds based on the distance between the two.

Ｓ７２１において、探索対象音と対応する混合音の音構造特徴量を算出する。Ｓ７１７において、混合音と抽出対象音の音構造特徴量との距離を求める。Ｓ７１８において、その距離が所定閾値以下であった場合、探索対象音が見つかったことを意味する。この場合、Ｓ７２２において探索対象音を時間波形に展開して、Ｓ７２３において、混合音から探索対象音との差分をとる。探索対象音ごとに、この操作を繰り返すことで、探索音として既に登録されている分離音を混合音から分離することができる。 In S721, the sound structure feature amount of the mixed sound corresponding to the search target sound is calculated. In S717, the distance between the mixed sound and the sound structure feature amount of the extraction target sound is obtained. If the distance is equal to or smaller than the predetermined threshold value in S718, it means that the search target sound has been found. In this case, the search target sound is developed into a time waveform in S722, and the difference between the mixed sound and the search target sound is obtained in S723. By repeating this operation for each search target sound, a separated sound that has already been registered as a search sound can be separated from the mixed sound.

図１１は、本実施の形態１の第二の構成である音選択加工装置１１００における分離音選択時の処理を示す流れ図である。図３と同じ処理についての説明は省略する。Ｓ６０１において不快感検知を行ない、検知されなかった場合の処理が図３に示した構成図の場合と異なる。このとき、Ｓ６２１において、探索対象音に該当する音構造特徴の有無を調べ、探索対象音が存在し、かつ、Ｓ６２２において混合音を構成する音構造特徴に変化があったかどうかを調べ、変化が認められている場合には、Ｓ６２３において探索対象音として、音構造特徴を更新する。 FIG. 11 is a flowchart showing a process when a separated sound is selected in the sound selection processing apparatus 1100 according to the second configuration of the first embodiment. A description of the same processing as in FIG. 3 is omitted. In S601, discomfort is detected, and the processing when no discomfort is detected is different from the configuration shown in FIG. At this time, in S621, the presence or absence of the sound structure feature corresponding to the search target sound is checked, and in S622, it is checked whether or not the sound structure feature constituting the mixed sound has changed, and the change is recognized. If so, the sound structure feature is updated as the search target sound in S623.

なお、上記音構造特徴の有無とは、図２（ｃ）および図１０にも記載したＳ７１８での特徴量間の距離を用いて判定を行う。一定時間分の距離を加算して判定するようにしてもよい。 The presence / absence of the sound structure feature is determined using the distance between the feature values in S718 described also in FIG. 2C and FIG. You may make it determine by adding the distance for a fixed time.

また、上記音構造特徴の変化とは、上記一定時間の距離の尺度の総和が、探索音特徴量に一致している範囲であったとしても、例えば、周波数特性がスイープするようになったなどにより、一部の距離が特にずれているなどの場合を指すものである。所定距離と比較して探索対象音の更新とは、該当探索対象音の音構造特徴量自体を置き換えてもよいし、新たに別の探索対象音を追加しても良いし、当該探索対象音の変化の程度を表すように音構造特徴量の分布を持たせるようにして、その分布特徴を更新するようにしてもよい。また、実際に更新するかどうかは、利用者に確認させるようにしてもよい。 In addition, the change in the sound structure feature means that, for example, the frequency characteristic is swept even if the sum of the scales of the distance for a certain time is in a range that matches the search sound feature amount. This indicates a case where a part of the distance is particularly deviated. The update of the search target sound compared to the predetermined distance may replace the sound structure feature amount of the search target sound itself, may newly add another search target sound, The distribution feature may be updated by providing a distribution of the sound structure feature quantity so as to express the degree of change of the sound structure. In addition, the user may be asked to confirm whether or not to actually update.

かかる構成によれば，以前聞きたくない音として、判定した音に基づいて混合音を分離した上で、例えば、暗騒音やその場に特有の背景音を対象音としてその音構造特徴を探索対象音構造特徴ＤＢ１３６に記録しておき、分離加工時の音分離で利用することにより、よりすばやく正確に分離を行なうことができるようになる。すなわち、過去に聞こえていた探索対象音が現在取得された混合音に含まれているとき、混合音から、まず、探索対象音構造特徴ＤＢ１３６に蓄積されている過去に聞こえていた対象音を分離してしまうので、混合音に含まれている新たな不快音を容易に推定し、提示することが可能となる。また、これにより、音選択加工装置１１００の計算負荷を低減することができる。 According to such a configuration, after separating the mixed sound based on the determined sound as the sound that the user does not want to hear before, for example, the sound structure characteristic is searched for using the background noise peculiar to the background noise or the place as the target sound. By recording in the sound structure feature DB 136 and using it for sound separation during separation processing, separation can be performed more quickly and accurately. That is, when a search target sound that has been heard in the past is included in the currently acquired mixed sound, first, the target sound that has been heard in the past stored in the search target sound structure feature DB 136 is first separated from the mixed sound. Therefore, it becomes possible to easily estimate and present a new unpleasant sound included in the mixed sound. Thereby, the calculation load of the sound selection processing apparatus 1100 can be reduced.

また、聞きたくない音が特定されて、その分離音の音構造特徴が加工音構造特徴ＤＢ１０８に保存される際に、現在保存されている音構造特徴量と、新たに保存しようとしている音構造特徴量の距離を、図２（ｃ）のＳ１７１のように判定して、所定閾値以下であった場合には、探索対象音構造特徴ＤＢ１３６への新たな保存をひかえるようにしてもよい。さらに同一の音源によるものかどうかをユーザに確認して、同じ不快音による音構造特徴を複数保存するようにしてもよい。さらには、その複数の不快音の遷移情報を含むようにしてもよい。 When a sound that is not desired to be heard is specified and the sound structure feature of the separated sound is stored in the processed sound structure feature DB 108, the currently stored sound structure feature amount and the sound structure to be newly stored When the distance of the feature amount is determined as in S171 of FIG. 2C and is equal to or less than a predetermined threshold value, new storage in the search target sound structure feature DB 136 may be made. Further, it is possible to confirm with the user whether the sound source is the same sound source, and to store a plurality of sound structure features due to the same unpleasant sound. Furthermore, the transition information of the plurality of unpleasant sounds may be included.

かかる構成によって、分離音の比較などの内部的な処理では、複数のストリームが加工音構造特徴ＤＢ１０８に登録されて、ユーザに対しては、一つの音源の音が聞きたくない音として提示加工されるようになる。 With such a configuration, in internal processing such as comparison of separated sounds, a plurality of streams are registered in the processed sound structure feature DB 108 and presented to the user as sounds that one sound source does not want to hear. Become so.

また、聞きたくない音が特定されて、加工音構造特徴ＤＢ１０８にその音構造特徴量が保存される際に、さらに、分離音の音構造特徴と同時に探索対象音構造特徴ＤＢ１３６に記録しておくようにしてもよい。保存されている分離音の構成から表現される音環境と同じかどうかその分離音構成を比較した結果を記録してもよい。この判定により、たとえば、聞きたくない音が連続的に変化していたとき、その音特徴構造ＤＢに保存されている特定音構造特徴量との距離が所定値を外れていても、その他の分離音の構成（例えば１０）のうち大部分の（例えば８）が一致して同じ環境が続いているということが判定できる。その上で、例えばユーザに確認して、同じ音かどうか判断させることによって、ユーザにとって聞きたくない同じ音が変化しているという情報を保存することができるため、音特徴構造ＤＢの不要な更新を妨げる事ができる。 Further, when a sound that is not desired to be heard is specified and the sound structure feature value is stored in the processed sound structure feature DB 108, it is recorded in the search target sound structure feature DB 136 simultaneously with the sound structure feature of the separated sound. You may do it. You may record the result of having compared the separated sound structure whether it is the same as the sound environment expressed from the structure of the separated separated sound. By this determination, for example, when a sound that the user does not want to hear is continuously changing, even if the distance from the specific sound structure feature stored in the sound feature structure DB is outside a predetermined value, other separation is performed. It can be determined that most of the sound configurations (for example, 10) (for example, 8) coincide and the same environment continues. Then, for example, by confirming with the user and determining whether or not the sound is the same, it is possible to save information that the same sound that the user does not want to hear has changed, so unnecessary update of the sound feature structure DB Can be prevented.

さらに、図１０のように、探索対象音との比較により分離音抽出を行う際に、探索対象の音の情報として、加工音構造特徴ＤＢ１０８に保存していた音とともに、探索対象音構造特徴ＤＢ１３６に記録しておいた、ＧＰＳなどによって取得される場所や時刻などの情報をもとに、場所や時間の類似度を求めて、それによって、探索対象音を絞り込むようにしてもよい。 Furthermore, as shown in FIG. 10, when the separated sound is extracted by comparison with the search target sound, the search target sound structure feature DB 136 is stored together with the sound stored in the processed sound structure feature DB 108 as information on the search target sound. It is also possible to obtain the similarity of the place and time based on the information such as the place and time acquired by GPS and the like, and thereby narrow down the search target sound.

かかる構成によれば，混合音の連続性や探索対象音の絞込みを行った上で探索対象音に基づく音分離ができるので、よりすばやく音分離を行なうことができるようになる。 According to this configuration, since sound separation based on the search target sound can be performed after the continuity of the mixed sound and the search target sound are narrowed down, the sound separation can be performed more quickly.

さらに、反射などの場面を考慮して、Ｓ７１７の距離尺度を求めるようにしてもよい。たとえば、Ｓ７１７の距離尺度判定を行う前に、入力音と探索対象音の時間波形あるいは特徴的な周波数成分について時間方向の相互相関関数を計算し、相互相関値に位相遅れを確認できたときには、音源からの到達時間の異なる反射波とみなして、入力音から直接波に対する影響分を差し引くことにより、反射波に対する距離計算を行うようにしてもよい。探索対象音との一定時間の距離が閾値以内つまり、同じとみなされる限り、反射波も、音構造特徴の変化とみなして、消去対象とすることができる。 Furthermore, the distance scale of S717 may be obtained in consideration of scenes such as reflection. For example, when the cross-correlation function in the time direction is calculated for the time waveform or the characteristic frequency component of the input sound and the sound to be searched before the distance scale determination in S717, and the phase lag is confirmed in the cross-correlation value, Assuming that the reflected waves have different arrival times from the sound source, the distance calculation for the reflected waves may be performed by subtracting the influence on the direct waves from the input sound. As long as the distance from the search target sound within a certain time is within a threshold, that is, the same, it is considered that the reflected wave is a change in the sound structure characteristics and can be deleted.

かかる構成によれば、拡散音場のため、音源方向が正確に取得できない状況であったとしても、時間遅延と若干の音特性が変化した反射音も、探索対象音との類似度で判定を行うことにより、より確実に消去することができるものと考えられる。 According to such a configuration, even if the sound source direction cannot be accurately obtained due to the diffuse sound field, the reflected sound having a time delay and a slight change in sound characteristics is also determined based on the similarity to the search target sound. It is considered that the data can be erased more reliably by performing.

人は緊張や不快感を感じるとき、交感神経機能を賦活化させ、たとえば、心拍率、血圧、呼吸率の上昇を招くことが知られている。そこで、皮膚電気抵抗などの生体信号を用いて、不快感を検知しても良い。例えば、音に対する主観的な評価を推定する技術として、聴収音に対する生体信号を用いて快不快音を推定する方法（例えば、特開２００４―３３７２９４号公報）がある。ここで注意したいのは、ストレス源が音によるものかどうかを判定する必要がある点である。そのため、ラウドネス変化を用いた聴覚的評価尺度（例えば、特開２００１―１６５７６６号公報）などを組み合わせることによって、音による不快を感じているかどうかを判断するようにしても良い。 It is known that when a person feels tension or discomfort, the sympathetic nerve function is activated and, for example, the heart rate, blood pressure, and respiratory rate are increased. Therefore, discomfort may be detected using a biological signal such as skin electrical resistance. For example, as a technique for estimating subjective evaluation of sound, there is a method of estimating pleasant unpleasant sound using a biological signal for collected sound (for example, Japanese Patent Application Laid-Open No. 2004-337294). It should be noted here that it is necessary to determine whether the stress source is due to sound. Therefore, it may be determined whether or not the user feels discomfort due to sound by combining an auditory evaluation scale (for example, JP-A-2001-165766) using a loudness change.

図１２は、本実施の形態１の第二の構成である音選択加工装置１１００における不快感検知の処理を示す流れ図である。図４と同じ処理に付いての説明は省略する。この流れ図では、所定値以上のラウドネス変化が所定時間（例えば１０秒）あたり所定回数（例えば５回）以上発生し、生体信号による不快感判定閾値以上かつボタン押下の時にのみ、不快感検知とし、いずれかの条件を満たさない場合には、不快感ボタンの誤押ではないかと判定する。 FIG. 12 is a flowchart showing processing for detecting discomfort in the sound selection processing apparatus 1100 according to the second configuration of the first embodiment. Description of the same processing as in FIG. 4 is omitted. In this flowchart, a loudness change greater than or equal to a predetermined value occurs more than a predetermined number of times (for example, 5 times) per predetermined time (for example, 10 seconds). If any of the conditions is not satisfied, it is determined that the discomfort button is not pressed by mistake.

かかる構成によれば、ボタンを誤って押した場合かどうか判定することができるので、探索対象音の無駄な計算を抑えることができるようになる。 According to such a configuration, it is possible to determine whether or not the button has been pressed by mistake, so that it is possible to suppress unnecessary calculation of the search target sound.

加工基準入力部１３５は、候補音提示特定部１０６において加工音として特定した音に対して，音の大きさの加工条件を入力する処理部である。このことによって、消すか残すかといったはっきりした加工方法ではなく、柔軟な音加工を行なうことができる。さらには、聞きたい音、どちらでもない音も、その加工対象とすることで、聞きたくない音を、それ以外の音のおよそ半分の大きさ−６ｄＢで加工することによって、聞きたくない音を制御しつつ実際の環境での音に近い環境の音を作ることができる。 The processing reference input unit 135 is a processing unit that inputs a processing condition for sound volume with respect to the sound specified as the processing sound by the candidate sound presentation specifying unit 106. This makes it possible to perform flexible sound processing rather than a clear processing method of whether to erase or leave. Furthermore, by processing the sound that you want to hear, or neither sound, you can process the sound you do not want to hear by processing the sound that you do not want to hear at -6dB, which is about half the volume of the other sounds. While controlling, it is possible to make sound of the environment close to the sound of the actual environment.

かかる構成によれば、特定の場所などに固有の音や利用者自身の声や利用者自身の所有物などが、音構造特徴として抽出しておくことにより、イライラとなりうる原因となる不快音をより絞り込むことが可能となるため、不快音の特定をより確実にできるようになる。 According to such a configuration, unpleasant sounds that cause irritation can be generated by extracting sounds unique to a specific place, user's own voice, user's own property, etc. as sound structure features. Since it becomes possible to narrow down more, an unpleasant sound can be specified more reliably.

なお、候補音提示特定部１０６において、提示する内容として、音自身に加え、探索対象音ＤＢに付与した、その音の種類等の属性を一緒に提示しても良い。また、その音の種類等の属性に応じて、提示の順番を変えるとしてもよい。探索対象ＤＢに含まれない分離音の場合には、ＧＭＭなどのモデルを通じて、その音を識別する手段を用いて、音情報を取得する。 In addition, in the candidate sound presentation specifying unit 106, in addition to the sound itself, attributes such as the type of the sound that are given to the search target sound DB may be presented together as the presented content. Further, the order of presentation may be changed according to attributes such as the type of sound. In the case of separated sounds that are not included in the search target DB, sound information is acquired using a means for identifying the sound through a model such as GMM.

かかる構成によれば、一つ一つ提示された分離音を聞くことなく、例えば、人の声だけを全て、一度に選択消去することもできるようになるため、非常に、効率的になるといえる。 According to this configuration, for example, it is possible to select and delete all human voices at once without listening to the separated sounds presented one by one, which can be said to be very efficient. .

さらに、加工音構造ＤＢ１０８を更新する際に聞きたくない音として特定された分離音の音構造特徴が、探索対象音構造特徴ＤＢ１３６に蓄積された音構造と距離とを求めて比較した上で更新するかどうかをユーザに提示して決定しても良い。また、今回特定された分離音と同時に発生した分離音群と探索対象ＤＢの類似性が認められた分離音と分離音群とを比較し、同じ環境か判断しても良い。なお、分離音群の比較や環境の比較には場所や時間や各分離音の音種別判定の結果を用いることができる。かかる構成によれば、聞きたくない音が入れ替わったことを検出できるようになると同時に、入れ替わることを考慮して音加工を行うことができるようになる。 Furthermore, the sound structure feature of the separated sound identified as the sound that is not desired to be heard when the processed sound structure DB 108 is updated is updated after the sound structure stored in the search target sound structure feature DB 136 is compared with the distance. Whether or not to do so may be presented to the user and determined. Alternatively, the separated sound group generated at the same time as the separated sound specified this time may be compared with the separated sound group in which the similarity between the search target DBs is recognized and the separated sound group may be determined as the same environment. It should be noted that the result of the sound type determination of each separated sound can be used for comparison of separated sound groups and environment. According to such a configuration, it becomes possible to detect that a sound that is not desired to be heard is switched, and at the same time, it is possible to perform sound processing in consideration of the replacement.

（実施の形態２）
図１３は、本実施の形態２の音選択加工装置１５００の構成を示すブロック図である。図１３において、図１および図９と同じ構成要素については同じ符号を用い、説明を省略する。 (Embodiment 2)
FIG. 13 is a block diagram illustrating a configuration of the sound selection processing device 1500 according to the second embodiment. In FIG. 13, the same components as those in FIGS. 1 and 9 are denoted by the same reference numerals, and the description thereof is omitted.

図１３において、図９に示した構成に加え、音選択加工装置１５００は、加工音構造受信部１３７と加工音構造送信部１３８とを備えて構成をしている。また、候補音選択決定部１０５には、分離音マスキング概形決定部１３２をさらに有している。 In FIG. 13, in addition to the configuration shown in FIG. 9, the sound selection processing device 1500 includes a processed sound structure receiving unit 137 and a processed sound structure transmitting unit 138. The candidate sound selection determining unit 105 further includes a separated sound masking outline determining unit 132.

加工音構造受信部１３７は、他者が利用している、音選択加工装置において、他者がイライラする原因となる不快音であると特定した加工音を受信する。また、加工音構造送信部１３８は、同様に利用者自身が特定した加工音を相手に送信する。 The processed sound structure receiving unit 137 receives the processed sound identified as an unpleasant sound that causes the other person to be frustrated in the sound selection processing device used by the other person. Similarly, the processed sound structure transmission unit 138 transmits the processed sound specified by the user to the other party.

候補音選択決定部１０５に分離音マスキング概形決定部１３２を加えたことによって、候補音の選択を人の聴感上の音が別の音に与える影響を考慮してマスク音の推定を行なうことができるようになる。 By adding the separated sound masking outline determining unit 132 to the candidate sound selection determining unit 105, the selection of the candidate sound is performed by estimating the mask sound in consideration of the influence of the sound on human perception on another sound. Will be able to.

図１４は、本実施の形態２における候補音選択決定部１５０の処理を示す流れ図である。図７と同じ処理については説明を省略する。 FIG. 14 is a flowchart showing processing of the candidate sound selection determination unit 150 in the second embodiment. Description of the same processing as in FIG. 7 is omitted.

図７の構成との差異は、Ｓ５０２の音構造比較信号波形の形成において、Ｓ５１０で音構造比較のための帯域代表信号を選択した後、Ｓ５１１において、帯域ごとにその代表信号を推定マスキングレベル波形に変換し、Ｓ５１２において、各マスキングレベル概形を合成して、音構造比較信号を取得する点である。 The difference from the configuration of FIG. 7 is that in the formation of the sound structure comparison signal waveform in S502, after selecting a band representative signal for sound structure comparison in S510, the representative signal is estimated masking level waveform for each band in S511. In S512, the outlines of the masking levels are synthesized to obtain a sound structure comparison signal.

図１５は、本実施の形態２における候補音選択決定部１０５の比較の流れ図である。この図は、図６と同じ３つの種類の分離音から混合音が構成されており。図６と同じ処理については説明を省略するものとする。図１５では、ｃ）比較信号波形として、ｂ）帯域代表信号をｆ）マスキングレベル概形に変換し、その包落信号を得る点で異なる。 FIG. 15 is a comparison flowchart of the candidate sound selection determination unit 105 according to the second embodiment. In this figure, a mixed sound is composed of the same three types of separated sounds as in FIG. The description of the same processing as in FIG. 6 will be omitted. FIG. 15 differs in that c) as a comparison signal waveform, b) a band representative signal is converted into f) a masking level outline, and an envelope signal is obtained.

このように、一旦、マスキングレベルに変換することは、聴覚上の聞こえを考慮したうえで、分離音の比較を行うことに相当する。そのため、図６での信号分析による分離音の比較に比べて、より利用者の聞こえに近い判断ができることが期待できる。 Thus, once converting to the masking level is equivalent to comparing separated sounds in consideration of auditory hearing. Therefore, it can be expected that a judgment closer to the user's hearing can be made compared with the comparison of separated sounds by signal analysis in FIG.

図１６（ａ）は、特許文献（特開２０００―５０６６３１号公報公報マスキングを用いた２つの音の評価方法）に掲載されたマスキング手法の図５を示す図である。図１６（ｂ）は、特許文献（特開２０００―５０６６３１号公報）に掲載されたマスキング手法の図１９を示す図である。なお、マスキングレベルは、図１６（ａ）に示される手法のように、知見で得られた同時マスキングに、さらに図１６（ｂ）に示されるような継時マスキングを考慮して概形を決定するようにしてもよい。 FIG. 16A is a diagram showing FIG. 5 of a masking technique published in a patent document (Japanese Patent Laid-Open Publication No. 2000-506631 publication, two sound evaluation methods using masking). FIG. 16B is a view showing FIG. 19 of the masking technique published in the patent document (Japanese Patent Laid-Open No. 2000-506631). Note that the masking level is determined by considering the simultaneous masking obtained from knowledge as in the method shown in FIG. 16 (a) and the successive masking as shown in FIG. 16 (b). You may make it do.

図１６（ａ）は、１０００Ｈｚの正弦波を０ｄＢから１００ｄＢまで与えた際に、個々の音圧レベルが他の周波数領域に与える影響を調べるため、計測された減衰量のカーブである。この図は、外耳および中耳の伝達関数を考慮されている。図１６（ａ）に示す文献の図５の横軸は、Ｈｚ単位の周波数、縦軸は周波数領域全体にわたる耳の減衰量を表し、ｄＢ単位で示される。 FIG. 16A is a curve of measured attenuation amounts in order to examine the influence of individual sound pressure levels on other frequency regions when a 1000 Hz sine wave is applied from 0 dB to 100 dB. This figure takes into account the transfer functions of the outer ear and the middle ear. The horizontal axis in FIG. 5 of the document shown in FIG. 16A represents the frequency in Hz, and the vertical axis represents the ear attenuation over the entire frequency domain, and is expressed in dB.

この図より、大きい音０ｄＢで与えた１０００Ｈｚの正弦波の影響が、１０００Ｈｚだけでなく、かなり広帯域に渡って影響が出ていることが分かる。また小さい音１００ｄＢで与えたときにも、低周波方向では、影響が出ることが分かる。このような影響がでるのは、蝸牛が音の伝播とともに進行方向に振動するという物理的な構造上の特性が現れているものと解釈されている。 From this figure, it can be seen that the influence of a 1000 Hz sine wave given with a loud sound of 0 dB affects not only 1000 Hz but also a fairly wide band. It can also be seen that even when given a small sound of 100 dB, there is an effect in the low frequency direction. This effect is interpreted as a physical structural characteristic that the cochlea vibrates in the direction of travel as sound propagates.

つまり、信号分析上の単一の周波数の音に対して、聴感上は、他の周波数帯域への影響（マスキング）を考慮するようにしたほうが、より聞こえに近い判定ができるようになるものと考えられる。 In other words, for sounds with a single frequency in signal analysis, it is possible to make judgments closer to hearing by considering the influence (masking) on other frequency bands in terms of hearing. Conceivable.

なお、図１６（ａ）で示される周波数方向への音の影響は同時マスキングと呼ばれ、高い音は低い音で消されやすいという傾向をもち、さらに、図１６（ｂ）で示される継時マスキングは、時間的に接近する音は互いに影響しあう。という特性をもっている。このような人の聞こえを反映させることによって、より聞きたくない音の影響を考慮して音を選択できるようになる。つまり継時マスキングを考慮するためには、時間方向の成分も見る必要があるということになる。 Note that the influence of the sound in the frequency direction shown in FIG. 16A is called simultaneous masking, and a high sound has a tendency to be easily erased by a low sound, and further, the connection time shown in FIG. In masking, sounds approaching in time affect each other. It has the characteristics of By reflecting such a person's hearing, it becomes possible to select a sound in consideration of the influence of a sound that is not desired to be heard. In other words, in order to take into account successive masking, it is necessary to look at the components in the time direction.

そこで、図１５のように、人の聞こえ特性を考慮して、支配的な音の推定処理ではマスキング変換した概形を用いて、支配的な帯域および時間遷移を考慮して、最終的に支配的な帯域あるいは、マスカーとなる分離音を決定する。 Therefore, as shown in FIG. 15, in consideration of human hearing characteristics, the dominant sound estimation process uses the masked transformation outline, and finally considers the dominant band and time transition and controls The separation sound that becomes a specific band or masker is determined.

また、分離音同士の関係を評価した結果、例えば３つの分離音がそれぞれに、マスカーと判定されるような支配的な音が存在しない場合には、いずれも同じ程度の大きさで、入れ子の状態になっている場合には、図２０のイライラ音の発生要因３の複数の音が同時になっている場合ということが想定される。そのため、そのことを考慮して、提示を行なうようにしても良い。 In addition, as a result of evaluating the relationship between separated sounds, for example, when there is no dominant sound in which each of the three separated sounds is determined to be a masker, all of them are of the same magnitude and are nested. In the state, it is assumed that a plurality of sounds of the frustrating sound generation factor 3 in FIG. Therefore, the presentation may be performed in consideration of that.

なお、本実施の形態において、音取得部１０１を同時に連携させることによって、アレイマイクのように方向に対する抑圧効果を持たせるようにしても良い。 In the present embodiment, the sound acquisition unit 101 may be linked simultaneously to provide a direction suppression effect like an array microphone.

かかる構成によって、相手の不快音が自分と同じであるかどうか判断し、同じであれば、アレイマイクとして動作させることによって、より指向性を用いて、音選択を行なうことができるため、別の切り口でも選択できるようになる。また、その要因となる音の、より音源に近い場所で収録された、音構造特徴を相手に送信することで、相手側でもより積極的に聞きたくない音として加工処理を行なうことが可能となる。 With such a configuration, it is determined whether or not the other party's unpleasant sound is the same as that of the other person, and if it is the same, the sound can be selected using more directivity by operating as an array microphone. You can also select it from the cut end. In addition, by sending to the other party the sound structure characteristics recorded in a place closer to the sound source of the sound that is the cause, it is possible to perform processing as a sound that the other party does not want to hear more actively Become.

また、加工音構造受信部１３７および加工音構造送信部１３８、送信しあうデータは、イライラする原因となる不快音として特定した音を送信することによって、相手が不快音と特定した音を、自ら聞いて、あるいは、相手から受信した音を、自分の探索対象音構造特徴ＤＢで該当する対象音があるかどうかを調べる構成をもたせてもよい。 In addition, the processed sound structure receiving unit 137 and the processed sound structure transmitting unit 138 transmit data that is identified as an unpleasant sound that causes annoyance, so that the other party identifies the sound identified as an unpleasant sound by itself. You may make it the structure which checks whether there exists any target sound which hears or receives from the other party in own search target sound structure characteristic DB.

かかる構成によって、例えば、その相手からの不快音が、自分もしくは自分の所有物などが発生源である音であるとわかった場合、自ら、その音が出ないように努力するきっかけを得ることができる。 With such a configuration, for example, if it is found that the unpleasant sound from the other party is the sound that originates from you or your property, you can get an opportunity to make efforts to prevent that sound from coming out. it can.

図１７は、二人の人Ａと人Ｂが、不快音情報を通信しあう例を模擬的に示した図である。この例では、人Ａは、電車またぎ音と周囲の人Ｘの声を、人Ｂは、電車モータ音と同じく人Ｘの声を不快音と特定しており、その不快音の情報を互いに送っている。このとき、不快音同士を比較することによって共通な不快音として人Ｘの声があることが、お互いに分かる。そのため、人Ｘの声を消去するために、アレイマイクのように仮想マイク入力が増えたとみなして音分離抽出処理を行うことによって、より精度よく分離音を抽出することができるようになる。 FIG. 17 is a diagram schematically showing an example in which two people A and B communicate discomfort sound information. In this example, the person A specifies the train straddling sound and the voice of the surrounding person X, and the person B specifies the voice of the person X as an unpleasant sound like the train motor sound, and sends information on the unpleasant sound to each other. ing. At this time, by comparing the unpleasant sounds, it can be seen that there is a voice of the person X as a common unpleasant sound. Therefore, in order to erase the voice of the person X, it is possible to extract the separated sound with higher accuracy by performing the sound separation and extraction process on the assumption that the input of the virtual microphone has increased as in the case of the array microphone.

また、送られてきた人Ｂの不快な分離音に対して、人Ａの不快音以外の分離音と比較し、自分の周りで起きている分離音と一致している音があると判断される場合には、その音を人Ａに提示するようにしてもよい。このようにすることによって、人Ａは、その音が、例えば、人Ａが持つかばんに付けられた飾りが電車の揺れによって音を生じているものと分かる。この判断の結果、人Ａがその音が出ないように行動を起こすきっかけを与えることができるようになる。 In addition, compared with the separated sound other than the unpleasant sound of the person A, it is determined that there is a sound that matches the separated sound that is occurring around the person. The sound may be presented to the person A. By doing in this way, the person A knows that the sound is generated by, for example, the decoration on the bag held by the person A due to the shaking of the train. As a result of this determination, the person A can be given an opportunity to take action so that the sound is not emitted.

かかる構成によれば、通信ネットワークなどを介して、同じ音選択加工装置を使用している他者との間でアレイマイクのような共用音環境を構成することができる。これにより、相手の不快音と自分の不快音を踏まえて、不快音のない混合音を再構成できるため、より良い関係の音環境を構築することができる。 According to such a configuration, a shared sound environment such as an array microphone can be configured with another person using the same sound selection processing device via a communication network or the like. As a result, a mixed sound without an unpleasant sound can be reconstructed based on the other party's unpleasant sound and one's own unpleasant sound, so that a better sound environment can be constructed.

また、加工音構造受信部１３７および加工音構造送信部１３８で、送信しあうデータは、イライラする原因となる不快音として特定した音そのものでなくとも、その特定音と同期するパルス音でもよい。さらに識別器によって、音の種別まで取得されている場合、方向などの情報が分かっている場合には、その音の種類や方向などの属性的な情報を送りあうようにしてもよい。 Further, the data to be transmitted by the processed sound structure receiving unit 137 and the processed sound structure transmitting unit 138 may not be the sound itself specified as an unpleasant sound that causes irritation, but may be a pulse sound synchronized with the specific sound. Further, when the sound class is acquired by the discriminator, when the information such as the direction is known, the attribute information such as the type and direction of the sound may be transmitted.

かかる構成によれば、音響信号そのものよりも、音の種類や方向などの属性的な情報を上位の情報として与えることで、必要な計算量やデータ量を抑えることができる。例えば、直接相手側の利用者に、ＹＹの方向からのＸＸＸの音などと情報を伝えることで、相手ユーザ自身が新たに不快感ボタンを押すなどの判断から始める事ができるようになる。 According to such a configuration, it is possible to suppress necessary calculation amount and data amount by giving attribute information such as the type and direction of sound as higher-level information than the acoustic signal itself. For example, by directly transmitting information such as the sound of XXX from the YY direction to the other user, the user can start from the judgment that the other user himself presses a new discomfort button.

なお、ブロック図（図１、図９、図１３など）の各機能ブロックは典型的には集積回路であるLSIとして実現される。これらは個別に１チップ化されても良いし、一部又は全てを含むように１チップ化されても良い。 Each functional block in the block diagrams (FIGS. 1, 9, 13, etc.) is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.

（例えばメモリ以外の機能ブロックが１チップ化されていても良い。） (For example, the functional blocks other than the memory may be integrated into one chip.)

ここでは、LSIとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 The name used here is LSI, but it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.

また、集積回路化の手法はLSIに限るものではなく、専用回路又は汎用プロセサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用しても良い。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.

さらには、半導体技術の進歩又は派生する別技術によりLSIに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適応等が可能性としてありえる。 Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

また、各機能ブロックのうち、符号化または復号化の対象となるデータを格納する手段だけ１チップ化せずに別構成としても良い。 In addition, among the functional blocks, only the means for storing the data to be encoded or decoded may be configured separately instead of being integrated into one chip.

本発明にかかる音選択加工装置は、イライラする原因となる不快音をすばやく選択する機構を有していることから、画像音を収録し、不要な音を編集することにより音の選択を必要とする機器において有用である。またＩＣレコーダ等において音を収録しながら、音の選択を行なうことで、リアルタイムに音のフィルタリング装置等の用途にも応用できる。たとえば、講義を録音していて、講演者や生徒が椅子や机をずらす音、咳やくしゃみなど、想定する音の対象音特徴構造として記録しておけば、講義内容にのみに音を選択して録音することができる。 Since the sound selection processing device according to the present invention has a mechanism for quickly selecting an unpleasant sound causing annoyance, it is necessary to select a sound by recording an image sound and editing an unnecessary sound. It is useful in the equipment to do. Further, by selecting a sound while recording the sound in an IC recorder or the like, it can be applied to a sound filtering device or the like in real time. For example, if you record a lecture and record it as the target sound feature structure of the sound you expect, such as the sound of a speaker or student moving a chair or desk, coughing, or sneezing, you can select the sound only for the lecture content. Can be recorded.

さらに、他者と不快音の情報を共有することで、携帯電話など遠隔地での通話に加え、騒音などの物理的に近い場所においても、互いに聞きたくない音を低減化させる補聴器などへの適用も有用である。 Furthermore, by sharing information about unpleasant sounds with others, in addition to calls in remote areas such as mobile phones, it is possible to use hearing aids that reduce sounds that you do not want to hear from each other even in physically close locations such as noise. Application is also useful.

本実施の形態１における音選択加工装置の構成図Configuration diagram of sound selection processing apparatus according to Embodiment 1 本実施の形態１における分離音加工時の処理の流れ図Flow chart of processing at the time of separated sound processing in the first embodiment 本実施の形態１における分離音選択時の処理の流れ図Flow chart of processing when separated sound is selected in the first embodiment 本実施の形態１における不快感検知部の流れ図Flow chart of discomfort detection unit in the first embodiment 本実施の形態１における候補音選択決定部の流れ図Flow chart of candidate sound selection determination unit in the first embodiment 本実施の形態１における候補音選択の処理の流れ図Flow chart of candidate sound selection processing in the first embodiment 本実施の形態１における候補音選択決定部の流れ図Flow chart of candidate sound selection determination unit in the first embodiment ストレスモデルの対比による本発明の効果説明図Illustration of effect of the present invention by contrast of stress model 本実施の形態１の第二の構成における構成図Configuration diagram in the second configuration of the first embodiment 本実施の形態１の第二の構成における分離音抽出対象音の比較の流れ図Flow chart of comparison of separated sound extraction target sounds in the second configuration of the first embodiment 本実施の形態１の第二の構成における分離音選択時の処理の流れ図Flowchart of processing when selecting separated sound in the second configuration of the first embodiment 本実施の形態１の第二の構成における不快感検知の流れ図Flow chart of discomfort detection in the second configuration of the first embodiment 本実施の形態２における音選択加工装置の構成図Configuration diagram of sound selection processing apparatus according to the second embodiment 本実施の形態２における候補音選択決定部の流れ図Flowchart of candidate sound selection determination unit in the second embodiment 本実施の形態２における候補音選択決定部の比較の流れ図Flow chart of comparison of candidate sound selection determination unit in the second embodiment マスキング概形図Masking outline drawing 本発明の適用例図Application example of the present invention 電車内の音情景の用例図Example of a sound scene in a train 主観による聞きたい音の用例説明図Example explanation of the sound you want to listen to subjectively 音によるイライラする要因の例図Example of annoying factors caused by sound

Explanation of symbols

１００音選択加工装置
１０１音取得部
１０２音構造特徴抽出部
１０３音分離部
１０４音加工部
１０５、１５０候補音選択決定部
１０６候補音提示特定部
１０７加工音構造特徴更新部
１０８加工音構造特徴ＤＢ
１３１不快感検知部
１３２分離音マスキング概形決定部
１３３音構造特徴比較部
１３４マスカー分離音推定部
１３５加工基準入力部
１３６探索対象音構造特徴ＤＢ
１３７加工音構造受信部
１３８加工音構造送信部
１１００音選択加工装置 DESCRIPTION OF SYMBOLS 100 Sound selection processing apparatus 101 Sound acquisition part 102 Sound structure feature extraction part 103 Sound separation part 104 Sound processing part 105,150 Candidate sound selection determination part 106 Candidate sound presentation specific part 107 Processed sound structure feature update part 108 Processed sound structure characteristic DB
131 Discomfort detection unit 132 Separation sound masking outline determination unit 133 Sound structure feature comparison unit 134 Masker separation sound estimation unit 135 Processing reference input unit 136 Search target sound structure feature DB
137 Processed sound structure receiving unit 138 Processed sound structure transmitting unit 1100 Sound selection processing device

Claims

A sound selection processing device that selectively removes unpleasant sounds from the mixed sound generated around the user,
Sound separation means for separating the mixed sound into sounds for each sound source;
Discomfort detection means for detecting that the user is in an uncomfortable state;
When it is detected by the discomfort detection means that the user is in the state, the relationship between each separated sound that is a separated sound is evaluated, and based on the evaluation result, the separated sound of the candidate for processing is obtained. A candidate sound selection determining means to be estimated;
Presenting the separated sound of the estimated processing target candidate to the user, accepting the selection, and candidate sound presentation specifying means for specifying the selected separated sound;
A sound selection processing apparatus comprising: sound processing means for processing the identified separated sound to reconstruct a mixed sound.

The sound separation means separates the mixed sound by extracting a sound structure characteristic for each sound source included in the mixed sound,
The candidate sound selection determining means includes
A sound structure feature comparison unit that evaluates the relationship between the separated sounds by comparing the signals formed from the separated sounds;
Based on the evaluation result, a masker separated sound estimation unit that estimates the separated sound of the candidate to be processed by estimating a separated separated sound or a dominant frequency band among the separated sounds. The sound selection processing apparatus according to claim 1, further comprising:

The discomfort detecting means detects that the user is in an uncomfortable state by using any one of input sound loudness change, skin electrical resistance fluctuation, and button press detection. Sound selection processing equipment.

The candidate sound presentation specifying means presents the separated sound separated from the acquired mixed sound to the user and accepts selection of a sound to be heard from the presented separated sounds,
The sound selection processing apparatus according to claim 1, wherein the sound processing means performs processing so as to amplify the selected sound to be heard and reconstructs the mixed sound.

The sound structure feature comparison unit compares the magnitudes of the signals formed from the separated sounds included in the mixed sound acquired within a predetermined time, thereby increasing the magnitude of the formed signals. Order the separated sounds as the dominant separated sounds,
The candidate sound presentation specifying unit presents a sound obtained by subtracting the separated sound of the candidate for processing from the separated sound or the mixed sound of the candidate for processing according to the order of dominant separated sounds. Item 2. The sound selection processing apparatus according to Item 2.

The sound selection processing device further includes:
Search target sound structure feature storage means for accumulating separated sounds other than the separated sounds specified by the candidate sound presentation specifying means in the past;
A sound structure feature storage means for accumulating the separated sound newly specified by the candidate sound presentation specifying means,
When the acquired mixed sound includes a separated sound related to a location accumulated in the search target sound structure feature storage unit, the sound structure feature storage unit displays a newly specified separated sound. The sound selection processing device according to claim 1, wherein the sound selection processing device is not stored.

The sound selection processing device further includes:
A discriminator for identifying the type of separated sound specified by the candidate sound presentation specifying means;
The sound selection processing apparatus according to claim 1, wherein the candidate sound presentation specifying unit presents the type of the sound along with the separated sound of the estimated processing target candidate.

Sound acquisition means for acquiring mixed sound generated in the surroundings;
A processed sound structure transmitting unit that transmits the sound structure of the separated sound specified by the candidate sound presentation specifying means to another sound selection processing device;
A processing sound structure receiving unit that receives the sound structure of the separated sound specified in the other sound selection processing device from the other sound selection processing device;
The sound acquisition means is an array together with the sound acquisition means of the other sound selection processing device when the separated sound specified by the candidate sound presentation specification means matches the separation sound received from the other sound selection processing device. The sound selection processing apparatus according to claim 1, wherein the sound selection processing apparatus operates as a microphone.

A sound selection processing method for selectively removing unpleasant sounds from the mixed sound generated around the user,
A sound separation step for separating the mixed sound into sounds for each sound source;
A discomfort detection step for detecting that the user is in an uncomfortable state;
When it is detected by the discomfort detection step that the user is in the state, the relationship between each separated sound that is a separated sound is evaluated, and based on the evaluation result, the separated sound of the candidate for processing is obtained. A candidate sound selection determination step to be estimated; and
Presenting the separated sound of the estimated processing target candidate to the user, accepting the selection, and a candidate sound presentation identifying step for identifying the selected separated sound;
And a sound processing step of processing the identified separated sound to reconstruct a mixed sound.

This is a program for a sound selection processing device that selectively removes unpleasant sounds from the mixed sound generated around the user, and the computer separates the mixed sound into sounds for each sound source. A step, a discomfort detection step for detecting that the user is in an uncomfortable state, and each separated sound that is a separated sound when the user is detected in the state by the discomfort detection step A candidate sound selection determination step for estimating a separation sound of a candidate for processing based on the evaluation result, and presenting the separation sound of the estimated candidate for processing to the user for selection , A candidate sound presentation specifying step for specifying the selected separated sound, and a sound processing step for processing the specified separated sound to reconstruct the mixed sound.